Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

6.1. Linear Transformations

But first, I want us to think about matrix-vector multiplication as something more than just number crunching.

In Chapter 5.3, the running example was the matrix

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

To multiply AA by a vector on the right, that vector must be in R3\mathbb{R}^3, and the result will be a vector in R5\mathbb{R}^5.

Put another way, if we consider the function T(x)=AxT(\vec x) = A \vec x, TT maps elements of R3\mathbb{R}^3 to elements of R5\mathbb{R}^5, i.e.

T:R3R5T: \mathbb{R}^3 \to \mathbb{R}^5

I’ve chosen the letter TT to denote that TT is a linear transformation.

Every linear transformation is of the form T(x)=AxT(\vec x) = A \vec x. For our purposes, linear transformations and matrix-vector multiplication are the same thing, though in general linear transformations are a more abstract concept (just like how vector spaces can be made up of functions, for example).

For example, the function

f(x)=f([x1x2])=[2x1+3x2x1x2]f(\vec x) = f\left( \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \right) = \begin{bmatrix} 2x_1 + 3x_2 \\ x_1 \\ x_2 \end{bmatrix}

is a linear transformation from R2\mathbb{R}^2 to R3\mathbb{R}^3, and is equivalent to

f(x)=[2x1+3x2x1x2]=[231001]matrixA[x1x2]xf(\vec x) = \begin{bmatrix} 2x_1 + 3x_2 \\ x_1 \\ x_2 \end{bmatrix} = \underbrace{\begin{bmatrix} 2 & 3 \\ 1 & 0 \\ 0 & 1 \end{bmatrix}}_{\text{matrix}\: A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}}_{\vec x}

The function g(x)=3xg(x) = 3x is also a linear transformation, from R\mathbb{R} to R\mathbb{R}.

A non-example of a linear transformation is

h(x)=[x12x22]h(\vec x) = \begin{bmatrix} x_1^2 \\ x_2^2 \end{bmatrix}

because no matrix multiplied by x\vec x will produce [x12x22]\begin{bmatrix} x_1^2 \\ x_2^2 \end{bmatrix}.

Another non-example, perhaps surprisingly, is

k(x)=2x+5k(x) = -2x + 5

This is the equation of a line in R2\mathbb{R}^2, which is linear in some sense, but it’s not a linear transformation, since it doesn’t satisfy the two properties of linearity. For k(x)k(x) to be a linear transformation, we’d need

k(cx)=ck(x)k(cx) = ck(x)

for any c,xRc, x \in \mathbb{R}. But, if we consider c=3c = 3 and x=1x = 1 as an example, we get

k(cx)=k(31)=k(3)=23+5=1ck(x)=3k(1)=3(21+5)=3(3)=9k(cx) = k(3 \cdot 1) = k(3) = -2 \cdot 3 + 5 = -1 \\ ck(x) = 3 k(1) = 3 (-2 \cdot 1 + 5) = 3(3) = 9

which are not equal. k(x)=2x+5k(x) = -2x + 5 is an example of an affine transformation, which in general is any function f:RdRnf: \mathbb{R}^d \to \mathbb{R}^n that can be written as f(x)=Ax+bf(\vec x) = A \vec x + \vec b, where AA is an n×dn \times d matrix and bRn\vec b \in \mathbb{R}^n.

From Rn\mathbb{R}^n to Rn\mathbb{R}^n

While linear transformations exist from R2\mathbb{R}^2 to R5\mathbb{R}^5 or R99\mathbb{R}^{99} to R4\mathbb{R}^4, it’s in some ways easiest to think about linear transformations with the same domain and codomain, i.e. transformations of the form T:RnRnT: \mathbb{R}^n \to \mathbb{R}^n. This will allow us to explore how transformations stretch, rotate, and reflect vectors in the same space. Linear transformations with the same domain (Rn\mathbb{R}^n) and codomain (Rn\mathbb{R}^n) are represented by n×nn \times n matrices, which gives us a useful setting to think about the invertibility of square matrices, beyond just looking at a bunch of numbers.

To start, let’s consider the linear transformation defined by the matrix

A=[2001/3]A = \begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}

What happens to a vector in R2\mathbb{R}^2 when we multiply it by AA? Let’s visualize the effect of AA on several vectors in R2\mathbb{R}^2.

Image produced in Jupyter

AA scales, or stretches, the input space by a factor of 2 in the xx-direction and a factor of 1/3-1/3 in the yy-direction.

Scaling

Another way of visualizing AA is to think about how it transforms the two standard basis vectors of R2\mathbb{R}^2, which are

ux=[10],uy=[01]{\color{#3d81f6}{\vec u_x = \begin{bmatrix} 1 \\ 0 \end{bmatrix}}}, \quad \color{#3d81f6}{\vec u_y = \begin{bmatrix} 0 \\ 1 \end{bmatrix}}

(In the past I’ve called these e1\vec e_1 and e2\vec e_2, but I’ll use ux\color{#3d81f6}{\vec u_x} and uy\color{#3d81f6}{\vec u_y} here since I’ll also use EE to represent a matrix shortly.)

Note that Aux=[20]\color{orange}{A \vec u_x} = \begin{bmatrix} 2 \\ 0 \end{bmatrix} is just the first column of AA, and similarly Auy=[01/3]\color{orange}{A \vec u_y} = \begin{bmatrix} 0 \\ -1/3 \end{bmatrix} is the second column of AA.

A=[2001/3]A = \begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}
Image produced in Jupyter

In addition to drawing ux\color{#3d81f6}\vec u_x and uy\color{#3d81f6}\vec u_y on the left and their transformed counterparts Aux\color{orange}A \vec u_x and Auy\color{orange}A \vec u_y on the right, I’ve also shaded in how the unit square, which is the square containing ux\color{#3d81f6}\vec u_x and uy\color{#3d81f6}\vec u_y, gets transformed. Here, it gets stretched from a square to a rectangle.

Remember that any vector vR2\vec v \in \mathbb{R}^2 is a linear combination of ux\color{#3d81f6}\vec u_x and uy\color{#3d81f6}\vec u_y. For instance,

[21]=2[10][01]=2uxuyv\begin{bmatrix} 2 \\ -1 \end{bmatrix} = 2 \begin{bmatrix} 1 \\ 0 \end{bmatrix} - \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \underbrace{2 {\color{#3d81f6}{\vec u_x}} - {\color{#3d81f6}{\vec u_y}}}_{\vec v}

So, multiplying AA by v\vec v is equivalent to multiplying AA by a linear combination of ux\vec u_x and uy\vec u_y.

A[21]=A(2uxuy)=2AuxAuyA \begin{bmatrix} 2 \\ -1 \end{bmatrix} = A (2 {\color{#3d81f6}{\vec u_x}} - {\color{#3d81f6}{\vec u_y}}) = 2{\color{orange}{A \vec u_x}} - {\color{orange}{A \vec u_y}}

and the result is a linear combination of Aux\color{orange}{A \vec u_x} and Auy\color{orange}{A \vec u_y} with the same coefficients! If this sounds confusing, just remember that Aux\color{orange} A{\vec u_x} and Auy\color{orange} A{\vec u_y} are just the first and second columns of AA, respectively.

So, as we move through the following examples, think of the transformed basis vectors Aux\color{orange}{A \vec u_x} and Auy\color{orange}{A \vec u_y} as a new set of “building blocks” that define the transformed space (which is the column space of AA).

AA is a diagonal matrix, which means it scales vectors. Note that any vector in R2\mathbb{R}^2 can be transformed by AA, not just vectors on or within the unit square; I’m just using these two basis vectors to visualize the transformation.

Rotations and Orthogonal Matrices

What might a non-diagonal matrix do? Let’s consider

B=[2/22/22/22/2]B = \begin{bmatrix} \sqrt{2} / 2 & -\sqrt{2} / 2 \\ \sqrt{2} / 2 & \sqrt{2} / 2 \end{bmatrix}
Image produced in Jupyter

Just to continue the previous example, the vector [21]\begin{bmatrix} 2 \\ -1 \end{bmatrix} is transformed into

B[21]=[2/22/22/22/2][21]=2[2/22/2]Bux[2/22/2]Buy=[32/22/2]B \begin{bmatrix} 2 \\ -1 \end{bmatrix} = \begin{bmatrix} \sqrt{2} / 2 & -\sqrt{2} / 2 \\ \sqrt{2} / 2 & \sqrt{2} / 2 \end{bmatrix} \begin{bmatrix} 2 \\ -1 \end{bmatrix} = 2 \underbrace{\begin{bmatrix} \sqrt{2} / 2 \\ \sqrt{2} / 2 \end{bmatrix}}_{\color{orange}B \vec u_x} - \underbrace{\begin{bmatrix} -\sqrt{2} / 2 \\ \sqrt{2} / 2 \end{bmatrix}}_{\color{orange} B \vec u_y} = \begin{bmatrix} 3 \sqrt{2} / 2 \\ \sqrt{2} / 2 \end{bmatrix}
Image produced in Jupyter

BB is an orthogonal matrix, which means that its columns are unit vectors and are orthogonal to one another.

BTB=BBT=Icondition for an orthogonal matrix\underbrace{B^TB = BB^T = I}_\text{condition for an orthogonal matrix}

Orthogonal matrices rotate vectors in the input space. In general, a matrix that rotates vectors by θ\theta (radians) counterclockwise in R2\mathbb{R}^2 is given by

R(θ)=[cosθsinθsinθcosθ]R(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}

B=R(π4)B = R(\frac{\pi}{4}) rotates vectors by π4\frac{\pi}{4} radians, i.e. 4545^\circ.

Rotations are more difficult to visualize in R3\mathbb{R}^3 and higher dimensions, but in Homework 5, you’ll prove that orthogonal matrices preserve norms, i.e. if QQ is an n×nn \times n orthogonal matrix and xRnx \in \mathbb{R}^n, then Qx=x\|Qx\| = \|x\|. So, even though an orthogonal matrix might be doing something harder to describe in R15\mathbb{R}^{15}, we know that it isn’t changing the lengths of the vectors it’s multiplying.

To drive home the point I made earlier, any vector v=[xy]\vec v = \begin{bmatrix} x \\ y \end{bmatrix}, once multiplied by BB, ends up transforming into

B[xy]v=x(Bux)+y(Buy)B \underbrace{\begin{bmatrix} x \\ y \end{bmatrix}}_{\vec v} = x ({\color{orange}{B \vec u_x}}) + y ({\color{orange}{B \vec u_y}})

Composing Transformations

We can even apply multiple transformations one after another. This is called composing transformations. For instance,

C=[222/62/6]C = \begin{bmatrix} \sqrt{2} & -\sqrt{2} \\ -\sqrt{2} / 6 & -\sqrt{2} / 6 \end{bmatrix}

is just

C=AB=[2001/3]scale[2/22/22/22/2]rotateC = AB = \underbrace{\begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}}_{\text{scale}} \underbrace{\begin{bmatrix} \sqrt{2}/2 & -\sqrt{2}/2 \\ \sqrt{2}/2 & \sqrt{2}/2 \end{bmatrix}}_{\text{rotate}}
Image produced in Jupyter

Note that CC rotates the input vector, and then scales it. Read the operations from right to left, since Cx=ABx=A(Bx)C\vec x = AB \vec x = A(B \vec x).

C=ABC = AB is different from

D=[22/622/6]=BAD = \begin{bmatrix} \sqrt{2} & \sqrt{2} / 6 \\ \sqrt{2} & -\sqrt{2} / 6 \end{bmatrix} = B A
Image produced in Jupyter

Shears

E=[12/301]E = \begin{bmatrix} 1 & -2/3 \\ 0 & 1 \end{bmatrix}
Image produced in Jupyter

EE is a shear matrix. Think of a shear as a transformation that slants the input space along one axis, while keeping the other axis fixed. What helps me interpret shears is looking at them formulaically.

E[xy]=[12/301][xy]=[x2/3yy]E \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 1 & -2/3 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x - 2/3 y \\ y \end{bmatrix}

Note that the yy-coordinate of input vectors in R2\mathbb{R}^2 remain unchanged, while the xx-coordinate is shifted by 23y-\frac{2}{3}y, which results in a slanted shape.

Similarly, FF is a shear matrix that keeps the xx-coordinate fixed, but shifts the yy-coordinate, resulting in a slanted shape that is tilted downwards.

F=[105/41]F = \begin{bmatrix} 1 & 0 \\ -5/4 & 1 \end{bmatrix}
Image produced in Jupyter

Projections

So far we’ve looked at scaling, rotation, and shear matrices. Yet another type is a projection matrix.

G=[1000]G = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}
Image produced in Jupyter

GxG \vec x projects x\vec x onto the xx-axis and throws away the yy-coordinate. Note that GG maps the unit square to a line, not another four-sided shape.

You might also notice that, unlike the matrices we’ve seen so far, colsp(G)\text{colsp}(G) is not all of R2\mathbb{R}^2, but rather it’s just a line in R2\mathbb{R}^2, since GG’s columns are not linearly independent.

HH below works similarly.

H=[1/2112]H = \begin{bmatrix} 1 / 2 & -1 \\ 1 & -2 \end{bmatrix}
Image produced in Jupyter

colsp(H)\text{colsp}(H) is the line spanned by [1/21]\begin{bmatrix} 1 / 2 \\ 1 \end{bmatrix}, so HxH \vec x will always be some vector on this line.

Put another way, if v=[xy]\vec v = \begin{bmatrix} x \\ y \end{bmatrix}, then HvH \vec v is

H[xy]v=x(Hux)+y(Huy)H \underbrace{\begin{bmatrix} x \\ y \end{bmatrix}}_{\vec v} = x ({\color{orange}{H \vec u_x}}) + y ({\color{orange}{H \vec u_y}})

but since Hux\color{orange}{H \vec u_x} and Huy\color{orange}{H \vec u_y} are both on the line spanned by [1/21]\begin{bmatrix} 1 / 2 \\ 1 \end{bmatrix}, HvH \vec v is really just a scalar multiple of [1/21]\begin{bmatrix} 1 / 2 \\ 1 \end{bmatrix}.

Arbitrary Matrices

Finally, I’ll comment that not all linear transformations have a nice, intuitive interpretation. For instance, consider

J=[1/3111/2]J = \begin{bmatrix} 1 / 3 & -1 \\ 1 & -1 / 2 \end{bmatrix}
Image produced in Jupyter

JJ turns the unit square into a parallelogram. In fact, so did AA, BB, CC, DD, EE, and FF; all of these transformations map the unit square to a parallelogram, with some additional properties (e.g. AA’s parallelogram was a rectangle, BB’s had equal sides, etc.).

There’s no need to memorize the names of these transformations – after all, they only apply in R2\mathbb{R}^2 and perhaps R3\mathbb{R}^3 where we can visualize.

Speaking of R3\mathbb{R}^3, an arbitrary 3×33 \times 3 matrix can be thought of as a transformation that maps the unit cube to a parallelepiped (the generalization of a parallelogram to three dimensions).

K=[100021/2011/2]K = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 1/2 \\ 0 & -1 & 1 / 2 \end{bmatrix}
Loading...

What do you notice about the transformation defined by LL, and how it relates to LL’s columns? (Drag the plot around to see the main point.)

L=[11/2011/2011/21]L = \begin{bmatrix} 1 & 1/2 & 0 \\ 1 & 1/2 & 0 \\ 1 & 1/2 & 1 \end{bmatrix}
Loading...

Since LL’s columns are linearly dependent, LL maps the unit cube to a flat parallelogram.

The Determinant

It turns out that there’s a formula for

  • area of the parallelogram formed by transforming the unit square by a 2×22 \times 2 matrix AA

  • volume of the parallelepiped formed by transforming the unit cube by a 3×33 \times 3 matrix AA

  • in general, the nn-dimensional “volume” of the object formed by transforming the unit nn-cube by an n×nn \times n matrix AA

That formula is called the determinant of AA, and is denoted det(A)\text{det}(A).

Why do we care? Remember, the goal of this section is to find the inverse of a square matrix AA, if it exists, and the determinant will give us one way to check if it does.

In the case of the projection matrices GG and HH above, we saw that their columns were linearly dependent, and so the transformations GG and HH mapped the unit square to a line with no area. Similarly above, LL mapped the unit cube to a flat parallelogram with no volume. In all other transformations, the matrices’ columns were linearly independent, so the resulting object had a non-zero area (in the case of 2×22 \times 2 matrices) or volume (in the case of 3×33 \times 3 matrices).

So, how do we find det(A)\text{det}(A)? Unfortunately, the formula is only convenient for 2×22 \times 2 matrices.

For example, in the transformation

J=[1/3111/2]J = \begin{bmatrix} 1 / 3 & -1 \\ 1 & -1 / 2 \end{bmatrix}

the area of the parallelogram formed by transforming the unit square is

det(J)=13(12)(1)(1)=16+1=56\text{det}(J) = \frac{1}{3}\left(-\frac{1}{2}\right) - (-1)(1) = -\frac{1}{6} + 1 = \frac{5}{6}
Image produced in Jupyter

Note that a determinant can be negative! So, to be precise, det(A)|\text{det}(A)| describes the nn-dimensional volume of the object formed by transforming the unit nn-cube by AA.

The sign of the determinant depends on the order of the columns of the matrix. For example, swap the columns of JJ and its determinant would be 56-\frac{5}{6}. (If AuxA \vec u_x is “to the right” of AuyA \vec u_y, the determinant is positive, like with the standard basis vectors; if AuxA \vec u_x is “to the left” of AuyA \vec u_y, the determinant is negative.)

The determinant of an n×nn \times n matrix can be expressed recursively using a weighted sum of determinants of smaller n1×n1n-1 \times n-1 matrices, called minors. For example, if AA is the 3×33 \times 3 matrix

A=[abcdefghi]A = \begin{bmatrix} {\color{3d81f6} a} & {\color{3d81f6} b} & {\color{3d81f6} c} \\ {\color{orange} d} & {\color{d81a60}e } & {\color{004d40}f } \\ {\color{orange} g} & {\color{d81a60}h } & {\color{004d40}i } \end{bmatrix}

then det(A)\text{det}(A) is

det(A)=aefhibdfgi+cdegh\text{det}(A) = {\color{3d81f6} a} \begin{vmatrix} {\color{d81a60}e } & {\color{004d40}f } \\ {\color{d81a60}h } & {\color{004d40}i } \end{vmatrix} - {\color{3d81f6} b} \begin{vmatrix} {\color{orange} d} & {\color{004d40}f } \\ {\color{orange} g} & {\color{004d40}i } \end{vmatrix} + {\color{3d81f6} c} \begin{vmatrix} {\color{orange} d} & {\color{d81a60}e } \\ {\color{orange} g} & {\color{d81a60}h } \end{vmatrix}

The matrix [efhi]\begin{bmatrix} e & f \\ h & i \end{bmatrix} is a minor of AA; it’s formed by deleting the first row and first column of AA. Note the alternating signs in the formula. This formula generalizes to n×nn \times n matrices, but is not practical for anything larger than 3×33 \times 3. (What is the runtime of this algorithm?)

The computation of the determinant is not super important. The big idea is that the determinant of AA is a single number that tells us whether AA’s transformation “loses a dimension” or not.

Some useful properties of the determinant are that, for any n×nn \times n matrices AA and BB,

  1. det(A)=det(AT)\text{det}(A) = \text{det}(A^T)

  2. det(AB)=det(A)det(B)\text{det}(AB) = \text{det}(A) \text{det}(B)

  3. det(cA)=cndet(A)\text{det}(cA) = c^n \text{det}(A) (notice the exponent!)

  4. If BB results from swapping two of AA’s columns (or rows), then

    det(B)=det(A)\text{det}(B) = -\text{det}(A)

The proofs of these properties are beyond the scope of our course. But, it’s worthwhile to think about what they mean in English in the context of linear transformations.

  1. det(A)=det(AT)\text{det}(A) = \text{det}(A^T) implies that the rows of AA (which are the columns of ATA^T) create the same “volume” as the columns of AA.

  2. det(AB)=det(A)det(B)\text{det}(AB) = \text{det}(A) \text{det}(B) matches our intuition that linear transformations can be composed. ABxAB \vec x is the result of applying BB to x\vec x, then AA to the result.

  3. det(cA)=cndet(A)\text{det}(cA) = c^n \text{det}(A) multiplies each column of AA by cc, and so the volume of the resulting object is scaled by c×c××c=cnc \times c \times \cdots \times c = c^n.

  4. If BB results from swapping two of AA’s columns (or rows), then $det(B)=det(A)\text{det}(B) = -\text{det}(A) reflects that swapping two columns reverses the orientation of the transformation, flipping the signed volume from positive to negative (or vice versa) while preserving its magnitude.

We’ve built intuition for linear transformations; next we return to inverses and how to compute them.