Skip to article frontmatterSkip to article content

2.9. Inverses

The big concept introduced in Chapter 2.8 was the rank of a matrix, rank(A)\text{rank}(A), which is the number of linearly independent columns of a matrix. The rank of a matrix told us the dimension of its column space, colsp(A)\text{colsp}(A), which is the set of all vectors that can be created with linear combinations of AA’s columns.

The ideas of rank, column space, and linear independence are all closely related, and also all apply for matrices of any shape – wide, tall, or square. This is good, because data from the real world is rarely square: we usually have many more observations (nn) than features (dd).

The big idea in Chapter 2.9 is that of the inverse of a matrix, A1A^{-1}, which only applies to square matrices. But, inverses and square matrices can still be useful for our rectangular matrices with real data, since the matrix ATAA^TA is square, no matter the shape of AA. And, as we discussed in Chapter 2.8 and you’re seeing in Homework 5, the matrix ATAA^TA has close ties to the correlations between the columns of AA, which you already know has some connection to linear regression.


Motivation: What is an Inverse?

Scalar Addition and Multiplication

In “regular” addition and multiplication, you’re already familiar with the idea of an inverse. In addition, the inverse of a number aa is another number aa' that satisfies

a+a=a+a=0a + a' = a' + a = 0

0 is the “identity” element in addition, since adding it to any number aa doesn’t change the value of aa. Of course, a=aa' = -a, so a-a is the additive inverse of aa. Any number aa has an additive inverse; for example, the additive inverse of 2 is -2, and the additive inverse of -2 is 2.

In multiplication, the inverse of a number aa is another number aa' that satisfies

aa=aa=1a \cdot a' = a' \cdot a = 1

1 is the “identity” element in multiplication, since multiplying it by any number aa doesn’t change the value of aa. Most numbers have a multiplicative inverse of a=1aa' = \frac{1}{a}, but 0 does not!

0a=00 \cdot a' = 0

is not achieved by any aa'.

When a multiplicative inverse exists, we can use it to solve equations like

2x=52x = 5

by multiplying both sides by the multiplicative inverse of 2, which is 12\frac{1}{2}.

12(2x)=12(5)    x=52\frac{1}{2} (2x) = \frac{1}{2} (5) \implies x = \frac{5}{2}

Matrices

If AA is an n×dn \times d matrix, its additive inverse is just A-A, which comes from negating each element of AA. That part is not all that interesting. What we’re more interested in is the multiplicative inverse of a matrix, which is just referred to as the inverse of the matrix.

Suppose AA is some n×dn \times d matrix. We’d like to find an inverse, AA', such that when AA is multiplied by AA' in any order, the result is the identity element for matrix multiplication, II. Recall, the n×nn \times n identity matrix is a matrix with 1s on the diagonal (moving from the top left to the bottom right), and 0s everywhere else. The 3×33 \times 3 identity matrix is

I3=[100010001]I_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

I3x=xI_3 \vec x = \vec x for any xR3\vec x \in \mathbb{R}^3, and BI3=I3B=BB I_3 = I_3 B = B for any BR3×3B \in \mathbb{R}^{3 \times 3}. The same holds true for any InI_n, where nn is the dimension of the space.

[100010001][x1x2x3]=[x1x2x3]\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}

If AA is an n×dn \times d matrix, we’d need a single matrix AA' that satisfies

AA=AA=IAA' = A'A = I

But, in order to evaluate AAAA', we’d need AA' to be of shape d×somethingd \times \text{something}, and in order to evaluate AAA'A, we’d need AA' to be of shape something×n\text{something} \times n. The only scenario where these are both possible is when AA is a square matrix, i.e. n=dn = d, and AA' is also a square matrix of shape n×nn \times n.

For example, consider the 2×22 \times 2 matrix

A=[2435]A = \begin{bmatrix} 2 & 4 \\ 3 & 5\end{bmatrix}

It turns out that AA does have an inverse! Its inverse, denoted by A1A^{-1}, is

A1=[5/223/21]A^{-1} = \begin{bmatrix} -5/2 & 2 \\ 3/2 & -1\end{bmatrix}

Because A1A^{-1} is the matrix that satisfies both

AA1=[2435][5/223/21]=[1001]=IAA^{-1} = \begin{bmatrix} 2 & 4 \\ 3 & 5\end{bmatrix} \begin{bmatrix} -5/2 & 2 \\ 3/2 & -1\end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1\end{bmatrix} = I

and

A1A=[5/223/21][2435]=[1001]=IA^{-1}A = \begin{bmatrix} -5/2 & 2 \\ 3/2 & -1\end{bmatrix} \begin{bmatrix} 2 & 4 \\ 3 & 5\end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1\end{bmatrix} = I

numpy is good at finding inverses, when they exist.

import numpy as np

A = np.array([[2, 4], 
              [3, 5]])
              
np.linalg.inv(A)
array([[-2.5, 2. ], [ 1.5, -1. ]])
# The top-left and bottom-right elements are 1s, 
# and the top-right and bottom-left elements are 0s.
# 4.4408921e-16 is just 0, with some floating point error baked in.
A @ np.linalg.inv(A)
array([[1.0000000e+00, 0.0000000e+00], [4.4408921e-16, 1.0000000e+00]])

But, not all square matrices are invertible, just like not all numbers have multiplicative inverses. What happens if we try and invert

B=[1224]B = \begin{bmatrix} 1 & 2 \\ 2 & 4\end{bmatrix}
B = np.array([[1, 2], 
              [2, 4]])

np.linalg.inv(B)
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
Cell In[17], line 4
      1 B = np.array([[1, 2], 
      2               [2, 4]])
----> 4 np.linalg.inv(B)

File ~/miniforge3/envs/pds/lib/python3.10/site-packages/numpy/linalg/linalg.py:561, in inv(a)
    559 signature = 'D->D' if isComplexType(t) else 'd->d'
    560 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 561 ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
    562 return wrap(ainv.astype(result_t, copy=False))

File ~/miniforge3/envs/pds/lib/python3.10/site-packages/numpy/linalg/linalg.py:112, in _raise_linalgerror_singular(err, flag)
    111 def _raise_linalgerror_singular(err, flag):
--> 112     raise LinAlgError("Singular matrix")

LinAlgError: Singular matrix

We’re told the matrix is singular, meaning that it’s not invertible.

The point of Chapter 2.9 is to understand why some square matrices are invertible and others are not, and what the inverse of an invertible matrix really means.


Linear Transformations

But first, I want us to think about matrix-vector multiplication as something more than just number crunching.

In Chapter 2.8, the running example was the matrix

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

To multiply AA by a vector on the right, that vector must be in R3\mathbb{R}^3, and the result will be a vector in R5\mathbb{R}^5.

Put another way, if we consider the function T(x)=AxT(\vec x) = A \vec x, TT maps elements of R3\mathbb{R}^3 to elements of R5\mathbb{R}^5, i.e.

T:R3R5T: \mathbb{R}^3 \to \mathbb{R}^5

I’ve chosen the letter TT to denote that TT is a linear transformation.

Every linear transformation is of the form T(x)=AxT(\vec x) = A \vec x. For our purposes, linear transformations and matrix-vector multiplication are the same thing, though in general linear transformations are a more abstract concept (just like how vector spaces can be made up of functions, for example).

For example, the function

f(x)=f([x1x2])=[2x1+3x2x1x2]f(\vec x) = f\left( \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \right) = \begin{bmatrix} 2x_1 + 3x_2 \\ x_1 \\ x_2 \end{bmatrix}

is a linear transformation from R2\mathbb{R}^2 to R3\mathbb{R}^3, and is equivalent to

f(x)=[2x1+3x2x1x2]=[231001]matrixA[x1x2]xf(\vec x) = \begin{bmatrix} 2x_1 + 3x_2 \\ x_1 \\ x_2 \end{bmatrix} = \underbrace{\begin{bmatrix} 2 & 3 \\ 1 & 0 \\ 0 & 1 \end{bmatrix}}_{\text{matrix}\: A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}}_{\vec x}

The function g(x)=3xg(x) = 3x is also a linear transformation, from R\mathbb{R} to R\mathbb{R}.

A non-example of a linear transformation is

h(x)=[x12x22]h(\vec x) = \begin{bmatrix} x_1^2 \\ x_2^2 \end{bmatrix}

because no matrix multiplied by x\vec x will produce [x12x22]\begin{bmatrix} x_1^2 \\ x_2^2 \end{bmatrix}.

Another non-example, perhaps surprisingly, is

k(x)=2x+5k(x) = -2x + 5

This is the equation of a line in R2\mathbb{R}^2, which is linear in some sense, but it’s not a linear transformation, since it doesn’t satisfy the two properties of linearity. For k(x)k(x) to be a linear transformation, we’d need

k(cx)=ck(x)k(cx) = ck(x)

for any c,xRc, x \in \mathbb{R}. But, if we consider c=3c = 3 and x=1x = 1 as an example, we get

k(cx)=k(31)=k(3)=23+5=1ck(x)=3k(1)=3(21+5)=3(3)=9k(cx) = k(3 \cdot 1) = k(3) = -2 \cdot 3 + 5 = -1 \\ ck(x) = 3 k(1) = 3 (-2 \cdot 1 + 5) = 3(3) = 9

which are not equal. k(x)=2x+5k(x) = -2x + 5 is an example of an affine transformation, which in general is any function f:RdRnf: \mathbb{R}^d \to \mathbb{R}^n that can be written as f(x)=Ax+bf(\vec x) = A \vec x + \vec b, where AA is an n×dn \times d matrix and bRn\vec b \in \mathbb{R}^n.

From Rn\mathbb{R}^n to Rn\mathbb{R}^n

While linear transformations exist from R2\mathbb{R}^2 to R5\mathbb{R}^5 or R99\mathbb{R}^{99} to R4\mathbb{R}^4, it’s in some ways easiest to think about linear transformations with the same domain and codomain, i.e. transformations of the form T:RnRnT: \mathbb{R}^n \to \mathbb{R}^n. This will allow us to explore how transformations stretch, rotate, and reflect vectors in the same space. Linear transformations with the same domain (Rn\mathbb{R}^n) and codomain (Rn\mathbb{R}^n) are represented by n×nn \times n matrices, which gives us a useful setting to think about the invertibility of square matrices, beyond just looking at a bunch of numbers.

To start, let’s consider the linear transformation defined by the matrix

A=[2001/3]A = \begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}

What happens to a vector in R2\mathbb{R}^2 when we multiply it by AA? Let’s visualize the effect of AA on several vectors in R2\mathbb{R}^2.

Image produced in Jupyter

AA scales, or stretches, the input space by a factor of 2 in the xx-direction and a factor of 1/3-1/3 in the yy-direction.

Scaling

Another way of visualizing AA is to think about how it transforms the two standard basis vectors of R2\mathbb{R}^2, which are

ux=[10],uy=[01]{\color{#3d81f6}{\vec u_x = \begin{bmatrix} 1 \\ 0 \end{bmatrix}}}, \quad \color{#3d81f6}{\vec u_y = \begin{bmatrix} 0 \\ 1 \end{bmatrix}}

(In the past I’ve called these e1\vec e_1 and e2\vec e_2, but I’ll use ux\color{#3d81f6}{\vec u_x} and uy\color{#3d81f6}{\vec u_y} here since I’ll also use EE to represent a matrix shortly.)

Note that Aux=[20]\color{orange}{A \vec u_x} = \begin{bmatrix} 2 \\ 0 \end{bmatrix} is just the first column of AA, and similarly Auy=[01/3]\color{orange}{A \vec u_y} = \begin{bmatrix} 0 \\ -1/3 \end{bmatrix} is the second column of AA.

A=[2001/3]A = \begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}
Image produced in Jupyter

In addition to drawing ux\color{#3d81f6}\vec u_x and uy\color{#3d81f6}\vec u_y on the left and their transformed counterparts Aux\color{orange}A \vec u_x and Auy\color{orange}A \vec u_y on the right, I’ve also shaded in how the unit square, which is the square containing ux\color{#3d81f6}\vec u_x and uy\color{#3d81f6}\vec u_y, gets transformed. Here, it gets stretched from a square to a rectangle.

Remember that any vector vR2\vec v \in \mathbb{R}^2 is a linear combination of ux\color{#3d81f6}\vec u_x and uy\color{#3d81f6}\vec u_y. For instance,

[74]=7[10]+4[01]=7ux+4uyv\begin{bmatrix} 7 \\ 4 \end{bmatrix} = 7 \begin{bmatrix} 1 \\ 0 \end{bmatrix} + 4 \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \underbrace{7 {\color{#3d81f6}{\vec u_x}} + 4 {\color{#3d81f6}{\vec u_y}}}_{\vec v}

So, multiplying AA by v\vec v is equivalent to multiplying AA by a linear combination of ux\vec u_x and uy\vec u_y.

A[74]=A(7ux+4uy)=7Aux+4AuyA \begin{bmatrix} 7 \\ 4 \end{bmatrix} = A (7 {\color{#3d81f6}{\vec u_x}} + 4 {\color{#3d81f6}{\vec u_y}}) = 7{\color{orange}{A \vec u_x}} + 4{\color{orange}{A \vec u_y}}

and the result is a linear combination of Aux\color{orange}{A \vec u_x} and Auy\color{orange}{A \vec u_y} with the same coefficients!

So, as we move through the following examples, think of the transformed basis vectors Aux\color{orange}{A \vec u_x} and Auy\color{orange}{A \vec u_y} as a new set of “building blocks” that define the transformed space (which is the column space of AA).

AA is a diagonal matrix, which means it scales vectors. Note that any vector in R2\mathbb{R}^2 can be transformed by AA, not just vectors on or within the unit square; I’m just using these two basis vectors to visualize the transformation.

Rotations and Orthogonal Matrices

What might a non-diagonal matrix do? Let’s consider

B=[2/22/22/22/2]B = \begin{bmatrix} \sqrt{2} / 2 & -\sqrt{2} / 2 \\ \sqrt{2} / 2 & \sqrt{2} / 2 \end{bmatrix}
Image produced in Jupyter

BB is an orthogonal matrix, which means that its columns are unit vectors and are orthogonal to one another.

BTB=BBT=Icondition for an orthogonal matrix\underbrace{B^TB = BB^T = I}_\text{condition for an orthogonal matrix}

Orthogonal matrices rotate vectors in the input space. In general, a matrix that rotates vectors by θ\theta (radians) counterclockwise is given by

R(θ)=[cosθsinθsinθcosθ]R(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}

B=R(π4)B = R(\frac{\pi}{4}) rotates vectors by π4\frac{\pi}{4} radians, i.e. 4545^\circ.

To drive home the point I made earlier, any vector v=[xy]\vec v = \begin{bmatrix} x \\ y \end{bmatrix}, once multiplied by BB, ends up transforming into

B[xy]v=x(Bux)+y(Buy)B \underbrace{\begin{bmatrix} x \\ y \end{bmatrix}}_{\vec v} = x ({\color{orange}{B \vec u_x}}) + y ({\color{orange}{B \vec u_y}})

Composing Transformations

We can even apply multiple transformations one after another. This is called composing transformations. For instance,

C=[222/62/6]C = \begin{bmatrix} \sqrt{2} & -\sqrt{2} \\ -\sqrt{2} / 6 & -\sqrt{2} / 6 \end{bmatrix}

is just

C=AB=[2001/3]scale[2/22/22/22/2]rotateC = AB = \underbrace{\begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}}_{\text{scale}} \underbrace{\begin{bmatrix} \sqrt{2}/2 & -\sqrt{2}/2 \\ \sqrt{2}/2 & \sqrt{2}/2 \end{bmatrix}}_{\text{rotate}}
Image produced in Jupyter

Note that CC rotates the input vector, and then scales it. Read the operations from right to left, since Cx=ABx=A(Bx)C\vec x = AB \vec x = A(B \vec x).

C=ABC = AB is different from

D=[22/622/6]=BAD = \begin{bmatrix} \sqrt{2} & \sqrt{2} / 6 \\ \sqrt{2} & -\sqrt{2} / 6 \end{bmatrix} = B A
Image produced in Jupyter

Shears

E=[12/301]E = \begin{bmatrix} 1 & -2/3 \\ 0 & 1 \end{bmatrix}
Image produced in Jupyter

EE is a shear matrix. Think of a shear as a transformation that slants the input space along one axis, while keeping the other axis fixed. What helps me interpret shears is looking at them formulaically.

E[xy]=[12/301][xy]=[x2/3yy]E \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 1 & -2/3 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x - 2/3 y \\ y \end{bmatrix}

Note that the yy-coordinate of input vectors in R2\mathbb{R}^2 remain unchanged, while the xx-coordinate is shifted by 23y-\frac{2}{3}y, which results in a slanted shape.

Similarly, FF is a shear matrix that keeps the xx-coordinate fixed, but shifts the yy-coordinate, resulting in a slanted shape that is tilted downwards.

F=[105/41]F = \begin{bmatrix} 1 & 0 \\ -5/4 & 1 \end{bmatrix}
Image produced in Jupyter

Projections

So far we’ve looked at scaling, rotation, and shear matrices. Yet another type is a projection matrix.

G=[1000]G = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}
Image produced in Jupyter

GxG \vec x projects x\vec x onto the xx-axis and throws away the yy-coordinate. Note that GG maps the unit square to a line, not another four-sided shape.

You might also notice that, unlike the matrices we’ve seen so far, colsp(G)\text{colsp}(G) is not all of R2\mathbb{R}^2, but rather it’s just a line in R2\mathbb{R}^2, since GG’s columns are not linearly independent.

HH below works similarly.

H=[1/2112]H = \begin{bmatrix} 1 / 2 & -1 \\ 1 & -2 \end{bmatrix}
Image produced in Jupyter

colsp(H)\text{colsp}(H) is the line spanned by [1/21]\begin{bmatrix} 1 / 2 \\ 1 \end{bmatrix}, so HxH \vec x will always be some vector on this line.

Put another way, if v=[xy]\vec v = \begin{bmatrix} x \\ y \end{bmatrix}, then HvH \vec v is

H[xy]v=x(Hux)+y(Huy)H \underbrace{\begin{bmatrix} x \\ y \end{bmatrix}}_{\vec v} = x ({\color{orange}{H \vec u_x}}) + y ({\color{orange}{H \vec u_y}})

but since Hux\color{orange}{H \vec u_x} and Huy\color{orange}{H \vec u_y} are both on the line spanned by [1/21]\begin{bmatrix} 1 / 2 \\ 1 \end{bmatrix}, HvH \vec v is really just a scalar multiple of [1/21]\begin{bmatrix} 1 / 2 \\ 1 \end{bmatrix}.

Arbitrary Matrices

Finally, I’ll comment that not all linear transformations have a nice, intuitive interpretation. For instance, consider

J=[1/3111/2]J = \begin{bmatrix} 1 / 3 & -1 \\ 1 & -1 / 2 \end{bmatrix}
Image produced in Jupyter

JJ turns the unit square into a parallelogram. In fact, so did AA, BB, CC, DD, EE, and FF; all of these transformations map the unit square to a parallelogram, with some additional properties (e.g. AA’s parallelogram was a rectangle, BB’s had equal sides, etc.).

There’s no need to memorize the names of these transformations – after all, they only apply in R2\mathbb{R}^2 and perhaps R3\mathbb{R}^3 where we can visualize.

Speaking of R3\mathbb{R}^3, an arbitrary 3×33 \times 3 matrix can be thought of as a transformation that maps the unit cube to a parallelepiped (the generalization of a parallelogram to three dimensions).

K=[100021/2011/2]K = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 1/2 \\ 0 & -1 & 1 / 2 \end{bmatrix}
Loading...

What do you notice about the transformation defined by LL, and how it relates to LL’s columns? (Drag the plot around to see the main point.)

L=[11/2011/2011/21]L = \begin{bmatrix} 1 & 1/2 & 0 \\ 1 & 1/2 & 0 \\ 1 & 1/2 & 1 \end{bmatrix}
Loading...

Since LL’s columns are linearly dependent, LL maps the unit cube to a flat parallelogram.

The Determinant

It turns out that there’s a formula for

  • area of the parallelogram formed by transforming the unit square by a 2×22 \times 2 matrix AA

  • volume of the parallelepiped formed by transforming the unit cube by a 3×33 \times 3 matrix AA

  • in general, the nn-dimensional “volume” of the object formed by transforming the unit nn-cube by an n×nn \times n matrix AA

Why do we care? Remember, the goal of this section is to find the inverse of a square matrix AA, if it exists, and the determinant will give us one way to check if it does.

In the case of the projection matrices GG and HH above, we saw that their columns were linearly dependent, and so the transformations GG and HH mapped the unit square to a line with no area. Similarly above, LL mapped the unit cube to a flat parallelogram with no volume. In all other transformations, the matrices’ columns were linearly independent, so the resulting object had a non-zero area (in the case of 2×22 \times 2 matrices) or volume (in the case of 3×33 \times 3 matrices).

So, how do we find the determinant of AA, denoted det(A)\text{det}(A)? Unfortunately, the formula is only convenient for 2×22 \times 2 matrices.

For example, in the transformation

J=[1/3111/2]J = \begin{bmatrix} 1 / 3 & -1 \\ 1 & -1 / 2 \end{bmatrix}

the area of the parallelogram formed by transforming the unit square is

det(J)=13(12)(1)(1)=16+1=56\text{det}(J) = \frac{1}{3}\left(-\frac{1}{2}\right) - (-1)(1) = -\frac{1}{6} + 1 = \frac{5}{6}
Image produced in Jupyter

Note that a determinant can be negative! Then, the absolute value of the determinant gives the area of the parallelogram. The sign of the determinant depends on the order of the columns of the matrix; swap the columns of JJ and its determinant would be 56-\frac{5}{6}. (If AuxA \vec u_x is “to the right” of AuyA \vec u_y, the determinant is positive, like with the standard basis vectors; if AuxA \vec u_x is “to the left” of AuyA \vec u_y, the determinant is negative.)

The determinant of an n×nn \times n matrix can be expressed recursively using a weighted sum of determinants of smaller n1×n1n-1 \times n-1 matrices, called minors. We’ll explore this idea more when necessary, but the computation is not super important: the big idea is that the determinant of AA is a single number that tells us whether AA’s transformation “loses a dimension” or not.

Two useful properties of the determinant are that

det(A)=det(AT)\text{det}(A) = \text{det}(A^T)

and

det(AB)=det(A)det(B)\text{det}(AB) = \text{det}(A) \text{det}(B)

if AA and BB are both n×nn \times n matrices.


Inverting a Transformation

Remember, the big goal of this section is to find the inverse of a square matrix AA.

Since each square matrix AA corresponds to a linear transformation, we can think A1A^{-1} (AA’s inverse) as “reversing” or “undoing” the transformation.

For example, if AA scales vectors, A1A^{-1} should scale by the reciprocal, so that applying AA and then A1A^{-1} returns the original vector.

The simplest case involves a diagonal matrix, like

A=[2001/3]A = \begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}
Image produced in Jupyter

To undo the effect of AA, we can apply the transformation

A1=[1/2003]A^{-1} = \begin{bmatrix} 1/2 & 0 \\ 0 & -3 \end{bmatrix}

Many of the transformations we looked at involving 2×22 \times 2 matrices are reversible, and hence the corresponding matrices are invertible. If the matrix rotates by θ\theta, the inverse is a rotation by θ-\theta. If the matrix shears by “dragging to the right”, the inverse is a shear by “dragging to the left”.

Another way of visualizing whether a transformation is reversible is whether given a vector (point) on the right, there is exactly one corresponding vector (point) on the left. By exactly one, I mean not 0, and not multiple.

Here, we visualize

F=[105/41]F = \begin{bmatrix} 1 & 0 \\ -5/4 & 1 \end{bmatrix}

Given any vector b\vec b of the form b=Fx\vec b = F \vec x, there is exactly one vector x\vec x that satisfies this equation.

Image produced in Jupyter

On the other hand, if we look at

H=[1/2112]H = \begin{bmatrix} 1 / 2 & -1 \\ 1 & -2 \end{bmatrix}

the same does not hold true. Given any vector bcolsp(H)\vec b \in \text{colsp}(H) on the right, there are infinitely many vectors x\vec x such that Hx=bH \vec x = \vec b. The vectors in pink on the left are all sent to the same vector on the right, [12]\color{#d81a60}{\begin{bmatrix} -1 \\\\ -2 \end{bmatrix}}.

Image produced in Jupyter

And, there are no vectors on the left that get sent to [22]\begin{bmatrix} -2 \\\\ 2 \end{bmatrix} on the right. Any vector in R2\mathbb{R}^2 that isn’t on the line spanned by [12]\begin{bmatrix} -1 \\ -2 \end{bmatrix} is unreachable.

You may recall the following key ideas from discrete math:

  • A function is invertible if and only if it is both one-to-one (injective) and onto (surjective).

  • A function is one-to-one if and only if no two inputs get sent to the same output, i.e. f(x1)=f(x2)f(x_1) = f(x_2) implies x1=x2x_1 = x_2.

  • A function is onto if every element of the codomain is an output of the function, i.e. for every yYy \in Y, there exists an xXx \in X such that f(x)=yf(x) = y.

The transformation represented by HH is neither one-to-one (because [01]\begin{bmatrix} 0 \\ 1 \end{bmatrix} and [11.5]\begin{bmatrix} 1 \\ 1.5 \end{bmatrix} get sent to the same point) nor onto (because [20]\begin{bmatrix} -2 \\ 0 \end{bmatrix} isn’t mapped to by any vector in R2\mathbb{R}^2).

In order for a linear transformation to be invertible, it must be both one-to-one and onto, i.e. it must be a bijection. Again, don’t worry if these terms seem foreign: I’ve provided them here to help build connections to other courses if you’ve taken them. If not, the rest of my coverage should still be sufficient.


Inverting a Matrix

The big idea I’m trying to get across is that an n×nn \times n matrix AA is invertible if and only if the corresponding linear transformation can be “undone”.

That is, AA is invertible if and only if given any vector bRn\vec b \in \mathbb{R}^n, there is exactly one vector xRn\vec x \in \mathbb{R}^n such that Ax=bA \vec x = \vec b. If the visual intuition from earlier didn’t make this clear, here’s another concrete example. Consider

A=[120120121]A = \begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \\ 1 & 2 & 1 \end{bmatrix}

rank(A)=2\text{rank}(A) = 2, since AA’s first two columns are scalar multiples of one another. Let’s consider two possible b\vec b’s, each depicting a different case.

  1. b=[333]\vec b = \begin{bmatrix} 3 \\ 3 \\ 3 \end{bmatrix}. This b\vec b is in colsp(A)\text{colsp}(A). The issue with AA is that there are infinitely many linear combinations of the columns of AA that equal this b\vec b.

[120120121]A[210]an x=[120120121]A[540]another x=...=[333]b\underbrace{\begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \\ 1 & 2 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} 2 \\ 1 \\ 0 \end{bmatrix}}_{\text{an } \vec x} = \underbrace{\begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \\ 1 & 2 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} -5 \\ 4 \\ 0 \end{bmatrix}}_{\text{another } \vec x} = ... = \underbrace{\begin{bmatrix} 3 \\ 3 \\ 3 \end{bmatrix}}_{\vec b}
  1. b=[100]\vec b = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}. This b\vec b is not in colsp(A)\text{colsp}(A), meaning there is no xR3\vec x \in \mathbb{R}^3 such that Ax=bA \vec x = \vec b.

[120120121]A[x1x2x3]no such x=[100]b\underbrace{\begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \\ 1 & 2 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}}_{\text{no such } \vec x} = \underbrace{\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}}_{\vec b}

But, if AA’s columns were linearly independent, they’d span all of R3\mathbb{R}^3, and so Ax=bA \vec x = \vec b would have a unique solution x\vec x for any b\vec b we think of.

Definition

The properties in the box above are sometimes together called the invertible matrix theorem. This is not an exhaustive list, either, and we’ll see other equivalent properties as time goes on.

Inverse of a 2×22 \times 2 Matrix

As was the case with the determinant, the general formula for the inverse of a matrix is only convenient for 2×22 \times 2 matrices.

You could solve for this formula by hand, by finding scalars ee, ff, gg, and hh such that

[abcd]A[efgh]=[1001]\underbrace{\begin{bmatrix} a & b \\ c & d \end{bmatrix}}_A \begin{bmatrix} e & f \\ g & h \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

Note that the formula above involves division by adbcad - bc. If adbc=0ad - bc = 0, then AA is not invertible, but adbcad - bc is just the determinant of AA!

Let’s test it out on some examples. If

A=[1234]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}

then

A1=11423[4231]=12[4231]=[213/21/2]A^{-1} = \frac{1}{1 \cdot 4 - 2 \cdot 3} \begin{bmatrix} 4 & -2 \\ -3 & 1 \end{bmatrix} = \frac{1}{-2} \begin{bmatrix} 4 & -2 \\ -3 & 1 \end{bmatrix} = \begin{bmatrix} -2 & 1 \\ 3/2 & -1/2 \end{bmatrix}

and indeed both

[1234][213/21/2]=[1001]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} -2 & 1 \\ 3/2 & -1/2 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

and

[213/21/2][1234]=[1001]\begin{bmatrix} -2 & 1 \\ 3/2 & -1/2 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

hold.

On the other hand,

B=[1224]B = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}

is not invertible, since its columns are linearly dependent.

Beyond 2×22 \times 2 Matrices

For matrices larger than 2×22 \times 2, the calculation of the inverse is not as straightforward; there’s no simple formula. In the 3×33 \times 3 case, we’d need to find 9 scalars cijc_{ij} such that

[371250420]A[c11c12c13c21c22c23c31c32c33]C=A1=[100010001]I\underbrace{\begin{bmatrix} 3 & 7 & 1 \\ -2 & 5 & 0 \\ 4 & 2 & 0 \end{bmatrix}}_A \underbrace{\begin{bmatrix} {\color{#3d81f6}c_{11}} & {\color{orange}c_{12}} & {\color{#d81a60}c_{13}} \\ {\color{#3d81f6}c_{21}} & {\color{orange}c_{22}} & {\color{#d81a60}c_{23}} \\ {\color{#3d81f6}c_{31}} & {\color{orange}c_{32}} & {\color{#d81a60}c_{33}} \end{bmatrix}}_{C = A^{-1}} = \underbrace{\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}}_{I}

This involves solving a system of 3 equations in 3 unknowns, 3 times – one per column of the identity matrix. One such system is

3c11+7c21+c31=12c11+5c21=04c11+2c21=0\begin{align*} 3 {\color{#3d81f6}c_{11}} + 7 {\color{#3d81f6}c_{21}} + {\color{#3d81f6}c_{31}} &= 1 \\ -2 {\color{#3d81f6}c_{11}} + 5 {\color{#3d81f6}c_{21}} &= 0 \\ 4 {\color{#3d81f6}c_{11}} + 2 {\color{#3d81f6}c_{21}} &= 0 \end{align*}

You can quickly see how this becomes a pain to solve by hand. Instead, we can use one of two strategies:

  1. Using row reduction, also known as Gaussian elimination, which is an efficient method for solving systems of linear equations without needing to write out each equation explicitly. Row reduction can be used to both find the rank and inverse of a matrix, among other things. More traditional linear algebra courses spend a considerable amount of time on this concept, though I’ve intentionally avoided it in this course to instead spend time on conceptual ideas most relevant to machine learning. That said, you’ll get some practice with it in a future homework.

  2. Using a pre-built function in numpy that does the row reduction for us.

At the end of this section, I give you some advice on how to (and not to) compute the inverse of a matrix in code.


More Examples

As we’ve come to expect, let’s work through some examples that illustrate important ideas.

Example: Inverting a Product

  1. Suppose AA and BB are both invertible n×nn \times n matrices. Is ABAB invertible? If so, what is its inverse?

  2. Suppose A=BCA = BC, and AA, BB, and CC are all invertible n×nn \times n matrices. What is the inverse of BB?

Example: Inverting a Sum

Suppose AA and BB are both invertible n×nn \times n matrices. Is A+BA + B invertible? If so, what is its inverse?

Example: Inverting XTXX^TX

Suppose XX is an n×dn \times d matrix. Note that XX is not square, and so it is not invertible. However, XTXX^TX is a square matrix, and so it is possible that it is invertible.

Explain why XTXX^TX is invertible if and only if XX’s columns are linearly independent.

Example: Orthogonal Matrices

Recall, an n×nn \times n matrix QQ is orthogonal if QTQ=QQT=IQ^T Q = QQ^T = I.

What is the inverse of QQ? Explain how this relates to the formula for rotation matrices in R2\mathbb{R}^2 from earlier,

R(θ)=[cosθsinθsinθcosθ]R(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}

Computing the Inverse in Code

I’d like to comment on the computational aspect of finding inverses. Remember that if AA is an n×nn \times n matrix, then

Ax=bA \vec x = \vec b

is a system of nn equations in nn unknowns.

[a11a12a1na21a22a2nan1an2ann]A[x1x2xn]x=[b1b2bn]b\underbrace{\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}}_{\vec x} = \underbrace{\begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{bmatrix}}_{\vec b}

One of the big (conceptual) usages of the inverse is to solve such a system. If AA is invertible, then we can solve for x\vec x by multiplying both sides on the left by A1A^{-1}:

A1Ax=A1b    x=A1bA^{-1} A \vec x = A^{-1} \vec b \implies \vec x = A^{-1} \vec b

But, in practice, finding A1A^{-1} is less efficient and more prone to floating point errors than solving the system of nn equations in nn unknowns directly for the specific b\vec b we care about.

  • Solving Ax=bA \vec x = \vec b involves solving just one system of nn equations in nn unknowns.

  • Finding A1A^{-1} involves solving a system of nn equations in nn unknowns, nn times! Each one has the same coefficient matrix AA but a different right-hand side b\vec b, corresponding to the columns of the identity matrix (which are the standard basis vectors in Rn\mathbb{R}^n).

The more floating point operations we need to do, the more error is introduced into the final results.

An illustrative example is to come!