Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5.4. Null Space and the Rank-Nullity Theorem

As we’ve seen, colsp(A)\text{colsp}(A) is a subspace of Rn\mathbb{R}^n that contains all possible results of AxA \vec x for xRd\vec x \in \mathbb{R}^d. Think of colsp(A)\text{colsp}(A) as the “reach” of the columns of AA.

If the columns of AA are linearly independent, then the only way to create 0\vec 0 through a linear combination of the columns of AA is if all the coefficients in the linear combination are 0, i.e. if x=0\vec x = \vec 0 in AxA \vec x.

But, if AA’s columns aren’t all linearly independent, then there will be some non-zero vectors x\vec x where Ax=0A \vec x = \vec 0. This holds from the definition of linear independence, which says that if there’s a non-zero linear combination of a collection of vectors that produces 0\vec 0, the vectors aren’t linearly independent.

It turns out that it’s worthwhile to study the set of vectors x\vec x that get sent to 0\vec 0 when multiplied by AA.

Sometimes, the null space is also called the kernel of AA, though we will mostly avoid that term in our class, since kernel often means something different in the context of machine learning.

Important: nullsp(A)\text{nullsp}(A) is a subspace of Rd\mathbb{R}^{\boxed{d}}, since it is made up of vectors that get multiplied by AA, an n×dn \times d matrix.

Let’s return to the example AA from earlier.

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

nullsp(A)\text{nullsp}(A) is the set of all vectors xR3\vec x \in \mathbb{R}^3 where Ax=0A \vec x = \vec 0. rank(A)=2\text{rank}(A) = 2, so there will be some non-zero vectors x\vec x in the null space. But what are they?

To find them, we need to find the general solution to

[532011341624101]A[x1x2x3]x=[00000]0\underbrace{\begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}}_{\vec x} = \underbrace{\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}}_{\vec 0}

We can write this as the system of equations

5x1+3x2+2x3=0x2+x3=03x1+4x2x3=06x1+2x2+4x3=0x1+x3=0\begin{align} 5x_1 &+ 3x_2 &+ 2x_3 &= 0 \tag{1} \\ &- x_2 &+ x_3 &= 0 \tag{2} \\ 3x_1 &+ 4x_2 &- x_3 &= 0 \tag{3} \\ 6x_1 &+ 2x_2 &+ 4x_3 &= 0 \tag{4} \\ x_1 & &+ x_3 &= 0 \tag{5} \end{align}

Equation (2) tells us x2=x3x_2 = x_3, and equation (5) tells us x1=x3x_1 = -x_3. So, any vector in nullsp(A)\text{nullsp}(A) is of the form

[x1x2x3]=[x3x3x3]=x3[111],x3R\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} -x_3 \\ x_3 \\ x_3 \end{bmatrix} = x_3 \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix}, \quad x_3 \in \mathbb{R}

So,

nullsp(A)=span({[111]})\text{nullsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix} \right\} \right)

For example, [555]nullsp(A)\begin{bmatrix} -5 \\ 5 \\ 5 \end{bmatrix} \in \text{nullsp}(A). Remember that vectors in nullsp(A)\text{nullsp}(A) are in R3\mathbb{R}^3, since AA is a 5×35 \times 3 matrix, while vectors in colsp(A)\text{colsp}(A) are in R5\mathbb{R}^5.

In this example, nullsp(A)\text{nullsp}(A) is a 1-dimensional subspace of R3\mathbb{R}^3. We also know from earlier that rank(A)=2\text{rank}(A) = 2. And curiously, 1+2=31 + 2 = 3, the number of columns in AA. This is not a coincidence, and sheds light on an important theorem.

The proof of this theorem is beyond the scope of our course. But, this is such an important theorem that it’s sometimes called the fundamental theorem of linear algebra. It tells us, for one, that the dimension of the null space is equal to the number of columns minus the rank. “Nullity” is just another word for the dimension of the null space.

Let’s see how it can be used in practice. Some of these examples are taken from Gilbert Strang’s book.

Example: Linearly Independent Columns

Let A=[3193063211]A = \begin{bmatrix} 3 & 1 \\ 9 & -3 \\ 0 & 6 \\ 3 & 2 \\ 1 & 1 \end{bmatrix}.

What is nullsp(A)\text{nullsp}(A)?

Example: Describing Spaces

Describe the column space, row space, and null space of the matrix

A=[123246]A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix}

Example: Thinking Abstractly

Suppose AA is a 3×43 \times 4 matrix with rank 3. Describe colsp(A)\text{colsp}(A) and nullsp(AT)\text{nullsp}(A^T). The latter is the null space of ATA^T, and is sometimes called the left null space of AA, as it’s a null space of AA when multiplied by vectors on the left, like in yTA\vec y^T A (which performs the same calculation as ATyA^T \vec y; the results are just transposed).

Example: Thinking Even More Abstractly

If AA is a 7×97 \times 9 matrix with rank 5, find the dimensions of each of the following.

  1. colsp(A)\text{colsp}(A)

  2. colsp(AT)\text{colsp}(A^T)

  3. nullsp(A)\text{nullsp}(A)

  4. nullsp(AT)\text{nullsp}(A^T)

Example: Row Space and Null Space

Suppose rcolsp(AT)\vec r \in \text{colsp}(A^T) and nnullsp(A)\vec n \in \text{nullsp}(A), meaning r\vec r is in the row space of AA and n\vec n is in the null space of AA.

Prove that r\vec r and n\vec n must be orthogonal. Often, this is phrased as “the row space and null space are orthogonal complements”.

Example: Rank of ABAB vs. Rank of AA or BB

Suppose AA is an n×dn \times d matrix, and BB is a d×pd \times p matrix. Explain why the column space of ABAB is a subset of the column space of AA, and the row space of ABAB is a subset of the row space of BB. What does this imply about the rank of ABAB?

Example: Rank of XTXX^TX

Prove that rank(XTX)=rank(X)\text{rank}(X^T X) = \text{rank}(X) for any n×dn \times d matrix XX.

The matrix XTXX^TX is hugely important for our regression problem, and you’ll also see in Homework 5 that it helps define the covariance matrix of our data.

Summary

TODO: Make the relationships between column space, null space, row space, and left null space more prominent.

With column and row space in place, we now define the null space and derive the rank-nullity relationship.


Matrix Decompositions

In the coming chapters, we’ll spend a lot of our time decomposing matrices into smaller pieces, each of which gives us some insight into the data stored in the matrix. This is not a new concept: in earlier math courses, you’ve learned to write a number as a product of prime factors, and to factor quadratics like x25x+6x^2 - 5x + 6 into (x2)(x3)(x-2)(x-3).

The “ultimate” decomposition for the purposes of machine learning is the singular value decomposition (SVD), which decomposes a matrix into the product of three other matrices.

X=UΣVTX = U \Sigma V^T

where UU and VV are orthogonal matrices and Σ\Sigma is a diagonal matrix. This decomposition will allow us to solve the dimensionality reduction problem we first alluded to in Chapter 1.1.

We’re not yet ready for that. For now, we’ll introduce a decomposition that ties together ideas from this section, and allows us to prove the fact that the number of linearly independent columns of AA is equal to the number of linearly independent rows of AA, i.e. that rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T).

CR Decomposition

Suppose AA is an n×dn \times d matrix with rank rr. This tells us that AA has rr linearly independent columns (and rows), and the remaining drd - r columns can be written as linear combinations of the rr “good” columns.

Let’s continue thinking about

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

Recall, rank(A)=2\text{rank}(A) = 2, and

column 3=column 1column 2\text{column 3} = \text{column 1} - \text{column 2}

Define the matrix CC as containing just the linearly independent columns of AA, i.e. CC’s columns are a basis for colsp(A)\text{colsp}(A). Notice that CC is a 5×25 \times 2 matrix; its number of columns is equal to the rank of AA.

C=[5301346210]C = \begin{bmatrix} 5 & 3 \\ 0 & -1 \\ 3 & 4 \\ 6 & 2 \\ 1 & 0 \end{bmatrix}

I’d like to produce a “mixing” matrix RR such that A=CRA = CR. Here, RR will tell us how to “mix” the columns of CC (which are linearly independent) to produce the columns of AA. Since AA is 5×35 \times 3 and CC is 5×25 \times 2, RR must be 2×32 \times 3 in order for the dimensions to work out.

  • Column 1 of AA is just 1(column 1 of C)1 (\text{column 1 of } C)

  • Column 2 of AA is just 1(column 2 of C)1 (\text{column 2 of } C)

  • Column 3 of AA is 1(column 1 of C)1(column 2 of C){\color{orange}1} (\text{column 1 of } C) - {\color{orange}1} (\text{column 2 of } C)

So, RR must be

R=[101011]R = \begin{bmatrix} 1 & 0 & {\color{orange}1} \\ 0 & 1 & {\color{orange}-1} \end{bmatrix}

So, the CR decomposition of AA is

A=[532011341624101]A=[5301346210]C[101011]RA = \underbrace{\begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}}_{A} = \underbrace{\begin{bmatrix} 5 & 3 \\ 0 & -1 \\ 3 & 4 \\ 6 & 2 \\ 1 & 0 \end{bmatrix}}_{C} \underbrace{\begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & -1 \end{bmatrix}}_{R}

Python can tell us quickly that we did the decomposition correctly.

C = np.array([[5, 3],
              [0, -1],
              [3, 4],
              [6, 2],
              [1, 0]])

R = np.array([[1, 0, 1],
              [0, 1, -1]])

C @ R
array([[ 5, 3, 2], [ 0, -1, 1], [ 3, 4, -1], [ 6, 2, 4], [ 1, 0, 1]])

By definition, CC’s columns are a basis for the column space of AA, colsp(A)\text{colsp}(A), since CC’s columns are linearly independent and span colsp(A)\text{colsp}(A).

But, there’s another fact hiding in plain sight: RR’s rows are a basis for the row space of AA, colsp(AT)\text{colsp}(A^T)! The row space of AA is the set of all linear combinations of AA’s rows, which is a subspace of R3\mathbb{R}^3, and any vector in that subspace can be written as a linear combination of RR’s two rows, [101]\begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} and [011]\begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix}. (In fact, the two rows of RR happen to be rows in AA, though this is not always guaranteed.)

To be precise, a CR decomposition of AA consists of a matrix CC of shape n×rn \times r that contains AA’s linearly independent columns, and a matrix RR of shape r×dr \times d that contains the coefficients needed to write each column of AA as a linear combination of the columns of CC. As we’re seeing above, in addition to CC’s columns being linearly independent, RR’s rows are also linearly independent. I won’t prove this fact here, but it’s a good thought exercise to think about why it must be true.

Notice that I’ve said a CR decomposition, not the CR decomposition, because this decomposition is not unique. Think about why the following is also a CR decomposition of AA.

C = np.array([[5, 2],
              [0, 1],
              [3, -1],
              [6, 4],
              [1, 1]])

R = np.array([[1, 1, 0],
              [0, -1, 1]])

C @ R
array([[ 5, 3, 2], [ 0, -1, 1], [ 3, 4, -1], [ 6, 2, 4], [ 1, 0, 1]])

You can think of rank(A)\text{rank}(A) as being the minimum possible number of columns in CC to make A=CRA = CR possible.

Here, since rank(A)=2\text{rank}(A) = 2, the minimum possible number of columns in CC is 2; no CC with just a single column would allow for A=CRA = CR to work, and while a CC with 3 columns would work, 2 is the minimum number of possible columns.

Why is rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T)?

The reason I’ve introduced the CR decomposition is that it will allow us to see why rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T) for any matrix AA, i.e. that the number of linearly independent columns of AA is equal to the number of linearly independent rows of AA.

For now, let’s suppose we don’t know that this is true, and just interpret rank(A)\text{rank}(A) as the number of linearly independent columns of AA.

Suppose, as we typically do, that AA is an n×dn \times d matrix with rank rr, and that A=CRA = CR is a CR decomposition of AA into matrices

  • CC (with shape n×rn \times r), and

  • RR (with shape r×dr \times d)

As mentioned above, CC’s columns are linearly independent, and RR’s rows are linearly independent.

What happens if we transpose A=CRA = CR? We get

AT=(CR)T=RTCTA^T = (CR)^T = R^T C^T

This tells us that ATA^T can be written as the product of RTR^T (shape d×rd \times r) and CTC^T (shape r×nr \times n).

Key insight: this is just a CR decomposition of ATA^T, using RTR^T and CTC^T instead of CC and RR!

AT=[503613142021141]AT=[100111]RT[5036131420]CTA^T = \underbrace{\begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \\ 2 & 1 & -1 & 4 & 1 \end{bmatrix}}_{A^T} = \underbrace{\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & -1 \end{bmatrix}}_{R^T} \underbrace{\begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \end{bmatrix}}_{C^T}

The columns of ATA^T are just the rows of AA, and the columns of RTR^T are just the rows of RR. Since we knew that the rows of RR were linearly independent (using my earlier assumption), we know that the columns of RTR^T are also linearly independent, which makes this a CR decomposition of ATA^T.

Because ATA^T has a CR decomposition using a matrix with rr linearly independent columns (RTR^T), it must have a rank of rr. But since rank(A)=r\text{rank}(A) = r, we must have

rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T)

I skipped some minor steps in this proof, but the main idea is that the CR decomposition A=CRA = CR gives us two things:

  1. A basis for the column space of AA, stored in CC’s columns

  2. A basis for the row space of AA, stored in RR’s rows

Remember that CC is carefully constructed to contain AA’s linearly independent columns; any arbitrary decomposition of A=XYA = XY might not have these properties.