5.4. Null Space and the Rank-Nullity Theorem

Null Space¶

Sometimes, the null space is also called the kernel of $A$ , though we will mostly avoid that term in our class, since kernel often means something different in the context of machine learning.

Important: $\text{nullsp}(A)$ is a subspace of $\mathbb{R}^{\boxed{d}}$ , since it is made up of vectors that get multiplied by $A$ , an $n \times d$ matrix. Vectors in $\text{nullsp}(A)$ are in $\mathbb{R}^d$ , while vectors in $\text{colsp}(A)$ are in $\mathbb{R}^n$ .

Let’s return to the example $A$ from earlier.

A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

$\text{nullsp}(A)$ is the set of all vectors $\vec x \in \mathbb{R}^3$ where $A \vec x = \vec 0$ . $\text{rank}(A) = 2$ , which is less than 3 (the number of columns of $A$ ), so there will be some non-zero vectors $\vec x$ in the null space. But what are they?

The Systems of Equations Approach¶

To find them, we need to find the general solution to

\underbrace{\begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}}_{\vec x} = \underbrace{\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}}_{\vec 0}

One way to proceed is to solve the system of equations

\begin{align} 5x_1 &+ 3x_2 &+ 2x_3 &= 0 \tag{1} \\ &- x_2 &+ x_3 &= 0 \tag{2} \\ 3x_1 &+ 4x_2 &- x_3 &= 0 \tag{3} \\ 6x_1 &+ 2x_2 &+ 4x_3 &= 0 \tag{4} \\ x_1 & &+ x_3 &= 0 \tag{5} \end{align}

Equation (2) tells us $x_2 = x_3$ , and equation (5) tells us $x_1 = -x_3$ . So, any vector in $\text{nullsp}(A)$ is of the form

\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} -x_3 \\ x_3 \\ x_3 \end{bmatrix} = x_3 \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix}, \quad x_3 \in \mathbb{R}

So,

\text{nullsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix} \right\} \right)

For example, $\begin{bmatrix} -5 \\ 5 \\ 5 \end{bmatrix} \in \text{nullsp}(A)$ . Remember that vectors in $\text{nullsp}(A)$ are in $\mathbb{R}^3$ , since $A$ is a $5 \times 3$ matrix, while vectors in $\text{colsp}(A)$ are in $\mathbb{R}^5$ .

The Shortcut¶

There’s a shortcut to finding the null space of a matrix, which works if you happen to already know the relationship between the columns of $A$ , or if the relationship is simple enough to eyeball. In Chapter 5.3, we saw that

\text{column 3} = \text{column 1} - \text{column 2}

or, in other words,

\text{column 1} - \text{column 2} - \text{column 3} = \vec 0

This, right away, tells us that taking 1 of column 1, subtracting 1 of column 2, and subtracting 1 of column 3 will give us the zero vector. But $A \vec x$ is just a linear combination of the columns in $A$ using the coefficients in $\vec x$ , so this means that

A \begin{bmatrix} 1 \\ -1 \\ -1 \end{bmatrix} = \vec 0

So, we’ve found a vector in $\text{nullsp}(A)$ . Any scalar multiple of this vector will also be in $\text{nullsp}(A)$ .

In this example, $\text{nullsp}(A)$ is a 1-dimensional subspace of $\mathbb{R}^3$ . We also know from earlier that $\text{rank}(A) = 2$ . And curiously, $1 + 2 = 3$ , the number of columns in $A$ . This is not a coincidence, and sheds light on an important theorem.

Rank-Nullity Theorem¶

The proof of this theorem is beyond the scope of our course. But, this is such an important theorem that it’s sometimes called the fundamental theorem of linear algebra. It tells us, for one, that the dimension of the null space is equal to the number of columns minus the rank. “Nullity” is just another word for the dimension of the null space.

Let’s see how it can be used in practice. Some of these examples are taken from Gilbert Strang’s book.

Examples¶

Example: Linearly Independent Columns¶

Let $A = \begin{bmatrix} 3 & 1 \\ 9 & -3 \\ 0 & 6 \\ 3 & 2 \\ 1 & 1 \end{bmatrix}$ .

What is $\text{nullsp}(A)$ ?

Solution

Since $A$ ’s two columns are linearly independent, $\text{rank}(A) = 2$ , and by the rank-nullity theorem, $\text{dim}(\text{nullsp}(A)) = 2 - 2 = 0$ .

So, $\text{nullsp}(A) = \{ \vec 0 \}$ , meaning that the only vector in the null space of $A$ is the zero vector. No other vector $\vec x$ satisfies $A \vec x = \vec 0$ .

Example: Describing Spaces¶

Describe the column space, row space, and null space of the matrix

A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix}

Additionally, find a basis for the null space.

Solution

$A$ only has one linearly independent column; columns 2 and 3 are both just multiples of column 1. So, $\text{rank}(A) = 1$ .

The column space of $A$ is the span of the first column, i.e.

\text{colsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} 1 \\ 2 \end{bmatrix} \right\} \right)

This is a 1-dimensional subspace of $\mathbb{R}^2$ ; 1 comes from the rank of $A$ , and 2 comes from the fact that each column of $A$ is a vector in $\mathbb{R}^2$ .

The row space of $A$ is the span of the first row, i.e.

\text{colsp}(A^T) = \text{span}\left( \left\{ \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \right\} \right)

This is a 1-dimensional subspace of $\mathbb{R}^3$ . Remember that the number of linearly independent columns and rows of a matrix are always the same, which is why we know the dimensions of the column space and row space are the same.

The null space of $A$ is the set of all vectors $\vec x \in \mathbb{R}^3$ where $A \vec x = \vec 0$ . This is a 2-dimensional subspace of $\mathbb{R}^3$ . 2 came from the rank-nullity theorem, which says that the dimension of the null space is equal to the number of columns minus the rank, or

3 - 1 = 2

here.

Can we say more about the null space? It is the set of all vectors $\vec x \in \mathbb{R}^3$ where $A \vec x = \vec 0$ . This is equivalent to the set of all vectors $\vec x = \begin{bmatrix} x \\ y \\ z \end{bmatrix}$ where

x + 2y + 3z = 0 \\ 2x + 4y + 6z = 0

The two equations above are equivalent. So, the null space is the set of all vectors $\vec x = \begin{bmatrix} x \\ y \\ z \end{bmatrix}$ where $x + 2y + 3z = 0$ . This is a plane in $\mathbb{R}^3$ , as we’d expect from a 2-dimensional subspace.

Can we find a basis for the null space? Yes, we can. My preferred method is to leverage the fact that we know the relationship between the columns of $A$ :

\text{column 2} = 2 (\text{column 1}) \implies 2 (\text{column 1}) - \text{column 2} = \vec 0

\text{column 3} = 3 (\text{column 1}) \implies 3 (\text{column 1}) - \text{column 3} = \vec 0

So, this tells us that

A \begin{bmatrix} 2 \\ -1 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ \end{bmatrix} \begin{bmatrix} 2 \\ -1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

and

A \begin{bmatrix} 3 \\ -1 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ \end{bmatrix} \begin{bmatrix} 3 \\ -1 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

So, we’ve found two linearly independent vectors in the null space; since $\text{nullsp}(A)$ is 2-dimensional, these must be a basis.

\text{nullsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} 2 \\ -1 \\ 0 \end{bmatrix}, \begin{bmatrix} 3 \\ -1 \\ 1 \end{bmatrix} \right\} \right)

Example: Thinking Abstractly¶

Suppose $A$ is a $3 \times 4$ matrix with rank 3. Describe $\text{colsp}(A)$ and $\text{nullsp}(A^T)$ . The latter is the null space of $A^T$ , and is sometimes called the left null space of $A$ , as it’s a null space of $A$ when multiplied by vectors on the left, like in $\vec y^T A$ (which performs the same calculation as $A^T \vec y$ ; the results are just transposed).

Solution

When faced with a problem like this, I like drawing out a rectangle that roughly shows me the dimensions of the matrix.

A = \begin{bmatrix} \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{bmatrix}

The rank tells us that there are 3 linearly independent columns. Since the columns themselves are in $\mathbb{R}^3$ , this must mean that

\text{colsp}(A) = \text{all of } \mathbb{R}^3

since any 3 linearly independent vectors in $\mathbb{R}^3$ will span the entire space. (The existence of the 4th linearly dependent column doesn’t change this fact, it just means that the linear combinations of the four columns won’t be unique.)

The null space of $A^T$ is the set of all vectors $\vec y$ where

\vec A^T \vec y = \begin{bmatrix} \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}

$\text{nullsp}(A^T)$ is a collection of vectors in $\mathbb{R}^3$ . What is the dimension of this space? Rank-nullity tells us that

\text{rank}(A) + \text{dim}(\text{nullsp}(A)) = \text{number of columns of } A

We can’t use this directly, since the matrix we’re dealing with is $A^T$ , not $A$ . So, replacing $A$ with $A^T$ in the equation above, we get

\text{rank}(A^T) + \text{dim}(\text{nullsp}(A^T)) = \text{number of columns of } A^T

But, $\text{rank}(A^T) = \text{rank}(A)$ , and the number of columns of $A^T$ is the same as the number of rows of $A$ . So,

\text{rank}(A) + \text{dim}(\text{nullsp}(A^T)) = \text{number of rows of } A \\ \text{dim}(\text{nullsp}(A^T)) = \text{number of rows of } A - \text{rank}(A)

But since $A$ has 3 rows and a rank of 3, we know that

\text{dim}(\text{nullsp}(A^T)) = 3 - 3 = 0

So, $\text{nullsp}(A^T)$ is a 0-dimensional subspace of $\mathbb{R}^3$ , meaning it only contains the zero vector.

\text{nullsp}(A^T) = \{ \vec 0 \}

More intuitively, using the results of the previous example, we know that $A^T$ ’s columns are linearly independent (since there are 3 of them and $\text{rank}(A^T) = 3$ ). So, the only vector in $\text{nullsp}(A^T)$ is the zero vector.

Example: Thinking Even More Abstractly¶

If $A$ is a $7 \times 9$ matrix with rank 5, find the dimensions of each of the following.

$\text{colsp}(A)$
$\text{colsp}(A^T)$
$\text{nullsp}(A)$
$\text{nullsp}(A^T)$

Solution

A = \begin{bmatrix} \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \end{bmatrix}

\text{dim}(\text{colsp}(A)) = \text{rank}(A) = 5

So, $\text{colsp}(A)$ is a 5-dimensional subspace of $\mathbb{R}^7$ .

\text{dim}(\text{colsp}(A^T)) = \text{rank}(A^T) = \text{rank}(A) = 5

So, $\text{colsp}(A^T)$ is a 5-dimensional subspace of $\mathbb{R}^9$ .

\text{dim}(\text{nullsp}(A)) = \underbrace{9 - \text{rank}(A)}_{\text{rank-nullity theorem}} = 9 - 5 = 4

So, $\text{nullsp}(A)$ is a 4-dimensional subspace of $\mathbb{R}^9$ .

\text{dim}(\text{nullsp}(A^T)) = \underbrace{7 - \text{rank}(A)}_{\text{rank-nullity theorem applied to } A^T} = 7 - 5 = 2

So, $\text{nullsp}(A^T)$ is a 2-dimensional subspace of $\mathbb{R}^7$ .

Example: Constructing a Matrix¶

Find a matrix $A$ such that $\begin{bmatrix} 3 \\ 1 \end{bmatrix} \in \text{colsp}(A)$ and $\begin{bmatrix} 1 \\ 3 \end{bmatrix} \in \text{nullsp}(A)$ .

Solution

$A$ needs to satisfy both of the following conditions.

\begin{bmatrix} 3 \\ 1 \end{bmatrix} \in \text{colsp}(A) \qquad \begin{bmatrix} 1 \\ 3 \end{bmatrix} \in \text{nullsp}(A)

Since $A$ can be multiplied by a vector in $\mathbb{R}^2$ , and elements in $\text{colsp}(A)$ are in $\mathbb{R}^2$ , we know that $A$ must be a $2 \times 2$ matrix. So, let

A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}

The second condition gives

A \begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} a + 3b \\ c + 3d \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

which implies

a = -3b \qquad c = -3d

Hence

A = \begin{bmatrix} -3b & b \\ -3d & d \end{bmatrix}

$A$ ’s columns are scalar multiples of one another, and are not linearly independent. This is what we’d expect, given that $A$ has a non-trivial null space. (Remember, the trivial null space is just $\{ \vec 0 \}$ ; since $A$ ’s null space has more than just $\vec 0$ in it, its columns are not linearly independent.)

Now, we just need to make sure we pick $b$ and $d$ such that $\begin{bmatrix} 3 \\ 1 \end{bmatrix} \in \text{colsp}(A)$ . The easy solution is to choose $b = 3$ and $d = 1$ , which makes $\begin{bmatrix} 3 \\ 1 \end{bmatrix}$ literally one of the columns of $A$ , which means it must be in $\text{colsp}(A)$ . This gives

A = \begin{bmatrix} -9 & 3 \\ -3 & 1 \end{bmatrix}

This is not the only solution; another one came from $b = 6$ and $d = 2$ , which would have made

A = \begin{bmatrix} -18 & 6 \\ -6 & 2 \end{bmatrix}

In both (and in the infinitely many other) cases, $\begin{bmatrix} 3 \\ 1 \end{bmatrix} \in \text{colsp}(A)$ and $\begin{bmatrix} 1 \\ 3 \end{bmatrix} \in \text{nullsp}(A)$ .

Example: Rank of $AB$ vs. Rank of $A$ or $B$ ¶

Suppose $A$ is an $n \times d$ matrix, and $B$ is a $d \times p$ matrix. Explain why the column space of $AB$ is a subset of the column space of $A$ , and the row space of $AB$ is a subset of the row space of $B$ . What does this imply about the rank of $AB$ ?

Solution

To show that $\text{colsp}(AB)$ is a subset of $\text{colsp}(A)$ , i.e.

\text{colsp}(AB) \subseteq \text{colsp}(A)

we need to show that any vector in $\text{colsp}(AB)$ is also in $\text{colsp}(A)$ . $AB$ is a matrix of shape $n \times p$ , so to multiply it by a vector $\vec x$ , $\vec x$ must be in $\mathbb{R}^p$ .

Suppose $\vec y \in \mathbb{R}^n$ is in $\text{colsp}(AB)$ . Then, $\vec y$ can be written as

\vec y = AB \vec x

for some $\vec x \in \mathbb{R}^p$ .

But,

\vec y = AB\vec x = A \underbrace{(B \vec x)}_{\text{vector in } \mathbb{R}^d}

meaning that $\vec y$ is also a linear combination of the columns of $A$ (since we wrote it as $A \vec z$ , for some vector $\vec z \in \mathbb{R}^d$ ).

So, we’ve shown that if $\vec y$ is in $\text{colsp}(AB)$ , then $\vec y$ is also in $\text{colsp}(A)$ . Therefore, $\text{colsp}(AB) \subseteq \text{colsp}(A)$ . This tells us that $\text{rank}(AB) \leq \text{rank}(A)$ , since the rank of a matrix is the dimension of its column space.

Using similar logic, any vector in $\text{rowsp}(AB) = \text{colsp}((AB)^T) = \text{colsp}(B^T A^T)$ is of the form

B^T A^T \vec x

But, $B^T A^T \vec x$ is of the form $B^T \vec y$ for some $\vec y \in \mathbb{R}^d$ , meaning that $B^TA^T \vec x$ is in the column space of $B^T$ or row space of $B$ . So, $\text{rowsp}(AB) \subseteq \text{rowsp}(B)$ , meaning $\text{rank}(AB) \leq \text{rank}(B)$ .

Putting these two results together, we have that

\text{rank}(AB) \leq \text{rank}(A) \text{ and } \text{rank}(AB) \leq \text{rank}(B)

But, since both must be true, then

\text{rank}(AB) \leq \min(\text{rank}(A), \text{rank}(B))

So intuitively, when we multiply two matrices, the rank of the resulting matrix can’t be greater than the rank of either of the two matrices we started with, but it can “drop”.

Example: Orthogonal Complements 🎥¶

Suppose $\vec r \in \text{colsp}(A^T)$ and $\vec n \in \text{nullsp}(A)$ , meaning $\vec r$ is in the row space of $A$ and $\vec n$ is in the null space of $A$ .

Prove that $\vec r$ and $\vec n$ must be orthogonal.

Solution

The row space of $A$ , $\text{colsp}(A^T)$ , is the set of all vectors $\vec r$ where $\vec r = A^T \vec y$ for some $\vec y \in \mathbb{R}^n$ . Note that if $A$ is an $n \times d$ matrix, then $A^T$ is a $d \times n$ matrix, and $\vec r$ is in $\mathbb{R}^d$ .

The null space of $A$ , $\text{nullsp}(A)$ , is the set of all vectors $\vec n$ where $A \vec n = \vec 0$ . Note that if $A$ is an $n \times d$ matrix, then $\vec n$ is in $\mathbb{R}^d$ .

So, $\vec r$ and $\vec n$ are both in $\mathbb{R}^d$ , which means they exist in the same universe (they have the same number of components), and so we can ask if they’re orthogonal. (If they had different numbers of components, this question would be a non-starter.)

In order to show that they’re orthogonal, we need to show that their dot product is 0.

\begin{align*} \vec r \cdot \vec n &= (A^T \vec y) \cdot \vec n \\ &= \underbrace{(A^T \vec y)^T \vec n}_{\vec u \cdot \vec v = \vec u^T \vec v} \\ &= y^T \underbrace{A \vec n}_{\vec 0} \\ &= y^T \vec 0 \\ &= 0 \end{align*}

So, every vector in the row space of $A$ is orthogonal to every vector in the null space of $A$ !

The row space and null space are orthogonal complements!

Above, we proved that the row space and null space are orthogonal complements, in $\mathbb{R}^d$ . This means that every element in the row space is orthogonal to every element in the null space. The concept of an orthogonal complement was first introduced in Chapter 4.3.

It is also true that the column space and left null space are orthogonal complements, in $\mathbb{R}^n$ . Meaning, if

\vec x \in \text{colsp}(A), \qquad \vec y \in \text{nullsp}(A^T)

then it must be the case that $\vec x \cdot \vec y = 0$ .

To summarize:

The row space, $\text{colsp}(A^T)$ , and null space, $\text{nullsp}(A)$ , are orthogonal complements; both are subspaces of $\mathbb{R}^d$ .
The column space, $\text{colsp}(A)$ , and left null space, $\text{nullsp}(A^T)$ , are orthogonal complements; both are subspaces of $\mathbb{R}^n$ .

Example: Rank of $X^TX$ 🎥¶

Prove that $\text{rank}(X^T X) = \text{rank}(X)$ for any $n \times d$ matrix $X$ .

The matrix $X^TX$ is hugely important for our regression problem, and you’ll also see in a homework that it helps define the covariance matrix of our data.

Solution

First, let’s think about the shape of $X^TX$ . If $X$ is an $n \times d$ matrix, then $X^T$ is a $d \times n$ matrix, and $X^TX$ is a $d \times d$ matrix. So, $X$ and $X^TX$ have the same number of columns ( $d$ ), but $X^TX$ is also square, which $X$ doesn’t have to be.

The way we’ll proceed is to show that both matrices have the same null space. If we can show they both have the same null space, then the dimensions of both null spaces must be the same. Since the rank-nullity theorem tells us that

\text{rank}(A) + \text{dim}(\text{nullsp}(A)) = \text{number of columns of } A

if we can show that $\text{dim}(\text{nullsp}(X^TX)) = \text{dim}(\text{nullsp}(X))$ , then we’ll have shown that $\text{rank}(X^TX) = \text{rank}(X)$ , since both $X$ and $X^TX$ have the same number of columns.

To show that both $X$ and $X^TX$ have the same null space, we need to show that any vector $\vec x$ in the null space of $X$ is also in the null space of $X^TX$ , and vice versa.

Part 1: Show that $\vec v \in \text{nullsp}(X) \implies \vec v \in \text{nullsp}(X^TX)$

Suppose $\vec v \in \text{nullsp}(X)$ . Then, $X \vec v = \vec 0$ . Multiplying both sides on the left by $X^T$ , we get

X^T X \vec v = X^T \vec 0 = \vec 0

So, $\vec v$ is also in the null space of $X^TX$ .

(This was the easier part.)

Part 2: Show that $\vec v \in \text{nullsp}(X^TX) \implies \vec v \in \text{nullsp}(X)$

Suppose $\vec v \in \text{nullsp}(X^TX)$ . Then, $X^TX \vec v = \vec 0$ . It’s not immediately clear how to use this to show that $\vec v$ is in the null space of $X$ , so let’s try something different.

What if we left-multiply both sides of the equation by $\vec v^T$ ? This is equivalent to taking the dot product of both sides with $\vec v$ .

X^TX \vec v = \vec 0 \\ \vec v^T X^TX \vec v = \vec v^T \vec 0 = 0

Okay, so $\vec v^T X^TX \vec v = 0$ . What does this tell us? If we look closely, the left-hand side is really just $(X \vec v)^T X \vec v$ , which is also just $(X \vec v) \cdot (X \vec v)$ , which we also know is equal to $\lVert X \vec v \rVert^2$ .

So, we have

\lVert X \vec v \rVert^2 = 0

But, the only vector with a norm of 0 is the zero vector. So, $X \vec v = \vec 0$ . So, we’ve now shown that if $X^TX \vec v = \vec 0$ , then it has to be the case that $X \vec v = \vec 0$ also.

Now that we’ve shown both directions, we’ve shown that $\text{nullsp}(X^TX) = \text{nullsp}(X)$ , because any vector in one of these sets is also in the other set, and so the two sets must be equal.

Summary¶

Matrix Decompositions¶

In the coming chapters, we’ll spend a lot of our time decomposing matrices into smaller pieces, each of which gives us some insight into the data stored in the matrix. This is not a new concept: in earlier math courses, you’ve learned to write a number as a product of prime factors, and to factor quadratics like $x^2 - 5x + 6$ into $(x-2)(x-3)$ .

The “ultimate” decomposition for the purposes of machine learning is the singular value decomposition (SVD), which decomposes a matrix into the product of three other matrices.

X = U \Sigma V^T

where $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix. This decomposition will allow us to solve the dimensionality reduction problem we first alluded to in Chapter 1.1.

We’re not yet ready for that; the SVD will be the final topic we study in this course. For now, we’ll introduce a decomposition that ties together ideas from this section, and allows us to understand why the number of linearly independent columns of $A$ is equal to the number of linearly independent rows of $A$ , i.e. that $\text{rank}(A) = \text{rank}(A^T)$ .

CR Decomposition¶

Suppose $A$ is an $n \times d$ matrix with rank $r$ . This tells us that $A$ has $r$ linearly independent columns, and the remaining $d - r$ columns can be written as linear combinations of the $r$ “good” columns.

Let’s continue thinking about

A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

Recall, $\text{rank}(A) = 2$ , and

\text{column 3} = \text{column 1} - \text{column 2}

Define the matrix $C$ as containing just the linearly independent columns of $A$ when taken from left to right, i.e. the columns that are obtained by running the algorithm in Chapter 4.2.

given v_1, v_2, ..., v_d # columns of A
initialize linearly independent set S = {v_1}
for i = 2 to d:
    if v_i is not a linear combination of S:
        add v_i to S

This means that $C$ ’s columns are a basis for $\text{colsp}(A)$ . Notice that $C$ is a $5 \times 2$ matrix; its number of columns is equal to the rank of $A$ .

C = \begin{bmatrix} 5 & 3 \\ 0 & -1 \\ 3 & 4 \\ 6 & 2 \\ 1 & 0 \end{bmatrix}

I’d like to produce a matrix $R$ such that $A = CR$ . $R$ will tell us how to “mix” the columns of $C$ (which are linearly independent) to produce the columns of $A$ . Since $A$ is $5 \times 3$ and $C$ is $5 \times 2$ , $R$ must be $2 \times 3$ in order for the dimensions to work out.

Column 1 of $A$ is just $1 (\text{column 1 of } C)$
Column 2 of $A$ is just $1 (\text{column 2 of } C)$
Column 3 of $A$ is ${\color{orange}1} (\text{column 1 of } C) - {\color{orange}1} (\text{column 2 of } C)$

So, $R$ must be

R = \begin{bmatrix} 1 & 0 & {\color{orange}1} \\ 0 & 1 & {\color{orange}-1} \end{bmatrix}

So, a CR decomposition of $A$ is

A = \underbrace{\begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}}_{A} = \underbrace{\begin{bmatrix} 5 & 3 \\ 0 & -1 \\ 3 & 4 \\ 6 & 2 \\ 1 & 0 \end{bmatrix}}_{C} \underbrace{\begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & -1 \end{bmatrix}}_{R}

We defined $C$ such that its columns are linearly independent and a basis for $\text{colsp}(A)$ . But, observe: $R$ ’s rows are linearly independent too!

This is because the first two columns of $R$ are $\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$ , which is $I_2$ , the $2 \times 2$ identity matrix. It’s impossible for any scalar multiple of the first row of $R$ to produce the second row of $R$ , because the second row has a non-zero value in a position where the first row has a zero value.

Why is $\text{rank}(A) = \text{rank}(A^T)$ ?¶

Not only are $R$ ’s rows linearly independent, they’re also a basis for the row space of $A$ , $\text{colsp}(A^T)$ . Every row of $A$ can be written as a linear combination of $R$ ’s rows. The way to see why this is the case is to transpose $A = CR$ to get $A^T = (CR)^T = R^T C^T$ .

A = \underbrace{\begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}}_{A} = \underbrace{\begin{bmatrix} 5 & 3 \\ 0 & -1 \\ 3 & 4 \\ 6 & 2 \\ 1 & 0 \end{bmatrix}}_{C} \underbrace{\begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & -1 \end{bmatrix}}_{R}

A^T = \underbrace{\begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \\ 2 & 1 & -1 & 4 & 1 \end{bmatrix}}_{A^T} = \underbrace{\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & -1 \end{bmatrix}}_{R^T} \underbrace{\begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \end{bmatrix}}_{C^T}

The rows of $R$ are the columns of $R^T$ . The above decomposition of $A^T = R^T C^T$ tells us how to mix the rows of $R$ to make the rows of $A$ ! For instance, it tells us that the first row of $A$ is

\underbrace{5 \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} + 3 \begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 5 \\ 3 \\ 2 \end{bmatrix}}_{\text{first row of } A}

To me, this is a bit magical: we started by picking off linearly independent columns and placing them in $C$ , but the $R$ we multiplied $C$ by to recreate $A$ ended up having linearly independent rows, which span the row space.

Sitting in front of us is an argument that the number of linearly independent columns of $A$ is equal to the number of linearly independent rows of $A$ ! In other words, we’ve just argued that

\text{rank}(A) = \text{rank}(A^T)

The gist of it was that if $A = CR$ is a CR decomposition of $A$ , then $A^T = (CR)^T = R^T C^T$ is a CR decomposition of $A^T$ , and both CR decompositions use the same rank, $r$ .

Non-Uniqueness of the CR Decomposition¶

I never formally defined the CR decomposition; I started with an example.

In the first example we saw above, we constructed $C$ by taking the first $r$ linearly independent columns in $A$ to form $C$ , and placing the coefficients needed to write each column of $A$ as a linear combination of the columns of $C$ in $R$ . When constructed this way, $C$ ’s columns are automatically linearly independent, and $R$ ’s rows are linearly independent too (using the identity matrix and positioning of 0’s argument from earlier).

A matrix has infinitely many CR decompositions, because we can choose different bases for $\text{colsp}(A)$ . Both of the following are CR decompositions of our $A$ .

import numpy as np

C = np.array([[5, 2],
              [0, 1],
              [3, -1],
              [6, 4],
              [1, 1]])

R = np.array([[1, 1, 0],
              [0, -1, 1]])

C @ R

array([[ 5,  3,  2],
       [ 0, -1,  1],
       [ 3,  4, -1],
       [ 6,  2,  4],
       [ 1,  0,  1]])

# Original decomposition
C = np.array([[5, 3],
              [0, -1],
              [3, 4],
              [6, 2],
              [1, 0]])

R = np.array([[1, 0, 1],
              [0, 1, -1]])

C @ R

array([[ 5,  3,  2],
       [ 0, -1,  1],
       [ 3,  4, -1],
       [ 6,  2,  4],
       [ 1,  0,  1]])

You can think of $r = \text{rank}(A)$ as being the minimum possible number of columns in $C$ to make $A = CR$ possible, where $C$ is $n \times r$ and $R$ is $r \times d$ .

Here, since $\text{rank}(A) = 2$ , the minimum possible number of columns in $C$ is 2. No $C$ with just a single column would allow for $A = CR$ to work, and while a $C$ with 3 columns would work, 2 is the minimum number of possible columns (and if $C$ had more than 2 columns, $C$ ’s columns would no longer be linearly independent).

What’s the Point?¶

Why did I introduce the CR decomposition? In truth, it’s not used to solve practical problems. Instead, I think it gives us a nice way to understand what the rank of a matrix really means. For one, it gave us one way to see why $\text{rank}(A) = \text{rank}(A^T)$ .

Let me try and get you excited about the CR decomposition with a more practical example. Suppose we have a large $1000 \times 50$ matrix with rank $r = 8$ . Storing the entire matrix requires storing $1000 \times 50 = 50000$ numbers. But, since its rank is 8, there exists a CR decomposition of $A$ into $A = CR$ where $C$ is $1000 \times 8$ and $R$ is $8 \times 50$ . Storing $C$ and $R$ requires storing

\underbrace{1000 \times 8}_{\text{size of } C} + \underbrace{8 \times 50}_{\text{size of } R} = 8400

numbers, far fewer than the 50,000 numbers required to store the entire matrix.

So, the CR decomposition is one way to compress a matrix; here, it relied on the fact that the rank of the matrix was much smaller than its number of columns. Hopefully this gives you some appreciation for why the rank of a matrix is such a fundamental property. Compression – of images, say – will be a recurring theme in Chapter 10, once we eventually make our way to the singular value decomposition.

5.4. Null Space and the Rank-Nullity Theorem

Overview¶

Null Space¶

The Systems of Equations Approach¶

The Shortcut¶

Rank-Nullity Theorem¶

Examples¶

Example: Linearly Independent Columns¶

Example: Describing Spaces¶

Example: Thinking Abstractly¶

Example: Thinking Even More Abstractly¶

Example: Constructing a Matrix¶

Example: Rank of ABABAB vs. Rank of AAA or BBB¶

Example: Orthogonal Complements 🎥¶

Example: Rank of XTXX^TXXTX 🎥¶

Summary¶

Matrix Decompositions¶

CR Decomposition¶

Why is rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T)rank(A)=rank(AT)?¶

Non-Uniqueness of the CR Decomposition¶

What’s the Point?¶

Example: Rank of $AB$ vs. Rank of $A$ or $B$ ¶

Example: Rank of $X^TX$ 🎥¶

Why is $\text{rank}(A) = \text{rank}(A^T)$ ?¶