2.8. Rank

Column Space and Rank¶

Let’s keep working with the matrix $A$ from above.

A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

$A$ is a $5 \times 3$ matrix. It can be thought of as either:

3 vectors in $\mathbb{R}^5$ (the columns of $A$ )
5 vectors in $\mathbb{R}^3$ (the rows of $A$ )

Let’s start with the column perspective. We saw in Chapter 2.7 that if $x \in \mathbb{R}^3$ , then $A \vec x$ is a new vector in $\mathbb{R}^5$ that is a linear combination of the columns of $A$ . For instance, if we take $\color{orange} \vec x = \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix}$ , then

A {\color{orange} \vec x} = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix} {\color{orange} \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix}} = {\color{orange} 2} \begin{bmatrix} 5 \\ 0 \\ 3 \\ 6 \\ 1 \end{bmatrix} + {\color{orange} 0} \begin{bmatrix} 3 \\ -1 \\ 4 \\ 2 \\ 0 \end{bmatrix} - {\color{orange} 1} \begin{bmatrix} 2 \\ 1 \\ -1 \\ 4 \\ 1 \end{bmatrix} = \begin{bmatrix} 8 \\ -1 \\ 7 \\ 8 \\ 1 \end{bmatrix}

By definition, the vector $A \vec x$ is in the span of the columns of $A$ , since it’s just a linear combination of $A$ ’s columns.

Given what we’ve learned in Chapter 2.4 and Chapter 2.6, it’s natural to try and describe the span of $A$ ’s columns.

Definition: Column Space

If $A$ is an $n \times d$ matrix, then the column space of $A$ , denoted $\text{colsp}(A)$ , is the span of the columns of $A$ . Equivalently, it is the set of all possible results of $A \vec x$ for $\vec x \in \mathbb{R}^d$ .

So, if $A$ ’s columns are $\vec a^{(1)}, \vec a^{(2)}, \ldots, \vec a^{(d)}$ , like in

A = \begin{bmatrix} | & | & & | \\ \vec a^{(1)} & \vec a^{(2)} & \ldots & \vec a^{(d)} \\ | & | & & | \end{bmatrix}

then

\underbrace{\text{colsp}(A) = \text{span}\left( \vec a^{(1)}, \vec a^{(2)}, \ldots, \vec a^{(d)} \right) = \{ A \vec x \mid \vec x \in \mathbb{R}^d \}}_{\text{subspace of } \mathbb{R}^n}

Notice that I’ve intentionally chosen not to use subscripts to refer to columns; this is so that when we switch back to focusing on datasets and machine learning, we keep consistent the fact that subscripts refer to different rows/data points, not columns/features.

“Column space” is just a new term for a concept we’re already familiar with: the span of a set of vectors. In the example $A$ we’ve been working with, the column space is

\text{colsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} 5 \\ 0 \\ 3 \\ 6 \\ 1 \end{bmatrix}, \begin{bmatrix} 3 \\ -1 \\ 4 \\ 2 \\ 0 \end{bmatrix}, \begin{bmatrix} 2 \\ 1 \\ -1 \\ 4 \\ 1 \end{bmatrix} \right\} \right)

This column space is a 2-dimensional subspace of $\mathbb{R}^5$ . Why is it 2-dimensional? The last column is a linear combination of the first two columns. Specifically,

\text{column 3} = \text{column 1} - \text{column 2}

Remember that:

the dimension of a subspace is the number of vectors in a basis for the subspace, and
a basis for a subspace is a linearly independent set of vectors that spans the entire subspace.

The first two columns of $A$ alone span the column space, and are linearly independent, and so $\text{dim}(\text{colsp}(A)) = 2$ . This number, 2, is the most important number associated with the matrix $A$ , so much so that we give it a special name.

The rank of a matrix tells us how “large” the space of possible linear combinations of the columns of $A$ is. We care about this because ultimately, our predictions in regression are just linear combinations of the columns of some data matrix.

To get a feel for the idea of rank, let’s work through some examples.

Example: Creating Matrices¶

Find three $3 \times 4$ matrices: one with rank 1, one with rank 2, and one with rank 3. Is it possible to have a $3 \times 4$ matrix with rank 4?

Solution

To create a $3 \times 4$ matrix with rank 1, we need there to only be one linearly independent column. For instance,
$A = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4 \end{bmatrix}$
has rank 1 because all columns are multiples of the first column.
To create a $3 \times 4$ matrix with rank 2, we need there to be two linearly independent columns. One way to construct such a matrix is to make the first two columns linearly independent, and make the last two columns linear combinations of the first two. For instance, in
$A = \begin{bmatrix} 1 & 1 & 2 & 2 \\ 1 & 2 & 2 & 3 \\ 1 & 3 & 2 & 4 \end{bmatrix}$
column 3 is a scalar multiple of column 1, and column 4 is column 1 + column 2.
To create a $3 \times 4$ matrix with rank 3, we need there to be three linearly independent columns, and the fourth column can be anything. One solution is
$A = \begin{bmatrix} 1 & 0 & 0 & 9 \\ 0 & 1 & 0 & 8 \\ 0 & 0 & 1 & 7 \end{bmatrix}$
A $3 \times 4$ matrix cannot have rank 4, because it’s impossible to have four linearly independent vectors in $\mathbb{R}^3$ . Any three linearly independent vectors in $\mathbb{R}^3$ span all of $\mathbb{R}^3$ , so a fourth vector in $\mathbb{R}^3$ would have to be a linear combination of the first three vectors.

Example: $2 \times 2$ Matrices¶

Let

A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}

Find a condition on $a, b, c, d$ that ensures $\text{rank}(A) = 2$ .

Solution

In order for $\text{rank}(A) = 2$ , the columns of $A$ must be linearly independent. This means that the second column cannot be a scalar multiple of the first column.

$\frac{b}{a}$ is the number we multiply $a$ by to get $b$ . So, we just need $\frac{b}{a} \cdot c$ to be different from $d$ .

\frac{b}{a} \cdot c \neq d \implies ad - bc \neq 0

So, if $ad - bc = 0$ , then $\text{rank}(A) = 1$ , and otherwise, $\text{rank}(A) = 2$ .

This expression, $ad - bc$ , is called the determinant of $A$ . We’ll learn more about determinants in Chapter 2.9.

Example: Diagonal Matrices¶

Suppose $d_1, d_2, \ldots, d_n$ are real numbers. What is the rank of the $n \times n$ diagonal matrix

D = \begin{bmatrix} d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_n \end{bmatrix}

if

$d_i = i$ , for $i = 1, 2, \ldots, n$ ?
$d_1 = d_2 = \cdots = d_n = 0$ ?
$k$ of the $d_i$ are equal to 0, and the rest are positive?

Solution

$\text{rank}(D) = n$ : If $d_i = i$ , for $i = 1, 2, \ldots, n$ , meaning $d_1 = 1, d_2 = 2, \ldots, d_n = n$ , then the rank is $n$ , because all columns are linearly independent. None can be written as a linear combination of any others, since they all have their sole non-zero entry in a different row.
$\text{rank}(D) = 0$ : If $d_1 = d_2 = \cdots = d_n = 0$ , then the rank is 0, because all columns are the zero vector.
$\text{rank}(D) = n - k$ : If $k$ of the $d_i$ are equal to 0, and the rest are positive, then the rank is $n - k$ , because we can throw out the zero columns and the remaining columns are linearly independent.

Example: Vector Outer Product¶

Let $\vec u = \begin{bmatrix} 1 \\ -3 \\ 4 \end{bmatrix}$ and $\vec v = \begin{bmatrix} 2 \\ 5 \\ -1 \end{bmatrix}$ .

As we’ve seen before, the dot product $\vec u \cdot \vec v = \vec u^T \vec v$ is a scalar, equal to -17 here.

The outer product of $\vec u$ and $\vec v$ is the matrix

\vec u \vec v^T = \begin{bmatrix} 1 \\ -3 \\ 4 \end{bmatrix} \begin{bmatrix} 2 & 5 & -1 \end{bmatrix} = \begin{bmatrix} 2 & 5 & -1 \\ -6 & -15 & 3 \\ 8 & 20 & -4 \end{bmatrix}

In general, for any two vectors $\vec u, \vec v \in \mathbb{R}^n$ , what is the rank of $\vec u \vec v^T$ ?

Solution

The rank of $\vec u \vec v^T$ is 1, since each column in $\vec u \vec v^T$ is a scalar multiple of $\vec u$ . In the example above, column 2 is $\frac{5}{2}$ times column 1, and column 3 is $-\frac{1}{2}$ times column 1.

In Chapter 5, we’ll see that any rank $r$ matrix can be written as a sum of $r$ rank 1 matrices, each of which is of the form $\vec u \vec v^T$ . So, rank 1 matrices can be thought of as the building blocks of all matrices!

Example: Basis for Column Space¶

Find a basis for the column space of

A = \begin{bmatrix} 3 & 6 & 0 & 9 & 3 \\ 2 & 4 & 0 & 6 & 2 \\ 0 & 0 & 1 & -5 & 0 \\ 1 & 2 & 0 & 3 & 1 \end{bmatrix}

Note that we’ve already seen plenty of problems of this form in earlier homeworks. It’s just that there, the spanning set of vectors was given to you directly, and here, they’re stored as columns in a matrix. The idea is the same.

Solution

Column 2 is a multiple of column 1.
Column 3 is independent from column 1.
Column 4 is 3 times column 1 plus -5 times column 3.
Column 5 is just column 1.

$A$ only has two linearly independent columns – columns 1 and 3 – and so $\text{rank}(A) = 2$ , and a basis for the column space is

\left\{ \begin{bmatrix} 3 \\ 2 \\ 0 \\ 1 \end{bmatrix}, \begin{bmatrix} 0 \\ 0 \\ 1 \\ 0 \end{bmatrix} \right\}

Finding the Rank Using Python¶

Shortly, we’ll learn a new technique for finding the rank of matrix by hand. But for the most part, we’ll not need to do this, and instead can use the power of Python to help us.

Returning to

A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

we have

import numpy as np

A = np.array([[5, 3, 2], 
              [0, -1, 1], 
              [3, 4, -1], 
              [6, 2, 4], 
              [1, 0, 1]])
              
np.linalg.matrix_rank(A)

2

Python makes it easy to experiment with how different operations affect the rank of a matrix.

For instance, later in this section, we’ll prove that the matrix $A^TA$ has the same rank as $A$ , for any $n \times d$ matrix $A$ .

A.T @ A

array([[71, 39, 32],
       [39, 30,  9],
       [32,  9, 23]])

# Same as rank of A from above!
np.linalg.matrix_rank(A.T @ A)

2

Row Space¶

So far, we’ve focused on thinking of a matrix as a collection of “column” vectors written next to each other. This is the more common perspective, since – as I’ve harped on – $A \vec x$ is a linear combination of the columns of $A$ .

But we can also think of a matrix as a collection of “row” vectors written on top of each other.

A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

$A$ contains 5 vectors in $\mathbb{R}^3$ in its rows, each in $\mathbb{R}^3$ . These vectors also have a span, which in this case is a subspace of $\mathbb{R}^3$ .

Definition: Row Space

If $A$ is an $n \times d$ matrix, then the row space of $A$ , denoted $\text{rowsp}(A)$ , is the span of the rows of $A$ . Equivalently, it is the set of all possible results of $A^T \vec y$ for $\vec y \in \mathbb{R}^n$ .

So, if $A$ ’s rows are $\vec a_1, \vec a_2, \ldots, \vec a_n$ , like in

A = \begin{bmatrix} —— & \vec a_1 & —— \\ —— & \vec a_2 & —— \\ & \vdots & \\ —— & \vec a_n & —— \end{bmatrix}

then

\underbrace{\text{rowsp}(A) = \text{span}\left( \vec a_1, \vec a_2, \ldots, \vec a_n \right) = \{ A^T \vec y \mid \vec y \in \mathbb{R}^n \}}_{\text{subspace of } \mathbb{R}^d}

Where did $A^T \vec y$ come from? Remember, $A \vec x$ is a linear combination of the columns of $A$ . If we transpose $A$ , then $A^T \vec y$ is a linear combination of the columns of $A^T$ , which are the rows of $A$ .

\begin{align*} A^T {\color{orange} \vec y} &= \begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \\ 2 & 1 & -1 & 4 & 1 \end{bmatrix} {\color{orange} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \end{bmatrix}} \\ &= \underbrace{{\color{orange} y_1} \begin{bmatrix} 5 \\ 3 \\ 2 \end{bmatrix} + {\color{orange} y_2} \begin{bmatrix} 0 \\ -1 \\ 1 \end{bmatrix} + {\color{orange} y_3} \begin{bmatrix} 3 \\ 4 \\ -1 \end{bmatrix} + {\color{orange} y_4} \begin{bmatrix} 6 \\ 2 \\ 4 \end{bmatrix} + {\color{orange} y_5} \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}}_\text{linear combination of rows of A} \end{align*}

Remember from Chapter 2.7 that $(A^T \vec y)^T = \vec y^T A$ . The product $\vec y^T A$ is also a linear combination of the rows of $A$ ; it just returns a row vector with shape $1 \times d$ rather than a vector with shape $d \times 1$ .

\begin{align*} {\color{orange} \vec y}^T A &= {\color{orange} \begin{bmatrix} y_1 & y_2 & y_3 & y_4 & y_5 \end{bmatrix}} \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix} \\ &= \underbrace{{\color{orange} y_1} \begin{bmatrix} 5 & 3 & 2 \end{bmatrix} + {\color{orange} y_2} \begin{bmatrix} 0 & -1 & 1 \end{bmatrix} + ... + {\color{orange} y_5} \begin{bmatrix} 1 & 0 & 1 \end{bmatrix}}_\text{linear combination of rows of A} \end{align*}

In $\vec y^T A$ , we left-multiplied $A$ by a vector; in $A \vec x$ , we right-multiplied $A$ by a vector. These are distinct types of multiplication, as they involve vectors of different shapes.

Since the columns of $A^T$ are the rows of $A$ , the row space of $A$ is the column space of $A^T$ , meaning

\text{rowsp}(A) = \text{colsp}(A^T)

To avoid carrying around lots of notation, I’ll often just use $\text{colsp}(A^T)$ to refer to the row space of $A$ .

What is the dimension of the row space of $A$ ? Other ways of phrasing this question are:

How many linearly independent rows does $A$ have?
What is the rank of $A^T$ ?

We know the answer can’t be more than 3, since the rows of $A$ are vectors in $\mathbb{R}^3$ . We could use the algorithm first presented in Chapter 2.4 to find a linearly independent set of rows with the same span as all 5 rows.

An easy way to see that $\text{dim}(\text{colsp}(A^T)) = 2$ is to pick out two of the five vectors, $\text{row 2} = \vec a_2 = \begin{bmatrix} 0 \\ -1 \\ 1 \end{bmatrix}$ and $\text{row 5} = \vec a_5 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}$ , and show that the other three vectors can be written as linear combinations of them. I chose $\vec a_2$ and $\vec a_5$ just because they have the simplest numbers, and because they’re linearly independent from one another.

$\vec a_1 = \begin{bmatrix} 5 \\ 3 \\ 2 \end{bmatrix} = -3 \vec a_2 + 5\vec a_5$
$\vec a_3 = \begin{bmatrix} 3 \\ 4 \\ -1 \end{bmatrix} = -4 \vec a_2 + 3\vec a_5$
$\vec a_4 = \begin{bmatrix} 6 \\ 2 \\ 4 \end{bmatrix} = -2 \vec a_2 + 6 \vec a_5$

You might notice that the number of linearly independent rows of $A$ and the number of linearly independent columns of $A$ were both 2.

Equivalence of Column Rank and Row Rank¶

Sometimes, we say the dimension of $\text{colsp}(A)$ is the column rank of $A$ , and the dimension of $\text{colsp}(A^T)$ is the row rank of $A$ . For the example $A$ we’ve been working with, both the column rank and row rank are 2.

But, there’s no need for separate names.

It’s not immediately obvious why this is true, and honestly, most of the proofs of it I’ve found aren’t all that convincing for a first-time linear algebra student. Still, I’ll try to prove this fact for you in just a little bit.

So, in general, what is $\text{rank}(A)$ ? Since the rank is equal to both the number of linearly independent columns and the number of linearly independent rows, then the largest the rank can be is the smaller of the number of rows and columns.

For example, if $A$ is a $7 \times 9$ matrix, then its rank is at most 7, since it cannot have more than 7 linearly independent rows. So in general, if $A$ is an $n \times d$ matrix, then

0 \leq \text{rank}(A) \leq \min(n, d)

\underbrace{\begin{bmatrix} 5 & 3 \\ 2 & 1 \\ -1 & \frac{1}{3} \\ 3 & 6 \\ 0 & 1 \end{bmatrix}}_{\text{if } n > d, \text{ rows can't be independent}} \qquad \underbrace{\begin{bmatrix} 1 & 0 & 3 & 2 & -1 \\ \frac{1}{9} & 3 & 0 & 0 & 2 \\ 9 & 0 & 0 & 6 & -3 \end{bmatrix}}_{\text{if } n < d, \text{ columns can't be independent}}

We say a matrix is full rank if it has the largest possible rank for a matrix of its shape, i.e. if $\text{rank}(A) = \min(n, d)$ . We’ll mostly use this term when refering to square matrices, which we’ll focus more on in Chapter 2.9.

Null Space¶

As we’ve seen, $\text{colsp}(A)$ is a subspace of $\mathbb{R}^n$ that contains all possible results of $A \vec x$ for $\vec x \in \mathbb{R}^d$ . Think of $\text{colsp}(A)$ as the “reach” of the columns of $A$ .

If the columns of $A$ are linearly independent, then the only way to create $\vec 0$ through a linear combination of the columns of $A$ is if all the coefficients in the linear combination are 0, i.e. if $\vec x = \vec 0$ in $A \vec x$ .

But, if $A$ ’s columns aren’t all linearly independent, then there will be some non-zero vectors $\vec x$ where $A \vec x = \vec 0$ . This holds from the definition of linear independence, which says that if there’s a non-zero linear combination of a collection of vectors that produces $\vec 0$ , the vectors aren’t linearly independent.

It turns out that it’s worthwhile to study the set of vectors $\vec x$ that get sent to $\vec 0$ when multiplied by $A$ .

Sometimes, the null space is also called the kernel of $A$ , though we will mostly avoid that term in our class, since kernel often means something different in the context of machine learning.

Important: $\text{nullsp}(A)$ is a subspace of $\mathbb{R}^{\boxed{d}}$ , since it is made up of vectors that get multiplied by $A$ , an $n \times d$ matrix.

Let’s return to the example $A$ from earlier.

A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

$\text{nullsp}(A)$ is the set of all vectors $\vec x \in \mathbb{R}^3$ where $A \vec x = \vec 0$ . $\text{rank}(A) = 2$ , so there will be some non-zero vectors $\vec x$ in the null space. But what are they?

To find them, we need to find the general solution to

\underbrace{\begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}}_{\vec x} = \underbrace{\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}}_{\vec 0}

We can write this as the system of equations

\begin{align} 5x_1 &+ 3x_2 &+ 2x_3 &= 0 \tag{1} \\ &- x_2 &+ x_3 &= 0 \tag{2} \\ 3x_1 &+ 4x_2 &- x_3 &= 0 \tag{3} \\ 6x_1 &+ 2x_2 &+ 4x_3 &= 0 \tag{4} \\ x_1 & &+ x_3 &= 0 \tag{5} \end{align}

Equation (2) tells us $x_2 = x_3$ , and equation (5) tells us $x_1 = -x_3$ . So, any vector in $\text{nullsp}(A)$ is of the form

\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} -x_3 \\ x_3 \\ x_3 \end{bmatrix} = x_3 \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix}, \quad x_3 \in \mathbb{R}

So,

\text{nullsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix} \right\} \right)

For example, $\begin{bmatrix} -5 \\ 5 \\ 5 \end{bmatrix} \in \text{nullsp}(A)$ . Remember that vectors in $\text{nullsp}(A)$ are in $\mathbb{R}^3$ , since $A$ is a $5 \times 3$ matrix, while vectors in $\text{colsp}(A)$ are in $\mathbb{R}^5$ .

In this example, $\text{nullsp}(A)$ is a 1-dimensional subspace of $\mathbb{R}^3$ . We also know from earlier that $\text{rank}(A) = 2$ . And curiously, $1 + 2 = 3$ , the number of columns in $A$ . This is not a coincidence, and sheds light on an important theorem.

The proof of this theorem is beyond the scope of our course. But, this is such an important theorem that it’s sometimes called the fundamental theorem of linear algebra. It tells us, for one, that the dimension of the null space is equal to the number of columns minus the rank. “Nullity” is just another word for the dimension of the null space.

Let’s see how it can be used in practice. Some of these examples are taken from Gilbert Strang’s book.

Example: Linearly Independent Columns¶

Let $A = \begin{bmatrix} 3 & 1 \\ 9 & -3 \\ 0 & 6 \\ 3 & 2 \\ 1 & 1 \end{bmatrix}$ .

What is $\text{nullsp}(A)$ ?

Solution

Since $A$ ’s two columns are linearly independent, $\text{rank}(A) = 2$ , and by the rank-nullity theorem, $\text{dim}(\text{nullsp}(A)) = 2 - 2 = 0$ .

So, $\text{nullsp}(A) = \{ \vec 0 \}$ , meaning that the only vector in the null space of $A$ is the zero vector. No other vector $\vec x$ satisfies $A \vec x = \vec 0$ .

Example: Describing Spaces¶

Describe the column space, row space, and null space of the matrix

A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix}

Solution

$A$ only has one linearly independent column; columns 2 and 3 are both just multiples of column 1. So, $\text{rank}(A) = 1$ .

The column space of $A$ is the span of the first column, i.e.

\text{colsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} 1 \\ 2 \end{bmatrix} \right\} \right)

This is a 1-dimensional subspace of $\mathbb{R}^2$ ; 1 comes from the rank of $A$ , and 2 comes from the fact that each column of $A$ is a vector in $\mathbb{R}^2$ .

The row space of $A$ is the span of the first row, i.e.

\text{colsp}(A^T) = \text{span}\left( \left\{ \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \right\} \right)

This is a 1-dimensional subspace of $\mathbb{R}^3$ . Remember that the number of linearly independent columns and rows of a matrix are always the same, which is why we know the dimensions of the column space and row space are the same.

The null space of $A$ is the set of all vectors $\vec x \in \mathbb{R}^3$ where $A \vec x = \vec 0$ . This is a 2-dimensional subspace of $\mathbb{R}^3$ . 2 came from the rank-nullity theorem, which says that the dimension of the null space is equal to the number of columns minus the rank, or $3 - 1 = 2$ here.

Can we say more about the null space? It is the set of all vectors $\vec x \in \mathbb{R}^3$ where $A \vec x = \vec 0$ . This is equivalent to the set of all vectors $\vec x = \begin{bmatrix} x \\ y \\ z \end{bmatrix}$ where

x + 2y + 3z = 0 \\ 2x + 4y + 6z = 0

The two equations above are equivalent. So, the null space is the set of all vectors $\vec x = \begin{bmatrix} x \\ y \\ z \end{bmatrix}$ where $x + 2y + 3z = 0$ . This is a plane in $\mathbb{R}^3$ , as we’d expect from a 2-dimensional subspace.

Example: Thinking Abstractly¶

Suppose $A$ is a $3 \times 4$ matrix with rank 3. Describe $\text{colsp}(A)$ and $\text{nullsp}(A^T)$ . The latter is the null space of $A^T$ , and is sometimes called the left null space of $A$ , as it’s a null space of $A$ when multiplied by vectors on the left, like in $\vec y^T A$ (which performs the same calculation as $A^T \vec y$ ; the results are just transposed).

Solution

When faced with a problem like this, I like drawing out a rectangle that roughly shows me the dimensions of the matrix.

A = \begin{bmatrix} \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{bmatrix}

The rank tells us that there are 3 linearly independent columns. Since the columns themselves are in $\mathbb{R}^3$ , this must mean that

\text{colsp}(A) = \text{all of } \mathbb{R}^3

since any 3 linearly independent vectors in $\mathbb{R}^3$ will span the entire space. (The existence of the 4th linearly dependent column doesn’t change this fact, it just means that the linear combinations of the four columns won’t be unique.)

The null space of $A^T$ is the set of all vectors $\vec y$ where

\vec A^T \vec y = \begin{bmatrix} \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}

$\text{nullsp}(A^T)$ is a collection of vectors in $\mathbb{R}^3$ . What is the dimension of this space? Rank-nullity tells us that

\text{rank}(A) + \text{dim}(\text{nullsp}(A)) = \text{number of columns of } A

We can’t use this directly, since the matrix we’re dealing with is $A^T$ , not $A$ . So, replacing $A$ with $A^T$ in the equation above, we get

\text{rank}(A^T) + \text{dim}(\text{nullsp}(A^T)) = \text{number of columns of } A^T

But, $\text{rank}(A^T) = \text{rank}(A)$ , and the number of columns of $A^T$ is the same as the number of rows of $A$ . So,

\text{rank}(A) + \text{dim}(\text{nullsp}(A^T)) = \text{number of rows of } A \\ \text{dim}(\text{nullsp}(A^T)) = \text{number of rows of } A - \text{rank}(A)

But since $A$ has 3 rows and a rank of 3, we know that

\text{dim}(\text{nullsp}(A^T)) = 3 - 3 = 0

So, $\text{nullsp}(A^T)$ is a 0-dimensional subspace of $\mathbb{R}^3$ , meaning it only contains the zero vector.

\text{nullsp}(A^T) = \{ \vec 0 \}

More intuitively, using the results of the previous example, we know that $A^T$ ’s columns are linearly independent (since there are 3 of them and $\text{rank}(A^T) = 3$ ). So, the only vector in $\text{nullsp}(A^T)$ is the zero vector.

Example: Thinking Even More Abstractly¶

If $A$ is a $7 \times 9$ matrix with rank 5, find the dimensions of each of the following.

$\text{colsp}(A)$
$\text{colsp}(A^T)$
$\text{nullsp}(A)$
$\text{nullsp}(A^T)$

Solution

A = \begin{bmatrix} \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \end{bmatrix}

\text{dim}(\text{colsp}(A)) = \text{rank}(A) = 5

So, $\text{colsp}(A)$ is a 5-dimensional subspace of $\mathbb{R}^7$ .

\text{dim}(\text{colsp}(A^T)) = \text{rank}(A^T) = \text{rank}(A) = 5

So, $\text{colsp}(A^T)$ is a 5-dimensional subspace of $\mathbb{R}^9$ .

\text{dim}(\text{nullsp}(A)) = \underbrace{9 - \text{rank}(A)}_{\text{rank-nullity theorem}} = 9 - 5 = 4

So, $\text{nullsp}(A)$ is a 4-dimensional subspace of $\mathbb{R}^9$ .

\text{dim}(\text{nullsp}(A^T)) = \underbrace{7 - \text{rank}(A)}_{\text{rank-nullity theorem applied to } A^T} = 7 - 5 = 2

So, $\text{nullsp}(A^T)$ is a 2-dimensional subspace of $\mathbb{R}^7$ .

Example: Row Space and Null Space¶

Suppose $\vec r \in \text{colsp}(A^T)$ and $\vec n \in \text{nullsp}(A)$ , meaning $\vec r$ is in the row space of $A$ and $\vec n$ is in the null space of $A$ .

Prove that $\vec r$ and $\vec n$ must be orthogonal. Often, this is phrased as “the row space and null space are orthogonal complements”.

Solution

The row space of $A$ , $\text{colsp}(A^T)$ , is the set of all vectors $\vec r$ where $\vec r = A^T \vec y$ for some $\vec y \in \mathbb{R}^n$ . Note that if $A$ is an $n \times d$ matrix, then $A^T$ is a $d \times n$ matrix, and $\vec r$ is in $\mathbb{R}^d$ .

The null space of $A$ , $\text{nullsp}(A)$ , is the set of all vectors $\vec n$ where $A \vec n = \vec 0$ . Note that if $A$ is an $n \times d$ matrix, then $\vec n$ is in $\mathbb{R}^d$ .

So, $\vec r$ and $\vec n$ are both in $\mathbb{R}^d$ , which means they exist in the same universe (they have the same number of components), and so we can ask if they’re orthogonal. (If they had different numbers of components, this question would be a non-starter.)

In order to show that they’re orthogonal, we need to show that their dot product is 0.

\begin{align*} \vec r \cdot \vec n &= (A^T \vec y) \cdot \vec n \\ &= \underbrace{(A^T \vec y)^T \vec n}_{\vec u \cdot \vec v = \vec u^T \vec v} \\ &= y^T \underbrace{A \vec n}_{\vec 0} \\ &= y^T \vec 0 \\ &= 0 \end{align*}

So, every vector in the row space of $A$ is orthogonal to every vector in the null space of $A$ !

Example: Rank of $AB$ vs. Rank of $A$ or $B$ ¶

Suppose $A$ is an $n \times d$ matrix, and $B$ is a $d \times p$ matrix. Explain why the column space of $AB$ is a subset of the column space of $A$ , and the row space of $AB$ is a subset of the row space of $B$ . What does this imply about the rank of $AB$ ?

Solution

To show that $\text{colsp}(AB)$ is a subset of $\text{colsp}(A)$ , i.e.

\text{colsp}(AB) \subseteq \text{colsp}(A)

we need to show that any vector in $\text{colsp}(AB)$ is also in $\text{colsp}(A)$ . $AB$ is a matrix of shape $n \times p$ , so to multiply it by a vector $\vec x$ , $\vec x$ must be in $\mathbb{R}^p$ .

Suppose $\vec y \in \mathbb{R}^n$ is in $\text{colsp}(AB)$ . Then, $\vec y$ can be written as

\vec y = AB \vec x

for some $\vec x \in \mathbb{R}^p$ .

But,

\vec y = AB\vec x = A \underbrace{(B \vec x)}_{\text{vector in } \mathbb{R}^d}

meaning that $\vec y$ is also a linear combination of the columns of $A$ (since we wrote it as $A \vec z$ , for some vector $\vec z \in \mathbb{R}^d$ ).

So, we’ve shown that if $\vec y$ is in $\text{colsp}(AB)$ , then $\vec y$ is also in $\text{colsp}(A)$ . Therefore, $\text{colsp}(AB) \subseteq \text{colsp}(A)$ . This tells us that $\text{rank}(AB) \leq \text{rank}(A)$ , since the rank of a matrix is the dimension of its column space.

Using similar logic, any vector in $\text{rowsp}(AB) = \text{colsp}((AB)^T) = \text{colsp}(B^T A^T)$ is of the form

B^T A^T \vec x

But, $B^T A^T \vec x$ is of the form $B^T \vec y$ for some $\vec y \in \mathbb{R}^d$ , meaning that $B^TA^T \vec x$ is in the column space of $B^T$ or row space of $B$ . So, $\text{rowsp}(AB) \subseteq \text{rowsp}(B)$ , meaning $\text{rank}(AB) \leq \text{rank}(B)$ .

Putting these two results together, we have that

\text{rank}(AB) \leq \text{rank}(A) \text{ and } \text{rank}(AB) \leq \text{rank}(B)

But, since both must be true, then

\text{rank}(AB) \leq \min(\text{rank}(A), \text{rank}(B))

So intuitively, when we multiply two matrices, the rank of the resulting matrix can’t be greater than the rank of either of the two matrices we started with, but it can “drop”.

Example: Rank of $X^TX$ ¶

Prove that $\text{rank}(X^T X) = \text{rank}(X)$ for any $n \times d$ matrix $X$ .

The matrix $X^TX$ is hugely important for our regression problem, and you’ll also see in Homework 5 that it helps define the covariance matrix of our data.

Solution

First, let’s think about the shape of $X^TX$ . If $X$ is an $n \times d$ matrix, then $X^T$ is a $d \times n$ matrix, and $X^TX$ is a $d \times d$ matrix. So, $X$ and $X^TX$ have the same number of columns ( $d$ ), but $X^TX$ is also square, which $X$ doesn’t have to be.

The way we’ll proceed is to show that both matrices have the same null space. If we can show they both have the same null space, then the dimensions of both null spaces must be the same. Since the rank-nullity theorem tells us that

\text{rank}(A) + \text{dim}(\text{nullsp}(A)) = \text{number of columns of } A

if we can show that $\text{dim}(\text{nullsp}(X^TX)) = \text{dim}(\text{nullsp}(X))$ , then we’ll have shown that $\text{rank}(X^TX) = \text{rank}(X)$ , since both $X$ and $X^TX$ have the same number of columns.

To show that both $X$ and $X^TX$ have the same null space, we need to show that any vector $\vec x$ in the null space of $X$ is also in the null space of $X^TX$ , and vice versa.

Part 1: Show that $\vec v \in \text{nullsp}(X) \implies \vec v \in \text{nullsp}(X^TX)$

Suppose $\vec v \in \text{nullsp}(X)$ . Then, $X \vec v = \vec 0$ . Multiplying both sides on the left by $X^T$ , we get

X^T X \vec v = X^T \vec 0 = \vec 0

So, $\vec v$ is also in the null space of $X^TX$ .

(This was the easier part.)

Part 2: Show that $\vec v \in \text{nullsp}(X^TX) \implies \vec v \in \text{nullsp}(X)$

Suppose $\vec v \in \text{nullsp}(X^TX)$ . Then, $X^TX \vec v = \vec 0$ . It’s not immediately clear how to use this to show that $\vec v$ is in the null space of $X$ , so let’s try something different.

What if we left-multiply both sides of the equation by $\vec v^T$ ? This is equivalent to taking the dot product of both sides with $\vec v$ .

X^TX \vec v = \vec 0 \\ \vec v^T X^TX \vec v = \vec v^T \vec 0 = 0

Okay, so $\vec v^T X^TX \vec v = 0$ . What does this tell us? If we look closely, the left-hand side is really just $(X \vec v)^T X \vec v$ , which is also just $(X \vec v) \cdot (X \vec v)$ , which we also know is equal to $\lVert X \vec v \rVert^2$ .

So, we have

\lVert X \vec v \rVert^2 = 0

But, the only vector with a norm of 0 is the zero vector. So, $X \vec v = \vec 0$ . So, we’ve now shown that if $X^TX \vec v = \vec 0$ , then it has to be the case that $X \vec v = \vec 0$ also.

Now that we’ve shown both directions, we’ve shown that $\text{nullsp}(X^TX) = \text{nullsp}(X)$ , because any vector in one of these sets is also in the other set, and so the two sets must be equal.

Summary¶

Matrix Decompositions¶

In the coming chapters, we’ll spend a lot of our time decomposing matrices into smaller pieces, each of which gives us some insight into the data stored in the matrix. This is not a new concept: in earlier math courses, you’ve learned to write a number as a product of prime factors, and to factor quadratics like $x^2 - 5x + 6$ into $(x-2)(x-3)$ .

The “ultimate” decomposition for the purposes of machine learning is the singular value decomposition (SVD), which decomposes a matrix into the product of three other matrices.

X = U \Sigma V^T

where $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix. This decomposition will allow us to solve the dimensionality reduction problem we first alluded to in Chapter 1.1.

We’re not yet ready for that. For now, we’ll introduce a decomposition that ties together ideas from this section, and allows us to prove the fact that the number of linearly independent columns of $A$ is equal to the number of linearly independent rows of $A$ , i.e. that $\text{rank}(A) = \text{rank}(A^T)$ .

CR Decomposition¶

Suppose $A$ is an $n \times d$ matrix with rank $r$ . This tells us that $A$ has $r$ linearly independent columns (and rows), and the remaining $d - r$ columns can be written as linear combinations of the $r$ “good” columns.

Let’s continue thinking about

A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

Recall, $\text{rank}(A) = 2$ , and

\text{column 3} = \text{column 1} - \text{column 2}

Define the matrix $C$ as containing just the linearly independent columns of $A$ , i.e. $C$ ’s columns are a basis for $\text{colsp}(A)$ . Notice that $C$ is a $5 \times 2$ matrix; its number of columns is equal to the rank of $A$ .

C = \begin{bmatrix} 5 & 3 \\ 0 & -1 \\ 3 & 4 \\ 6 & 2 \\ 1 & 0 \end{bmatrix}

I’d like to produce a “mixing” matrix $R$ such that $A = CR$ . Here, $R$ will tell us how to “mix” the columns of $C$ (which are linearly independent) to produce the columns of $A$ . Since $A$ is $5 \times 3$ and $C$ is $5 \times 2$ , $R$ must be $2 \times 3$ in order for the dimensions to work out.

Column 1 of $A$ is just $1 (\text{column 1 of } C)$
Column 2 of $A$ is just $1 (\text{column 2 of } C)$
Column 3 of $A$ is ${\color{orange}1} (\text{column 1 of } C) - {\color{orange}1} (\text{column 2 of } C)$

So, $R$ must be

R = \begin{bmatrix} 1 & 0 & {\color{orange}1} \\ 0 & 1 & {\color{orange}-1} \end{bmatrix}

So, the CR decomposition of $A$ is

A = \underbrace{\begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}}_{A} = \underbrace{\begin{bmatrix} 5 & 3 \\ 0 & -1 \\ 3 & 4 \\ 6 & 2 \\ 1 & 0 \end{bmatrix}}_{C} \underbrace{\begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & -1 \end{bmatrix}}_{R}

Python can tell us quickly that we did the decomposition correctly.

C = np.array([[5, 3],
              [0, -1],
              [3, 4],
              [6, 2],
              [1, 0]])

R = np.array([[1, 0, 1],
              [0, 1, -1]])

C @ R

array([[ 5,  3,  2],
       [ 0, -1,  1],
       [ 3,  4, -1],
       [ 6,  2,  4],
       [ 1,  0,  1]])

By definition, $C$ ’s columns are a basis for the column space of $A$ , $\text{colsp}(A)$ , since $C$ ’s columns are linearly independent and span $\text{colsp}(A)$ .

But, there’s another fact hiding in plain sight: $R$ ’s rows are a basis for the row space of $A$ , $\text{colsp}(A^T)$ ! The row space of $A$ is the set of all linear combinations of $A$ ’s rows, which is a subspace of $\mathbb{R}^3$ , and any vector in that subspace can be written as a linear combination of $R$ ’s two rows, $\begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}$ and $\begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix}$ . (In fact, the two rows of $R$ happen to be rows in $A$ , though this is not always guaranteed.)

To be precise, a CR decomposition of $A$ consists of a matrix $C$ of shape $n \times r$ that contains $A$ ’s linearly independent columns, and a matrix $R$ of shape $r \times d$ that contains the coefficients needed to write each column of $A$ as a linear combination of the columns of $C$ . As we’re seeing above, in addition to $C$ ’s columns being linearly independent, $R$ ’s rows are also linearly independent. I won’t prove this fact here, but it’s a good thought exercise to think about why it must be true.

Notice that I’ve said a CR decomposition, not the CR decomposition, because this decomposition is not unique. Think about why the following is also a CR decomposition of $A$ .

C = np.array([[5, 2],
              [0, 1],
              [3, -1],
              [6, 4],
              [1, 1]])

R = np.array([[1, 1, 0],
              [0, -1, 1]])

C @ R

array([[ 5,  3,  2],
       [ 0, -1,  1],
       [ 3,  4, -1],
       [ 6,  2,  4],
       [ 1,  0,  1]])

You can think of $\text{rank}(A)$ as being the minimum possible number of columns in $C$ to make $A = CR$ possible.

Here, since $\text{rank}(A) = 2$ , the minimum possible number of columns in $C$ is 2; no $C$ with just a single column would allow for $A = CR$ to work, and while a $C$ with 3 columns would work, 2 is the minimum number of possible columns.

Why is $\text{rank}(A) = \text{rank}(A^T)$ ?¶

The reason I’ve introduced the CR decomposition is that it will allow us to see why $\text{rank}(A) = \text{rank}(A^T)$ for any matrix $A$ , i.e. that the number of linearly independent columns of $A$ is equal to the number of linearly independent rows of $A$ .

For now, let’s suppose we don’t know that this is true, and just interpret $\text{rank}(A)$ as the number of linearly independent columns of $A$ .

Suppose, as we typically do, that $A$ is an $n \times d$ matrix with rank $r$ , and that $A = CR$ is a CR decomposition of $A$ into matrices

$C$ (with shape $n \times r$ ), and
$R$ (with shape $r \times d$ )

As mentioned above, $C$ ’s columns are linearly independent, and $R$ ’s rows are linearly independent.

What happens if we transpose $A = CR$ ? We get

A^T = (CR)^T = R^T C^T

This tells us that $A^T$ can be written as the product of $R^T$ (shape $d \times r$ ) and $C^T$ (shape $r \times n$ ).

Key insight: this is just a CR decomposition of $A^T$ , using $R^T$ and $C^T$ instead of $C$ and $R$ !

A^T = \underbrace{\begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \\ 2 & 1 & -1 & 4 & 1 \end{bmatrix}}_{A^T} = \underbrace{\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & -1 \end{bmatrix}}_{R^T} \underbrace{\begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \end{bmatrix}}_{C^T}

The columns of $A^T$ are just the rows of $A$ , and the columns of $R^T$ are just the rows of $R$ . Since we knew that the rows of $R$ were linearly independent (using my earlier assumption), we know that the columns of $R^T$ are also linearly independent, which makes this a CR decomposition of $A^T$ .

Because $A^T$ has a CR decomposition using a matrix with $r$ linearly independent columns ( $R^T$ ), it must have a rank of $r$ . But since $\text{rank}(A) = r$ , we must have

\text{rank}(A) = \text{rank}(A^T)

I skipped some minor steps in this proof, but the main idea is that the CR decomposition $A = CR$ gives us two things:

A basis for the column space of $A$ , stored in $C$ ’s columns
A basis for the row space of $A$ , stored in $R$ ’s rows

Remember that $C$ is carefully constructed to contain $A$ ’s linearly independent columns; any arbitrary decomposition of $A = XY$ might not have these properties.

Column Space and Rank¶

Example: Creating Matrices¶

Example: 2×22 \times 22×2 Matrices¶

Example: Diagonal Matrices¶

Example: Vector Outer Product¶

Example: Basis for Column Space¶

Finding the Rank Using Python¶

Row Space¶

Equivalence of Column Rank and Row Rank¶

Null Space¶

Example: Linearly Independent Columns¶

Example: Describing Spaces¶

Example: Thinking Abstractly¶

Example: Thinking Even More Abstractly¶

Example: Row Space and Null Space¶

Example: Rank of ABABAB vs. Rank of AAA or BBB¶

Example: Rank of XTXX^TXXTX¶

Summary¶

Matrix Decompositions¶

CR Decomposition¶

Why is rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T)rank(A)=rank(AT)?¶

Example: $2 \times 2$ Matrices¶

Example: Rank of $AB$ vs. Rank of $A$ or $B$ ¶

Example: Rank of $X^TX$ ¶

Why is $\text{rank}(A) = \text{rank}(A^T)$ ?¶