Skip to article frontmatterSkip to article content

2.8. Rank

In Chapter 2.7, I encouraged you to think of a matrix as a collection of vectors stacked together.

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

Soon, these matrices will be made up of observations and features from some dataset that we’d like to build a regression model with. The content of this section will be a crucial building block for helping us get there.


Column Space and Rank

Let’s keep working with the matrix AA from above.

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

AA is a 5×35 \times 3 matrix. It can be thought of as either:

  • 3 vectors in R5\mathbb{R}^5 (the columns of AA)

  • 5 vectors in R3\mathbb{R}^3 (the rows of AA)

Let’s start with the column perspective. We saw in Chapter 2.7 that if xR3x \in \mathbb{R}^3, then AxA \vec x is a new vector in R5\mathbb{R}^5 that is a linear combination of the columns of AA. For instance, if we take x=[201]\color{orange} \vec x = \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix}, then

Ax=[532011341624101][201]=2[50361]+0[31420]1[21141]=[81781]A {\color{orange} \vec x} = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix} {\color{orange} \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix}} = {\color{orange} 2} \begin{bmatrix} 5 \\ 0 \\ 3 \\ 6 \\ 1 \end{bmatrix} + {\color{orange} 0} \begin{bmatrix} 3 \\ -1 \\ 4 \\ 2 \\ 0 \end{bmatrix} - {\color{orange} 1} \begin{bmatrix} 2 \\ 1 \\ -1 \\ 4 \\ 1 \end{bmatrix} = \begin{bmatrix} 8 \\ -1 \\ 7 \\ 8 \\ 1 \end{bmatrix}

By definition, the vector AxA \vec x is in the span of the columns of AA, since it’s just a linear combination of AA’s columns.

Given what we’ve learned in Chapter 2.4 and Chapter 2.6, it’s natural to try and describe the span of AA’s columns.

Notice that I’ve intentionally chosen not to use subscripts to refer to columns; this is so that when we switch back to focusing on datasets and machine learning, we keep consistent the fact that subscripts refer to different rows/data points, not columns/features.

“Column space” is just a new term for a concept we’re already familiar with: the span of a set of vectors. In the example AA we’ve been working with, the column space is

colsp(A)=span({[50361],[31420],[21141]})\text{colsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} 5 \\ 0 \\ 3 \\ 6 \\ 1 \end{bmatrix}, \begin{bmatrix} 3 \\ -1 \\ 4 \\ 2 \\ 0 \end{bmatrix}, \begin{bmatrix} 2 \\ 1 \\ -1 \\ 4 \\ 1 \end{bmatrix} \right\} \right)

This column space is a 2-dimensional subspace of R5\mathbb{R}^5. Why is it 2-dimensional? The last column is a linear combination of the first two columns. Specifically,

column 3=column 1column 2\text{column 3} = \text{column 1} - \text{column 2}

Remember that:

  • the dimension of a subspace is the number of vectors in a basis for the subspace, and

  • a basis for a subspace is a linearly independent set of vectors that spans the entire subspace.

The first two columns of AA alone span the column space, and are linearly independent, and so dim(colsp(A))=2\text{dim}(\text{colsp}(A)) = 2. This number, 2, is the most important number associated with the matrix AA, so much so that we give it a special name.

The rank of a matrix tells us how “large” the space of possible linear combinations of the columns of AA is. We care about this because ultimately, our predictions in regression are just linear combinations of the columns of some data matrix.

To get a feel for the idea of rank, let’s work through some examples.

Example: Creating Matrices

Find three 3×43 \times 4 matrices: one with rank 1, one with rank 2, and one with rank 3. Is it possible to have a 3×43 \times 4 matrix with rank 4?

Example: 2×22 \times 2 Matrices

Let

A=[abcd]A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}

Find a condition on a,b,c,da, b, c, d that ensures rank(A)=2\text{rank}(A) = 2.

Example: Diagonal Matrices

Suppose d1,d2,,dnd_1, d_2, \ldots, d_n are real numbers. What is the rank of the n×nn \times n diagonal matrix

D=[d1000d2000dn]D = \begin{bmatrix} d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_n \end{bmatrix}

if

  1. di=id_i = i, for i=1,2,,ni = 1, 2, \ldots, n?

  2. d1=d2==dn=0d_1 = d_2 = \cdots = d_n = 0?

  3. kk of the did_i are equal to 0, and the rest are positive?

Example: Vector Outer Product

Let u=[134]\vec u = \begin{bmatrix} 1 \\ -3 \\ 4 \end{bmatrix} and v=[251]\vec v = \begin{bmatrix} 2 \\ 5 \\ -1 \end{bmatrix}.

As we’ve seen before, the dot product uv=uTv\vec u \cdot \vec v = \vec u^T \vec v is a scalar, equal to -17 here.

The outer product of u\vec u and v\vec v is the matrix

uvT=[134][251]=[25161538204]\vec u \vec v^T = \begin{bmatrix} 1 \\ -3 \\ 4 \end{bmatrix} \begin{bmatrix} 2 & 5 & -1 \end{bmatrix} = \begin{bmatrix} 2 & 5 & -1 \\ -6 & -15 & 3 \\ 8 & 20 & -4 \end{bmatrix}

In general, for any two vectors u,vRn\vec u, \vec v \in \mathbb{R}^n, what is the rank of uvT\vec u \vec v^T?

Example: Basis for Column Space

Find a basis for the column space of

A=[36093240620015012031]A = \begin{bmatrix} 3 & 6 & 0 & 9 & 3 \\ 2 & 4 & 0 & 6 & 2 \\ 0 & 0 & 1 & -5 & 0 \\ 1 & 2 & 0 & 3 & 1 \end{bmatrix}

Note that we’ve already seen plenty of problems of this form in earlier homeworks. It’s just that there, the spanning set of vectors was given to you directly, and here, they’re stored as columns in a matrix. The idea is the same.

Finding the Rank Using Python

Shortly, we’ll learn a new technique for finding the rank of matrix by hand. But for the most part, we’ll not need to do this, and instead can use the power of Python to help us.

Returning to

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

we have

import numpy as np

A = np.array([[5, 3, 2], 
              [0, -1, 1], 
              [3, 4, -1], 
              [6, 2, 4], 
              [1, 0, 1]])
              
np.linalg.matrix_rank(A)
2

Python makes it easy to experiment with how different operations affect the rank of a matrix.

For instance, later in this section, we’ll prove that the matrix ATAA^TA has the same rank as AA, for any n×dn \times d matrix AA.

A.T @ A
array([[71, 39, 32], [39, 30, 9], [32, 9, 23]])
# Same as rank of A from above!
np.linalg.matrix_rank(A.T @ A)
2

Row Space

So far, we’ve focused on thinking of a matrix as a collection of “column” vectors written next to each other. This is the more common perspective, since – as I’ve harped on – AxA \vec x is a linear combination of the columns of AA.

But we can also think of a matrix as a collection of “row” vectors written on top of each other.

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

AA contains 5 vectors in R3\mathbb{R}^3 in its rows, each in R3\mathbb{R}^3. These vectors also have a span, which in this case is a subspace of R3\mathbb{R}^3.

Where did ATyA^T \vec y come from? Remember, AxA \vec x is a linear combination of the columns of AA. If we transpose AA, then ATyA^T \vec y is a linear combination of the columns of ATA^T, which are the rows of AA.

ATy=[503613142021141][y1y2y3y4y5]=y1[532]+y2[011]+y3[341]+y4[624]+y5[101]linear combination of rows of A\begin{align*} A^T {\color{orange} \vec y} &= \begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \\ 2 & 1 & -1 & 4 & 1 \end{bmatrix} {\color{orange} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \end{bmatrix}} \\ &= \underbrace{{\color{orange} y_1} \begin{bmatrix} 5 \\ 3 \\ 2 \end{bmatrix} + {\color{orange} y_2} \begin{bmatrix} 0 \\ -1 \\ 1 \end{bmatrix} + {\color{orange} y_3} \begin{bmatrix} 3 \\ 4 \\ -1 \end{bmatrix} + {\color{orange} y_4} \begin{bmatrix} 6 \\ 2 \\ 4 \end{bmatrix} + {\color{orange} y_5} \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}}_\text{linear combination of rows of A} \end{align*}

Remember from Chapter 2.7 that (ATy)T=yTA(A^T \vec y)^T = \vec y^T A. The product yTA\vec y^T A is also a linear combination of the rows of AA; it just returns a row vector with shape 1×d1 \times d rather than a vector with shape d×1d \times 1.

yTA=[y1y2y3y4y5][532011341624101]=y1[532]+y2[011]+...+y5[101]linear combination of rows of A\begin{align*} {\color{orange} \vec y}^T A &= {\color{orange} \begin{bmatrix} y_1 & y_2 & y_3 & y_4 & y_5 \end{bmatrix}} \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix} \\ &= \underbrace{{\color{orange} y_1} \begin{bmatrix} 5 & 3 & 2 \end{bmatrix} + {\color{orange} y_2} \begin{bmatrix} 0 & -1 & 1 \end{bmatrix} + ... + {\color{orange} y_5} \begin{bmatrix} 1 & 0 & 1 \end{bmatrix}}_\text{linear combination of rows of A} \end{align*}

In yTA\vec y^T A, we left-multiplied AA by a vector; in AxA \vec x, we right-multiplied AA by a vector. These are distinct types of multiplication, as they involve vectors of different shapes.

Since the columns of ATA^T are the rows of AA, the row space of AA is the column space of ATA^T, meaning

rowsp(A)=colsp(AT)\text{rowsp}(A) = \text{colsp}(A^T)

To avoid carrying around lots of notation, I’ll often just use colsp(AT)\text{colsp}(A^T) to refer to the row space of AA.

What is the dimension of the row space of AA? Other ways of phrasing this question are:

  • How many linearly independent rows does AA have?

  • What is the rank of ATA^T?

We know the answer can’t be more than 3, since the rows of AA are vectors in R3\mathbb{R}^3. We could use the algorithm first presented in Chapter 2.4 to find a linearly independent set of rows with the same span as all 5 rows.

An easy way to see that dim(colsp(AT))=2\text{dim}(\text{colsp}(A^T)) = 2 is to pick out two of the five vectors, row 2=a2=[011]\text{row 2} = \vec a_2 = \begin{bmatrix} 0 \\ -1 \\ 1 \end{bmatrix} and row 5=a5=[101]\text{row 5} = \vec a_5 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, and show that the other three vectors can be written as linear combinations of them. I chose a2\vec a_2 and a5\vec a_5 just because they have the simplest numbers, and because they’re linearly independent from one another.

  • a1=[532]=3a2+5a5\vec a_1 = \begin{bmatrix} 5 \\ 3 \\ 2 \end{bmatrix} = -3 \vec a_2 + 5\vec a_5

  • a3=[341]=4a2+3a5\vec a_3 = \begin{bmatrix} 3 \\ 4 \\ -1 \end{bmatrix} = -4 \vec a_2 + 3\vec a_5

  • a4=[624]=2a2+6a5\vec a_4 = \begin{bmatrix} 6 \\ 2 \\ 4 \end{bmatrix} = -2 \vec a_2 + 6 \vec a_5

You might notice that the number of linearly independent rows of AA and the number of linearly independent columns of AA were both 2.

Equivalence of Column Rank and Row Rank

Sometimes, we say the dimension of colsp(A)\text{colsp}(A) is the column rank of AA, and the dimension of colsp(AT)\text{colsp}(A^T) is the row rank of AA. For the example AA we’ve been working with, both the column rank and row rank are 2.

But, there’s no need for separate names.

It’s not immediately obvious why this is true, and honestly, most of the proofs of it I’ve found aren’t all that convincing for a first-time linear algebra student. Still, I’ll try to prove this fact for you in just a little bit.

So, in general, what is rank(A)\text{rank}(A)? Since the rank is equal to both the number of linearly independent columns and the number of linearly independent rows, then the largest the rank can be is the smaller of the number of rows and columns.

For example, if AA is a 7×97 \times 9 matrix, then its rank is at most 7, since it cannot have more than 7 linearly independent rows. So in general, if AA is an n×dn \times d matrix, then

0rank(A)min(n,d)0 \leq \text{rank}(A) \leq \min(n, d)
[53211133601]if n>d, rows can’t be independent[1032119300290063]if n<d, columns can’t be independent\underbrace{\begin{bmatrix} 5 & 3 \\ 2 & 1 \\ -1 & \frac{1}{3} \\ 3 & 6 \\ 0 & 1 \end{bmatrix}}_{\text{if } n > d, \text{ rows can't be independent}} \qquad \underbrace{\begin{bmatrix} 1 & 0 & 3 & 2 & -1 \\ \frac{1}{9} & 3 & 0 & 0 & 2 \\ 9 & 0 & 0 & 6 & -3 \end{bmatrix}}_{\text{if } n < d, \text{ columns can't be independent}}

We say a matrix is full rank if it has the largest possible rank for a matrix of its shape, i.e. if rank(A)=min(n,d)\text{rank}(A) = \min(n, d). We’ll mostly use this term when refering to square matrices, which we’ll focus more on in Chapter 2.9.


Null Space

As we’ve seen, colsp(A)\text{colsp}(A) is a subspace of Rn\mathbb{R}^n that contains all possible results of AxA \vec x for xRd\vec x \in \mathbb{R}^d. Think of colsp(A)\text{colsp}(A) as the “reach” of the columns of AA.

If the columns of AA are linearly independent, then the only way to create 0\vec 0 through a linear combination of the columns of AA is if all the coefficients in the linear combination are 0, i.e. if x=0\vec x = \vec 0 in AxA \vec x.

But, if AA’s columns aren’t all linearly independent, then there will be some non-zero vectors x\vec x where Ax=0A \vec x = \vec 0. This holds from the definition of linear independence, which says that if there’s a non-zero linear combination of a collection of vectors that produces 0\vec 0, the vectors aren’t linearly independent.

It turns out that it’s worthwhile to study the set of vectors x\vec x that get sent to 0\vec 0 when multiplied by AA.

Sometimes, the null space is also called the kernel of AA, though we will mostly avoid that term in our class, since kernel often means something different in the context of machine learning.

Important: nullsp(A)\text{nullsp}(A) is a subspace of Rd\mathbb{R}^{\boxed{d}}, since it is made up of vectors that get multiplied by AA, an n×dn \times d matrix.

Let’s return to the example AA from earlier.

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

nullsp(A)\text{nullsp}(A) is the set of all vectors xR3\vec x \in \mathbb{R}^3 where Ax=0A \vec x = \vec 0. rank(A)=2\text{rank}(A) = 2, so there will be some non-zero vectors x\vec x in the null space. But what are they?

To find them, we need to find the general solution to

[532011341624101]A[x1x2x3]x=[00000]0\underbrace{\begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}}_{\vec x} = \underbrace{\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}}_{\vec 0}

We can write this as the system of equations

5x1+3x2+2x3=0x2+x3=03x1+4x2x3=06x1+2x2+4x3=0x1+x3=0\begin{align} 5x_1 &+ 3x_2 &+ 2x_3 &= 0 \tag{1} \\ &- x_2 &+ x_3 &= 0 \tag{2} \\ 3x_1 &+ 4x_2 &- x_3 &= 0 \tag{3} \\ 6x_1 &+ 2x_2 &+ 4x_3 &= 0 \tag{4} \\ x_1 & &+ x_3 &= 0 \tag{5} \end{align}

Equation (2) tells us x2=x3x_2 = x_3, and equation (5) tells us x1=x3x_1 = -x_3. So, any vector in nullsp(A)\text{nullsp}(A) is of the form

[x1x2x3]=[x3x3x3]=x3[111],x3R\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} -x_3 \\ x_3 \\ x_3 \end{bmatrix} = x_3 \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix}, \quad x_3 \in \mathbb{R}

So,

nullsp(A)=span({[111]})\text{nullsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix} \right\} \right)

For example, [555]nullsp(A)\begin{bmatrix} -5 \\ 5 \\ 5 \end{bmatrix} \in \text{nullsp}(A). Remember that vectors in nullsp(A)\text{nullsp}(A) are in R3\mathbb{R}^3, since AA is a 5×35 \times 3 matrix, while vectors in colsp(A)\text{colsp}(A) are in R5\mathbb{R}^5.

In this example, nullsp(A)\text{nullsp}(A) is a 1-dimensional subspace of R3\mathbb{R}^3. We also know from earlier that rank(A)=2\text{rank}(A) = 2. And curiously, 1+2=31 + 2 = 3, the number of columns in AA. This is not a coincidence, and sheds light on an important theorem.

The proof of this theorem is beyond the scope of our course. But, this is such an important theorem that it’s sometimes called the fundamental theorem of linear algebra. It tells us, for one, that the dimension of the null space is equal to the number of columns minus the rank. “Nullity” is just another word for the dimension of the null space.

Let’s see how it can be used in practice. Some of these examples are taken from Gilbert Strang’s book.

Example: Linearly Independent Columns

Let A=[3193063211]A = \begin{bmatrix} 3 & 1 \\ 9 & -3 \\ 0 & 6 \\ 3 & 2 \\ 1 & 1 \end{bmatrix}.

What is nullsp(A)\text{nullsp}(A)?

Example: Describing Spaces

Describe the column space, row space, and null space of the matrix

A=[123246]A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix}

Example: Thinking Abstractly

Suppose AA is a 3×43 \times 4 matrix with rank 3. Describe colsp(A)\text{colsp}(A) and nullsp(AT)\text{nullsp}(A^T). The latter is the null space of ATA^T, and is sometimes called the left null space of AA, as it’s a null space of AA when multiplied by vectors on the left, like in yTA\vec y^T A (which performs the same calculation as ATyA^T \vec y; the results are just transposed).

Example: Thinking Even More Abstractly

If AA is a 7×97 \times 9 matrix with rank 5, find the dimensions of each of the following.

  1. colsp(A)\text{colsp}(A)

  2. colsp(AT)\text{colsp}(A^T)

  3. nullsp(A)\text{nullsp}(A)

  4. nullsp(AT)\text{nullsp}(A^T)

Example: Row Space and Null Space

Suppose rcolsp(AT)\vec r \in \text{colsp}(A^T) and nnullsp(A)\vec n \in \text{nullsp}(A), meaning r\vec r is in the row space of AA and n\vec n is in the null space of AA.

Prove that r\vec r and n\vec n must be orthogonal. Often, this is phrased as “the row space and null space are orthogonal complements”.

Example: Rank of ABAB vs. Rank of AA or BB

Suppose AA is an n×dn \times d matrix, and BB is a d×pd \times p matrix. Explain why the column space of ABAB is a subset of the column space of AA, and the row space of ABAB is a subset of the row space of BB. What does this imply about the rank of ABAB?

Example: Rank of XTXX^TX

Prove that rank(XTX)=rank(X)\text{rank}(X^T X) = \text{rank}(X) for any n×dn \times d matrix XX.

The matrix XTXX^TX is hugely important for our regression problem, and you’ll also see in Homework 5 that it helps define the covariance matrix of our data.

Summary


Matrix Decompositions

In the coming chapters, we’ll spend a lot of our time decomposing matrices into smaller pieces, each of which gives us some insight into the data stored in the matrix. This is not a new concept: in earlier math courses, you’ve learned to write a number as a product of prime factors, and to factor quadratics like x25x+6x^2 - 5x + 6 into (x2)(x3)(x-2)(x-3).

The “ultimate” decomposition for the purposes of machine learning is the singular value decomposition (SVD), which decomposes a matrix into the product of three other matrices.

X=UΣVTX = U \Sigma V^T

where UU and VV are orthogonal matrices and Σ\Sigma is a diagonal matrix. This decomposition will allow us to solve the dimensionality reduction problem we first alluded to in Chapter 1.1.

We’re not yet ready for that. For now, we’ll introduce a decomposition that ties together ideas from this section, and allows us to prove the fact that the number of linearly independent columns of AA is equal to the number of linearly independent rows of AA, i.e. that rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T).

CR Decomposition

Suppose AA is an n×dn \times d matrix with rank rr. This tells us that AA has rr linearly independent columns (and rows), and the remaining drd - r columns can be written as linear combinations of the rr “good” columns.

Let’s continue thinking about

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

Recall, rank(A)=2\text{rank}(A) = 2, and

column 3=column 1column 2\text{column 3} = \text{column 1} - \text{column 2}

Define the matrix CC as containing just the linearly independent columns of AA, i.e. CC’s columns are a basis for colsp(A)\text{colsp}(A). Notice that CC is a 5×25 \times 2 matrix; its number of columns is equal to the rank of AA.

C=[5301346210]C = \begin{bmatrix} 5 & 3 \\ 0 & -1 \\ 3 & 4 \\ 6 & 2 \\ 1 & 0 \end{bmatrix}

I’d like to produce a “mixing” matrix RR such that A=CRA = CR. Here, RR will tell us how to “mix” the columns of CC (which are linearly independent) to produce the columns of AA. Since AA is 5×35 \times 3 and CC is 5×25 \times 2, RR must be 2×32 \times 3 in order for the dimensions to work out.

  • Column 1 of AA is just 1(column 1 of C)1 (\text{column 1 of } C)

  • Column 2 of AA is just 1(column 2 of C)1 (\text{column 2 of } C)

  • Column 3 of AA is 1(column 1 of C)1(column 2 of C){\color{orange}1} (\text{column 1 of } C) - {\color{orange}1} (\text{column 2 of } C)

So, RR must be

R=[101011]R = \begin{bmatrix} 1 & 0 & {\color{orange}1} \\ 0 & 1 & {\color{orange}-1} \end{bmatrix}

So, the CR decomposition of AA is

A=[532011341624101]A=[5301346210]C[101011]RA = \underbrace{\begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}}_{A} = \underbrace{\begin{bmatrix} 5 & 3 \\ 0 & -1 \\ 3 & 4 \\ 6 & 2 \\ 1 & 0 \end{bmatrix}}_{C} \underbrace{\begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & -1 \end{bmatrix}}_{R}

Python can tell us quickly that we did the decomposition correctly.

C = np.array([[5, 3],
              [0, -1],
              [3, 4],
              [6, 2],
              [1, 0]])

R = np.array([[1, 0, 1],
              [0, 1, -1]])

C @ R
array([[ 5, 3, 2], [ 0, -1, 1], [ 3, 4, -1], [ 6, 2, 4], [ 1, 0, 1]])

By definition, CC’s columns are a basis for the column space of AA, colsp(A)\text{colsp}(A), since CC’s columns are linearly independent and span colsp(A)\text{colsp}(A).

But, there’s another fact hiding in plain sight: RR’s rows are a basis for the row space of AA, colsp(AT)\text{colsp}(A^T)! The row space of AA is the set of all linear combinations of AA’s rows, which is a subspace of R3\mathbb{R}^3, and any vector in that subspace can be written as a linear combination of RR’s two rows, [101]\begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} and [011]\begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix}. (In fact, the two rows of RR happen to be rows in AA, though this is not always guaranteed.)

To be precise, a CR decomposition of AA consists of a matrix CC of shape n×rn \times r that contains AA’s linearly independent columns, and a matrix RR of shape r×dr \times d that contains the coefficients needed to write each column of AA as a linear combination of the columns of CC. As we’re seeing above, in addition to CC’s columns being linearly independent, RR’s rows are also linearly independent. I won’t prove this fact here, but it’s a good thought exercise to think about why it must be true.

Notice that I’ve said a CR decomposition, not the CR decomposition, because this decomposition is not unique. Think about why the following is also a CR decomposition of AA.

C = np.array([[5, 2],
              [0, 1],
              [3, -1],
              [6, 4],
              [1, 1]])

R = np.array([[1, 1, 0],
              [0, -1, 1]])

C @ R
array([[ 5, 3, 2], [ 0, -1, 1], [ 3, 4, -1], [ 6, 2, 4], [ 1, 0, 1]])

You can think of rank(A)\text{rank}(A) as being the minimum possible number of columns in CC to make A=CRA = CR possible.

Here, since rank(A)=2\text{rank}(A) = 2, the minimum possible number of columns in CC is 2; no CC with just a single column would allow for A=CRA = CR to work, and while a CC with 3 columns would work, 2 is the minimum number of possible columns.

Why is rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T)?

The reason I’ve introduced the CR decomposition is that it will allow us to see why rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T) for any matrix AA, i.e. that the number of linearly independent columns of AA is equal to the number of linearly independent rows of AA.

For now, let’s suppose we don’t know that this is true, and just interpret rank(A)\text{rank}(A) as the number of linearly independent columns of AA.

Suppose, as we typically do, that AA is an n×dn \times d matrix with rank rr, and that A=CRA = CR is a CR decomposition of AA into matrices

  • CC (with shape n×rn \times r), and

  • RR (with shape r×dr \times d)

As mentioned above, CC’s columns are linearly independent, and RR’s rows are linearly independent.

What happens if we transpose A=CRA = CR? We get

AT=(CR)T=RTCTA^T = (CR)^T = R^T C^T

This tells us that ATA^T can be written as the product of RTR^T (shape d×rd \times r) and CTC^T (shape r×nr \times n).

Key insight: this is just a CR decomposition of ATA^T, using RTR^T and CTC^T instead of CC and RR!

AT=[503613142021141]AT=[100111]RT[5036131420]CTA^T = \underbrace{\begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \\ 2 & 1 & -1 & 4 & 1 \end{bmatrix}}_{A^T} = \underbrace{\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & -1 \end{bmatrix}}_{R^T} \underbrace{\begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \end{bmatrix}}_{C^T}

The columns of ATA^T are just the rows of AA, and the columns of RTR^T are just the rows of RR. Since we knew that the rows of RR were linearly independent (using my earlier assumption), we know that the columns of RTR^T are also linearly independent, which makes this a CR decomposition of ATA^T.

Because ATA^T has a CR decomposition using a matrix with rr linearly independent columns (RTR^T), it must have a rank of rr. But since rank(A)=r\text{rank}(A) = r, we must have

rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T)

I skipped some minor steps in this proof, but the main idea is that the CR decomposition A=CRA = CR gives us two things:

  1. A basis for the column space of AA, stored in CC’s columns

  2. A basis for the row space of AA, stored in RR’s rows

Remember that CC is carefully constructed to contain AA’s linearly independent columns; any arbitrary decomposition of A=XYA = XY might not have these properties.