Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5.3. Rank and Column Space

In Chapter 5.1, I encouraged you to think of a matrix as a collection of vectors stacked together.

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

Soon, these matrices will be made up of observations and features from some dataset that we’d like to build a regression model with. The content of this section will be a crucial building block for helping us get there.


Column Space and Rank

Let’s keep working with the matrix AA from above.

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

AA is a 5×35 \times 3 matrix. It can be thought of as either:

  • 3 vectors in R5\mathbb{R}^5 (the columns of AA)

  • 5 vectors in R3\mathbb{R}^3 (the rows of AA)

Let’s start with the column perspective. We saw in Chapter 5.1 that if xR3x \in \mathbb{R}^3, then AxA \vec x is a new vector in R5\mathbb{R}^5 that is a linear combination of the columns of AA. For instance, if we take x=[201]\color{orange} \vec x = \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix}, then

Ax=[532011341624101][201]=2[50361]+0[31420]1[21141]=[81781]A {\color{orange} \vec x} = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix} {\color{orange} \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix}} = {\color{orange} 2} \begin{bmatrix} 5 \\ 0 \\ 3 \\ 6 \\ 1 \end{bmatrix} + {\color{orange} 0} \begin{bmatrix} 3 \\ -1 \\ 4 \\ 2 \\ 0 \end{bmatrix} - {\color{orange} 1} \begin{bmatrix} 2 \\ 1 \\ -1 \\ 4 \\ 1 \end{bmatrix} = \begin{bmatrix} 8 \\ -1 \\ 7 \\ 8 \\ 1 \end{bmatrix}

By definition, the vector AxA \vec x is in the span of the columns of AA, since it’s just a linear combination of AA’s columns.

Given what we’ve learned in Chapter 4.1 and Chapter 4.3, it’s natural to try and describe the span of AA’s columns.

Notice that I’ve intentionally chosen not to use subscripts to refer to columns; this is so that when we switch back to focusing on datasets and machine learning, we keep consistent the fact that subscripts refer to different rows/data points, not columns/features.

“Column space” is just a new term for a concept we’re already familiar with: the span of a set of vectors. In the example AA we’ve been working with, the column space is

colsp(A)=span({[50361],[31420],[21141]})\text{colsp}(A) = \text{span}\left( \left\{ \begin{bmatrix} 5 \\ 0 \\ 3 \\ 6 \\ 1 \end{bmatrix}, \begin{bmatrix} 3 \\ -1 \\ 4 \\ 2 \\ 0 \end{bmatrix}, \begin{bmatrix} 2 \\ 1 \\ -1 \\ 4 \\ 1 \end{bmatrix} \right\} \right)

This column space is a 2-dimensional subspace of R5\mathbb{R}^5. Why is it 2-dimensional? The last column is a linear combination of the first two columns. Specifically,

column 3=column 1column 2\text{column 3} = \text{column 1} - \text{column 2}

Remember that:

  • the dimension of a subspace is the number of vectors in a basis for the subspace, and

  • a basis for a subspace is a linearly independent set of vectors that spans the entire subspace.

The first two columns of AA alone span the column space, and are linearly independent, and so dim(colsp(A))=2\text{dim}(\text{colsp}(A)) = 2. This number, 2, is the most important number associated with the matrix AA, so much so that we give it a special name.

The rank of a matrix tells us how “large” the space of possible linear combinations of the columns of AA is. We care about this because ultimately, our predictions in regression are just linear combinations of the columns of some data matrix.

To get a feel for the idea of rank, let’s work through some examples.

Example: Creating Matrices

Find three 3×43 \times 4 matrices: one with rank 1, one with rank 2, and one with rank 3. Is it possible to have a 3×43 \times 4 matrix with rank 4?

Example: 2×22 \times 2 Matrices

Let

A=[abcd]A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}

Find a condition on a,b,c,da, b, c, d that ensures rank(A)=2\text{rank}(A) = 2.

Example: Diagonal Matrices

Suppose d1,d2,,dnd_1, d_2, \ldots, d_n are real numbers. What is the rank of the n×nn \times n diagonal matrix

D=[d1000d2000dn]D = \begin{bmatrix} d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_n \end{bmatrix}

if

  1. di=id_i = i, for i=1,2,,ni = 1, 2, \ldots, n?

  2. d1=d2==dn=0d_1 = d_2 = \cdots = d_n = 0?

  3. kk of the did_i are equal to 0, and the rest are positive?

Example: Vector Outer Product

Let u=[134]\vec u = \begin{bmatrix} 1 \\ -3 \\ 4 \end{bmatrix} and v=[251]\vec v = \begin{bmatrix} 2 \\ 5 \\ -1 \end{bmatrix}.

As we’ve seen before, the dot product uv=uTv\vec u \cdot \vec v = \vec u^T \vec v is a scalar, equal to -17 here.

The outer product of u\vec u and v\vec v is the matrix

uvT=[134][251]=[25161538204]\vec u \vec v^T = \begin{bmatrix} 1 \\ -3 \\ 4 \end{bmatrix} \begin{bmatrix} 2 & 5 & -1 \end{bmatrix} = \begin{bmatrix} 2 & 5 & -1 \\ -6 & -15 & 3 \\ 8 & 20 & -4 \end{bmatrix}

In general, for any two vectors u,vRn\vec u, \vec v \in \mathbb{R}^n, what is the rank of uvT\vec u \vec v^T?

Example: Basis for Column Space

Find a basis for the column space of

A=[36093240620015012031]A = \begin{bmatrix} 3 & 6 & 0 & 9 & 3 \\ 2 & 4 & 0 & 6 & 2 \\ 0 & 0 & 1 & -5 & 0 \\ 1 & 2 & 0 & 3 & 1 \end{bmatrix}

Note that we’ve already seen plenty of problems of this form in earlier homeworks. It’s just that there, the spanning set of vectors was given to you directly, and here, they’re stored as columns in a matrix. The idea is the same.

Finding the Rank Using Python

Shortly, we’ll learn a new technique for finding the rank of matrix by hand. But for the most part, we’ll not need to do this, and instead can use the power of Python to help us.

Returning to

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

we have

import numpy as np

A = np.array([[5, 3, 2], 
              [0, -1, 1], 
              [3, 4, -1], 
              [6, 2, 4], 
              [1, 0, 1]])
              
np.linalg.matrix_rank(A)
2

Python makes it easy to experiment with how different operations affect the rank of a matrix.

For instance, later in this section, we’ll prove that the matrix ATAA^TA has the same rank as AA, for any n×dn \times d matrix AA.

A.T @ A
array([[71, 39, 32], [39, 30, 9], [32, 9, 23]])
# Same as rank of A from above!
np.linalg.matrix_rank(A.T @ A)
2

Row Space

So far, we’ve focused on thinking of a matrix as a collection of “column” vectors written next to each other. This is the more common perspective, since - as I’ve harped on - AxA \vec x is a linear combination of the columns of AA.

But we can also think of a matrix as a collection of “row” vectors written on top of each other.

A=[532011341624101]A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

AA contains 5 vectors in R3\mathbb{R}^3 in its rows, each in R3\mathbb{R}^3. These vectors also have a span, which in this case is a subspace of R3\mathbb{R}^3.

Where did ATyA^T \vec y come from? Remember, AxA \vec x is a linear combination of the columns of AA. If we transpose AA, then ATyA^T \vec y is a linear combination of the columns of ATA^T, which are the rows of AA.

ATy=[503613142021141][y1y2y3y4y5]=y1[532]+y2[011]+y3[341]+y4[624]+y5[101]linear combination of rows of A\begin{align*} A^T {\color{orange} \vec y} &= \begin{bmatrix} 5 & 0 & 3 & 6 & 1 \\ 3 & -1 & 4 & 2 & 0 \\ 2 & 1 & -1 & 4 & 1 \end{bmatrix} {\color{orange} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \end{bmatrix}} \\ &= \underbrace{{\color{orange} y_1} \begin{bmatrix} 5 \\ 3 \\ 2 \end{bmatrix} + {\color{orange} y_2} \begin{bmatrix} 0 \\ -1 \\ 1 \end{bmatrix} + {\color{orange} y_3} \begin{bmatrix} 3 \\ 4 \\ -1 \end{bmatrix} + {\color{orange} y_4} \begin{bmatrix} 6 \\ 2 \\ 4 \end{bmatrix} + {\color{orange} y_5} \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}}_\text{linear combination of rows of A} \end{align*}

Remember from Chapter 5.1 that (ATy)T=yTA(A^T \vec y)^T = \vec y^T A. The product yTA\vec y^T A is also a linear combination of the rows of AA; it just returns a row vector with shape 1×d1 \times d rather than a vector with shape d×1d \times 1.

yTA=[y1y2y3y4y5][532011341624101]=y1[532]+y2[011]+...+y5[101]linear combination of rows of A\begin{align*} {\color{orange} \vec y}^T A &= {\color{orange} \begin{bmatrix} y_1 & y_2 & y_3 & y_4 & y_5 \end{bmatrix}} \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix} \\ &= \underbrace{{\color{orange} y_1} \begin{bmatrix} 5 & 3 & 2 \end{bmatrix} + {\color{orange} y_2} \begin{bmatrix} 0 & -1 & 1 \end{bmatrix} + ... + {\color{orange} y_5} \begin{bmatrix} 1 & 0 & 1 \end{bmatrix}}_\text{linear combination of rows of A} \end{align*}

In yTA\vec y^T A, we left-multiplied AA by a vector; in AxA \vec x, we right-multiplied AA by a vector. These are distinct types of multiplication, as they involve vectors of different shapes.

Since the columns of ATA^T are the rows of AA, the row space of AA is the column space of ATA^T, meaning

rowsp(A)=colsp(AT)\text{rowsp}(A) = \text{colsp}(A^T)

To avoid carrying around lots of notation, I’ll often just use colsp(AT)\text{colsp}(A^T) to refer to the row space of AA.

What is the dimension of the row space of AA? Other ways of phrasing this question are:

  • How many linearly independent rows does AA have?

  • What is the rank of ATA^T?

We know the answer can’t be more than 3, since the rows of AA are vectors in R3\mathbb{R}^3. We could use the algorithm first presented in Chapter 4.1 to find a linearly independent set of rows with the same span as all 5 rows.

An easy way to see that dim(colsp(AT))=2\text{dim}(\text{colsp}(A^T)) = 2 is to pick out two of the five vectors, row 2=a2=[011]\text{row 2} = \vec a_2 = \begin{bmatrix} 0 \\ -1 \\ 1 \end{bmatrix} and row 5=a5=[101]\text{row 5} = \vec a_5 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, and show that the other three vectors can be written as linear combinations of them. I chose a2\vec a_2 and a5\vec a_5 just because they have the simplest numbers, and because they’re linearly independent from one another.

  • a1=[532]=3a2+5a5\vec a_1 = \begin{bmatrix} 5 \\ 3 \\ 2 \end{bmatrix} = -3 \vec a_2 + 5\vec a_5

  • a3=[341]=4a2+3a5\vec a_3 = \begin{bmatrix} 3 \\ 4 \\ -1 \end{bmatrix} = -4 \vec a_2 + 3\vec a_5

  • a4=[624]=2a2+6a5\vec a_4 = \begin{bmatrix} 6 \\ 2 \\ 4 \end{bmatrix} = -2 \vec a_2 + 6 \vec a_5

You might notice that the number of linearly independent rows of AA and the number of linearly independent columns of AA were both 2.

Equivalence of Column Rank and Row Rank

Sometimes, we say the dimension of colsp(A)\text{colsp}(A) is the column rank of AA, and the dimension of colsp(AT)\text{colsp}(A^T) is the row rank of AA. For the example AA we’ve been working with, both the column rank and row rank are 2.

But, there’s no need for separate names.

It’s not immediately obvious why this is true, and honestly, most of the proofs of it I’ve found aren’t all that convincing for a first-time linear algebra student. Still, I’ll try to prove this fact for you in just a little bit.

So, in general, what is rank(A)\text{rank}(A)? Since the rank is equal to both the number of linearly independent columns and the number of linearly independent rows, then the largest the rank can be is the smaller of the number of rows and columns.

For example, if AA is a 7×97 \times 9 matrix, then its rank is at most 7, since it cannot have more than 7 linearly independent rows. So in general, if AA is an n×dn \times d matrix, then

0rank(A)min(n,d)0 \leq \text{rank}(A) \leq \min(n, d)
[53211133601]if n>d, rows can’t be independent[1032119300290063]if n<d, columns can’t be independent\underbrace{\begin{bmatrix} 5 & 3 \\ 2 & 1 \\ -1 & \frac{1}{3} \\ 3 & 6 \\ 0 & 1 \end{bmatrix}}_{\text{if } n > d, \text{ rows can't be independent}} \qquad \underbrace{\begin{bmatrix} 1 & 0 & 3 & 2 & -1 \\ \frac{1}{9} & 3 & 0 & 0 & 2 \\ 9 & 0 & 0 & 6 & -3 \end{bmatrix}}_{\text{if } n < d, \text{ columns can't be independent}}

We say a matrix is full rank if it has the largest possible rank for a matrix of its shape, i.e. if rank(A)=min(n,d)\text{rank}(A) = \min(n, d). We’ll mostly use this term when refering to square matrices, which we’ll focus more on in Chapter 6.2.

Next, we introduce the null space and connect it to rank via the rank-nullity theorem.