In Chapter 5.1, I encouraged you to think of a matrix as a collection of vectors stacked together.
A=⎣⎡503613−142021−141⎦⎤
Soon, these matrices will be made up of observations and features from some dataset that we’d like to build a regression model with. The content of this section will be a crucial building block for helping us get there.
A is a 5×3 matrix. It can be thought of as either:
3 vectors in R5 (the columns of A)
5 vectors in R3 (the rows of A)
Let’s start with the column perspective. We saw in Chapter 5.1 that if x∈R3, then Ax is a new vector in R5 that is a linear combination of the columns of A. For instance, if we take x=⎣⎡20−1⎦⎤, then
By definition, the vector Ax is in the span of the columns of A, since it’s just a linear combination of A’s columns.
Given what we’ve learned in Chapter 4.1 and Chapter 4.3, it’s natural to try and describe the span of A’s columns.
Notice that I’ve intentionally chosen not to use subscripts to refer to columns; this is so that when we switch back to focusing on datasets and machine learning, we keep consistent the fact that subscripts refer to different rows/data points, not columns/features.
“Column space” is just a new term for a concept we’re already familiar with: the span of a set of vectors. In the example A we’ve been working with, the column space is
This column space is a 2-dimensional subspace of R5. Why is it 2-dimensional? The last column is a linear combination of the first two columns. Specifically,
column 3=column 1−column 2
Remember that:
the dimension of a subspace is the number of vectors in a basis for the subspace, and
a basis for a subspace is a linearly independent set of vectors that spans the entire subspace.
The first two columns of A alone span the column space, and are linearly independent, and so dim(colsp(A))=2. This number, 2, is the most important number associated with the matrix A, so much so that we give it a special name.
The rank of a matrix tells us how “large” the space of possible linear combinations of the columns of A is. We care about this because ultimately, our predictions in regression are just linear combinations of the columns of some data matrix.
To get a feel for the idea of rank, let’s work through some examples.
Find three 3×4 matrices: one with rank 1, one with rank 2, and one with rank 3. Is it possible to have a 3×4 matrix with rank 4?
Solution
To create a 3×4 matrix with rank 1, we need there to only be one linearly independent column. For instance,
A=⎣⎡111222333444⎦⎤
has rank 1 because all columns are multiples of the first column.
To create a 3×4 matrix with rank 2, we need there to be two linearly independent columns. One way to construct such a matrix is to make the first two columns linearly independent, and make the last two columns linear combinations of the first two. For instance, in
A=⎣⎡111123222234⎦⎤
column 3 is a scalar multiple of column 1, and column 4 is column 1 + column 2.
To create a 3×4 matrix with rank 3, we need there to be three linearly independent columns, and the fourth column can be anything. One solution is
A=⎣⎡100010001987⎦⎤
A 3×4 matrix cannot have rank 4, because it’s impossible to have four linearly independent vectors in R3. Any three linearly independent vectors in R3 span all of R3, so a fourth vector in R3 would have to be a linear combination of the first three vectors.
Find a condition on a,b,c,d that ensures rank(A)=2.
Solution
In order for rank(A)=2, the columns of A must be linearly independent. This means that the second column cannot be a scalar multiple of the first column.
ab is the number we multiply a by to get b. So, we just need ab⋅c to be different from d.
ab⋅c=d⟹ad−bc=0
So, if ad−bc=0, then rank(A)=1, and otherwise, rank(A)=2.
This expression, ad−bc, is called the determinant of A. We’ll learn more about determinants in Chapter 6.2.
Suppose d1,d2,…,dn are real numbers. What is the rank of the n×ndiagonal matrix
D=⎣⎡d10⋮00d2⋮0⋯⋯⋱⋯00⋮dn⎦⎤
if
di=i, for i=1,2,…,n?
d1=d2=⋯=dn=0?
k of the di are equal to 0, and the rest are positive?
Solution
rank(D)=n: If di=i, for i=1,2,…,n, meaning d1=1,d2=2,…,dn=n, then the rank is n, because all columns are linearly independent. None can be written as a linear combination of any others, since they all have their sole non-zero entry in a different row.
rank(D)=0: If d1=d2=⋯=dn=0, then the rank is 0, because all columns are the zero vector.
rank(D)=n−k: If k of the di are equal to 0, and the rest are positive, then the rank is n−k, because we can throw out the zero columns and the remaining columns are linearly independent.
As we’ve seen before, the dot product u⋅v=uTv is a scalar, equal to -17 here.
The outer product of u and v is the matrix
uvT=⎣⎡1−34⎦⎤[25−1]=⎣⎡2−685−1520−13−4⎦⎤
In general, for any two vectors u,v∈Rn, what is the rank of uvT?
Solution
The rank of uvT is 1, since each column in uvT is a scalar multiple of u. In the example above, column 2 is 25 times column 1, and column 3 is −21 times column 1.
In Chapter 10, we’ll see that any rank r matrix can be written as a sum of r rank 1 matrices, each of which is of the form uvT. So, rank 1 matrices can be thought of as the building blocks of all matrices!
Note that we’ve already seen plenty of problems of this form in earlier homeworks. It’s just that there, the spanning set of vectors was given to you directly, and here, they’re stored as columns in a matrix. The idea is the same.
Solution
Column 2 is a multiple of column 1.
Column 3 is independent from column 1.
Column 4 is 3 times column 1 plus -5 times column 3.
Column 5 is just column 1.
A only has two linearly independent columns - columns 1 and 3 - and so rank(A)=2, and a basis for the column space is
Shortly, we’ll learn a new technique for finding the rank of matrix by hand. But for the most part, we’ll not need to do this, and instead can use the power of Python to help us.
So far, we’ve focused on thinking of a matrix as a collection of “column” vectors written next to each other. This is the more common perspective, since - as I’ve harped on - Ax is a linear combination of the columns of A.
But we can also think of a matrix as a collection of “row” vectors written on top of each other.
A=⎣⎡503613−142021−141⎦⎤
A contains 5 vectors in R3 in its rows, each in R3. These vectors also have a span, which in this case is a subspace of R3.
Where did ATy come from? Remember, Ax is a linear combination of the columns of A. If we transpose A, then ATy is a linear combination of the columns of AT, which are the rows of A.
ATy=⎣⎡5320−1134−1624101⎦⎤⎣⎡y1y2y3y4y5⎦⎤=linear combination of rows of Ay1⎣⎡532⎦⎤+y2⎣⎡0−11⎦⎤+y3⎣⎡34−1⎦⎤+y4⎣⎡624⎦⎤+y5⎣⎡101⎦⎤
Remember from Chapter 5.1 that (ATy)T=yTA. The product yTA is also a linear combination of the rows of A; it just returns a row vector with shape 1×d rather than a vector with shape d×1.
yTA=[y1y2y3y4y5]⎣⎡503613−142021−141⎦⎤=linear combination of rows of Ay1[532]+y2[0−11]+...+y5[101]
In yTA, we left-multipliedA by a vector; in Ax, we right-multipliedA by a vector. These are distinct types of multiplication, as they involve vectors of different shapes.
Since the columns of AT are the rows of A, the row space of A is the column space of AT, meaning
rowsp(A)=colsp(AT)
To avoid carrying around lots of notation, I’ll often just use colsp(AT) to refer to the row space of A.
What is the dimension of the row space of A? Other ways of phrasing this question are:
How many linearly independent rows does A have?
What is the rank of AT?
We know the answer can’t be more than 3, since the rows of A are vectors in R3. We could use the algorithm first presented in Chapter 4.1 to find a linearly independent set of rows with the same span as all 5 rows.
An easy way to see that dim(colsp(AT))=2 is to pick out two of the five vectors, row 2=a2=⎣⎡0−11⎦⎤ and row 5=a5=⎣⎡101⎦⎤, and show that the other three vectors can be written as linear combinations of them. I chose a2 and a5 just because they have the simplest numbers, and because they’re linearly independent from one another.
a1=⎣⎡532⎦⎤=−3a2+5a5
a3=⎣⎡34−1⎦⎤=−4a2+3a5
a4=⎣⎡624⎦⎤=−2a2+6a5
You might notice that the number of linearly independent rows of A and the number of linearly independent columns of A were both 2.
Sometimes, we say the dimension of colsp(A) is the column rank of A, and the dimension of colsp(AT) is the row rank of A. For the example A we’ve been working with, both the column rank and row rank are 2.
But, there’s no need for separate names.
It’s not immediately obvious why this is true, and honestly, most of the proofs of it I’ve found aren’t all that convincing for a first-time linear algebra student. Still, I’ll try to prove this fact for you in just a little bit.
So, in general, what is rank(A)? Since the rank is equal to both the number of linearly independent columns and the number of linearly independent rows, then the largest the rank can be is the smaller of the number of rows and columns.
For example, if A is a 7×9 matrix, then its rank is at most 7, since it cannot have more than 7 linearly independent rows. So in general, if A is an n×d matrix, then
0≤rank(A)≤min(n,d)
if n>d, rows can’t be independent⎣⎡52−130313161⎦⎤if n<d, columns can’t be independent⎣⎡1919030300206−12−3⎦⎤
We say a matrix is full rank if it has the largest possible rank for a matrix of its shape, i.e. ifrank(A)=min(n,d). We’ll mostly use this term when refering to square matrices, which we’ll focus more on in Chapter 6.2.
Next, we introduce the null space and connect it to rank via the rank-nullity theorem.