In Chapter 6.3, we explored the idea of projecting a vector onto the column space of a matrix. If has linearly independent columns, then the vector in that is closest to is the vector , where is the unique solution to the normal equation,
When has linearly independent columns, then is invertible, so we can solve for uniquely.
What if ’s Columns are Linearly Dependent?¶
In the case where ’s columns are linearly dependent, we can’t invert to solve for . This means that
has infinitely many solutions. Let’s give more thought to what these solutions actually are.
First, note that all of these solutions for correspond to the same projection, . The “best approximation” of in is always just one vector; if there are infinitely many 's, that just means there are infinitely many ways of describing that one best approximation. Remember, if vectors are linearly independent, then any of their linear combinations can only be expressed in one way; if they are linearly dependent, then their linear combinations can be expressed in infinitely many ways.
In other words, if has linearly dependent columns, then there are infinitely many 's that satisfy the normal equation, but they all correspond to the same projection in the figure below.
Let me drive this point home further. Let’s suppose both and satisfy
Then,
which means that
i.e. the difference between the two vectors, , is in . But, back in Chapter 5.3, we proved that and have the same null space, meaning any vector that gets sent to by also gets sent to by , and vice versa.
So,
too, but that just means
meaning that even though and are different-looking coefficient vectors, they both still correspond to the same linear combination of ’s columns!
Let’s see how we can apply this to an example. Let and . This is an example of a matrix with linearly dependent columns, so there’s no unique that satisfies the normal equations.
Finding One Solution¶
One way to find a possible vector is to solve the normal equations. is not invertible, so we can’t solve for uniquely, but we can still try and find a solution.
Here’s one approach: let’s just toss out the linearly dependent columns of and solve for using the remaining columns. Then, for the full can use the same coefficients for the linearly independent columns, but 0s for the dependent ones. Removing the linearly dependent columns does not change (i.e. the set of all linear combinations of ’s columns), so the projection is the same.
The easy solution is to keep columns 2 and 3, since their numbers are smallest. So, for now, let’s say
Here, . I won’t bore you with the calculations; you can verify them yourself.
Now, one possible for the full is , which keeps the same coefficients on columns 2 and 3 as in , but 0 for the column we didn’t use.
Finding All Solutions¶
As I mentioned above, if there are infinitely many solutions to the normal equation, then the difference between any two solutions is in , which is also . Put another way, if satisfies the normal equations, then so does for any .
So, once we have one , to get the rest, just add any vector in or (since those are the same subspaces).
What is ? It’s the set of vectors such that .
In our particular example,
we see that , so has a dimension of (by the rank-nullity theorem), so it’s going to be the span of a single vector. All we need to do now is find one vector in , and we will know that the null space is the set of scalar multiples of that vector.
Since column 1 is three times column 2, the vector must be in .
So, since , we know that the set of all possible 's is
This is not a subspace, since it doesn’t contain the zero vector.
There’s another way to arrive at this set of possible 's: we can solve the normal equations directly. I wouldn’t recommend this second approach since it’s much longer, but I’ll add it here for completeness.
Then, the normal equations give us
The first and second equations are just scalar multiples of each other, so we can disregard one of them, and solve for a form where we can use one unknown as a parameter for the other two. To illustrate, let’s pick .
gives us . Plugging this into both equations gives us
These are now both the same equation; the first one is just 3 times the second. So, we can solve for in terms of :
which gives us the complete solution
This is the exact same line as using the null space approach! Plug in to get , for example. The set of all possible 's is not a subspace, since it doesn’t contain the zero vector.
The Projection Matrix¶
So far, we’ve established that the vector in that is closest to is the vector , where is the solution to the normal equations,
If is invertible, then is the unique vector
meaning that the vector in that is closest to is
You’ll notice that the above expression also looks like a linear transformation applied to , where is being multiplied by the matrix
The matrix is called the projection matrix. In other classes, it is called the “hat matrix”, because they might use instead of and instead of , and in that notation, , so puts a “hat” on . (I don’t use hat notation in this class because drawing a hat on top of a vector is awkward. Doesn’t look strange?)
So,
shows us that there are two ways to interpret the act of projecting onto :
The resulting vector is some optimal linear combination of 's columns.
The resulting vector is the result of applying the linear transformation to .
Let’s work out an example. Suppose
’s columns are linearly independent, so is invertible, and
is well-defined.
X = np.array([[3, 0],
[0, 154],
[6, 0]])
P = X @ np.linalg.inv(X.T @ X) @ X.T
Parray([[0.2, 0. , 0.4],
[0. , 1. , 0. ],
[0.4, 0. , 0.8]])P @ np.array([1, 2, 3])array([1.4, 2. , 2.8])contains the information we need to project onto . Each row of tells us the right mixture of ’s components we need to construct the projection.
Notice that ’s second row is . This came from the fact that ’s first column had a second component of 0 while its second column had a non-zero second component but zeros in the other two components, meaning that we can scale ’s second column to exactly match ’s second component. Change the 154 in to any other non-zero value and won’t change!
Additionally, if we consider some that is already in , then multiplying it by doesn’t change it! For example, if we set (the sum of ’s columns), then .
P @ np.array([3, 154, 6])array([ 3., 154., 6.])Let’s work through some examples that develop our intuition for .
Example: Is invertible?¶
Suppose exists, meaning is invertible. Is invertible? If so, what is its inverse?
Solution
Before we do any calculations, intuitively the answer should be no. Once we’ve projected onto , we’ve lost information, since we went from an arbitrary vector in to a vector in a smaller subspace, so it shouldn’t be possible to reverse the projection. Put another way, two different vectors in might have the same “shadow” onto .
Even in the most recent example, is not invertible, since column 3 is a multiple of column 1.
Let’s think about this a bit more from the perspective of ranks. It turns out that ; I’ve provided a proof of this at the bottom of the solutions box, but you might want to attempt it on your own for practice.
Remember that is an matrix, meaning . doesn’t need to have a rank of for to be invertible; it just needs to have a rank of .
Since is an matrix, in general it won’t be the case that . To illustrate, in the example above where , was a matrix with rank 2.
The only case in which is when , which also only happens when is an square matrix that is also invertible. In such a case, , and so we probably wouldn’t set out to project onto in the first place, since any vector in is already a linear combination of 's columns.
Extra: Proof that
We can show that this is the case by showing that and both have the same column spaces. This proof also helps explain why the normal equation, , always has at least one solution for , even when isn’t invertible.
Show , i.e. show that any vector in the column space of is also in the column space of .
If , then can be written as a linear combination of ’s columns. Say for some . Then,
Here, we see that if , then is also in 's column space. So,
Show , i.e. show that any vector in the column space of is also in the column space of .
This direction is a bit more involved. Let’s start by considering some vector , meaning
for some . What happens if we multiply both sides by ?
If , then can be written as a linear combination of 's columns. Say for some . Then,
So, if , then , meaning that is also in ’s column space if it’s in 's column space. Intuitively, means that if is already in the span of ’s columns, then projecting it onto doesn’t change it.
Now that we’ve shown that and , we can conclude that . If two sets are subsets of each other, then they must be equal.
Example: Is orthogonal?¶
Is orthogonal?
Solution
No. Orthogonal matrices have the property that , meaning that
But, as we saw, is not invertible in general, so it can’t satisfy this property. This tells us that does not perform a rotation; projections are not rotations. Rotations can be undone but projections can’t.
Example: Is symmetric?¶
Is symmetric?
Solution
Yes. Symmetric matrices have the property that . We can show that satisfies this property; to do so, we’ll need to use the fact that .
Going from the second to the third line, we used the fact that is symmetric, and so is its inverse. Remember that is a square matrix consisting of the dot products of the columns of with themselves.
Example: Is idempotent?¶
Recall, an idemponent matrix satisfies . Is idempotent?
Solution
Yes.
Intuitively, this means that is the same as , meaning that once we’ve projected onto , projecting its projection again onto gives us back the same , since is already in .
Example: What is , and why?¶
What is ? What does the result mean?
Solution
Interpret as a matrix made up of , , ..., as its columns. is the projection of onto , but since is already in , projecting it again onto gives us back the same . So, should just be again.
Example: Rotations, Reflections, and Projections¶
Suppose is an arbitrary matrix. Describe the conditions on that make the corresponding linear transformation a...
Rotation
Reflection
Projection
Summary¶
Let’s take a step back and walk through our logic from Chapter 6.3 and here in Chapter 6.4 once more, since it’s that important.
Suppose is an matrix and is some vector in .
Orthogonal Projections¶
Our goal is to find the linear combination of ’s columns that is closest to .
This boils down to finding the vector that minimizes .
The vector that minimizes makes the resulting error vector,
orthogonal to the columns of .
The that makes the error vector orthogonal to the columns of is the one that satisfies the normal equation,
If is invertible, which happens if and only if ’s columns are linearly independent, then is the unique vector
Otherwise, there are infinitely many solutions to the normal equation. All of these infinitely many solutions correspond to the same projection, . If is one solution (which can be found by removing the linearly dependent columns of ), then all other solutions are of the form , where is any vector in .
The Projection Matrix¶
Assuming has linearly independent columns, the projection matrix is
is defined such that is the vector in that is closest to . is symmetric and idemponent, but not invertible nor orthogonal.
We’re now finally ready to head back to the land of machine learning.