We’re almost ready to return to our original motivation for studying linear algebra, which was to perform linear regression using multiple input variables. This section outlines the final piece of the puzzle.
Approximating using a Single Vector¶
In Chapter 2.3, we introduced the approximation problem, which asked:
Among all vectors of the form , which one is closest to ?
We now know the answer is the vector , where
is called the orthogonal projection of onto .
As we’ve studied, the resulting error vector,
is orthogonal to .
In our original look at the approximation problem, we were approximating using a scalar multiple of just a single vector, . The set of all scalar multiples of , denoted by , is a line in .
Key idea: instead of projecting onto the subspace spanned by just a single vector, how might we project onto the subspace spanned by multiple vectors?
Approximating using Multiple Vectors¶
Equipped with our understanding of linear independence, spans, subspaces, and column spaces, we’re ready to tackle a more advanced version of the approximation problem. I’m going to use slightly different variables to pose the problem than I did in the single vector case, to make the transition back into data, models, and loss functions a little smoother.
All three statements at the bottom of the box above are asking the exact same question; I’ve presented all three forms so that you see more clearly how the ideas of spans, column spaces, and matrix-vector multiplication fit together. I will tend to refer to the latter two versions of the problem the most. In what follows, suppose is an matrix whose columns , , ..., are the building blocks we want to approximate with.
First, let’s get the trivial case out of the way. If , then the vector in that is closest to is just itself. If that’s the case, there exists some such that exactly. This is unique only if 's columns are linearly independent; otherwise, there will be infinitely many good ’s.
But, that’s not the case I’m really interested in. I care more about when is not in . (Remember, this is the case we’re interested in when we’re doing linear regression: usually, it’s not possible to make our predictions 100% correct, and we’ll have to settle for some error.)
Then what?
In general, is an -dimensional subspace of , where . In the diagram below, I’ve used a plane to represent ; just remember that may have more than 3 rows or columns.
We’re searching for the vector in that is closest to , i.e. whose distance from is minimal. Remember that is the set of linear combinations of 's columns, so it’s the set of all vectors of the form for some .
So, our problem boils down to finding the that minimizes the distance between and , i.e. the that minimizes the norm of the error vector .
is a function of only; and should be thought of as fixed. There are two ways we’ll minimize :
Using a geometric argument involving orthogonality, as we did in the single vector case.
Using calculus. This is more involved than before, since the input variable is a vector, not a scalar, but it can be done, as we’ll see in Chapter 4.
What does our intuition tell us? Extending the single vector case, we expect the vector in that is closest to to be the orthogonal projection of onto .
More to come!