Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

6.5. The Gram-Schmidt Process

This section can be thought of as a detour through the main storyline of the course. Chapter 7.1 is where I connect the idea of projecting y\vec y onto the column space of XX to that of linear regression.

Instead, here I’ll introduce a new algorithm that can be used to turn a collection of vectors into a more convenient form, and see how this can make the act of projecting y\vec y onto the column space of XX much easier.


Why Orthogonalize?

Recall, a set of vectors q1,q2,,qdRn\vec q_1, \vec q_2, \ldots, \vec q_d \in \mathbb{R}^n are orthonormal if they are both:

  1. pairwise orthogonal, meaning qiqj=0\vec q_i \cdot \vec q_j = 0 for all iji \neq j, and

  2. each vector is a unit vector, meaning qi=1\lVert \vec q_i \rVert = 1 for all ii

Orthonormal vectors (and matrices containing them) are convenient to work with. For example, if QQ’s columns are the vectors q1,q2,,qd\vec q_1, \vec q_2, \ldots, \vec q_d, then QTQQ^TQ, the matrix containing the dot products of the columns of QQ, is the identity matrix.

QTQ=[q1q1q1q2q1qdq2q1q2q2q2qdqdq1qdq2qdqd]=[100010001]=IQ^TQ = \begin{bmatrix} \vec q_1 \cdot \vec q_1 & \vec q_1 \cdot \vec q_2 & \cdots & \vec q_1 \cdot \vec q_d \\ \vec q_2 \cdot \vec q_1 & \vec q_2 \cdot \vec q_2 & \cdots & \vec q_2 \cdot \vec q_d \\ \vdots & \vdots & \ddots & \vdots \\ \vec q_d \cdot \vec q_1 & \vec q_d \cdot \vec q_2 & \cdots & \vec q_d \cdot \vec q_d \end{bmatrix} = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix} = I

(I know that I promised to use superscripts like q(1)\vec q^{(1)} to denote columns of a matrix, but I have my reasons for doing it this way in this section.)

As we’ve seen in Chapter 6.3, when projecting y\vec y onto the column space of XX, the matrix XTXX^TX – and its inverse – plays a big role in finding the optimal coefficients w\vec w^* to multiply each column of XX by. Most matrices, by default, don’t have orthonormal columns. But if they did, then some of these calculations would be much, much simpler!

So, the goal here is to learn how to turn a linearly independent set of vectors into an orthonormal set of vectors with the same span, i.e. how to “orthogonalize” a set of vectors.

v1,v2,,vdlinearly independent set of vectorsq1,q2,,qdorthonormal set of vectors with the same span\underset{\text{linearly independent set of vectors}}{\vec v_1, \vec v_2, \ldots, \vec v_d} \to \underset{\text{orthonormal set of vectors with the same span}}{\vec q_1, \vec q_2, \ldots, \vec q_d}

The Algorithm

The algorithm that produces this orthonomal set of vectors is called the Gram-Schmidt process. It exploits the fact that when you project y\vec y onto x\vec x, the error vector

e=yp\vec e = \vec y - \vec p

is orthogonal to x\vec x.

Loading...
Loading...
Image produced in Jupyter

If you look in the figure above, the vectors x\color{#3d81f6} \vec x and e\color{#d81b60} \vec e are orthogonal, and have the same span as y\color{orange} \vec y and x\color{#3d81f6} \vec x. The key takeaway is that if you’d like to “invent” vectors that are orthogonal to each other, you can construct them by iteratively projecting!

To illustrate how the algorithm works, let’s use as an example the vectors

v1=[111],v2=[101],v3=[112]\vec v_1 = \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}, \quad \vec v_2 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \quad \vec v_3 = \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}

These are three linearly independent vectors in R3\mathbb{R}^3, though they are not orthogonal. These vectors span some subspace SS. Our goal is to find an orthonormal set of vectors that spans the same SS. (In this case, SS is all of R3\mathbb{R}^3, but in general this process works even if d<nd < n.)

In what follows, let projx(y)\text{proj}_{\vec x}(\vec y) be the projection of y\vec y onto x\vec x, i.e. projx(y)=yxxxx\text{proj}_{\vec x}(\vec y) = \frac{\vec y \cdot \vec x}{\vec x \cdot \vec x} \vec x.

  • Iteration 1: Set Q1=v1{\color{3d81f6} \vec Q_1} = \vec v_1.

    In the first iteration, we simply take the first vector v1\vec v_1 and copy it to Q1\vec Q_1. From now on, each new vector will be constructed to be orthogonal to all previously constructed Qi\vec Q_i’s.

    Q1=[111]\color{3d81f6} \vec Q_1 = \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}
  • Iteration 2: Set Q2=v2projQ1(v2){\color{orange} \vec Q_2} = \vec v_2 - \text{proj}_{\color{3d81f6}\vec Q_1}(\vec v_2).

    Q2\color{orange} \vec Q_2 is the same as the error vector from projecting v2\vec v_2 onto Q1\color{3d81f6} \vec Q_1, which we know is orthogonal to Q1\color{3d81f6} \vec Q_1.

    Even without actually calculating Q2\color{orange} \vec Q_2, we can see that it, by definition, must be orthogonal to Q1\color{3d81f6} \vec Q_1.

    Q2Q1=(v2projQ1(v2))Q1=(v2v2Q1Q1Q1Q1)Q1=v2Q1v2Q1Q1Q1Q1Q1=v2Q1v2Q1=0\begin{align*} {\color{orange} \vec Q_2} \cdot {\color{3d81f6} \vec Q_1} &= \left(\vec v_2 - \text{proj}_{\color{3d81f6}\vec Q_1}(\vec v_2)\right) \cdot {\color{3d81f6} \vec Q_1} \\ &= \left( \vec v_2- \frac{\vec v_2 \cdot {\color{3d81f6} \vec Q_1}}{{\color{3d81f6} \vec Q_1} \cdot {\color{3d81f6} \vec Q_1}} {\color{3d81f6} \vec Q_1} \right) \cdot {\color{3d81f6} \vec Q_1} \\ &= \vec v_2 \cdot {\color{3d81f6} \vec Q_1} - \frac{\vec v_2 \cdot {\color{3d81f6} \vec Q_1}}{{\color{3d81f6} \vec Q_1} \cdot {\color{3d81f6} \vec Q_1}} {\color{3d81f6} \vec Q_1} \cdot {\color{3d81f6} \vec Q_1} \\ &= \vec v_2 \cdot {\color{3d81f6} \vec Q_1} - \vec v_2 \cdot {\color{3d81f6} \vec Q_1} \\ &= 0 \end{align*}

    With that understanding in mind, let’s evaluate Q2\color{orange} \vec Q_2 explicitly.

    Q2=[101]v2[101][111][111][111][111]projQ1(v2)=[101]23[111]=[1/32/31/3]{\color{orange} \vec Q_2} = \underbrace{\begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}}_{\vec v_2} - \underbrace{\frac{\begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} \cdot \color{3d81f6} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}}{\color{3d81f6} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix} \cdot \color{3d81f6} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}} \color{3d81f6} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}}_{\text{proj}_{\color{3d81f6}\vec Q_1}(\vec v_2)} = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} - \frac{2}{3} \color{3d81f6} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix} = \color{orange} \begin{bmatrix} 1/3 \\ 2/3 \\ 1/3 \end{bmatrix}
  • Iteration 3: Set Q3=v3projQ1(v3)projQ2(v3){\color{d81b60} \vec Q_3} = \vec v_3 - \text{proj}_{\color{3d81f6}\vec Q_1}(\vec v_3) - \text{proj}_{\color{orange}\vec Q_2}(\vec v_3).

    When constructed this way, Q3\color{d81b60} \vec Q_3 is orthogonal to both Q1\color{3d81f6} \vec Q_1 and Q2\color{orange} \vec Q_2. Think of it this way: span({Q1,Q2})\text{span}(\{{\color{3d81f6} \vec Q_1}, {\color{orange} \vec Q_2}\}) is a plane in R3\mathbb{R}^3; after projecting v3\vec v_3 onto this plane, the remaining part of v3\vec v_3 that is orthogonal to the plane is Q3\color{d81b60} \vec Q_3. If that doesn’t make sense, execute Q3Q1{\color{d81b60} \vec Q_3} \cdot \color{3d81f6} \vec Q_1 and Q3Q2{\color{d81b60} \vec Q_3} \cdot \color{orange} \vec Q_2 using the same general steps I followed in Iteration 2.

Q3=v3projQ1(v3)projQ2(v3)=[112][112][111][111][111][111]projQ1(v3)[112][1/32/31/3][1/32/31/3][1/32/31/3][1/32/31/3]projQ2(v3)=[1/201/2]\begin{align*} {\color{d81b60} \vec Q_3} &= \vec v_3 - \text{proj}_{\color{3d81f6}\vec Q_1}(\vec v_3) - \text{proj}_{\color{orange}\vec Q_2}(\vec v_3) \\ &= \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} - \underbrace{\frac{\begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} \cdot \color{3d81f6} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}}{\color{3d81f6} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix} \cdot \color{3d81f6} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}} \color{3d81f6} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}}_{\text{proj}_{\color{3d81f6}\vec Q_1}(\vec v_3)} - \underbrace{\frac{\begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} \cdot \color{orange} \begin{bmatrix} 1/3 \\ 2/3 \\ 1/3 \end{bmatrix}}{\color{orange} \begin{bmatrix} 1/3 \\ 2/3 \\ 1/3 \end{bmatrix} \cdot \color{orange} \begin{bmatrix} 1/3 \\ 2/3 \\ 1/3 \end{bmatrix}} \color{orange} \begin{bmatrix} 1/3 \\ 2/3 \\ 1/3 \end{bmatrix}}_{\text{proj}_{\color{orange}\vec Q_2}(\vec v_3)} \\ &= \color{d81b60} \begin{bmatrix} -1/2 \\ 0 \\ 1/2 \end{bmatrix} \end{align*}

If there were more vi\vec v_i’s, we’d continue this process, each time constructing a new Qi\vec Q_i that is orthogonal to all previously constructed Qi\vec Q_i’s by “subtracting off” the parts we’ve already accounted for through the earlier Qi\vec Q_i’s.

Now, Q1,Q2,Q3\vec Q_1, \vec Q_2, \vec Q_3 are orthogonal to one another, but they are not yet unit vectors. To make them unit vectors, we simply need to divide each by its length.

q1=Q1Q1=[1/31/31/3]q2=Q2Q2=[1/62/61/6]q3=Q3Q3=[1/201/2]{\color{3d81f6} \vec q_1 = \frac{\color{3d81f6} \vec Q_1}{\lVert \color{3d81f6} \vec Q_1 \rVert} = \boxed{\begin{bmatrix} 1 /\sqrt{3} \\ -1 /\sqrt{3} \\ 1 /\sqrt{3} \end{bmatrix}}} \\ {\color{orange} \vec q_2 = \frac{\color{orange} \vec Q_2}{\lVert \color{orange} \vec Q_2 \rVert} = \boxed{\begin{bmatrix} 1 /\sqrt{6} \\ 2 /\sqrt{6} \\ 1 /\sqrt{6} \end{bmatrix}}} \\ \color{d81b60} \vec q_3 = \frac{\color{d81b60} \vec Q_3}{\lVert \color{d81b60} \vec Q_3 \rVert} = \boxed{\begin{bmatrix} -1 /\sqrt{2} \\ 0 \\ 1 /\sqrt{2} \end{bmatrix}}

Now, the vectors q1,q2,q3\vec q_1, \vec q_2, \vec q_3 are orthonormal to one another, and they span the same subspace SS as the vectors v1,v2,v3\vec v_1, \vec v_2, \vec v_3!

A quick note on signs: you could multiply any of the qi\vec q_i’s by -1 without changing the span of the collection, so if using a computer to compute the qi\vec q_i’s, you might see them come out with different signs.

Loading...

Notice above that v1\color{3d81f6} \vec v_1 in the top and q1\color{3d81f6} \vec q_1 in the bottom are parallel, it’s just that q1\color{3d81f6} \vec q_1 is a unit vector. The three vi\vec v_i’s in the top figure are linearly independent but not orthogonal; the three qi\vec q_i’s in the bottom figure are, however, orthonormal.


Gram-Schmidt and Projection

In general, we’re presumed to have a vector y\vec y that we’d like to approximate as a linear combination of matrix XX’s columns. Generally, XX’s columns are not orthonormal – they may not even be linearly independent.

If we:

  1. Remove the linearly dependent columns from XX

  2. Use the Gram-Schmidt process to orthonormalize the remaining columns, and store them in the columns of QQ

Then, the matrix QQ that results has the same column space as XX, but solving the normal equations for QQ and y\vec y is much simpler.

Our problem now is slightly different: we’re trying to find the best linear combination of the columns of QQ (not the columns of XX) that approximates y\vec y, i.e. we’re projecting y\vec y onto the column space of QQ. If we adjust our objective to this goal, then the best w\vec w^* – the one that minimizes yQw2\lVert \vec y - Q \vec w \rVert^2 – satisfies

QTQw=QTyQ^TQ \vec w = Q^T \vec y

But, QTQ=IQ^TQ = I as I showed you at the start of this section, so this just reduces to

w=QTy\vec w^* = Q^T \vec y

That’s it! No inversion required: we can compute w\vec w^* with just a single matrix-vector multiplication.