Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5.3. Singular Value Decomposition

Eigenvalues and eigenvectors are powerful concepts, but unfortunately, they only apply to square matrices. It would be nice if we could extend some of their utility to non-square matrices, like matrices containing real data, which typically have many more rows than columns.

This is precisely where the singular value decomposition (SVD) comes in. You should think of it as a generalization of the eigenvalue decomposition, A=VΛV1A = V \Lambda V^{-1}, to non-square matrices. When we first introduced eigenvalues and eigenvectors, we said that an eigenvector v\vec v of AA is a vector whose direction is unchanged when multiplied by AA – all that happens to v\vec v is that it’s scaled by a factor of λ\mathcal{\lambda}.

Av=λvA \vec v = \lambda \vec v

But now, suppose XX is some n×dn \times d matrix. For any vector vRd\vec v \in \mathbb{R}^d, the vector XvX \vec v is in Rn\mathbb{R}^n, not Rd\mathbb{R}^d, meaning v\vec v and XvX \vec v live in different universes. So, it can’t make sense to say that XvX \vec v results from stretching v\vec v.

Instead, we’ll find pairs of singular vectors, vRd\vec v \in \mathbb{R}^d and uRn\vec u \in \mathbb{R}^n, such that

Av=σuA \vec v = \sigma \vec u

where σ\mathcal{\sigma} is a singular value of AA (not a standard deviation!). Intuitively, this says that when XX is multiplied by v\vec v, the result is a scaled version of u\vec u. These singular values and vectors are the focus of Chapter 5.3. Applications of the singular value decomposition come in Chapter 5.4.


Introduction

I’ll start by giving you the definition of the SVD, and then together we’ll figure out where it came from.

Firstly, note that the SVD exists, no matter what XX is: it can be non-square, and it doesn’t even need to be full rank. But note that unlike in the eigenvalue decomposition, where we decomposed AA using just one eigenvector matrix VV and a diagonal matrix Λ\mathcal{\Lambda}, here we need to use two singular vector matrices UU and VV and a diagonal matrix Σ\mathcal{\Sigma}.

There’s a lot of notation and new concepts above. Let’s start with an example, then try and understand where each piece in X=UΣVTX = U \Sigma V^T comes from and means. Suppose

X=[3252352205510]X = \begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}

XX is a 4×34 \times 3 matrix with rank(X)=2\text{rank}(X) = 2, since its third column is the sum of the first two. Its singular value decomposition is given by X=UΣVTX = U \Sigma V^T:

X=[1613213231613213230223013260130]U[1500030000000]Σ[16162612120131313]VTX = \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}}_{U} \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}

Two important observations:

  1. UU and VV are both orthogonal matrices, meaning UTU=UUT=I4×4U^TU = UU^T = I_{4 \times 4} and VTV=VVT=I3×3V^TV = VV^T = I_{3 \times 3}.

  2. Σ\mathcal{\Sigma} contains the singular values of XX on the diagonal, arranged in decreasing order. We have that σ1=15\sigma_1 = 15, σ2=3\sigma_2 = 3, and σ3=0\sigma_3 = 0. XX has three singular values, but only two are non-zero. In general, the number of non-zero singular values is equal to the rank of XX.

Where did all of these numbers come from?


Discovering the SVD

The SVD of XX depends heavily on the matrices XTXX^TX and XXTXX^T. While XX itself is 4×34 \times 3,

  • XTXX^TX is a symmetric 3×33 \times 3 matrix, containing the dot products of XX’s columns

  • XXTXX^T is a symmetric 4×44 \times 4 matrix, containing the dot products of XX’s rows

Since XTXX^TX and XXTXX^T are both square matrices, they have eigenvalues and eigenvectors. And since they’re both symmetric, their eigenvectors for different eigenvalues are orthogonal to each other, as the spectral theorem A=QΛQTA = Q \Lambda Q^T guarantees for any symmetric matrix AA.

Singular Values and Singular Vectors

The singular value decomposition involves creatively using the eigenvalues and eigenvectors of XTXX^TX and XXTXX^T. Suppose X=UΣVTX = U \Sigma V^T is the SVD of XX. Then, using the facts that UTU=IU^TU = I and VTV=IV^TV = I, we have:

XTX=(UΣVT)T(UΣVT)=VΣTUTUIΣVT=VΣTΣVTlooks like QΛQTX^TX = (U \Sigma V^T)^T (U \Sigma V^T) = V \Sigma^T \underbrace{U^T U}_{I} \Sigma V^T = \underbrace{V \Sigma^T \Sigma V^T}_{\text{looks like } Q \Lambda Q^T}
XXT=(UΣVT)(UΣVT)T=UΣVTVIΣTUT=UΣΣTUTlooks like PΛPTXX^T = (U \Sigma V^T) (U \Sigma V^T)^T = U \Sigma \underbrace{V^T V}_{I} \Sigma^T U^T = \underbrace{U \Sigma \Sigma^T U^T}_{\text{looks like } P \Lambda P^T}

This just looks like we diagonalized XTXX^TX and XXTXX^T! The expressions above are saying that:

  • VV’s columns are the eigenvectors of XTXX^TX

  • UU’s columns are the eigenvectors of XXTXX^T

XTXX^TX and XXTXX^T usually have different sets of eigenvectors, which is why UU and VV are generally not the same matrix (they don’t even have the same shape).

The eigenvalues of XTXX^TX and XXTXX^T are the same, though: those are the non-zero entries of ΣTΣ\Sigma^T \Sigma and ΣΣT\Sigma \Sigma^T. Since Σ\mathcal{\Sigma} is an n×dn \times d matrix, ΣTΣ\Sigma^T \Sigma is a d×dd \times d matrix and ΣΣT\Sigma \Sigma^T is an n×nn \times n matrix. But, when you work out both products, you’ll notice that their non-zero values are the same.

Suppose for example that Σ=[σ1000σ2000σ3000]\Sigma = \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \\ 0 & 0 & 0 \end{bmatrix}. Then,

ΣTΣ=[σ10000σ20000σ30][σ1000σ2000σ3000]=[σ12000σ22000σ32]\Sigma^T \Sigma = \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & 0 \\ 0 & 0 & \sigma_3 & 0 \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \\ 0 & 0 & 0 \end{bmatrix} = \begin{bmatrix} \sigma_1^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2\end{bmatrix}
ΣΣT=[σ1000σ2000σ3000][σ10000σ20000σ30]=[σ120000σ220000σ3200000]\Sigma \Sigma^T = \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & 0 \\ 0 & 0 & \sigma_3 & 0 \end{bmatrix} = \begin{bmatrix} \sigma_1^2 & 0 & 0 & 0 \\ 0 & \sigma_2^2 & 0 & 0 \\ 0 & 0 & \sigma_3^2 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

But, these matrices with the squared terms are precisely the Λ\mathcal{\Lambda}'s in the spectral decompositions of XTX=QΛQTX^TX = Q \Lambda Q^T and XXT=PΛPTXX^T = P \Lambda P^T. This means that

σi2=λi    σi=λi\sigma_i^2 = \lambda_i \implies \sigma_i = \sqrt{\lambda_i}

where σi\sigma_i is a singular value of XX and λi\lambda_i is an eigenvalue of XTXX^TX or XXTXX^T.

The above derivation is enough to justify that the eigenvalues of XTXX^TX and XXTXX^T are never negative, but for another perspective, note that both XTXX^TX and XXTXX^T are positive semidefinite, meaning their eigenvalues are non-negative. (You’re wrestling with this fact in Lab 11 and Homework 10.)

Computing the SVD

To find UU, Σ\mathcal{\Sigma}, and VTV^T, we don’t actually need to compute both XTXX^TX and XXTXX^T: all of these quantities can be uncovered just with one of them.

Let’s return to our example, X=[3252352205510]X = \begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}. Using what we’ve just learned, let’s find UU, Σ\mathcal{\Sigma}, and VTV^T ourselves. XTXX^TX has fewer entries than XXTXX^T, so let’s start with it. I will delegate some number crunching to numpy.

import numpy as np

X = np.array([[3, 2, 5],
              [2, 3, 5],
              [2, -2, 0],
              [5, 5, 10]])
X.T @ X
array([[ 42, 33, 75], [ 33, 42, 75], [ 75, 75, 150]])

XTXX^TX is a 3×33 \times 3 matrix, but its rank is 2, meaning it will have an eigenvalue of 0. What are its other eigenvalues?

eigvals, eigvecs = np.linalg.eig(X.T @ X)
eigvals
array([225., 9., 0.])

The eigenvalues of XTXX^TX are 225, 9, and 0. This tells us that the singular values of XX are 225=15\sqrt{225} = 15, 9=3\sqrt{9} = 3, and 0. So far, we’ve discovered that Σ=[1500030000000]\mathcal{\Sigma} = \begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}. Remember that Σ\mathcal{\Sigma} always has the same shape as XX (both are n×dn \times d), and all of its entries are 0 except for the singular values, which are arranged in decreasing order on the diagonal, starting with the largest singular value in the top left corner.

Let’s now find the eigenvectors of XTXX^TX – that is, the right singular vectors of XX – which we should store in VV. We expect the eigenvectors vi\vec v_i of XTXX^TXto be orthogonal, since XTXX^TX is symmetric.

X.T @ X
array([[ 42, 33, 75], [ 33, 42, 75], [ 75, 75, 150]])
  • For λ1=225\lambda_1 = 225, one eigenvector is v1=[112]\vec v_1 = \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}, since

[4233753342757575150]XTX[112]=[225225450]=225[112]\underbrace{\begin{bmatrix} 42 & 33 & 75 \\ 33 & 42 & 75 \\ 75 & 75 & 150 \end{bmatrix}}_{X^TX} \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 225 \\ 225 \\ 450 \end{bmatrix} = 225 \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}
  • For λ2=9\lambda_2 = 9, one eigenvector is v2=[110]\vec v_2 = \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix}, since

[4233753342757575150][110]=[990]=9[110]\begin{bmatrix} 42 & 33 & 75 \\ 33 & 42 & 75 \\ 75 & 75 & 150 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} = \begin{bmatrix} 9 \\ -9 \\ 0 \end{bmatrix} = 9 \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix}
  • For λ3=0\lambda_3 = 0, one eigenvector is v3=[111]\vec v_3 = \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix}, since

[4233753342757575150][111]=[000]\begin{bmatrix} 42 & 33 & 75 \\ 33 & 42 & 75 \\ 75 & 75 & 150 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}

Note that v3\vec v_3 is a basis for nullsp(X)\text{nullsp}(X), since dim(nullsp(X))=1\text{dim}(\text{nullsp}(X)) = 1 and Xv3=0X \vec v_3 = \vec 0. Hold that thought for now.

To create VV, all we need to do is turn v1\vec v_1, v2\vec v_2, and v3\vec v_3 into unit vectors.

[112][161626]v1,[110][12120]v2,[111][131313]v3\begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \end{bmatrix}}_{\vec v_1}, \quad \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix}}_{\vec v_2}, \quad \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \end{bmatrix}}_{\vec v_3}

Stacking these unit vectors together gives us VV:

V=[16121316121326013]V = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{6}} & -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \\ \frac{2}{\sqrt{6}} & 0 & -\frac{1}{\sqrt{3}} \end{bmatrix}

And indeed, since XTXX^TX is symmetric, VV is orthogonal: VTV=VVT=I3×3V^TV = VV^T = I_{3 \times 3}.

Great! We’re almost done computing the SVD. So far, we have

[3252352205510]X=U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = U \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}

XV=UΣXV = U \Sigma and Xvi=σiuiX \vec v_i = \sigma_i \vec u_i

Ideally, we can avoid having to compute the eigenvectors of XXTXX^T to stack into UU. And we can. If we start with

X=UΣVTX = U \Sigma V^T

and multiply both sides on the right by VV, we uncover a relationship between the columns of UU and the columns of VV.

XV=UΣXV = U \Sigma

Let’s unpack this. On the left, the matrix XVXV is made up of multiplying XX by each column of VV.

XV=X4×3[v1v2v3]3×3=[Xv1Xv2Xv3]4×3XV = X_{4 \times 3} \begin{bmatrix} | & | & | \\ \vec v_1 & \vec v_2 & \vec v_3 \\ | & | & | \end{bmatrix}_{3 \times 3} = \begin{bmatrix} | & | & | \\ X \vec v_1 & X \vec v_2 & X \vec v_3 \\ | & | & | \end{bmatrix}_{4 \times 3}

On the right, UΣU \Sigma is made up of stretching each column of UU by the corresponding singular value in the diagonal of Σ\mathcal{\Sigma}.

UΣ=[u1u2u3u4]4×4[σ1000σ2000σ3000]4×3=[σ1u1σ2u2σ3u3]4×3U \Sigma = \begin{bmatrix} | & | & | & | \\ \vec u_1 & \vec u_2 & \vec u_3 & \vec u_4 \\ | & | & | & | \end{bmatrix}_{4 \times 4} \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \\ 0 & 0 & 0 \end{bmatrix}_{4 \times 3} = \begin{bmatrix} | & | & | \\ \sigma_1 \vec u_1 & \sigma_2 \vec u_2 & \sigma_3 \vec u_3 \\ | & | & | \end{bmatrix}_{4 \times 3}

But, since XV=UΣXV = U \Sigma, we have

[Xv1Xv2Xv3]=[σ1u1σ2u2σ3u3]\begin{bmatrix} | & | & | \\ X \vec v_1 & X \vec v_2 & X \vec v_3 \\ | & | & | \end{bmatrix} = \begin{bmatrix} | & | & | \\ \sigma_1 \vec u_1 & \sigma_2 \vec u_2 & \sigma_3 \vec u_3 \\ | & | & | \end{bmatrix}

Crucially though, Xvi=σiuiX \vec v_i = \sigma_i \vec u_i only holds when we’ve arranged the singular values and vectors in UU, Σ\mathcal{\Sigma}, and VTV^T consistently. This is one reason why we always arrange the singular values in Σ\mathcal{\Sigma} in decreasing order.

Back to our example. Again, we currently have

[3252352205510]X=U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = U \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}

We know XX, and we know each vi\vec v_i and σi\sigma_i. Rearranging Xvi=σiuiX \vec v_i = \sigma_i \vec u_i gives us

ui=1σiXvi\vec u_i = \frac{1}{\sigma_i} X \vec v_i

which is a recipe for computing u1\vec u_1, u2\vec u_2, and u3\vec u_3.

  1. u1=115X[161626]=[1616026]\vec u_1 = \frac{1}{15} X \begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ 0 \\ \frac{2}{\sqrt{6}} \end{bmatrix}

  2. u2=13X[12120]=[1321322230]\vec u_2 = \frac{1}{3} X \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix} = \begin{bmatrix} \frac{1}{3\sqrt{2}} \\ -\frac{1}{3\sqrt{2}} \\ \frac{2\sqrt{2}}{3} \\ 0 \end{bmatrix}

  3. u3=10X[131313]=...wait, what?\vec u_3 = \frac{1}{0} X \begin{bmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \end{bmatrix} = ... \text{wait, what?}

Something is not quite right: σ3=0\sigma_3 = 0, which means we can’t use ui=1σiXvi\vec u_i = \frac{1}{\sigma_i} X \vec v_i to compute u3\vec u_3. And, even if we could, this recipe tells us nothing about u4\vec u_4, which we also need to find, since UU is 4×44 \times 4.

Null Spaces Return

So far, we’ve found two of the four columns of UU.

U=[16132??16132??0223??260??]U = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & ? & ? \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & ? & ? \\ 0 & \frac{2\sqrt{2}}{3} & ? & ? \\ \frac{2}{\sqrt{6}} & 0 & ? & ? \end{bmatrix}

The issue is that we don’t have a recipe for what u3\vec u_3 and u4\vec u_4 should be. This problem stems from the fact that rank(X)=2\text{rank}(X) = 2.

X=[3252352205510]X = \begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}

If we were to compute XXTXX^T, whose eigenvectors are the columns of UU, we would have seen that it has an eigenvalue of 0 with geometric (and algebraic) multiplicity 2.

X @ X.T
array([[ 38, 37, 2, 75], [ 37, 38, -2, 75], [ 2, -2, 8, 0], [ 75, 75, 0, 150]])
eigvals, eigvecs = np.linalg.eig(X @ X.T)
eigvals
array([225., 9., -0., 0.])

So, all we need to do is fill u3\vec u_3 and u4\vec u_4 with two orthonormal vectors that span the eigenspace of XXTXX^T for eigenvalue 0. u1\vec u_1 is already an eigenvector for eigenvalue 225 and u2\vec u_2 is already an eigenvector for eigenvalue 9. Bringing in u3\vec u_3 and u4\vec u_4 will complete UU, making it a basis for R4\mathbb{R}^4, which it needs to be to be an invertible square matrix. (Remember, UU must satisfy UTU=UUT=IU^T U = UU^T = I, which means UU is invertible.)

But, the eigenspace of XXTXX^T for eigenvalue 0 is nullsp(XXT)\text{nullsp}(XX^T)!

Since XXTXX^T is symmetric, u3\vec u_3 and u4\vec u_4 will be orthogonal to u1\vec u_1 and u2\vec u_2 no matter what, since the eigenvectors for different eigenvalues of a symmetric matrix are always orthogonal. (Another perspective on why this is true is that u3\vec u_3 and u4\vec u_4 will be in nullsp(XXT)\text{nullsp}(XX^T) which is equivalent to nullsp(XT)\text{nullsp}(X^T), and any vector in nullsp(XT)\text{nullsp}(X^T) is orthogonal to any vector in colsp(X)\text{colsp}(X), which u1\vec u_1 and u2\vec u_2 are in. There’s a long chain of reasoning that leads to this conclusion: make sure you’re familiar with it.)

Observe that

  • [1101]\begin{bmatrix} -1 \\ -1 \\ 0 \\ 1 \end{bmatrix} is in nullsp(XXT)\text{nullsp}(XX^T), since [38372753738275228075750150][1101]=[0000]\begin{bmatrix} 38 & 37 & 2 & 75 \\ 37 & 38 & -2 & 75 \\ 2 & -2 & 8 & 0 \\ 75 & 75 & 0 & 150 \\\end{bmatrix} \begin{bmatrix} -1 \\ -1 \\ 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}

  • [2210]\begin{bmatrix} -2 \\ 2 \\ 1 \\ 0 \end{bmatrix} is in nullsp(XXT)\text{nullsp}(XX^T), since [38372753738275228075750150][2210]=[0000]\begin{bmatrix} 38 & 37 & 2 & 75 \\ 37 & 38 & -2 & 75 \\ 2 & -2 & 8 & 0 \\ 75 & 75 & 0 & 150 \\\end{bmatrix} \begin{bmatrix} -2 \\ 2 \\ 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}

The vectors [1101]\begin{bmatrix} -1 \\ -1 \\ 0 \\ 1 \end{bmatrix} and [2210]\begin{bmatrix} -2 \\ 2 \\ 1 \\ 0 \end{bmatrix} are orthogonal to each other, so they’re good candidates for u3\vec u_3 and u4\vec u_4, we just need to normalize them first.

[1101][1313013]u3,[2210][2323130]u4\begin{bmatrix} -1 \\ -1 \\ 0 \\ 1 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} -\frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ 0 \\ \frac{1}{\sqrt{3}} \end{bmatrix}}_{\vec u_3}, \quad \begin{bmatrix} -2 \\ 2 \\ 1 \\ 0 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \\ 0 \end{bmatrix}}_{\vec u_4}

Before we place u3\vec u_3 and u4\vec u_4 in UU, it’s worth noticing that u3\vec u_3 and u4\vec u_4 are both in nullsp(XT)\text{nullsp}(X^T) too.

XTu3=[3225232555010][1313013]=[000]X^T \vec u_3 = \begin{bmatrix} 3 & 2 & 2 & 5 \\ 2 & 3 & -2 & 5 \\ 5 & 5 & 0 & 10 \end{bmatrix} \begin{bmatrix} -\frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ 0 \\ \frac{1}{\sqrt{3}} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}
XTu4=[3225232555010][2323130]=[000]X^T \vec u_4 = \begin{bmatrix} 3 & 2 & 2 & 5 \\ 2 & 3 & -2 & 5 \\ 5 & 5 & 0 & 10 \end{bmatrix} \begin{bmatrix} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}

In fact, remember from Chapter 2.8 that

nullsp(XT)=nullsp(XXT)nullsp(X)=nullsp(XTX)\text{nullsp}(X^T) = \text{nullsp}(XX^T) \\ \text{nullsp}(X) = \text{nullsp}(X^TX)

So, we have

U=[1613213231613213230223013260130]U = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}

And finally, we have computed the SVD of XX!

[3252352205510]X=[1613213231613213230223013260130]U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}}_{U} \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}

For the most part, we will use numpy to compute the SVD of a matrix. But, it’s important to understand the steps we took to compute the SVD manually.


Examples

Let’s work through a few computational problems. Most of our focus on the SVD in this class is conceptual, but it’s important to have some baseline computational fluency.

Example: SVD of a 2×22 \times 2 Matrix

We developed the SVD to decompose non-square matrices. But, it still works for square matrices too. Find the SVD of X=[5403]X = \begin{bmatrix} 5 & 4 \\ 0 & 3 \end{bmatrix}.

Example: SVD of a Wide Matrix

Find the SVD of X=[310401]X = \begin{bmatrix} 3 & 1 & 0 \\ 4 & 0 & 1 \end{bmatrix}.

Example: SVD of a Positive Semidefinite Matrix

Recall, a positive semidefinite matrix is a square, symmetric matrix whose eigenvalues are non-negative.

Consider the positive semidefinite matrix X=[1224]X = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}. Find the SVD of XX. How does it compare to the spectral decomposition of XX into QΛQTQ \Lambda Q^T?

Example: SVD of a Symmetric Matrix with Negative Eigenvalues

Let X=[2332]X = \begin{bmatrix} 2 & 3 \\ 3 & 2 \end{bmatrix}. Find the SVD of XX. How do its singular values compare to its eigenvalues?

(Remember that only square matrices have eigenvalues, but all matrices have singular values.)

Example: SVD of an Orthogonal Matrix

Suppose XX itself is an n×nn \times n orthogonal matrix, meaning XTX=XXT=IX^TX = XX^T = I. Why are all of XX’s singular values 1?


Visualizing the SVD

In Chapter 5.2, we visualized the spectral theorem, which decomposed a symmetric matrix A=QΛQTA = Q \Lambda Q^T, which we said represented a rotation, stretch, and rotation back.

Uorthogonal!Σdiagonal!VTorthogonal!\underbrace{U}_{\text{orthogonal!}} \:\:\:\: \underbrace{\Sigma}_{\text{diagonal!}} \:\:\:\: \underbrace{V^T}_{\text{orthogonal!}}

X=UΣVTX = U \Sigma V^T can be interpreted similarly. If we think of f(w)=Xwf(\vec w) = X \vec w as a linear transformation from Rd\mathbb{R}^d to Rn\mathbb{R}^n, then ff operates on w\vec w in three stages:

  1. a rotation by VTV^T (because VV is orthogonal), followed by

  2. a stretch by Σ\mathcal{\Sigma}, followed by

  3. a rotation by UU

To illustrate, let’s consider the square (but not symmetric) matrix

X=[5403]X = \begin{bmatrix} 5 & 4 \\ 0 & 3 \end{bmatrix}

from one of the earlier worked examples. Unfortunately, it’s difficult to visualize the SVD of matrices larger than 2×22 \times 2, since then either the ui\vec u_i’s or vi\vec v_i’s (or both) are in R3\mathbb{R}^3 or higher.

XX’s SVD is

[5403]X=[310110110310]U[35005]Σ[12121212]VT\underbrace{\begin{bmatrix} 5 & 4 \\ 0 & 3 \end{bmatrix}}_X = \underbrace{\begin{bmatrix} \frac{3}{\sqrt{10}} & -\frac{1}{\sqrt{10}} \\ \frac{1}{\sqrt{10}} & \frac{3}{\sqrt{10}} \end{bmatrix}}_U \underbrace{\begin{bmatrix} 3\sqrt{5} & 0 \\ 0 & \sqrt{5} \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}}_{V^T}
Image produced in Jupyter

Full vs. Compact SVD

Full SVD

The version of the SVD we’ve constructed together is called the full SVD.

In the full SVD, UU is n×nn \times n, Σ\mathcal{\Sigma} is n×dn \times d, and VTV^T is d×dd \times d. If rank(X)=r\text{rank}(X) = r, then

  • The first rr columns of UU are the left singular vectors of XX and are a basis for colsp(X)\text{colsp}(X); the last nrn - r columns of UU are a basis for nullsp(XT)\text{nullsp}(X^T). Together, the columns of UU span Rn\mathbb{R}^n.

  • The first rr columns of VV are the right singular vectors of XX and are a basis for colsp(XT)\text{colsp}(X^T); the last drd - r columns of VV are a basis for nullsp(X)\text{nullsp}(X). Together, the columns of VV span Rd\mathbb{R}^d.

The full SVD is nice for proofs, but is a little... annoying to use in real applications, because it contains a lot of 0’s. The additional nrn - r columns of UU and drd - r columns of VV are included to make UU and VV orthogonal matrices, but end up getting 0’d out when multiplied by the corresponding 0’s in Σ\mathcal{\Sigma}.

Remember that X=UΣVTX = U \Sigma V^T is equivalent to XV=UΣXV = U \Sigma, which says that, for i=1,2,...,di = 1, 2, ..., d,

Xvi=σiuiX \vec v_i = \sigma_i \vec u_i

but when i>ri > r, all this says is 0=0\vec 0 = \vec 0:

  • Xvi=0X \vec v_i = \vec 0, since vr\vec v_r, vr+1\vec v_{r+1}, ..., vd\vec v_d are all in nullsp(X)\text{nullsp}(X), and

  • σi=0\sigma_i = 0, so σiui=0ui=0\sigma_i \vec u_i = 0 \vec u_i = \vec 0.

Compact SVD

The compact SVD throws away the 0’s in Σ\mathcal{\Sigma} and the corresponding columns in UU and VV. In the annotated figure above, it keeps only the first r columns in U\color{#3d81f6} U, first r values in Σ\color{orange} \Sigma, and first r rows in VT\color{d81a60} V^T.

That is, the compact SVD says X=UrΣrVrTX = U_r \Sigma_r V_r^T, where UrU_r is n×rn \times r, Σr\Sigma_r is r×rr \times r, and VrTV_r^T is r×dr \times d.

X=[u1ur]Ur[σ1σr]Σr[v1TvrT]VrTX = \underbrace{ \begin{bmatrix} {\color{#3d81f6} |} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} |} \\ {\color{#3d81f6} \vec u_1} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} \vec u_r} \\ {\color{#3d81f6} |} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} |} \end{bmatrix} }_{U_r} \underbrace{ \begin{bmatrix} {\color{orange} \sigma_1} & & \\ & {\color{orange} \ddots} & \\ & & {\color{orange} \sigma_r} \end{bmatrix} }_{\Sigma_r} \underbrace{ \begin{bmatrix} {\color{#d81a60}\text{---} \:\: \vec v_1^T \:\: \text{---}} \\ {\color{#d81a60} \vdots} \\ {\color{#d81a60}\text{---} \:\: \vec v_r^T \:\: \text{---}} \end{bmatrix} }_{V_r^T}

The full and compact SVDs are equivalent, in that X=UΣVT=UrΣrVrTX = U \Sigma V^T = U_r \Sigma_r V_r^T. But the individual components are of different sizes:

Full SVDCompact SVD
UUn×nn \times nn×rn \times rFirst rr columns of UU are basis for colsp(X)\text{colsp}(X)
Σ\mathcal{\Sigma}n×dn \times dr×rr \times rNumber of non-zero σi\sigma_i’s is r=rank(X)r = \text{rank}(X)
VTV^Td×dd \times dr×dr \times dFirst rr rows of VTV^T are basis for colsp(XT)\text{colsp}(X^T)

UrU_r and VrV_r are no longer orthogonal matrices, since only square matrices can be orthogonal. However, their columns are still orthonormal, meaning UrTUr=Ir×rU_r^T U_r = I_{r \times r} and VrTVr=Ir×rV_r^T V_r = I_{r \times r}.


SVD as a Sum

The compact SVD hints at another way to think about XX.

X=[u1ur]Ur[σ1σr]Σr[v1TvrT]VrT=[σ1u1σrur][v1TvrT]=σ1u1v1T+σ2u2v2T++σrurvrT\begin{align*} X &= \underbrace{ \begin{bmatrix} {\color{#3d81f6} |} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} |} \\ {\color{#3d81f6} \vec u_1} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} \vec u_r} \\ {\color{#3d81f6} |} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} |} \end{bmatrix} }_{U_r} \underbrace{ \begin{bmatrix} {\color{orange} \sigma_1} & & \\ & {\color{orange} \ddots} & \\ & & {\color{orange} \sigma_r} \end{bmatrix} }_{\Sigma_r} \underbrace{ \begin{bmatrix} {\color{#d81a60}\text{---} \:\: \vec v_1^T \:\: \text{---}} \\ {\color{#d81a60} \vdots} \\ {\color{#d81a60}\text{---} \:\: \vec v_r^T \:\: \text{---}} \end{bmatrix} }_{V_r^T} \\ &= \begin{bmatrix} | & \cdots & | \\ {\color{orange} \sigma_1} {\color{#3d81f6} \vec u_1} & \cdots & {\color{orange} \sigma_r} {\color{#3d81f6} \vec u_r} \\ | & \cdots & | \end{bmatrix} \begin{bmatrix} {\color{#d81a60}\text{---} \:\: \vec v_1^T \:\: \text{---}} \\ {\color{#d81a60} \vdots} \\ {\color{#d81a60}\text{---} \:\: \vec v_r^T \:\: \text{---}} \end{bmatrix} \\ &= {\color{orange} \sigma_1} {\color{#3d81f6} \vec u_1} {\color{#d81a60}\vec v_1^T} + {\color{orange} \sigma_2} {\color{#3d81f6} \vec u_2} {\color{#d81a60}\vec v_2^T} + \cdots + {\color{orange} \sigma_r} {\color{#3d81f6} \vec u_r} {\color{#d81a60}\vec v_r^T} \end{align*}

Each term σiuiviT{\color{orange} \sigma_i} {\color{#3d81f6} \vec u_i} {\color{#d81a60}\vec v_i^T} is an outer product of ui{\color{#3d81f6} \vec u_i} and vi{\color{#d81a60}\vec v_i}, scaled by σi{\color{orange} \sigma_i}. Outer products are rank-one matrices: each column of σiuiviT{\color{orange} \sigma_i} {\color{#3d81f6} \vec u_i} {\color{#d81a60}\vec v_i^T} is a multiple of ui{\color{#3d81f6} \vec u_i}, and each row of it is a multiple of viT{\color{#d81a60}\vec v_i^T}.

This outer product view of matrix multiplication is not one that we’ve emphasized a ton in this course, but it can be useful in certain contexts, as we’re about to see. To see how it works, let’s revisit our first example.

[3252352205510]X=[1613213231613213230223013260130]U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = \underbrace{\begin{bmatrix} {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#ff8c00} \frac{1}{3\sqrt{2}}} & {\color{#aaa} -\frac{1}{\sqrt{3}}} & {\color{#aaa} -\frac{2}{3}} \\ {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#ff8c00} -\frac{1}{3\sqrt{2}}} & {\color{#aaa} -\frac{1}{\sqrt{3}}} & {\color{#aaa} \frac{2}{3}} \\ {\color{#3d81f6} 0} & {\color{#ff8c00} \frac{2\sqrt{2}}{3}} & {\color{#aaa} 0} & {\color{#aaa} \frac{1}{3}} \\ {\color{#3d81f6} \frac{2}{\sqrt{6}}} & {\color{#ff8c00} 0} & {\color{#aaa} \frac{1}{\sqrt{3}}} & {\color{#aaa} 0} \end{bmatrix}}_{U} \underbrace{\begin{bmatrix} {\color{#3d81f6} 15} & {\color{#aaa} 0} & {\color{#aaa} 0} \\ {\color{#aaa} 0} & {\color{#ff8c00} 3} & {\color{#aaa} 0} \\ {\color{#aaa} 0} & {\color{#aaa} 0} & {\color{#aaa} 0} \\ {\color{#aaa} 0} & {\color{#aaa} 0} & {\color{#aaa} 0} \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#3d81f6} \frac{2}{\sqrt{6}}} \\ {\color{#ff8c00} \frac{1}{\sqrt{2}}} & {\color{#ff8c00} -\frac{1}{\sqrt{2}}} & {\color{#ff8c00} 0} \\ {\color{#aaa} \frac{1}{\sqrt{3}}} & {\color{#aaa} \frac{1}{\sqrt{3}}} & {\color{#aaa} -\frac{1}{\sqrt{3}}} \end{bmatrix}}_{V^T}

The summation view of the SVD says that:

X=  15[1616026][161626]+3[1321322230][12120]=[52525525250005510]rank-one matrix+[1212012120220000]rank-one matrix\begin{align*} X =\;& {\color{#3d81f6} 15} \begin{bmatrix} {\color{#3d81f6} \frac{1}{\sqrt{6}}} \\ {\color{#3d81f6} \frac{1}{\sqrt{6}}} \\ {\color{#3d81f6} 0} \\ {\color{#3d81f6} \frac{2}{\sqrt{6}}} \end{bmatrix} \begin{bmatrix} {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#3d81f6} \frac{2}{\sqrt{6}}} \end{bmatrix} + {\color{#ff8c00} 3} \begin{bmatrix} {\color{#ff8c00} \frac{1}{3\sqrt{2}}} \\ {\color{#ff8c00} -\frac{1}{3\sqrt{2}}} \\ {\color{#ff8c00} \frac{2\sqrt{2}}{3}} \\ {\color{#ff8c00} 0} \end{bmatrix} \begin{bmatrix} {\color{#ff8c00} \frac{1}{\sqrt{2}}} & {\color{#ff8c00} -\frac{1}{\sqrt{2}}} & {\color{#ff8c00} 0} \end{bmatrix} \\ &= \underbrace{{\color{#3d81f6} \begin{bmatrix} \frac{5}{2} & \frac{5}{2} & 5 \\ \frac{5}{2} & \frac{5}{2} & 5 \\ 0 & 0 & 0 \\ 5 & 5 & 10 \end{bmatrix}}}_{\text{rank-one matrix}} + \underbrace{{\color{orange} \begin{bmatrix} \frac{1}{2} & -\frac{1}{2} & 0 \\ -\frac{1}{2} & \frac{1}{2} & 0 \\ 2 & -2 & 0 \\ 0 & 0 & 0 \end{bmatrix}}}_{\text{rank-one matrix}} \end{align*}

Since 15>3{\color{#3d81f6} 15} > {\color{orange} 3}, the first outer product contributes more to XX than the second one does.

We can think of the singular values as representing the importance of the corresponding singular vectors in representing XX. Since we sort singular values in decreasing order, σ1σ2...σr>0\sigma_1 \geq \sigma_2 \geq ... \geq \sigma_r > 0, the first outer product is always the most important one, the second outer product is the second most important one, and so on.

Low-Rank Approximation

Our most recent observation is that the SVD X=UΣVTX = U \Sigma V^T allows us to write XX as a sum of rank-one matrices, in decreasing order of importance.

X=σ1u1v1Tmost important+σ2u2v2Tsecond most important++σrurvrTleast importantX = \underbrace{\sigma_1 \vec u_1 \vec v_1^T}_{\text{most important}} + \underbrace{\sigma_2 \vec u_2 \vec v_2^T}_{\text{second most important}} + \cdots + \underbrace{\sigma_r \vec u_r \vec v_r^T}_{\text{least important}}

In many practical applications, the full matrix XX can be too large to store or process. In such cases, we can produce a low-rank approximation of XX by summing fewer than rr of these rank-one matrices. In the example above, XX was of rank 2, so a rank-1 approximation of XX would just be the first outer product, σ1u1v1T\sigma_1 \vec u_1 \vec v_1^T, which is in blue above.

Example: Image Compression

A common application of the SVD and low-rank approximations is to compress images. How so? Consider the following grayscale image of my (16 year old) dog, Junior.

<Figure size 600x450 with 1 Axes>

This image is 300 pixels wide and 400 pixels tall. Since the image is grayscale, each pixel’s intensity can be represented by a number between 0 and 255, where 0 is black and 255 is white. These intensities can be stored in a 400×300400 \times 300 matrix. The rank of this matrix is likely 300, since it’s extremely unlikely that any of the 300 columns of the image are exactly representable as a linear combination of other columns.

But, as the SVD reveals, the image can be approximated well by a rank-kk matrix, for a kk that is much smaller than 300. We build this low-rank approximation of the image by summing up kk rank-one matrices.

rank k approximation of image=i=1kσiuiviT\text{rank } k \text{ approximation of image} = \sum_{i=1}^k \sigma_i \vec u_i \vec v_i^T

A slider should appear below, allowing you to select a value of kk and see the corresponding rank-kk approximation of Junior.

Loading...

To store the full image, we need to store 400300=120,000400 \cdot 300 = 120{,}000 numbers. But to store a rank-kk approximation of the image, we only need to store (1+400+300)k=701k(1 + 400 + 300)k = 701k numbers – only the first kk singular values, along with u1,u2,...,uk\vec u_1, \vec u_2, ..., \vec u_k (each of which has 400 numbers), and v1,v2,...,vk\vec v_1, \vec v_2, ..., \vec v_k (each of which has 300 numbers). If we’re satisfied with a rank-30 approximation, we only need to store 70130=21,030701 \cdot 30 = 21{,}030 numbers, which is a compression of 120,00021,0305.7\frac{120{,}000}{21{,}030} \approx 5.7 times!


Computing the SVD

Finally, I’ll show you how to use numpy to compute the SVD of a matrix. The key function is np.linalg.svd. Let’s apply it to our familiar example,

[3252352205510]X=[1613213231613213230223013260130]U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}}_{U} \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}
X = np.array([[3, 2, 5],
              [2, 3, 5],
              [2, -2, 0],
              [5, 5, 10]])
              
u, s, vt = np.linalg.svd(X)
u.shape, s.shape, vt.shape
((4, 4), (3,), (3, 3))

By default, it computes the full SVD, which is why u is of shape 4×44 \times 4 and vt is of shape 3×33 \times 3, even though rank(X)=2\text{rank}(X) = 2.

s is returned as a 1-dimensional array of singular values.

s
array([15., 3., 0.])

If we’d like to use u, s, and vt to reconstruct XX, we need to reshape s into a matrix with the same shape as XX.

ss = np.zeros(X.shape)
ss[np.arange(len(s)), np.arange(len(s))] = s
ss
array([[15., 0., 0.], [ 0., 3., 0.], [ 0., 0., 0.], [ 0., 0., 0.]])
u @ ss @ vt
array([[ 3., 2., 5.], [ 2., 3., 5.], [ 2., -2., 0.], [ 5., 5., 10.]])

Notice: The signs of the columns of UU and VV are not uniquely determined by the SVD. For example, we could replace u1\vec u_1 with u1-\vec u_1 and v1\vec v_1 with v1-\vec v_1 to get the same matrix XX. The values for UU we found had flipped signs for columns 1, 2 relative to the values returned by np.linalg.svd.

our U=[1613213231613213230223013260130]\text{our } U = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}
u
array([[-0.40824829, 0.23570226, -0.45735398, -0.75405909], [-0.40824829, -0.23570226, -0.68098866, 0.56038577], [-0. , 0.94280904, -0.05590867, 0.32861122], [-0.81649658, -0. , 0.56917132, 0.09683666]])
  • The first column of numpy’s u is the negative of our first column of UU.

  • Both second columns are the same.

  • The latter two columns are different. Why? There are infinitely many orthogonal bases for the null space of XTX^T, which are what the last two columns of UU represent. We just picked a different choice than numpy did.


Key Takeaways

  1. All n×dn \times d matrices XX have a singular value decomposition X=UΣVTX = U \Sigma V^T, where UU is n×nn \times n, Σ\mathcal{\Sigma} is n×dn \times d, and VV is d×dd \times d.

  1. The columns of UU are orthonormal eigenvectors of XXTXX^T; these are called the left singular vectors of XX.

  2. The columns of VV are orthonormal eigenvectors of XTXX^TX; these are called the right singular vectors of XX.

  3. Both XXTXX^T and XTXX^TX share the same non-zero eigenvalues; the singular values of XX are the square roots of these eigenvalues.

The number of non-zero singular values is equal to the rank of XX. It’s important that we sort the singular values in decreasing order, so that σ1σ2...σr>0\sigma_1 \geq \sigma_2 \geq ... \geq \sigma_r > 0, and place the singular vectors in the columns of UU and VV in the same order.

σi=λi\sigma_i = \sqrt{\lambda_i}

5. A typical recipe for computing the SVD is to: 1. Compute XTXX^TX. Place its eigenvectors in the columns of VV, and place the square roots of its eigenvalues in the diagonal of Σ\mathcal{\Sigma}. 1. To find each ui\vec u_i, use Xvi=σiuiX \vec v_i = \sigma_i \vec u_i, i.e. ui=1σiXvi\vec u_i = \frac{1}{\sigma_i} X \vec v_i. 1. The above rule only works for σi>0\sigma_i > 0. If σi=0\sigma_i = 0, then the remaining ui\vec u_i’s must be eigenvectors of XXTXX^T for the eigenvalue 0, meaning they must lie in the nullspace of XTX^T. 6. The SVD allows us to interpret the linear transformation of multiplying by XX as a composition of a rotation by VTV^T, a scaling/stretching 7. The SVD X=UΣVTX = U \Sigma V^T can be viewed as a sum of rank-one matrices:

X=i=1rσiuiviTX = \sum_{i=1}^r \sigma_i \vec u_i \vec v_i^T

Each piece σiuiviT\sigma_i \vec u_i \vec v_i^T is a rank-one matrix, consisting of the outer product of ui\vec u_i and vi\vec v_i. This summation view can be used to compute a low-rank approximation of XX by summing fewer than rr of these rank-one matrices.

Xk=i=1kσiuiviTX_k = \sum_{i=1}^k \sigma_i \vec u_i \vec v_i^T