Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

10.2. Low-Rank Approximation

Full SVD

The version of the SVD we’ve constructed together is called the full SVD.

In the full SVD, UU is n×nn \times n, Σ\mathcal{\Sigma} is n×dn \times d, and VTV^T is d×dd \times d. If rank(X)=r\text{rank}(X) = r, then

The full SVD is nice for proofs, but is a little... annoying to use in real applications, because it contains a lot of 0’s. The additional nrn - r columns of UU and drd - r columns of VV are included to make UU and VV orthogonal matrices, but end up getting 0’d out when multiplied by the corresponding 0’s in Σ\mathcal{\Sigma}.

Remember that X=UΣVTX = U \Sigma V^T is equivalent to XV=UΣXV = U \Sigma, which says that, for i=1,2,...,di = 1, 2, ..., d,

Xvi=σiuiX \vec v_i = \sigma_i \vec u_i

but when i>ri > r, all this says is 0=0\vec 0 = \vec 0:

Compact SVD

The compact SVD throws away the 0’s in Σ\mathcal{\Sigma} and the corresponding columns in UU and VV. In the annotated figure above, it keeps only the first r columns in U\color{#3d81f6} U, first r values in Σ\color{orange} \Sigma, and first r rows in VT\color{d81a60} V^T.

That is, the compact SVD says X=UrΣrVrTX = U_r \Sigma_r V_r^T, where UrU_r is n×rn \times r, Σr\Sigma_r is r×rr \times r, and VrTV_r^T is r×dr \times d.

X=[u1ur]Ur[σ1σr]Σr[v1TvrT]VrTX = \underbrace{ \begin{bmatrix} {\color{#3d81f6} |} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} |} \\ {\color{#3d81f6} \vec u_1} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} \vec u_r} \\ {\color{#3d81f6} |} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} |} \end{bmatrix} }_{U_r} \underbrace{ \begin{bmatrix} {\color{orange} \sigma_1} & & \\ & {\color{orange} \ddots} & \\ & & {\color{orange} \sigma_r} \end{bmatrix} }_{\Sigma_r} \underbrace{ \begin{bmatrix} {\color{#d81a60}\text{---} \:\: \vec v_1^T \:\: \text{---}} \\ {\color{#d81a60} \vdots} \\ {\color{#d81a60}\text{---} \:\: \vec v_r^T \:\: \text{---}} \end{bmatrix} }_{V_r^T}

The full and compact SVDs are equivalent, in that X=UΣVT=UrΣrVrTX = U \Sigma V^T = U_r \Sigma_r V_r^T. But the individual components are of different sizes:

Full SVDCompact SVD
UUn×nn \times nn×rn \times rFirst rr columns of UU are basis for colsp(X)\text{colsp}(X)
Σ\mathcal{\Sigma}n×dn \times dr×rr \times rNumber of non-zero σi\sigma_i’s is r=rank(X)r = \text{rank}(X)
VTV^Td×dd \times dr×dr \times dFirst rr rows of VTV^T are basis for colsp(XT)\text{colsp}(X^T)

UrU_r and VrV_r are no longer orthogonal matrices, since only square matrices can be orthogonal. However, their columns are still orthonormal, meaning UrTUr=Ir×rU_r^T U_r = I_{r \times r} and VrTVr=Ir×rV_r^T V_r = I_{r \times r}.

We now compare full vs. compact SVD and use the decomposition for low-rank approximation.


SVD as a Sum

The compact SVD hints at another way to think about XX.

X=[u1ur]Ur[σ1σr]Σr[v1TvrT]VrT=[σ1u1σrur][v1TvrT]=σ1u1v1T+σ2u2v2T++σrurvrT\begin{align*} X &= \underbrace{ \begin{bmatrix} {\color{#3d81f6} |} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} |} \\ {\color{#3d81f6} \vec u_1} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} \vec u_r} \\ {\color{#3d81f6} |} & {\color{#3d81f6} \cdots} & {\color{#3d81f6} |} \end{bmatrix} }_{U_r} \underbrace{ \begin{bmatrix} {\color{orange} \sigma_1} & & \\ & {\color{orange} \ddots} & \\ & & {\color{orange} \sigma_r} \end{bmatrix} }_{\Sigma_r} \underbrace{ \begin{bmatrix} {\color{#d81a60}\text{---} \:\: \vec v_1^T \:\: \text{---}} \\ {\color{#d81a60} \vdots} \\ {\color{#d81a60}\text{---} \:\: \vec v_r^T \:\: \text{---}} \end{bmatrix} }_{V_r^T} \\ &= \begin{bmatrix} | & \cdots & | \\ {\color{orange} \sigma_1} {\color{#3d81f6} \vec u_1} & \cdots & {\color{orange} \sigma_r} {\color{#3d81f6} \vec u_r} \\ | & \cdots & | \end{bmatrix} \begin{bmatrix} {\color{#d81a60}\text{---} \:\: \vec v_1^T \:\: \text{---}} \\ {\color{#d81a60} \vdots} \\ {\color{#d81a60}\text{---} \:\: \vec v_r^T \:\: \text{---}} \end{bmatrix} \\ &= {\color{orange} \sigma_1} {\color{#3d81f6} \vec u_1} {\color{#d81a60}\vec v_1^T} + {\color{orange} \sigma_2} {\color{#3d81f6} \vec u_2} {\color{#d81a60}\vec v_2^T} + \cdots + {\color{orange} \sigma_r} {\color{#3d81f6} \vec u_r} {\color{#d81a60}\vec v_r^T} \end{align*}

Each term σiuiviT{\color{orange} \sigma_i} {\color{#3d81f6} \vec u_i} {\color{#d81a60}\vec v_i^T} is an outer product of ui{\color{#3d81f6} \vec u_i} and vi{\color{#d81a60}\vec v_i}, scaled by σi{\color{orange} \sigma_i}. Outer products are rank-one matrices: each column of σiuiviT{\color{orange} \sigma_i} {\color{#3d81f6} \vec u_i} {\color{#d81a60}\vec v_i^T} is a multiple of ui{\color{#3d81f6} \vec u_i}, and each row of it is a multiple of viT{\color{#d81a60}\vec v_i^T}.

This outer product view of matrix multiplication is not one that we’ve emphasized a ton in this course, but it can be useful in certain contexts, as we’re about to see. To see how it works, let’s revisit our first example.

[3252352205510]X=[1613213231613213230223013260130]U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = \underbrace{\begin{bmatrix} {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#ff8c00} \frac{1}{3\sqrt{2}}} & {\color{#aaa} -\frac{1}{\sqrt{3}}} & {\color{#aaa} -\frac{2}{3}} \\ {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#ff8c00} -\frac{1}{3\sqrt{2}}} & {\color{#aaa} -\frac{1}{\sqrt{3}}} & {\color{#aaa} \frac{2}{3}} \\ {\color{#3d81f6} 0} & {\color{#ff8c00} \frac{2\sqrt{2}}{3}} & {\color{#aaa} 0} & {\color{#aaa} \frac{1}{3}} \\ {\color{#3d81f6} \frac{2}{\sqrt{6}}} & {\color{#ff8c00} 0} & {\color{#aaa} \frac{1}{\sqrt{3}}} & {\color{#aaa} 0} \end{bmatrix}}_{U} \underbrace{\begin{bmatrix} {\color{#3d81f6} 15} & {\color{#aaa} 0} & {\color{#aaa} 0} \\ {\color{#aaa} 0} & {\color{#ff8c00} 3} & {\color{#aaa} 0} \\ {\color{#aaa} 0} & {\color{#aaa} 0} & {\color{#aaa} 0} \\ {\color{#aaa} 0} & {\color{#aaa} 0} & {\color{#aaa} 0} \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#3d81f6} \frac{2}{\sqrt{6}}} \\ {\color{#ff8c00} \frac{1}{\sqrt{2}}} & {\color{#ff8c00} -\frac{1}{\sqrt{2}}} & {\color{#ff8c00} 0} \\ {\color{#aaa} \frac{1}{\sqrt{3}}} & {\color{#aaa} \frac{1}{\sqrt{3}}} & {\color{#aaa} -\frac{1}{\sqrt{3}}} \end{bmatrix}}_{V^T}

The summation view of the SVD says that:

X=  15[1616026][161626]+3[1321322230][12120]=[52525525250005510]rank-one matrix+[1212012120220000]rank-one matrix\begin{align*} X =\;& {\color{#3d81f6} 15} \begin{bmatrix} {\color{#3d81f6} \frac{1}{\sqrt{6}}} \\ {\color{#3d81f6} \frac{1}{\sqrt{6}}} \\ {\color{#3d81f6} 0} \\ {\color{#3d81f6} \frac{2}{\sqrt{6}}} \end{bmatrix} \begin{bmatrix} {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#3d81f6} \frac{1}{\sqrt{6}}} & {\color{#3d81f6} \frac{2}{\sqrt{6}}} \end{bmatrix} + {\color{#ff8c00} 3} \begin{bmatrix} {\color{#ff8c00} \frac{1}{3\sqrt{2}}} \\ {\color{#ff8c00} -\frac{1}{3\sqrt{2}}} \\ {\color{#ff8c00} \frac{2\sqrt{2}}{3}} \\ {\color{#ff8c00} 0} \end{bmatrix} \begin{bmatrix} {\color{#ff8c00} \frac{1}{\sqrt{2}}} & {\color{#ff8c00} -\frac{1}{\sqrt{2}}} & {\color{#ff8c00} 0} \end{bmatrix} \\ &= \underbrace{{\color{#3d81f6} \begin{bmatrix} \frac{5}{2} & \frac{5}{2} & 5 \\ \frac{5}{2} & \frac{5}{2} & 5 \\ 0 & 0 & 0 \\ 5 & 5 & 10 \end{bmatrix}}}_{\text{rank-one matrix}} + \underbrace{{\color{orange} \begin{bmatrix} \frac{1}{2} & -\frac{1}{2} & 0 \\ -\frac{1}{2} & \frac{1}{2} & 0 \\ 2 & -2 & 0 \\ 0 & 0 & 0 \end{bmatrix}}}_{\text{rank-one matrix}} \end{align*}

Since 15>3{\color{#3d81f6} 15} > {\color{orange} 3}, the first outer product contributes more to XX than the second one does.

We can think of the singular values as representing the importance of the corresponding singular vectors in representing XX. Since we sort singular values in decreasing order, σ1σ2...σr>0\sigma_1 \geq \sigma_2 \geq ... \geq \sigma_r > 0, the first outer product is always the most important one, the second outer product is the second most important one, and so on.

Low-Rank Approximation

Our most recent observation is that the SVD X=UΣVTX = U \Sigma V^T allows us to write XX as a sum of rank-one matrices, in decreasing order of importance.

X=σ1u1v1Tmost important+σ2u2v2Tsecond most important++σrurvrTleast importantX = \underbrace{\sigma_1 \vec u_1 \vec v_1^T}_{\text{most important}} + \underbrace{\sigma_2 \vec u_2 \vec v_2^T}_{\text{second most important}} + \cdots + \underbrace{\sigma_r \vec u_r \vec v_r^T}_{\text{least important}}

In many practical applications, the full matrix XX can be too large to store or process. In such cases, we can produce a low-rank approximation of XX by summing fewer than rr of these rank-one matrices. In the example above, XX was of rank 2, so a rank-1 approximation of XX would just be the first outer product, σ1u1v1T\sigma_1 \vec u_1 \vec v_1^T, which is in blue above.

Example: Image Compression

A common application of the SVD and low-rank approximations is to compress images. How so? Consider the following grayscale image of my (16 year old) dog, Junior.

<Figure size 600x450 with 1 Axes>

This image is 300 pixels wide and 400 pixels tall. Since the image is grayscale, each pixel’s intensity can be represented by a number between 0 and 255, where 0 is black and 255 is white. These intensities can be stored in a 400×300400 \times 300 matrix. The rank of this matrix is likely 300, since it’s extremely unlikely that any of the 300 columns of the image are exactly representable as a linear combination of other columns.

But, as the SVD reveals, the image can be approximated well by a rank-kk matrix, for a kk that is much smaller than 300. We build this low-rank approximation of the image by summing up kk rank-one matrices.

rank k approximation of image=i=1kσiuiviT\text{rank } k \text{ approximation of image} = \sum_{i=1}^k \sigma_i \vec u_i \vec v_i^T

A slider should appear below, allowing you to select a value of kk and see the corresponding rank-kk approximation of Junior.

Loading...

To store the full image, we need to store 400300=120,000400 \cdot 300 = 120{,}000 numbers. But to store a rank-kk approximation of the image, we only need to store (1+400+300)k=701k(1 + 400 + 300)k = 701k numbers – only the first kk singular values, along with u1,u2,...,uk\vec u_1, \vec u_2, ..., \vec u_k (each of which has 400 numbers), and v1,v2,...,vk\vec v_1, \vec v_2, ..., \vec v_k (each of which has 300 numbers). If we’re satisfied with a rank-30 approximation, we only need to store 70130=21,030701 \cdot 30 = 21{,}030 numbers, which is a compression of 120,00021,0305.7\frac{120{,}000}{21{,}030} \approx 5.7 times!


Computing the SVD

Finally, I’ll show you how to use numpy to compute the SVD of a matrix. The key function is np.linalg.svd. Let’s apply it to our familiar example,

[3252352205510]X=[1613213231613213230223013260130]U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}}_{U} \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}
X = np.array([[3, 2, 5],
              [2, 3, 5],
              [2, -2, 0],
              [5, 5, 10]])
              
u, s, vt = np.linalg.svd(X)
u.shape, s.shape, vt.shape
((4, 4), (3,), (3, 3))

By default, it computes the full SVD, which is why u is of shape 4×44 \times 4 and vt is of shape 3×33 \times 3, even though rank(X)=2\text{rank}(X) = 2.

s is returned as a 1-dimensional array of singular values.

s
array([15., 3., 0.])

If we’d like to use u, s, and vt to reconstruct XX, we need to reshape s into a matrix with the same shape as XX.

ss = np.zeros(X.shape)
ss[np.arange(len(s)), np.arange(len(s))] = s
ss
array([[15., 0., 0.], [ 0., 3., 0.], [ 0., 0., 0.], [ 0., 0., 0.]])
u @ ss @ vt
array([[ 3., 2., 5.], [ 2., 3., 5.], [ 2., -2., 0.], [ 5., 5., 10.]])

Notice: The signs of the columns of UU and VV are not uniquely determined by the SVD. For example, we could replace u1\vec u_1 with u1-\vec u_1 and v1\vec v_1 with v1-\vec v_1 to get the same matrix XX. The values for UU we found had flipped signs for columns 1, 2 relative to the values returned by np.linalg.svd.

our U=[1613213231613213230223013260130]\text{our } U = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}
u
array([[-0.40824829, 0.23570226, -0.45735398, -0.75405909], [-0.40824829, -0.23570226, -0.68098866, 0.56038577], [-0. , 0.94280904, -0.05590867, 0.32861122], [-0.81649658, -0. , 0.56917132, 0.09683666]])
  • The first column of numpy’s u is the negative of our first column of UU.

  • Both second columns are the same.

  • The latter two columns are different. Why? There are infinitely many orthogonal bases for the null space of XTX^T, which are what the last two columns of UU represent. We just picked a different choice than numpy did.


Key Takeaways

To recap:

  1. All n×dn \times d matrices XX have a singular value decomposition X=UΣVTX = U \Sigma V^T, where UU is n×nn \times n, Σ\mathcal{\Sigma} is n×dn \times d, and VV is d×dd \times d.

  1. The columns of UU are orthonormal eigenvectors of XXTXX^T; these are called the left singular vectors of XX.

  2. The columns of VV are orthonormal eigenvectors of XTXX^TX; these are called the right singular vectors of XX.

  3. Both XXTXX^T and XTXX^TX share the same non-zero eigenvalues; the singular values of XX are the square roots of these eigenvalues. The number of non-zero singular values is equal to the rank of XX. It’s important that we sort the singular values in decreasing order, so that σ1σ2...σr>0\sigma_1 \geq \sigma_2 \geq ... \geq \sigma_r > 0, and place the singular vectors in the columns of UU and VV in the same order.

    σi=λi\sigma_i = \sqrt{\lambda_i}
  4. A typical recipe for computing the SVD is to:

    1. Compute XTXX^TX. Place its eigenvectors in the columns of VV, and place the square roots of its eigenvalues in the diagonal of Σ\mathcal{\Sigma}.

    2. To find each ui\vec u_i, use Xvi=σiuiX \vec v_i = \sigma_i \vec u_i, i.e. ui=1σiXvi\vec u_i = \frac{1}{\sigma_i} X \vec v_i.

    3. The above rule only works for σi>0\sigma_i > 0. If σi=0\sigma_i = 0, then the remaining ui\vec u_i’s must be eigenvectors of XXTXX^T for the eigenvalue 0, meaning they must lie in the nullspace of XTX^T.

  5. The SVD allows us to interpret the linear transformation of multiplying by XX as a composition of a rotation by VTV^T, a scaling/stretching by Σ\mathcal{\Sigma}, and a rotation by UU.

  6. The SVD X=UΣVTX = U \Sigma V^T can be viewed as a sum of rank-one matrices:

    X=i=1rσiuiviTX = \sum_{i=1}^r \sigma_i \vec u_i \vec v_i^T

    Each piece σiuiviT\sigma_i \vec u_i \vec v_i^T is a rank-one matrix, consisting of the outer product of ui\vec u_i and vi\vec v_i. This summation view can be used to compute a low-rank approximation of XX by summing fewer than rr of these rank-one matrices.

    Xk=i=1kσiuiviTX_k = \sum_{i=1}^k \sigma_i \vec u_i \vec v_i^T