Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

9.6. Positive Semidefinite Matrices and the Rayleigh Quotient

In Chapter 9.5, we saw that symmetric matrices can be diagonalized by an orthogonal matrix. In this section, we will use that fact to understand two closely related ideas:

Quadratic Forms and Positive Semidefinite Matrices

If AA is an n×nn \times n matrix, then the function

f(x)=xTAxf(\vec x) = \vec x^T A \vec x

is called a quadratic form. When AA is symmetric, quadratic forms are especially nice because the spectral theorem lets us understand them completely in terms of eigenvalues and eigenvectors.

A positive definite matrix is even stronger: it satisfies

xTAx>0for every x0,\vec x^T A \vec x > 0 \quad \text{for every } \vec x \neq \vec 0,

which is equivalent to saying that all eigenvalues of AA are strictly positive.

Quadratic forms already appeared earlier in the course. For example, the mean-squared error from Chapter 8.1 can be expanded into a quadratic expression in the weight vector w\vec w.

Rsq(w)=1nyXw2R_\text{sq}(\vec w) = \frac{1}{n}\lVert \vec y - X\vec w \rVert^2

So understanding when a quadratic form has a minimum is not just abstract linear algebra; it is directly tied to optimization.

Why are the two definitions of positive semidefiniteness equivalent? Since AA is symmetric, we can write

A=QΛQTA = Q\Lambda Q^T

where QQ is orthogonal and Λ\Lambda is diagonal with eigenvalues λ1,,λn\lambda_1, \ldots, \lambda_n on the diagonal. For any vector x\vec x, let y=QTx\vec y = Q^T \vec x. Then

xTAx=xT(QΛQT)x=yTΛy=i=1nλiyi2\vec x^T A \vec x = \vec x^T(Q\Lambda Q^T)\vec x = \vec y^T \Lambda \vec y = \sum_{i=1}^n \lambda_i y_i^2

This formula is the key. Each yi2y_i^2 is non-negative, so if every eigenvalue λi0\lambda_i \geq 0, then every term λiyi2\lambda_i y_i^2 is non-negative and therefore xTAx0\vec x^T A \vec x \geq 0 for every x\vec x. Conversely, if some eigenvalue were negative, then plugging in the corresponding eigenvector would make xTAx<0\vec x^T A \vec x < 0.

Loading...
Loading...

The picture on the left comes from the positive definite matrix

A=[2112],A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix},

whose eigenvalues are 3 and 1. Every output is non-negative, and the level curves are ellipses surrounding a single global minimum at the origin.

The picture on the right comes from the symmetric matrix

A=[1551],A = \begin{bmatrix} 1 & 5 \\ 5 & 1 \end{bmatrix},

whose eigenvalues are 6 and -4. Because one eigenvalue is negative, the quadratic form takes both positive and negative values, so it cannot be positive semidefinite and it is not convex.

For a symmetric quadratic form, convexity is controlled exactly by positive semidefiniteness. If

f(x)=xTAx,f(\vec x) = \vec x^T A \vec x,

then

f(x)=2AxandHf=2A,\nabla f(\vec x) = 2A\vec x \qquad \text{and} \qquad H_f = 2A,

because A=ATA = A^T. From multivariable calculus, a function is convex exactly when its Hessian is positive semidefinite. So for symmetric quadratic forms,

f(x)=xTAx is convex     A0f(\vec x) = \vec x^T A \vec x \text{ is convex } \iff A \succeq 0

This is one reason positive semidefinite matrices show up constantly in optimization: they tell us that the landscape bends upward in every direction.

But there is still one issue. The value of xTAx\vec x^T A \vec x depends on both the direction of x\vec x and its length. If we double x\vec x, then the value quadruples. To study the effect of direction alone, we normalize by the squared length of the vector.

The Rayleigh Quotient

Suppose AA is a symmetric n×nn \times n matrix. The Rayleigh quotient of AA is the function

g(v)=vTAvvTvg(\vec v) = \frac{\vec v^T A \vec v}{\vec v^T \vec v}

for all non-zero vectors v\vec v.

You should think of this as a normalized quadratic form. The numerator vTAv\vec v^T A \vec v measures the output of the quadratic form, while the denominator vTv=v2\vec v^T \vec v = \lVert \vec v \rVert^2 removes the effect of scale.

Indeed, if c0c \neq 0, then

g(cv)=(cv)TA(cv)(cv)T(cv)=c2vTAvc2vTv=g(v)g(c\vec v) = \frac{(c\vec v)^T A (c\vec v)}{(c\vec v)^T(c\vec v)} = \frac{c^2\vec v^T A \vec v}{c^2\vec v^T \vec v} = g(\vec v)

So the Rayleigh quotient depends only on the direction of v\vec v, not on its magnitude. In particular, if v=1\lVert \vec v \rVert = 1, then

g(v)=vTAv,g(\vec v) = \vec v^T A \vec v,

so the Rayleigh quotient is just the quadratic form restricted to the unit sphere.

In Homework 9, Problem 4, you showed that

g(v)=2vTv(Avg(v)v)\nabla g(\vec v) = \frac{2}{\vec v^T \vec v}\left(A\vec v - g(\vec v)\vec v\right)

If v\vec v is a critical point of gg, then g(v)=0\nabla g(\vec v) = \vec 0, which forces

Av=g(v)vA\vec v = g(\vec v)\vec v

That means every critical point of the Rayleigh quotient is an eigenvector of AA, and the corresponding value of the Rayleigh quotient is the associated eigenvalue.

Loading...

The dashed lines mark the eigenvector directions of AA. Notice what changed compared to the earlier quadratic form plot: the extreme values no longer occur farther and farther away from the origin. After normalization, only direction matters.

The reddest direction is the eigenvector direction corresponding to the largest eigenvalue, which is 6. The bluest direction is the eigenvector direction corresponding to the smallest eigenvalue, which is -4. On the unit circle, those are exactly the maximum and minimum values of the quadratic form.

So for a symmetric matrix AA,

  • the largest possible value of the Rayleigh quotient is the largest eigenvalue of AA, and

  • the smallest possible value of the Rayleigh quotient is the smallest eigenvalue of AA.

This gives a geometric interpretation of eigenvectors: they are the directions where the normalized quadratic form is stationary, and in fact extremized.

Activity 3

The Rayleigh quotient will reappear in Chapter 10.3, where we’ll apply it with A=X~TX~A = \tilde X^T \tilde X to find the best direction for dimensionality reduction.