9.6. Positive Semidefinite Matrices and the Rayleigh Quotient

Quadratic Forms and Positive Semidefinite Matrices¶

If $A$ is an $n \times n$ matrix, then the function

f(\vec x) = \vec x^T A \vec x

is called a quadratic form. When $A$ is symmetric, quadratic forms are especially nice because the spectral theorem lets us understand them completely in terms of eigenvalues and eigenvectors.

A positive definite matrix is even stronger: it satisfies

\vec x^T A \vec x > 0 \quad \text{for every } \vec x \neq \vec 0,

which is equivalent to saying that all eigenvalues of $A$ are strictly positive.

Quadratic forms already appeared earlier in the course. For example, the mean-squared error from Chapter 8.1 can be expanded into a quadratic expression in the weight vector $\vec w$ .

R_\text{sq}(\vec w) = \frac{1}{n}\lVert \vec y - X\vec w \rVert^2

So understanding when a quadratic form has a minimum is not just abstract linear algebra; it is directly tied to optimization.

Why are the two definitions of positive semidefiniteness equivalent? Since $A$ is symmetric, we can write

A = Q\Lambda Q^T

where $Q$ is orthogonal and $\Lambda$ is diagonal with eigenvalues $\lambda_1, \ldots, \lambda_n$ on the diagonal. For any vector $\vec x$ , let $\vec y = Q^T \vec x$ . Then

\vec x^T A \vec x = \vec x^T(Q\Lambda Q^T)\vec x = \vec y^T \Lambda \vec y = \sum_{i=1}^n \lambda_i y_i^2

This formula is the key. Each $y_i^2$ is non-negative, so if every eigenvalue $\lambda_i \geq 0$ , then every term $\lambda_i y_i^2$ is non-negative and therefore $\vec x^T A \vec x \geq 0$ for every $\vec x$ . Conversely, if some eigenvalue were negative, then plugging in the corresponding eigenvector would make $\vec x^T A \vec x < 0$ .

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

x = np.linspace(-3, 3, 250)
y = np.linspace(-3, 3, 250)
X, Y = np.meshgrid(x, y)

A_psd = np.array([[2, 1], [1, 2]])
A_indef = np.array([[1, 5], [5, 1]])

Z_psd = 2 * X**2 + 2 * X * Y + 2 * Y**2
Z_indef = X**2 + 10 * X * Y + Y**2

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=(
        r"$$\text{A positive definite quadratic form}$$",
        r"$$\text{An indefinite quadratic form}$$",
    ),
    horizontal_spacing=0.12,
)

fig.add_trace(
    go.Contour(
        z=Z_psd,
        x=x,
        y=y,
        colorscale="YlOrRd",
        showscale=False,
        contours=dict(showlabels=True),
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Contour(
        z=Z_indef,
        x=x,
        y=y,
        colorscale="RdBu_r",
        showscale=False,
        contours=dict(showlabels=True),
    ),
    row=1,
    col=2,
)

for col in [1, 2]:
    fig.update_xaxes(
        title_text="x",
        range=[-3, 3],
        showline=True,
        linecolor="black",
        linewidth=1,
        showgrid=True,
        gridcolor="#f0f0f0",
        zeroline=True,
        zerolinecolor="#c7c7c7",
        row=1,
        col=col,
    )
    fig.update_yaxes(
        title_text="y",
        range=[-3, 3],
        showline=True,
        linecolor="black",
        linewidth=1,
        showgrid=True,
        gridcolor="#f0f0f0",
        zeroline=True,
        zerolinecolor="#c7c7c7",
        scaleanchor=f"x{col}",
        row=1,
        col=col,
    )

fig.update_layout(
    title=r"$$\text{Contour plots of } f(\vec x) = \vec x^T A \vec x$$",
    width=950,
    height=450,
    font=dict(family="Palatino, serif", size=16, color="#222"),
    paper_bgcolor="white",
    plot_bgcolor="white",
    margin=dict(l=50, r=40, b=50, t=80),
)

fig.show()

Loading...

The picture on the left comes from the positive definite matrix

A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix},

whose eigenvalues are 3 and 1. Every output is non-negative, and the level curves are ellipses surrounding a single global minimum at the origin.

The picture on the right comes from the symmetric matrix

A = \begin{bmatrix} 1 & 5 \\ 5 & 1 \end{bmatrix},

whose eigenvalues are 6 and -4. Because one eigenvalue is negative, the quadratic form takes both positive and negative values, so it cannot be positive semidefinite and it is not convex.

For a symmetric quadratic form, convexity is controlled exactly by positive semidefiniteness. If

f(\vec x) = \vec x^T A \vec x,

then

\nabla f(\vec x) = 2A\vec x \qquad \text{and} \qquad H_f = 2A,

because $A = A^T$ . From multivariable calculus, a function is convex exactly when its Hessian is positive semidefinite. So for symmetric quadratic forms,

f(\vec x) = \vec x^T A \vec x \text{ is convex } \iff A \succeq 0

This is one reason positive semidefinite matrices show up constantly in optimization: they tell us that the landscape bends upward in every direction.

But there is still one issue. The value of $\vec x^T A \vec x$ depends on both the direction of $\vec x$ and its length. If we double $\vec x$ , then the value quadruples. To study the effect of direction alone, we normalize by the squared length of the vector.

The Rayleigh Quotient¶

Suppose $A$ is a symmetric $n \times n$ matrix. The Rayleigh quotient of $A$ is the function

g(\vec v) = \frac{\vec v^T A \vec v}{\vec v^T \vec v}

for all non-zero vectors $\vec v$ .

You should think of this as a normalized quadratic form. The numerator $\vec v^T A \vec v$ measures the output of the quadratic form, while the denominator $\vec v^T \vec v = \lVert \vec v \rVert^2$ removes the effect of scale.

Indeed, if $c \neq 0$ , then

g(c\vec v) = \frac{(c\vec v)^T A (c\vec v)}{(c\vec v)^T(c\vec v)} = \frac{c^2\vec v^T A \vec v}{c^2\vec v^T \vec v} = g(\vec v)

So the Rayleigh quotient depends only on the direction of $\vec v$ , not on its magnitude. In particular, if $\lVert \vec v \rVert = 1$ , then

g(\vec v) = \vec v^T A \vec v,

so the Rayleigh quotient is just the quadratic form restricted to the unit sphere.

In Homework 9, Problem 4, you showed that

\nabla g(\vec v) = \frac{2}{\vec v^T \vec v}\left(A\vec v - g(\vec v)\vec v\right)

If $\vec v$ is a critical point of $g$ , then $\nabla g(\vec v) = \vec 0$ , which forces

A\vec v = g(\vec v)\vec v

That means every critical point of the Rayleigh quotient is an eigenvector of $A$ , and the corresponding value of the Rayleigh quotient is the associated eigenvalue.

A = np.array([[1, 5], [5, 1]])

v1 = np.linspace(-5, 5, 400)
v2 = np.linspace(-5, 5, 400)
V1, V2 = np.meshgrid(v1, v2)

numerator = V1**2 + 10 * V1 * V2 + V2**2
denominator = V1**2 + V2**2
Z = np.where(denominator > 1e-12, numerator / denominator, np.nan)

theta = np.linspace(0, 2 * np.pi, 400)
circle_x = np.cos(theta)
circle_y = np.sin(theta)

eigvals, eigvecs = np.linalg.eigh(A)

fig = go.Figure()
fig.add_trace(
    go.Contour(
        z=Z,
        x=v1,
        y=v2,
        colorscale="RdBu_r",
        contours=dict(coloring="heatmap", showlabels=True),
        colorbar=dict(title="Rayleigh Quotient"),
    )
)

fig.add_trace(
    go.Scatter(
        x=circle_x,
        y=circle_y,
        mode="lines",
        line=dict(color="black", width=3),
        name="unit circle",
    )
)

for eigval, eigvec in zip(eigvals, eigvecs.T):
    direction = eigvec / np.linalg.norm(eigvec)
    fig.add_trace(
        go.Scatter(
            x=[-5 * direction[0], 5 * direction[0]],
            y=[-5 * direction[1], 5 * direction[1]],
            mode="lines",
            line=dict(width=3, dash="dash"),
            name=fr"eigenvalue {eigval:.0f}",
        )
    )

fig.update_layout(
    title=r"$$\text{Rayleigh quotient for } A = \begin{bmatrix} 1 & 5 \\ 5 & 1 \end{bmatrix}$$",
    width=700,
    height=650,
    font=dict(family="Palatino, serif", size=16, color="#222"),
    paper_bgcolor="white",
    plot_bgcolor="white",
    xaxis=dict(
        title="x",
        range=[-5, 5],
        showline=True,
        linecolor="black",
        linewidth=1,
        showgrid=True,
        gridcolor="#f0f0f0",
        zeroline=True,
        zerolinecolor="#c7c7c7",
    ),
    yaxis=dict(
        title="y",
        range=[-5, 5],
        showline=True,
        linecolor="black",
        linewidth=1,
        showgrid=True,
        gridcolor="#f0f0f0",
        zeroline=True,
        zerolinecolor="#c7c7c7",
        scaleanchor="x",
    ),
    legend=dict(bgcolor="rgba(255,255,255,0.8)"),
)

fig.show()

Loading...

The dashed lines mark the eigenvector directions of $A$ . Notice what changed compared to the earlier quadratic form plot: the extreme values no longer occur farther and farther away from the origin. After normalization, only direction matters.

The reddest direction is the eigenvector direction corresponding to the largest eigenvalue, which is 6. The bluest direction is the eigenvector direction corresponding to the smallest eigenvalue, which is -4. On the unit circle, those are exactly the maximum and minimum values of the quadratic form.

So for a symmetric matrix $A$ ,

the largest possible value of the Rayleigh quotient is the largest eigenvalue of $A$ , and
the smallest possible value of the Rayleigh quotient is the smallest eigenvalue of $A$ .

This gives a geometric interpretation of eigenvectors: they are the directions where the normalized quadratic form is stationary, and in fact extremized.

Activity 1¶

Activity 1

Suppose $x, y \in \mathbb{R}$ . What are the largest and smallest possible values of

f(x, y) = \frac{2x^2 + 12xy + 7y^2}{x^2 + y^2}

Solution

First, we should write $f(x, y)$ as a Rayleigh quotient. Let $\vec x = \begin{bmatrix} x \\ y \end{bmatrix}$ and

A = \begin{bmatrix} 2 & 6 \\ 6 & 7 \end{bmatrix}

Then,

f(x, y) = \frac{\vec x^T A \vec x}{\vec x^T \vec x} = g(\vec x)

Before going any further, you should verify that $\vec x^T A \vec x$ indeed is equal to $2x^2 + 12xy + 7y^2$ .

Using the logic introduced before this activity, the largest possible value of $f(x, y)$ is the largest eigenvalue of $A$ , and the smallest possible value is the smallest eigenvalue of $A$ .

The eigenvalues of $A$ must

sum to $\text{trace}(A) = 2 + 7 = 9$ , and
multiply to $\det(A) = 2 \cdot 7 - 6^2 = -22$ .

So, $A$ ’s eigenvalues are 11 and -2. Thus, the largest possible value of $f(x, y)$ is 11, and the smallest possible value of $f(x, y)$ is -2.

The Rayleigh quotient will reappear in Chapter 10.3, where we’ll apply it with $A = \tilde X^T \tilde X$ to find the best direction for dimensionality reduction.