2.9. Inverses

Motivation: What is an Inverse?¶

Scalar Addition and Multiplication¶

In “regular” addition and multiplication, you’re already familiar with the idea of an inverse. In addition, the inverse of a number $a$ is another number $a'$ that satisfies

a + a' = a' + a = 0

0 is the “identity” element in addition, since adding it to any number $a$ doesn’t change the value of $a$ . Of course, $a' = -a$ , so $-a$ is the additive inverse of $a$ . Any number $a$ has an additive inverse; for example, the additive inverse of 2 is -2, and the additive inverse of -2 is 2.

In multiplication, the inverse of a number $a$ is another number $a'$ that satisfies

a \cdot a' = a' \cdot a = 1

1 is the “identity” element in multiplication, since multiplying it by any number $a$ doesn’t change the value of $a$ . Most numbers have a multiplicative inverse of $a' = \frac{1}{a}$ , but 0 does not!

0 \cdot a' = 0

is not achieved by any $a'$ .

When a multiplicative inverse exists, we can use it to solve equations like

2x = 5

by multiplying both sides by the multiplicative inverse of 2, which is $\frac{1}{2}$ .

\frac{1}{2} (2x) = \frac{1}{2} (5) \implies x = \frac{5}{2}

Matrices¶

If $A$ is an $n \times d$ matrix, its additive inverse is just $-A$ , which comes from negating each element of $A$ . That part is not all that interesting. What we’re more interested in is the multiplicative inverse of a matrix, which is just referred to as the inverse of the matrix.

Suppose $A$ is some $n \times d$ matrix. We’d like to find an inverse, $A'$ , such that when $A$ is multiplied by $A'$ in any order, the result is the identity element for matrix multiplication, $I$ . Recall, the $n \times n$ identity matrix is a matrix with 1s on the diagonal (moving from the top left to the bottom right), and 0s everywhere else. The $3 \times 3$ identity matrix is

I_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

$I_3 \vec x = \vec x$ for any $\vec x \in \mathbb{R}^3$ , and $B I_3 = I_3 B = B$ for any $B \in \mathbb{R}^{3 \times 3}$ . The same holds true for any $I_n$ , where $n$ is the dimension of the space.

\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}

If $A$ is an $n \times d$ matrix, we’d need a single matrix $A'$ that satisfies

AA' = A'A = I

But, in order to evaluate $AA'$ , we’d need $A'$ to be of shape $d \times \text{something}$ , and in order to evaluate $A'A$ , we’d need $A'$ to be of shape $\text{something} \times n$ . The only scenario where these are both possible is when $A$ is a square matrix, i.e. $n = d$ , and $A'$ is also a square matrix of shape $n \times n$ .

For example, consider the $2 \times 2$ matrix

A = \begin{bmatrix} 2 & 4 \\ 3 & 5\end{bmatrix}

It turns out that $A$ does have an inverse! Its inverse, denoted by $A^{-1}$ , is

A^{-1} = \begin{bmatrix} -5/2 & 2 \\ 3/2 & -1\end{bmatrix}

Because $A^{-1}$ is the matrix that satisfies both

AA^{-1} = \begin{bmatrix} 2 & 4 \\ 3 & 5\end{bmatrix} \begin{bmatrix} -5/2 & 2 \\ 3/2 & -1\end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1\end{bmatrix} = I

and

A^{-1}A = \begin{bmatrix} -5/2 & 2 \\ 3/2 & -1\end{bmatrix} \begin{bmatrix} 2 & 4 \\ 3 & 5\end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1\end{bmatrix} = I

numpy is good at finding inverses, when they exist.

import numpy as np

A = np.array([[2, 4], 
              [3, 5]])
              
np.linalg.inv(A)

array([[-2.5,  2. ],
       [ 1.5, -1. ]])

# The top-left and bottom-right elements are 1s, 
# and the top-right and bottom-left elements are 0s.
# 4.4408921e-16 is just 0, with some floating point error baked in.
A @ np.linalg.inv(A)

array([[1.0000000e+00, 0.0000000e+00],
       [4.4408921e-16, 1.0000000e+00]])

But, not all square matrices are invertible, just like not all numbers have multiplicative inverses. What happens if we try and invert

B = \begin{bmatrix} 1 & 2 \\ 2 & 4\end{bmatrix}

B = np.array([[1, 2], 
              [2, 4]])

np.linalg.inv(B)

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
Cell In[17], line 4
      1 B = np.array([[1, 2], 
      2               [2, 4]])
----> 4 np.linalg.inv(B)

File ~/miniforge3/envs/pds/lib/python3.10/site-packages/numpy/linalg/linalg.py:561, in inv(a)
    559 signature = 'D->D' if isComplexType(t) else 'd->d'
    560 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 561 ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
    562 return wrap(ainv.astype(result_t, copy=False))

File ~/miniforge3/envs/pds/lib/python3.10/site-packages/numpy/linalg/linalg.py:112, in _raise_linalgerror_singular(err, flag)
    111 def _raise_linalgerror_singular(err, flag):
--> 112     raise LinAlgError("Singular matrix")

LinAlgError: Singular matrix

We’re told the matrix is singular, meaning that it’s not invertible.

The point of Chapter 2.9 is to understand why some square matrices are invertible and others are not, and what the inverse of an invertible matrix really means.

Linear Transformations¶

But first, I want us to think about matrix-vector multiplication as something more than just number crunching.

In Chapter 2.8, the running example was the matrix

A = \begin{bmatrix} 5 & 3 & 2 \\ 0 & -1 & 1 \\ 3 & 4 & -1 \\ 6 & 2 & 4 \\ 1 & 0 & 1 \end{bmatrix}

To multiply $A$ by a vector on the right, that vector must be in $\mathbb{R}^3$ , and the result will be a vector in $\mathbb{R}^5$ .

Put another way, if we consider the function $T(\vec x) = A \vec x$ , $T$ maps elements of $\mathbb{R}^3$ to elements of $\mathbb{R}^5$ , i.e.

T: \mathbb{R}^3 \to \mathbb{R}^5

I’ve chosen the letter $T$ to denote that $T$ is a linear transformation.

Every linear transformation is of the form $T(\vec x) = A \vec x$ . For our purposes, linear transformations and matrix-vector multiplication are the same thing, though in general linear transformations are a more abstract concept (just like how vector spaces can be made up of functions, for example).

For example, the function

f(\vec x) = f\left( \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \right) = \begin{bmatrix} 2x_1 + 3x_2 \\ x_1 \\ x_2 \end{bmatrix}

is a linear transformation from $\mathbb{R}^2$ to $\mathbb{R}^3$ , and is equivalent to

f(\vec x) = \begin{bmatrix} 2x_1 + 3x_2 \\ x_1 \\ x_2 \end{bmatrix} = \underbrace{\begin{bmatrix} 2 & 3 \\ 1 & 0 \\ 0 & 1 \end{bmatrix}}_{\text{matrix}\: A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}}_{\vec x}

The function $g(x) = 3x$ is also a linear transformation, from $\mathbb{R}$ to $\mathbb{R}$ .

A non-example of a linear transformation is

h(\vec x) = \begin{bmatrix} x_1^2 \\ x_2^2 \end{bmatrix}

because no matrix multiplied by $\vec x$ will produce $\begin{bmatrix} x_1^2 \\ x_2^2 \end{bmatrix}$ .

Another non-example, perhaps surprisingly, is

k(x) = -2x + 5

This is the equation of a line in $\mathbb{R}^2$ , which is linear in some sense, but it’s not a linear transformation, since it doesn’t satisfy the two properties of linearity. For $k(x)$ to be a linear transformation, we’d need

k(cx) = ck(x)

for any $c, x \in \mathbb{R}$ . But, if we consider $c = 3$ and $x = 1$ as an example, we get

k(cx) = k(3 \cdot 1) = k(3) = -2 \cdot 3 + 5 = -1 \\ ck(x) = 3 k(1) = 3 (-2 \cdot 1 + 5) = 3(3) = 9

which are not equal. $k(x) = -2x + 5$ is an example of an affine transformation, which in general is any function $f: \mathbb{R}^d \to \mathbb{R}^n$ that can be written as $f(\vec x) = A \vec x + \vec b$ , where $A$ is an $n \times d$ matrix and $\vec b \in \mathbb{R}^n$ .

Activity 1

Activity 1.1

True or false: if $T$ is a linear transformation, then $T(\vec 0) = \vec 0$ .

Activity 1.2

For each of the following functions, determine whether it is a linear transformation. If it is, write it in the form $T(\vec x) = A \vec x$ . If it is not, explain why not.

(More activities coming soon!)

From $\mathbb{R}^n$ to $\mathbb{R}^n$ ¶

While linear transformations exist from $\mathbb{R}^2$ to $\mathbb{R}^5$ or $\mathbb{R}^{99}$ to $\mathbb{R}^4$ , it’s in some ways easiest to think about linear transformations with the same domain and codomain, i.e. transformations of the form $T: \mathbb{R}^n \to \mathbb{R}^n$ . This will allow us to explore how transformations stretch, rotate, and reflect vectors in the same space. Linear transformations with the same domain ( $\mathbb{R}^n$ ) and codomain ( $\mathbb{R}^n$ ) are represented by $n \times n$ matrices, which gives us a useful setting to think about the invertibility of square matrices, beyond just looking at a bunch of numbers.

To start, let’s consider the linear transformation defined by the matrix

A = \begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}

What happens to a vector in $\mathbb{R}^2$ when we multiply it by $A$ ? Let’s visualize the effect of $A$ on several vectors in $\mathbb{R}^2$ .

from utils import plot_vectors
import numpy as np

A = np.array([[2, 0], [0, - 1 / 3]])
u = np.array([3, 2])
v = np.array([2, -4])
w = np.array([-1, 3])

Au = A @ u
Av = A @ v
Aw = A @ w

fig = plot_vectors(
    [
        (tuple(u), 'orange', r'$\vec u$'),
        (tuple(v), '#3d81f6', r'$\vec v$'),
        (tuple(w), '#d81a60', r'$\vec w$'),
        (tuple(Au), 'orange', r'$A \vec u$'),
        (tuple(Av), '#3d81f6', r'$A \vec v$'),
        (tuple(Aw), '#d81a60', r'$A \vec w$')
    ],
    vdeltay=0.2
)

# Fix aspect ratio: set both axes to be equal using 'constrain'
fig.update_layout(
    xaxis=dict(range=[-2, 6], dtick=1),
    yaxis=dict(scaleanchor="x", range=[-4, 3], dtick=1)
)

fig.show(scale=3, renderer='png')

$A$ scales, or stretches, the input space by a factor of 2 in the $x$ -direction and a factor of $-1/3$ in the $y$ -direction.

Scaling¶

Another way of visualizing $A$ is to think about how it transforms the two standard basis vectors of $\mathbb{R}^2$ , which are

{\color{#3d81f6}{\vec u_x = \begin{bmatrix} 1 \\ 0 \end{bmatrix}}}, \quad \color{#3d81f6}{\vec u_y = \begin{bmatrix} 0 \\ 1 \end{bmatrix}}

(In the past I’ve called these $\vec e_1$ and $\vec e_2$ , but I’ll use $\color{#3d81f6}{\vec u_x}$ and $\color{#3d81f6}{\vec u_y}$ here since I’ll also use $E$ to represent a matrix shortly.)

Note that $\color{orange}{A \vec u_x} = \begin{bmatrix} 2 \\ 0 \end{bmatrix}$ is just the first column of $A$ , and similarly $\color{orange}{A \vec u_y} = \begin{bmatrix} 0 \\ -1/3 \end{bmatrix}$ is the second column of $A$ .

A = \begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from utils import plot_vectors

def plot_unit_square_and_transform(S, name="Matrix", vdeltax=-0.3, vdeltay=0.2, return_fig=False):
    # Vertices of the unit square
    square = np.array([
        [0, 0],  # A
        [1, 0],  # B
        [1, 1],  # C
        [0, 1],  # D
        [0, 0]   # A (to close the square)
    ])
    # Transformed square
    square_trans = (S @ square.T).T

    # Create subplot figure
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=(r"$$\vec u_x \text{  and } \vec u_y$$", fr"$${name}" + r"\vec u_x \text{  and }" + fr"{name} \vec u_y$$"),
        horizontal_spacing=0.08
    )

    # Common axis settings
    axis_style = dict(
        showgrid=True,
        gridcolor="#f0f0f0",
        zeroline=False,  # We'll add our own zero lines
        showline=True,
        linecolor="#f0f0f0",
        mirror=True,
        ticks="outside",
        showticklabels=True,
        tickfont=dict(family="Palatino, serif", size=14),
    )

    # Set axis ranges to [-2.5, 2.5] for both subplots
    for i in [1, 2]:
        fig.update_xaxes(
            range=[-2.5, 2.5],
            constrain="domain",
            dtick=1,
            **axis_style,
            row=1, col=i
        )
        fig.update_yaxes(
            range=[-2.5, 2.5],
            scaleanchor=f"x{i}",
            dtick=1,
            **axis_style,
            row=1, col=i
        )

    # --- Draw faint 0 grid lines underneath everything else ---
    for i in [1, 2]:
        # Horizontal y=0
        fig.add_shape(
            type="line",
            x0=-2.5, x1=2.5, y0=0, y1=0,
            line=dict(color="rgba(170,170,170,0.25)", width=2, dash="solid"),
            row=1, col=i,
            layer="below"
        )
        # Vertical x=0
        fig.add_shape(
            type="line",
            x0=0, x1=0, y0=-2.5, y1=2.5,
            line=dict(color="rgba(170,170,170,0.25)", width=2, dash="solid"),
            row=1, col=i,
            layer="below"
        )

    # Plot original unit square (no perimeter, low opacity)
    fig.add_trace(
        go.Scatter(
            x=square[:,0], y=square[:,1],
            fill="toself",
            fillcolor="rgba(61,129,246,0.4)",
            line=dict(color="rgba(0,0,0,0)", width=0),  # No perimeter
            name="Unit Square",
            showlegend=False
        ),
        row=1, col=1
    )

    # Plot transformed square (no perimeter, low opacity)
    fig.add_trace(
        go.Scatter(
            x=square_trans[:,0], y=square_trans[:,1],
            fill="toself",
            fillcolor="rgba(255,140,0,0.4)",
            line=dict(color="rgba(0,0,0,0)", width=0),  # No perimeter
            name="Transformed Square",
            showlegend=False
        ),
        row=1, col=2
    )

    # --- Draw vectors using plot_vectors ---
    # Standard basis vectors
    u_x = np.array([1, 0])
    u_y = np.array([0, 1])
    # Transformed basis vectors
    Su_x = S @ u_x
    Su_y = S @ u_y

    # Left: u_x, u_y in #3d81f6
    left_vecs = [
        (tuple(u_x), '#3d81f6', r'$\vec u_x$'),
        (tuple(u_y), '#3d81f6', r'$\vec u_y$')
    ]
    left_fig = plot_vectors(left_vecs, vdeltay=-0.2)
    # Remove background, axes, and grid from plot_vectors output
    left_fig.update_layout(
        plot_bgcolor="rgba(0,0,0,0)",
        paper_bgcolor="rgba(0,0,0,0)",
        xaxis=dict(visible=False),
        yaxis=dict(visible=False),
        margin=dict(l=0, r=0, t=0, b=0)
    )
    # Add the vector traces to the left subplot
    for trace in left_fig.data:
        fig.add_trace(trace, row=1, col=1)
    # Add the vector annotations to the left subplot
    for ann in left_fig.layout.annotations:
        # Convert Plotly Annotation object to dict safely
        if hasattr(ann, "to_plotly_json"):
            ann_dict = ann.to_plotly_json()
        else:
            ann_dict = dict(ann)
        # Remove keys that are not valid for add_annotation
        ann_dict = {k: v for k, v in ann_dict.items() if k not in {"xref", "yref", "axref", "ayref"}}
        fig.add_annotation(**ann_dict, row=1, col=1)

    # Right: Su_x, Su_y in orange, with correct labels
    right_vecs = [
        (tuple(Su_x), 'orange', rf'${name} \vec u_x$'),
        (tuple(Su_y), 'orange', rf'${name} \vec u_y$')
    ]
    right_fig = plot_vectors(right_vecs, vdeltax=vdeltax, vdeltay=vdeltay)
    right_fig.update_layout(
        plot_bgcolor="rgba(0,0,0,0)",
        paper_bgcolor="rgba(0,0,0,0)",
        xaxis=dict(visible=False),
        yaxis=dict(visible=False),
        margin=dict(l=0, r=0, t=0, b=0)
    )
    for trace in right_fig.data:
        fig.add_trace(trace, row=1, col=2)
    for ann in right_fig.layout.annotations:
        # Convert Plotly Annotation object to dict safely
        if hasattr(ann, "to_plotly_json"):
            ann_dict = ann.to_plotly_json()
        else:
            ann_dict = dict(ann)
        # Remove keys that are not valid for add_annotation
        ann_dict = {k: v for k, v in ann_dict.items() if k not in {"xref", "yref", "axref", "ayref"}}
        fig.add_annotation(**ann_dict, row=1, col=2)

    fig.update_layout(
        font=dict(family="Palatino, serif", size=16),
        plot_bgcolor="white",
        paper_bgcolor="white",
        margin=dict(l=20, r=20, t=40, b=20),
        width=800, height=400,
    )

    if return_fig:
        return fig
    fig.show(renderer='png', scale=3)

# Example usage:
A = np.array([[2, 0], [0, -1/3]])
plot_unit_square_and_transform(A, name="A")

In addition to drawing $\color{#3d81f6}\vec u_x$ and $\color{#3d81f6}\vec u_y$ on the left and their transformed counterparts $\color{orange}A \vec u_x$ and $\color{orange}A \vec u_y$ on the right, I’ve also shaded in how the unit square, which is the square containing $\color{#3d81f6}\vec u_x$ and $\color{#3d81f6}\vec u_y$ , gets transformed. Here, it gets stretched from a square to a rectangle.

Remember that any vector $\vec v \in \mathbb{R}^2$ is a linear combination of $\color{#3d81f6}\vec u_x$ and $\color{#3d81f6}\vec u_y$ . For instance,

\begin{bmatrix} 7 \\ 4 \end{bmatrix} = 7 \begin{bmatrix} 1 \\ 0 \end{bmatrix} + 4 \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \underbrace{7 {\color{#3d81f6}{\vec u_x}} + 4 {\color{#3d81f6}{\vec u_y}}}_{\vec v}

So, multiplying $A$ by $\vec v$ is equivalent to multiplying $A$ by a linear combination of $\vec u_x$ and $\vec u_y$ .

A \begin{bmatrix} 7 \\ 4 \end{bmatrix} = A (7 {\color{#3d81f6}{\vec u_x}} + 4 {\color{#3d81f6}{\vec u_y}}) = 7{\color{orange}{A \vec u_x}} + 4{\color{orange}{A \vec u_y}}

and the result is a linear combination of $\color{orange}{A \vec u_x}$ and $\color{orange}{A \vec u_y}$ with the same coefficients!

So, as we move through the following examples, think of the transformed basis vectors $\color{orange}{A \vec u_x}$ and $\color{orange}{A \vec u_y}$ as a new set of “building blocks” that define the transformed space (which is the column space of $A$ ).

$A$ is a diagonal matrix, which means it scales vectors. Note that any vector in $\mathbb{R}^2$ can be transformed by $A$ , not just vectors on or within the unit square; I’m just using these two basis vectors to visualize the transformation.

Rotations and Orthogonal Matrices¶

What might a non-diagonal matrix do? Let’s consider

B = \begin{bmatrix} \sqrt{2} / 2 & -\sqrt{2} / 2 \\ \sqrt{2} / 2 & \sqrt{2} / 2 \end{bmatrix}

B = np.array([[np.cos(np.pi/4), -np.sin(np.pi/4)], 
              [np.sin(np.pi/4), np.cos(np.pi/4)]])
plot_unit_square_and_transform(B, name="B", vdeltay=-0.3)

$B$ is an orthogonal matrix, which means that its columns are unit vectors and are orthogonal to one another.

\underbrace{B^TB = BB^T = I}_\text{condition for an orthogonal matrix}

Orthogonal matrices rotate vectors in the input space. In general, a matrix that rotates vectors by $\theta$ (radians) counterclockwise is given by

R(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}

$B = R(\frac{\pi}{4})$ rotates vectors by $\frac{\pi}{4}$ radians, i.e. $45^\circ$ .

To drive home the point I made earlier, any vector $\vec v = \begin{bmatrix} x \\ y \end{bmatrix}$ , once multiplied by $B$ , ends up transforming into

B \underbrace{\begin{bmatrix} x \\ y \end{bmatrix}}_{\vec v} = x ({\color{orange}{B \vec u_x}}) + y ({\color{orange}{B \vec u_y}})

Composing Transformations¶

We can even apply multiple transformations one after another. This is called composing transformations. For instance,

C = \begin{bmatrix} \sqrt{2} & -\sqrt{2} \\ -\sqrt{2} / 6 & -\sqrt{2} / 6 \end{bmatrix}

is just

C = AB = \underbrace{\begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}}_{\text{scale}} \underbrace{\begin{bmatrix} \sqrt{2}/2 & -\sqrt{2}/2 \\ \sqrt{2}/2 & \sqrt{2}/2 \end{bmatrix}}_{\text{rotate}}

C = A @ B
plot_unit_square_and_transform(C, name="C", vdeltay=-0.3)

Note that $C$ rotates the input vector, and then scales it. Read the operations from right to left, since $C\vec x = AB \vec x = A(B \vec x)$ .

$C = AB$ is different from

D = \begin{bmatrix} \sqrt{2} & \sqrt{2} / 6 \\ \sqrt{2} & -\sqrt{2} / 6 \end{bmatrix} = B A

D = B @ A
plot_unit_square_and_transform(D, name="D", vdeltay=-0.3)

Shears¶

E = \begin{bmatrix} 1 & -2/3 \\ 0 & 1 \end{bmatrix}

E = np.array([[1, -2 / 3], [0, 1]])
plot_unit_square_and_transform(E, name="E", vdeltay=-0.2, vdeltax=0.55)

$E$ is a shear matrix. Think of a shear as a transformation that slants the input space along one axis, while keeping the other axis fixed. What helps me interpret shears is looking at them formulaically.

E \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 1 & -2/3 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x - 2/3 y \\ y \end{bmatrix}

Note that the $y$ -coordinate of input vectors in $\mathbb{R}^2$ remain unchanged, while the $x$ -coordinate is shifted by $-\frac{2}{3}y$ , which results in a slanted shape.

Similarly, $F$ is a shear matrix that keeps the $x$ -coordinate fixed, but shifts the $y$ -coordinate, resulting in a slanted shape that is tilted downwards.

F = \begin{bmatrix} 1 & 0 \\ -5/4 & 1 \end{bmatrix}

F = np.array([[1, 0], [-5 / 4, 1]])
plot_unit_square_and_transform(F, name="F", vdeltay=-1.3, vdeltax=0.3)

Projections¶

So far we’ve looked at scaling, rotation, and shear matrices. Yet another type is a projection matrix.

G = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}

G = np.array([[1, 0], [0, 0]])
plot_unit_square_and_transform(G, name="G")

$G \vec x$ projects $\vec x$ onto the $x$ -axis and throws away the $y$ -coordinate. Note that $G$ maps the unit square to a line, not another four-sided shape.

You might also notice that, unlike the matrices we’ve seen so far, $\text{colsp}(G)$ is not all of $\mathbb{R}^2$ , but rather it’s just a line in $\mathbb{R}^2$ , since $G$ ’s columns are not linearly independent.

$H$ below works similarly.

H = \begin{bmatrix} 1 / 2 & -1 \\ 1 & -2 \end{bmatrix}

H = np.array([[1 / 2, -1], [1, -2]])
plot_unit_square_and_transform(H, name="H")

$\text{colsp}(H)$ is the line spanned by $\begin{bmatrix} 1 / 2 \\ 1 \end{bmatrix}$ , so $H \vec x$ will always be some vector on this line.

Put another way, if $\vec v = \begin{bmatrix} x \\ y \end{bmatrix}$ , then $H \vec v$ is

H \underbrace{\begin{bmatrix} x \\ y \end{bmatrix}}_{\vec v} = x ({\color{orange}{H \vec u_x}}) + y ({\color{orange}{H \vec u_y}})

but since $\color{orange}{H \vec u_x}$ and $\color{orange}{H \vec u_y}$ are both on the line spanned by $\begin{bmatrix} 1 / 2 \\ 1 \end{bmatrix}$ , $H \vec v$ is really just a scalar multiple of $\begin{bmatrix} 1 / 2 \\ 1 \end{bmatrix}$ .

Arbitrary Matrices¶

Finally, I’ll comment that not all linear transformations have a nice, intuitive interpretation. For instance, consider

J = \begin{bmatrix} 1 / 3 & -1 \\ 1 & -1 / 2 \end{bmatrix}

J = np.array([[1 / 3, -1], [1, -1 / 2]])
plot_unit_square_and_transform(J, name="J", vdeltay=0.45, vdeltax=-0.5)

$J$ turns the unit square into a parallelogram. In fact, so did $A$ , $B$ , $C$ , $D$ , $E$ , and $F$ ; all of these transformations map the unit square to a parallelogram, with some additional properties (e.g. $A$ ’s parallelogram was a rectangle, $B$ ’s had equal sides, etc.).

There’s no need to memorize the names of these transformations – after all, they only apply in $\mathbb{R}^2$ and perhaps $\mathbb{R}^3$ where we can visualize.

Speaking of $\mathbb{R}^3$ , an arbitrary $3 \times 3$ matrix can be thought of as a transformation that maps the unit cube to a parallelepiped (the generalization of a parallelogram to three dimensions).

K = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 1/2 \\ 0 & -1 & 1 / 2 \end{bmatrix}

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from utils import plot_vectors

def plot_unit_cube_and_transform(S, name="Matrix", vdeltax=-0.3, vdeltay=0.2, vdeltaz=0.2):
    """
    Visualize the action of a 2x2 or 3x3 matrix S on the unit square/cube and basis vectors.
    For 2D, shows the unit square and its image; for 3D, shows the unit cube and its image.
    """
    S = np.asarray(S)
    dim = S.shape[0]
    assert S.shape[0] == S.shape[1], "Matrix must be square"
    assert dim in (2, 3), "Only 2D or 3D matrices supported"

    if dim == 2:
        # Vertices of the unit square (closed)
        square = np.array([
            [0, 0],  # A
            [1, 0],  # B
            [1, 1],  # C
            [0, 1],  # D
        ])
        square_closed = np.vstack([square, square[0]])  # for plotting outline if needed
        square_trans = (S @ square.T).T
        square_trans_closed = np.vstack([square_trans, square_trans[0]])

        fig = make_subplots(
            rows=1, cols=2,
            horizontal_spacing=0.08,
            subplot_titles=("Original Unit Square", f"Transformed by {name}")
        )

        axis_style = dict(
            showgrid=True,
            gridcolor="#eeeeee",
            zeroline=False,
            showline=True,
            linecolor="#eeeeee",
            mirror=True,
            ticks="",  # turn off axis ticks
            showticklabels=False,  # turn off axis tick labels
            tickfont=dict(family="Palatino, serif", size=14),
        )

        for i in [1, 2]:
            fig.update_xaxes(
                range=[-2.5, 2.5],
                constrain="domain",
                dtick=1,
                **axis_style,
                row=1, col=i
            )
            fig.update_yaxes(
                range=[-2.5, 2.5],
                scaleanchor=f"x{i}",
                dtick=1,
                **axis_style,
                row=1, col=i
            )

        # Draw gridlines at x=0 and y=0 in #ccc
        for i in [1, 2]:
            fig.add_shape(
                type="line",
                x0=-2.5, x1=2.5, y0=0, y1=0,
                line=dict(color="#cccccc", width=2, dash="solid"),
                row=1, col=i,
                layer="below"
            )
            fig.add_shape(
                type="line",
                x0=0, x1=0, y0=-2.5, y1=2.5,
                line=dict(color="#cccccc", width=2, dash="solid"),
                row=1, col=i,
                layer="below"
            )

        # Plot original unit square (shaded, no boundary)
        fig.add_trace(
            go.Scatter(
                x=square_closed[:,0], y=square_closed[:,1],
                fill="toself",
                fillcolor="rgba(61,129,246,0.25)",
                line=dict(color="rgba(0,0,0,0)", width=0),
                name="Unit Square",
                showlegend=False
            ),
            row=1, col=1
        )
        # Plot transformed square (shaded, no boundary)
        fig.add_trace(
            go.Scatter(
                x=square_trans_closed[:,0], y=square_trans_closed[:,1],
                fill="toself",
                fillcolor="rgba(255,140,0,0.25)",
                line=dict(color="rgba(0,0,0,0)", width=0),
                name="Transformed Square",
                showlegend=False
            ),
            row=1, col=2
        )

        # --- Draw vectors using plot_vectors ---
        u_x = np.array([1, 0])
        u_y = np.array([0, 1])
        Su_x = S @ u_x
        Su_y = S @ u_y

        # Only draw the border for the basis vectors
        left_vecs = [
            (tuple(u_x), '#3d81f6', None),
            (tuple(u_y), '#3d81f6', None)
        ]
        left_fig = plot_vectors(left_vecs, vdeltay=-0.2)
        left_fig.update_layout(
            plot_bgcolor="white",
            paper_bgcolor="white",
            xaxis=dict(visible=False),
            yaxis=dict(visible=False),
            margin=dict(l=0, r=0, t=0, b=0)
        )
        for trace in left_fig.data:
            fig.add_trace(trace, row=1, col=1)

        right_vecs = [
            (tuple(Su_x), 'orange', None),
            (tuple(Su_y), 'orange', None)
        ]
        right_fig = plot_vectors(right_vecs, vdeltax=vdeltax, vdeltay=vdeltay)
        right_fig.update_layout(
            plot_bgcolor="white",
            paper_bgcolor="white",
            xaxis=dict(visible=False),
            yaxis=dict(visible=False),
            margin=dict(l=0, r=0, t=0, b=0)
        )
        for trace in right_fig.data:
            fig.add_trace(trace, row=1, col=2)

        fig.update_layout(
            font=dict(family="Palatino, serif", size=16),
            plot_bgcolor="white",
            paper_bgcolor="white",
            margin=dict(l=20, r=20, t=40, b=20),
            width=600, height=300,
        )
        fig.show(renderer='png', scale=3)

    elif dim == 3:
        # Vertices of the unit cube (8 corners)
        cube = np.array([
            [0, 0, 0],  # 0
            [1, 0, 0],  # 1
            [1, 1, 0],  # 2
            [0, 1, 0],  # 3
            [0, 0, 1],  # 4
            [1, 0, 1],  # 5
            [1, 1, 1],  # 6
            [0, 1, 1],  # 7
        ])
        
        cube_trans = (S @ cube.T).T

        fig = make_subplots(
            rows=1, cols=2,
            specs=[[{'type': 'scene'}, {'type': 'scene'}]],
            horizontal_spacing=0.08,
            subplot_titles=("u<sub>x</sub>, u<sub>y</sub>, and u<sub>z</sub>", f"{name}u<sub>x</sub>, {name}u<sub>y</sub>, and {name}u<sub>z</sub>")
        )

        # Draw gridlines at x=0, y=0, z=0 in #ccc for both scenes
        for i in [1, 2]:
            # x=0 plane (y-z gridline)
            fig.add_trace(
                go.Scatter3d(
                    x=[0, 0], y=[-2.5, 2.5], z=[0, 0],
                    mode="lines",
                    line=dict(color="#cccccc", width=2),
                    showlegend=False
                ),
                row=1, col=i
            )
            fig.add_trace(
                go.Scatter3d(
                    x=[0, 0], y=[0, 0], z=[-2.5, 2.5],
                    mode="lines",
                    line=dict(color="#cccccc", width=2),
                    showlegend=False
                ),
                row=1, col=i
            )
            # y=0 plane (x-z gridline)
            fig.add_trace(
                go.Scatter3d(
                    x=[-2.5, 2.5], y=[0, 0], z=[0, 0],
                    mode="lines",
                    line=dict(color="#cccccc", width=2),
                    showlegend=False
                ),
                row=1, col=i
            )
            fig.add_trace(
                go.Scatter3d(
                    x=[0, 0], y=[0, 0], z=[-2.5, 2.5],
                    mode="lines",
                    line=dict(color="#cccccc", width=2),
                    showlegend=False
                ),
                row=1, col=i
            )
            # z=0 plane (x-y gridline)
            fig.add_trace(
                go.Scatter3d(
                    x=[-2.5, 2.5], y=[0, 0], z=[0, 0],
                    mode="lines",
                    line=dict(color="#cccccc", width=2),
                    showlegend=False
                ),
                row=1, col=i
            )
            fig.add_trace(
                go.Scatter3d(
                    x=[0, 0], y=[-2.5, 2.5], z=[0, 0],
                    mode="lines",
                    line=dict(color="#cccccc", width=2),
                    showlegend=False
                ),
                row=1, col=i
            )

        # Create mesh for the original cube (all vertices at once)
        # Define faces using vertex indices - i, j, k for triangulation
        i = [0, 0, 0, 0, 1, 1, 2, 2, 4, 4, 5, 5]
        j = [1, 3, 1, 3, 2, 5, 3, 6, 5, 7, 6, 7]
        k = [2, 7, 5, 4, 6, 6, 7, 7, 6, 6, 1, 4]
        
        fig.add_trace(
            go.Mesh3d(
                x=cube[:,0], y=cube[:,1], z=cube[:,2],
                i=i, j=j, k=k,
                color="rgba(61,129,246,0.25)",
                opacity=0.7,
                flatshading=True,
                showscale=False,
                hoverinfo='skip',
                name="Unit Cube",
                showlegend=False,
                lighting=dict(ambient=0.5, diffuse=0.7, specular=0.2, roughness=0.7, fresnel=0.1),
                lightposition=dict(x=2, y=2, z=4)
            ),
            row=1, col=1
        )
        
        # Create mesh for the transformed cube
        fig.add_trace(
            go.Mesh3d(
                x=cube_trans[:,0], y=cube_trans[:,1], z=cube_trans[:,2],
                i=i, j=j, k=k,
                color="rgba(255,140,0,0.25)",
                opacity=0.7,
                flatshading=True,
                showscale=False,
                hoverinfo='skip',
                name="Transformed Cube",
                showlegend=False,
                lighting=dict(ambient=0.5, diffuse=0.7, specular=0.2, roughness=0.7, fresnel=0.1),
                lightposition=dict(x=2, y=2, z=4)
            ),
            row=1, col=2
        )

        # --- Draw vectors using plot_vectors ---
        u_x = np.array([1, 0, 0])
        u_y = np.array([0, 1, 0])
        u_z = np.array([0, 0, 1])
        Su_x = S @ u_x
        Su_y = S @ u_y
        Su_z = S @ u_z

        # Only draw the border for the basis vectors
        left_vecs = [
            (tuple(u_x), '#3d81f6', None),
            (tuple(u_y), '#3d81f6', None),
            (tuple(u_z), '#3d81f6', None)
        ]
        left_fig = plot_vectors(left_vecs, vdeltay=-0.2, vdeltaz=-0.2)
        left_fig.update_layout(
            scene=dict(
                xaxis=dict(visible=False, showticklabels=False, ticks=""),
                yaxis=dict(visible=False, showticklabels=False, ticks=""),
                zaxis=dict(visible=False, showticklabels=False, ticks=""),
                bgcolor="white"
            ),
            margin=dict(l=0, r=0, t=0, b=0),
            paper_bgcolor="white",
            plot_bgcolor="white"
        )
        for trace in left_fig.data:
            fig.add_trace(trace, row=1, col=1)

        # Only draw the border for the transformed basis vectors
        right_vecs = [
            (tuple(Su_x), 'orange', None),
            (tuple(Su_y), 'orange', None),
            (tuple(Su_z), 'orange', None)
        ]
        right_fig = plot_vectors(right_vecs, vdeltax=vdeltax, vdeltay=vdeltay, vdeltaz=vdeltaz)
        right_fig.update_layout(
            scene=dict(
                xaxis=dict(visible=False, showticklabels=False, ticks=""),
                yaxis=dict(visible=False, showticklabels=False, ticks=""),
                zaxis=dict(visible=False, showticklabels=False, ticks=""),
                bgcolor="white"
            ),
            margin=dict(l=0, r=0, t=0, b=0),
            paper_bgcolor="white",
            plot_bgcolor="white"
        )
        for trace in right_fig.data:
            fig.add_trace(trace, row=1, col=2)

        # Set axis ranges and aspect for both scenes with WHITE background
        for i in [1, 2]:
            fig.update_scenes(
                xaxis=dict(range=[-2.5, 2.5], backgroundcolor="white", showgrid=True, gridcolor="#eeeeee", zeroline=False, showline=True, linecolor="#eeeeee", mirror=True, ticks="", showticklabels=False, dtick=1, tickfont=dict(family="Palatino, serif", size=14)),
                yaxis=dict(range=[-2.5, 2.5], backgroundcolor="white", showgrid=True, gridcolor="#eeeeee", zeroline=False, showline=True, linecolor="#eeeeee", mirror=True, ticks="", showticklabels=False, dtick=1, tickfont=dict(family="Palatino, serif", size=14)),
                zaxis=dict(range=[-2.5, 2.5], backgroundcolor="white", showgrid=True, gridcolor="#eeeeee", zeroline=False, showline=True, linecolor="#eeeeee", mirror=True, ticks="", showticklabels=False, dtick=1, tickfont=dict(family="Palatino, serif", size=14)),
                aspectmode="cube",
                row=1, col=i
            )

        fig.update_layout(
            font=dict(family="Palatino, serif", size=16),
            plot_bgcolor="white",
            paper_bgcolor="white",
            margin=dict(l=20, r=20, t=40, b=20),
            width=700, height=350,
        )
        fig.show()

# Example usage:
A3 = np.array([[1,0,0],
               [0,2,1/2],
               [0,-1,0.5]])
plot_unit_cube_and_transform(A3, name="K")

Loading...

What do you notice about the transformation defined by $L$ , and how it relates to $L$ ’s columns? (Drag the plot around to see the main point.)

L = \begin{bmatrix} 1 & 1/2 & 0 \\ 1 & 1/2 & 0 \\ 1 & 1/2 & 1 \end{bmatrix}

A3 = np.array([[1,1/2,0],
               [1,1/2,0],
               [1,1/2,1]])
plot_unit_cube_and_transform(A3, name="L")

Loading...

Since $L$ ’s columns are linearly dependent, $L$ maps the unit cube to a flat parallelogram.

The Determinant¶

It turns out that there’s a formula for

area of the parallelogram formed by transforming the unit square by a $2 \times 2$ matrix $A$
volume of the parallelepiped formed by transforming the unit cube by a $3 \times 3$ matrix $A$
in general, the $n$ -dimensional “volume” of the object formed by transforming the unit $n$ -cube by an $n \times n$ matrix $A$

Why do we care? Remember, the goal of this section is to find the inverse of a square matrix $A$ , if it exists, and the determinant will give us one way to check if it does.

In the case of the projection matrices $G$ and $H$ above, we saw that their columns were linearly dependent, and so the transformations $G$ and $H$ mapped the unit square to a line with no area. Similarly above, $L$ mapped the unit cube to a flat parallelogram with no volume. In all other transformations, the matrices’ columns were linearly independent, so the resulting object had a non-zero area (in the case of $2 \times 2$ matrices) or volume (in the case of $3 \times 3$ matrices).

So, how do we find the determinant of $A$ , denoted $\text{det}(A)$ ? Unfortunately, the formula is only convenient for $2 \times 2$ matrices.

For example, in the transformation

J = \begin{bmatrix} 1 / 3 & -1 \\ 1 & -1 / 2 \end{bmatrix}

the area of the parallelogram formed by transforming the unit square is

\text{det}(J) = \frac{1}{3}\left(-\frac{1}{2}\right) - (-1)(1) = -\frac{1}{6} + 1 = \frac{5}{6}

J = np.array([[1 / 3, -1], [1, -1 / 2]])
plot_unit_square_and_transform(J, name="J", vdeltay=0.45, vdeltax=-0.5)

Note that a determinant can be negative! Then, the absolute value of the determinant gives the area of the parallelogram. The sign of the determinant depends on the order of the columns of the matrix; swap the columns of $J$ and its determinant would be $-\frac{5}{6}$ . (If $A \vec u_x$ is “to the right” of $A \vec u_y$ , the determinant is positive, like with the standard basis vectors; if $A \vec u_x$ is “to the left” of $A \vec u_y$ , the determinant is negative.)

The determinant of an $n \times n$ matrix can be expressed recursively using a weighted sum of determinants of smaller $n-1 \times n-1$ matrices, called minors. We’ll explore this idea more when necessary, but the computation is not super important: the big idea is that the determinant of $A$ is a single number that tells us whether $A$ ’s transformation “loses a dimension” or not.

Two useful properties of the determinant are that

\text{det}(A) = \text{det}(A^T)

and

\text{det}(AB) = \text{det}(A) \text{det}(B)

if $A$ and $B$ are both $n \times n$ matrices.

Activity 2

Activity 2.1

Find the determinant of the following matrices:

$A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$
$R(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}$
(Hint: The answer does not depend on $\theta$ !)

Activity 2.2

If $A$ ’s columns are linearly dependent, then what is

\text{det}(AB)

Inverting a Transformation¶

Remember, the big goal of this section is to find the inverse of a square matrix $A$ .

Since each square matrix $A$ corresponds to a linear transformation, we can think $A^{-1}$ ( $A$ ’s inverse) as “reversing” or “undoing” the transformation.

For example, if $A$ scales vectors, $A^{-1}$ should scale by the reciprocal, so that applying $A$ and then $A^{-1}$ returns the original vector.

The simplest case involves a diagonal matrix, like

A = \begin{bmatrix} 2 & 0 \\ 0 & -1/3 \end{bmatrix}

A = np.array([[2, 0], [0, -1/3]])
plot_unit_square_and_transform(A, name="A")

To undo the effect of $A$ , we can apply the transformation

A^{-1} = \begin{bmatrix} 1/2 & 0 \\ 0 & -3 \end{bmatrix}

Many of the transformations we looked at involving $2 \times 2$ matrices are reversible, and hence the corresponding matrices are invertible. If the matrix rotates by $\theta$ , the inverse is a rotation by $-\theta$ . If the matrix shears by “dragging to the right”, the inverse is a shear by “dragging to the left”.

Another way of visualizing whether a transformation is reversible is whether given a vector (point) on the right, there is exactly one corresponding vector (point) on the left. By exactly one, I mean not 0, and not multiple.

Here, we visualize

F = \begin{bmatrix} 1 & 0 \\ -5/4 & 1 \end{bmatrix}

Given any vector $\vec b$ of the form $\vec b = F \vec x$ , there is exactly one vector $\vec x$ that satisfies this equation.

F = np.array([[1, 0], [-5 / 4, 1]])
# The point to highlight
pt = np.array([1, 2])
# Its transformed position
pt_trans = F @ pt

# Plot the transformation and get the figure object
fig = plot_unit_square_and_transform(
    F, 
    name="F", 
    vdeltay=-1.3, 
    vdeltax=0.3,
    return_fig=True
)

# Overlay the original point on the left (input) subplot
fig.add_scatter(
    x=[pt[0]], y=[pt[1]],
    mode="markers+text",
    marker=dict(size=12, color="#d81a60"),
    name="original",
    text=[f"$\\begin{{bmatrix}} {pt[0]} \\\\ {pt[1]} \\end{{bmatrix}}$"], textposition="middle right",
    textfont=dict(color="#d81a60"),
    row=1, col=1,
    showlegend=False
)
# Overlay the transformed point on the right (output) subplot
fig.add_scatter(
    x=[pt_trans[0]], y=[pt_trans[1]],
    mode="markers+text",
    marker=dict(size=12, color="#d81a60"),
    name="transformed",
    text=[f"$\\begin{{bmatrix}} {pt_trans[0]} \\\\ {pt_trans[1]} \\end{{bmatrix}}$"], textposition="middle right",
    textfont=dict(color="#d81a60"),
    row=1, col=2,
    showlegend=False
)
fig.show(renderer='png', scale=3)

On the other hand, if we look at

H = \begin{bmatrix} 1 / 2 & -1 \\ 1 & -2 \end{bmatrix}

the same does not hold true. Given any vector $\vec b \in \text{colsp}(H)$ on the right, there are infinitely many vectors $\vec x$ such that $H \vec x = \vec b$ . The vectors in pink on the left are all sent to the same vector on the right, $\color{#d81a60}{\begin{bmatrix} -1 \\\\ -2 \end{bmatrix}}$ .

H = np.array([[1 / 2, -1], [1, -2]])

# Points to highlight
pts = [np.array([0, 1]), np.array([1, 1.5]), np.array([-1, 0.5]), np.array([-2, 0])]
# Their transformed positions
pts_trans = [H @ pt for pt in pts]

# Plot the transformation and get the figure object
fig = plot_unit_square_and_transform(
    H, 
    name="H", 
    return_fig=True
)

# Overlay the original points on the left (input) subplot
for i, pt in enumerate(pts):
    fig.add_scatter(
        x=[pt[0]], y=[pt[1]],
        mode="markers+text",
        marker=dict(size=12, color="#d81a60"),
        name="original",
        text=[f"$\\begin{{bmatrix}} {pt[0]} \\\\ {pt[1]} \\end{{bmatrix}}$"] if i != 2 else "", textposition="middle right",
        textfont=dict(color="#d81a60"),
        row=1, col=1,
        showlegend=False
    )

# Overlay the transformed points on the right (output) subplot
for pt, pt_t in zip(pts, pts_trans):
    fig.add_scatter(
        x=[pt_t[0]], y=[pt_t[1]],
        mode="markers+text",
        marker=dict(size=12, color="#d81a60"),
        name="transformed",
        text=[f"$\\text{{all get sent to }} \\begin{{bmatrix}} {pt_t[0]} \\\\ {pt_t[1]} \\end{{bmatrix}}!$"], textposition="middle right",
        textfont=dict(color="#d81a60"),
        row=1, col=2,
        showlegend=False
    )

fig.add_scatter(
    x=[-2], y=[2],
    mode="markers+text",
    marker=dict(size=12, color="#004d40"),
    name="",
    text=[f"$\\text{{can't reach }} \\begin{{bmatrix}} -2 \\\\ 2 \\end{{bmatrix}}!$"], textposition="middle right",
    textfont=dict(color="#004d40"),
    row=1, col=2,
    showlegend=False
)

fig.show(renderer='png', scale=3)

And, there are no vectors on the left that get sent to $\begin{bmatrix} -2 \\\\ 2 \end{bmatrix}$ on the right. Any vector in $\mathbb{R}^2$ that isn’t on the line spanned by $\begin{bmatrix} -1 \\ -2 \end{bmatrix}$ is unreachable.

You may recall the following key ideas from discrete math:

A function is invertible if and only if it is both one-to-one (injective) and onto (surjective).
A function is one-to-one if and only if no two inputs get sent to the same output, i.e. $f(x_1) = f(x_2)$ implies $x_1 = x_2$ .
A function is onto if every element of the codomain is an output of the function, i.e. for every $y \in Y$ , there exists an $x \in X$ such that $f(x) = y$ .

The transformation represented by $H$ is neither one-to-one (because $\begin{bmatrix} 0 \\ 1 \end{bmatrix}$ and $\begin{bmatrix} 1 \\ 1.5 \end{bmatrix}$ get sent to the same point) nor onto (because $\begin{bmatrix} -2 \\ 0 \end{bmatrix}$ isn’t mapped to by any vector in $\mathbb{R}^2$ ).

In order for a linear transformation to be invertible, it must be both one-to-one and onto, i.e. it must be a bijection. Again, don’t worry if these terms seem foreign: I’ve provided them here to help build connections to other courses if you’ve taken them. If not, the rest of my coverage should still be sufficient.

Inverting a Matrix¶

The big idea I’m trying to get across is that an $n \times n$ matrix $A$ is invertible if and only if the corresponding linear transformation can be “undone”.

That is, $A$ is invertible if and only if given any vector $\vec b \in \mathbb{R}^n$ , there is exactly one vector $\vec x \in \mathbb{R}^n$ such that $A \vec x = \vec b$ . If the visual intuition from earlier didn’t make this clear, here’s another concrete example. Consider

A = \begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \\ 1 & 2 & 1 \end{bmatrix}

$\text{rank}(A) = 2$ , since $A$ ’s first two columns are scalar multiples of one another. Let’s consider two possible $\vec b$ ’s, each depicting a different case.

$\vec b = \begin{bmatrix} 3 \\ 3 \\ 3 \end{bmatrix}$ . This $\vec b$ is in $\text{colsp}(A)$ . The issue with $A$ is that there are infinitely many linear combinations of the columns of $A$ that equal this $\vec b$ .

\underbrace{\begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \\ 1 & 2 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} 2 \\ 1 \\ 0 \end{bmatrix}}_{\text{an } \vec x} = \underbrace{\begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \\ 1 & 2 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} -5 \\ 4 \\ 0 \end{bmatrix}}_{\text{another } \vec x} = ... = \underbrace{\begin{bmatrix} 3 \\ 3 \\ 3 \end{bmatrix}}_{\vec b}

$\vec b = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}$ . This $\vec b$ is not in $\text{colsp}(A)$ , meaning there is no $\vec x \in \mathbb{R}^3$ such that $A \vec x = \vec b$ .

\underbrace{\begin{bmatrix} 1 & 2 & 0 \\ 1 & 2 & 0 \\ 1 & 2 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}}_{\text{no such } \vec x} = \underbrace{\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}}_{\vec b}

But, if $A$ ’s columns were linearly independent, they’d span all of $\mathbb{R}^3$ , and so $A \vec x = \vec b$ would have a unique solution $\vec x$ for any $\vec b$ we think of.

Definition¶

Definition: Inverse of a Matrix

An $n \times n$ matrix $A$ is invertible if and only if $A$ ’s columns are linearly independent.

The sentence above is enough to define invertibility, but there are plenty of other equivalent conditions that guarantee invertibility:

$\text{rank}(A) = n$ (i.e., $A$ is full rank)
$A$ ’s columns are linearly independent (and hence $\text{colsp}(A) = \mathbb{R}^n$ )
$A$ ’s rows are linearly independent (and hence $\text{rowsp}(A) = \text{colsp}(A^T) = \mathbb{R}^n$ )
$A$ ’s null space contains only the zero vector
$\text{det}(A) \neq 0$

If one of the above properties hold, they all hold. If one doesn’t, they all don’t.

If $A$ is invertible, its inverse $A^{-1}$ is the unique $n \times n$ matrix such that

A A^{-1} = I = A^{-1} A

If $A$ is not invertible, we say that $A$ is singular.

The properties in the box above are sometimes together called the invertible matrix theorem. This is not an exhaustive list, either, and we’ll see other equivalent properties as time goes on.

Inverse of a $2 \times 2$ Matrix¶

As was the case with the determinant, the general formula for the inverse of a matrix is only convenient for $2 \times 2$ matrices.

You could solve for this formula by hand, by finding scalars $e$ , $f$ , $g$ , and $h$ such that

\underbrace{\begin{bmatrix} a & b \\ c & d \end{bmatrix}}_A \begin{bmatrix} e & f \\ g & h \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

Note that the formula above involves division by $ad - bc$ . If $ad - bc = 0$ , then $A$ is not invertible, but $ad - bc$ is just the determinant of $A$ !

Let’s test it out on some examples. If

A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}

then

A^{-1} = \frac{1}{1 \cdot 4 - 2 \cdot 3} \begin{bmatrix} 4 & -2 \\ -3 & 1 \end{bmatrix} = \frac{1}{-2} \begin{bmatrix} 4 & -2 \\ -3 & 1 \end{bmatrix} = \begin{bmatrix} -2 & 1 \\ 3/2 & -1/2 \end{bmatrix}

and indeed both

\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} -2 & 1 \\ 3/2 & -1/2 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

and

\begin{bmatrix} -2 & 1 \\ 3/2 & -1/2 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

hold.

On the other hand,

B = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}

is not invertible, since its columns are linearly dependent.

Activity 3

Suppose $A$ is an invertible $n \times n$ matrix.

Is $A^T$ invertible? If so, what is its inverse?
What is $\text{det}(A^{-1})$ ?
Is $A^2$ invertible? If so, what is its inverse?

Beyond $2 \times 2$ Matrices¶

For matrices larger than $2 \times 2$ , the calculation of the inverse is not as straightforward; there’s no simple formula. In the $3 \times 3$ case, we’d need to find 9 scalars $c_{ij}$ such that

\underbrace{\begin{bmatrix} 3 & 7 & 1 \\ -2 & 5 & 0 \\ 4 & 2 & 0 \end{bmatrix}}_A \underbrace{\begin{bmatrix} {\color{#3d81f6}c_{11}} & {\color{orange}c_{12}} & {\color{#d81a60}c_{13}} \\ {\color{#3d81f6}c_{21}} & {\color{orange}c_{22}} & {\color{#d81a60}c_{23}} \\ {\color{#3d81f6}c_{31}} & {\color{orange}c_{32}} & {\color{#d81a60}c_{33}} \end{bmatrix}}_{C = A^{-1}} = \underbrace{\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}}_{I}

This involves solving a system of 3 equations in 3 unknowns, 3 times – one per column of the identity matrix. One such system is

\begin{align*} 3 {\color{#3d81f6}c_{11}} + 7 {\color{#3d81f6}c_{21}} + {\color{#3d81f6}c_{31}} &= 1 \\ -2 {\color{#3d81f6}c_{11}} + 5 {\color{#3d81f6}c_{21}} &= 0 \\ 4 {\color{#3d81f6}c_{11}} + 2 {\color{#3d81f6}c_{21}} &= 0 \end{align*}

You can quickly see how this becomes a pain to solve by hand. Instead, we can use one of two strategies:

Using row reduction, also known as Gaussian elimination, which is an efficient method for solving systems of linear equations without needing to write out each equation explicitly. Row reduction can be used to both find the rank and inverse of a matrix, among other things. More traditional linear algebra courses spend a considerable amount of time on this concept, though I’ve intentionally avoided it in this course to instead spend time on conceptual ideas most relevant to machine learning. That said, you’ll get some practice with it in a future homework.
Using a pre-built function in numpy that does the row reduction for us.

At the end of this section, I give you some advice on how to (and not to) compute the inverse of a matrix in code.

More Examples¶

As we’ve come to expect, let’s work through some examples that illustrate important ideas.

Example: Inverting a Product¶

Suppose $A$ and $B$ are both invertible $n \times n$ matrices. Is $AB$ invertible? If so, what is its inverse?
Suppose $A = BC$ , and $A$ , $B$ , and $C$ are all invertible $n \times n$ matrices. What is the inverse of $B$ ?

Example: Inverting a Sum¶

Suppose $A$ and $B$ are both invertible $n \times n$ matrices. Is $A + B$ invertible? If so, what is its inverse?

Example: Inverting $X^TX$ ¶

Suppose $X$ is an $n \times d$ matrix. Note that $X$ is not square, and so it is not invertible. However, $X^TX$ is a square matrix, and so it is possible that it is invertible.

Explain why $X^TX$ is invertible if and only if $X$ ’s columns are linearly independent.

Example: Orthogonal Matrices¶

Recall, an $n \times n$ matrix $Q$ is orthogonal if $Q^T Q = QQ^T = I$ .

What is the inverse of $Q$ ? Explain how this relates to the formula for rotation matrices in $\mathbb{R}^2$ from earlier,

R(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}

Computing the Inverse in Code¶

I’d like to comment on the computational aspect of finding inverses. Remember that if $A$ is an $n \times n$ matrix, then

A \vec x = \vec b

is a system of $n$ equations in $n$ unknowns.

\underbrace{\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}}_{\vec x} = \underbrace{\begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{bmatrix}}_{\vec b}

One of the big (conceptual) usages of the inverse is to solve such a system. If $A$ is invertible, then we can solve for $\vec x$ by multiplying both sides on the left by $A^{-1}$ :

A^{-1} A \vec x = A^{-1} \vec b \implies \vec x = A^{-1} \vec b

But, in practice, finding $A^{-1}$ is less efficient and more prone to floating point errors than solving the system of $n$ equations in $n$ unknowns directly for the specific $\vec b$ we care about.

Solving $A \vec x = \vec b$ involves solving just one system of $n$ equations in $n$ unknowns.
Finding $A^{-1}$ involves solving a system of $n$ equations in $n$ unknowns, $n$ times! Each one has the same coefficient matrix $A$ but a different right-hand side $\vec b$ , corresponding to the columns of the identity matrix (which are the standard basis vectors in $\mathbb{R}^n$ ).

The more floating point operations we need to do, the more error is introduced into the final results.

An illustrative example is to come!

Motivation: What is an Inverse?¶

Scalar Addition and Multiplication¶

Matrices¶

Linear Transformations¶

From Rn\mathbb{R}^nRn to Rn\mathbb{R}^nRn¶

Scaling¶

Rotations and Orthogonal Matrices¶

Composing Transformations¶

Shears¶

Projections¶

Arbitrary Matrices¶

The Determinant¶

Inverting a Transformation¶

Inverting a Matrix¶

Definition¶

Inverse of a 2×22 \times 22×2 Matrix¶

Beyond 2×22 \times 22×2 Matrices¶

More Examples¶

Example: Inverting a Product¶

Example: Inverting a Sum¶

Example: Inverting XTXX^TXXTX¶

Example: Orthogonal Matrices¶

Computing the Inverse in Code¶

From $\mathbb{R}^n$ to $\mathbb{R}^n$ ¶

Inverse of a $2 \times 2$ Matrix¶

Beyond $2 \times 2$ Matrices¶

Example: Inverting $X^TX$ ¶