Skip to article frontmatterSkip to article content

2.7. Matrices

Introduction

In Chapters 2.4 through 2.6, we focused on understanding the subspace of Rn\mathbb{R}^n spanned by a set of vectors. I’ve been alluding to the fact that to solve the regression problem in higher dimensions, we’d need to be able to project a vector onto the span of a set of vectors. (To be clear, we haven’t done this yet.) Matrices will allow us to achieve this goal elegantly. It will take a few sections to get there:

Just remember that all of this is for machine learning.

For now, let’s consider the following three vectors in R4\mathbb{R}^4:

v1=[3202],v2=[1112],v3=[4900]\vec v_1 = \begin{bmatrix} 3 \\ 2 \\ 0 \\ 2 \end{bmatrix}, \quad \vec v_2 = \begin{bmatrix} 1 \\ 1 \\ -1 \\ -2 \end{bmatrix}, \quad \vec v_3 = \begin{bmatrix} 4 \\ 9 \\ 0 \\ 0 \end{bmatrix}

If we stack these vectors horizontally, we produce a matrix:

[v1v2v3]=[314219010220]\begin{bmatrix} \mid & \mid & \mid \\ \vec v_1 & \vec v_2 & \vec v_3 \\ \mid & \mid & \mid \end{bmatrix} = \begin{bmatrix} 3 & 1 & 4 \\ 2 & 1 & 9 \\ 0 & -1 & 0 \\ 2 & -2 & 0 \end{bmatrix}

AA is a matrix with 4 rows and 3 columns, so we might say that AA is a 4×34 \times 3 matrix, or that AR4×3A \in \mathbb{R}^{4 \times 3}.

It’s common to use the notation AijA_{ij} to denote the entry in the ii-th row and jj-th column of AA, e.g. A23=9A_{23} = 9.

In numpy, you can access the entry in the 2nd row and 3rd column of the 2D matrix A using A[1, 2], once we account for the fact that numpy uses 0-based indexing.

I’ve introduced matrices in a very particular way, because I’d like for you to think of them as a collection of vectors, sometimes called column vectors. We’ll come back to this in just a moment. You can also think of a matrix as a collection of row vectors; the example AA defined above consists of four row vectors in R3\mathbb{R}^3 stacked horizontally.

When we introduced vectors, nn was typically the placeholder we’d use for the number of components of a vector. With matrices, I like using nn for the number of rows, and dd for the number of columns. This means that, in general, a matrix is of shape n×dn \times d and is a member of the set Rn×d\mathbb{R}^{n \times d}. This makes clear that when we’ve collected data, we store each of our nn observations in a row of our matrix. Just be aware that using n×dn \times d to denote the shape of a matrix is a little unorthodox; most other textbooks will use m×nm \times n to denote the shape of a matrix. Of course, this is all arbitrary.

Suppose AA has nn rows and dd columns, i.e. ARn×dA \in \mathbb{R}^{n \times d}.

  • If n>dn > d, we say that AA is tall.

  • If n=dn = d, we say that AA is square.

  • If n<dn < d, we say that AA is wide.

[53211133601]tall[3044π01201]square[1032119300290063]wide\underbrace{\begin{bmatrix} 5 & 3 \\ 2 & 1 \\ -1 & \frac{1}{3} \\ 3 & 6 \\ 0 & 1 \end{bmatrix}}_{\text{tall}} \qquad \underbrace{\begin{bmatrix} 3 & 0 & 4 \\ 4 & -\pi & 0 \\ \frac{1}{2} & 0 & 1 \end{bmatrix}}_{\text{square}} \qquad \underbrace{\begin{bmatrix} 1 & 0 & 3 & 2 & -1 \\ \frac{1}{9} & 3 & 0 & 0 & 2 \\ 9 & 0 & 0 & 6 & -3 \end{bmatrix}}_{\text{wide}}

As we’ll see in the coming sections, square matrices are the most flexible – there are several properties and operations that are only defined for them. Unfortunately, not every matrix is square. Remember, matrices are used for storing data, and the number of observations, nn and number of features, dd, don’t need to be the same. (In practice, we’ll often have very tall matrices, i.e. ndn \gg d, but this is not always the case.)


Addition and Scalar Multiplication

Like vectors, matrices support addition and scalar multiplication out-of-the-box, and both behave as you’d expect.

Let’s see an example. Consider the matrices AA and BB (where AA is the same as in the previous example):

A=[314219010220],B=[123456789101112]A = \begin{bmatrix} 3 & 1 & 4 \\ 2 & 1 & 9 \\ 0 & -1 & 0 \\ 2 & -2 & 0 \end{bmatrix}, \quad B = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ 10 & 11 & 12 \end{bmatrix}

Then, the operation 3AB3A - B is well-defined, and produces the matrix

3AB=3[314219010220][123456789101112]=[8192221711941712]3A - B = 3 \begin{bmatrix} 3 & 1 & 4 \\ 2 & 1 & 9 \\ 0 & {\color{#3d81f6} \mathbf{-1}} & 0 \\ 2 & -2 & 0 \end{bmatrix} - \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & \color{#3d81f6} \mathbf{8} & 9 \\ 10 & 11 & 12 \end{bmatrix} = \begin{bmatrix} 8 & 1 & 9 \\ 2 & -2 & 21 \\ -7 & \boxed{\color{#3d81f6} \mathbf{-11}} & -9 \\ -4 & -17 & -12 \end{bmatrix}

I’ve colored the entry at position (3,2)(3, 2) blue\color{#3d81f6} \text{blue} to help you trace the computation.

3(1)8=113 \cdot (-1) - 8 = -11

Matrix-Vector Multiplication

Great – we know how to add two matrices, and how to multiply a matrix by a scalar. The natural next step is to figure out how – and why – to multiply two matrices together.

First, a definition.

Let’s use the Golden Rule. First – as the title of this subsection suggests – we’ll start by computing the product of a matrix and a vector.

Let’s suppose AA is the same 4×34 \times 3 matrix we’ve become familiar with:

A=[314219010220]A = \begin{bmatrix} 3 & 1 & 4 \\ 2 & 1 & 9 \\ 0 & -1 & 0 \\ 2 & -2 & 0 \end{bmatrix}

And let’s suppose x\vec x is some vector. Note that we can think of an nn-dimensional vector as a matrix with nn rows and 1 column. In order for the product AxA \vec x to be valid, x\vec x must have 3 elements in it, by the Golden Rule above. To make the example concrete, let’s suppose:

x=[103]\vec x = \begin{bmatrix} 1 \\ 0 \\ 3 \end{bmatrix}

How do we multiply AxA \vec x? The following key definition will help us.

Let me say a little more about the dimensions of the output. Below, I’ve written the dimensions of AA (4×34 \times 3) and x\vec x (3×13 \times 1) next to each other. By the Golden Rule, the inner dimensions, both of which are bolded, must be equal in order for the multiplication to be valid. The dimensions of the output will be the result of looking at the outer dimensions\boxed{\text{outer dimensions}}, which here are 4×14 \times 1.

A4×3x3×1=output4×1\underbrace{A}_{\boxed{4} \times \textbf{3}} \:\:\:\: \underbrace{\vec x}_{\textbf{3} \times \boxed{1}} = \underbrace{\text{output}}_{4 \times 1}

So, the result of multiplying AxA \vec x will be 4×14 \times 1 matrix, or in other words, a vector in R4\mathbb{R}^4. Indeed, the result of multiply a matrix by a vector always results in another vector, and this act of multiplying a matrix by a vector is often thought of as transforming the vector from Rd\mathbb{R}^d to Rn\mathbb{R}^n.

So, how do we find those 4 components? As mentioned earlier, we compute each component by taking the dot product of a row in AA with x\vec x.

A=[314219010220],x=[103]A = \begin{bmatrix} \color{#3d81f6} \mathbf{3} & \color{#3d81f6} \mathbf{1} & \color{#3d81f6} \mathbf{4} \\ {\color{orange} \mathbf{2}} & {\color{orange} \mathbf{1}} & {\color{orange} \mathbf{9}} \\ {\color{#d81b60}\mathbf{0}} & {\color{#d81b60} \mathbf{-1}} & {\color{#d81b60} \mathbf{0}} \\ {\color{#004d40} \mathbf{2}} & {\color{#004d40} \mathbf{-2}} & {\color{#004d40} \mathbf{0}} \end{bmatrix}, \quad \vec x = \begin{bmatrix} 1 \\ 0 \\ 3 \end{bmatrix}

Let’s start with the top row of AA, [314]\color{#3d81f6}\begin{bmatrix} 3 \\ 1 \\ 4\end{bmatrix}. The dot product of two vectors is only defined if they have equal lengths. This is why we’ve instituted the Golden Rule! The Golden Rule tells us we can only multiply AA and x\vec x if the number of columns in AA is the same as the number of components in x\vec x, which is true here (both of those numbers are 3).

Then, the dot product of the first row of AA with x\vec x is:

[314][103]=31+10+43=15{\color{#3d81f6}\begin{bmatrix} \mathbf{3} \\ \mathbf{1} \\ \mathbf{4} \end{bmatrix}} \cdot \begin{bmatrix} 1 \\ 0 \\ 3 \end{bmatrix} = {\color{#3d81f6} \mathbf{3}} \cdot 1 + {\color{#3d81f6} \mathbf{1}} \cdot 0 + {\color{#3d81f6} \mathbf{4}} \cdot 3 = 15

Nice! We’re a quarter of the way there. Now, we just need to compute the remaining three dot products:

  • The dot product of the second row of AA with x\vec x is [219][103]=29{\color{orange}\begin{bmatrix} \mathbf{2} \\ \mathbf{1} \\ \mathbf{9} \end{bmatrix}} \cdot \begin{bmatrix} 1 \\ 0 \\ 3 \end{bmatrix} = 29.

  • The dot product of the third row of AA with x\vec x is [010][103]=0{\color{#d81a60}\begin{bmatrix} \mathbf{0} \\ \mathbf{-1} \\ \mathbf{0} \end{bmatrix}} \cdot \begin{bmatrix} 1 \\ 0 \\ 3 \end{bmatrix} = 0.

  • And finally, the dot product of the fourth row of AA with x\vec x is [220][103]=2{\color{#004d40}\begin{bmatrix} \mathbf{2} \\ \mathbf{-2} \\ \mathbf{0} \end{bmatrix}} \cdot \begin{bmatrix} 1 \\ 0 \\ 3 \end{bmatrix} = 2.

The result of our matrix-vector multiplication, then, is the result of stacking all 4 dot products together into the vector [152902]\begin{bmatrix} {\color{#3d81f6} \mathbf{15}} \\ {\color{orange} \mathbf{29}} \\ {\color{#d81b60} \mathbf{0}} \\ {\color{#004d40} \mathbf{2}} \end{bmatrix}. To summarize:

Ax=[314219010220][103]=[152902]\boxed{A \vec x = \begin{bmatrix} {\color{#3d81f6} \mathbf{3}} & {\color{#3d81f6} \mathbf{1}} & {\color{#3d81f6} \mathbf{4}} \\ {\color{orange} \mathbf{2}} & {\color{orange} \mathbf{1}} & {\color{orange} \mathbf{9}} \\ {\color{#d81b60}\mathbf{0}} & {\color{#d81b60} \mathbf{-1}} & {\color{#d81b60} \mathbf{0}} \\ {\color{#004d40} \mathbf{2}} & {\color{#004d40} \mathbf{-2}} & {\color{#004d40} \mathbf{0}} \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 3 \end{bmatrix} = \begin{bmatrix} {\color{#3d81f6} \mathbf{15}} \\ {\color{orange} \mathbf{29}} \\ {\color{#d81b60} \mathbf{0}} \\ {\color{#004d40} \mathbf{2}} \end{bmatrix}}

Refer to this example if you’re ever confused on how to multiply a matrix by a vector.

We can’t visualize the output vector as it’s in 4 dimensions, but we can look at the 4 row vectors of AA and the vector x\vec x in 3D space.

Loading...

Above, you’ll notice that x\color{purple} \vec x and the third row vector, [010]\color{#d81b60} \begin{bmatrix} \mathbf{0} \\ \mathbf{-1} \\ \mathbf{0} \end{bmatrix}, are orthogonal (rotate the plot so that you can see this), so the third component in

Ax=[152902]A \vec x = \begin{bmatrix} {\color{#3d81f6} \mathbf{15}} \\ {\color{orange} \mathbf{29}} \\ {\color{#d81b60} \mathbf{0}} \\ {\color{#004d40} \mathbf{2}} \end{bmatrix}

is 0{\color{#d81b60} \mathbf{0}}.


The Linear Combination Interpretation

We’ve described matrix-vector multiplication as the result of taking the dot product of each row of AA with x\vec x, and indeed this is the easiest way to actually compute the output. But, there’s another more important interpretation. In the above dot products, you may have noticed:

  • Entries in the first column of AA (3, 2, 0, and 2) were always multiplied by the first element of x\vec x (1).

  • Entries in the second column of AA (1, 1, -1, and -2) were always multiplied by the second element of x\vec x (0).

  • Entries in the third column of AA (4, 9, 0, and 0) were always multiplied by the third element of x\vec x (3).

In other words:

Ax=[314219010220][103]=1[3202]+0[1112]+3[4900]linear combination of columns of A=[152902]A \vec x = \begin{bmatrix} 3 & 1 & 4 \\ 2 & 1 & 9 \\ 0 & -1 & 0 \\ 2 & -2 & 0 \end{bmatrix} \begin{bmatrix} \color{orange} \mathbf{1} \\ \color{orange} \mathbf{0} \\ \color{orange} \mathbf{3} \end{bmatrix} = \underbrace{{\color{orange} \mathbf{1}} \begin{bmatrix} 3 \\ 2 \\ 0 \\ 2 \end{bmatrix} + {\color{orange} \mathbf{0}} \begin{bmatrix} 1 \\ 1 \\ -1 \\ -2 \end{bmatrix} + {\color{orange} \mathbf{3}} \begin{bmatrix} 4 \\ 9 \\ 0 \\ 0 \end{bmatrix}}_{\color{orange} \text{linear combination of columns of A}} = \begin{bmatrix} 15 \\ 29 \\ 0 \\ 2 \end{bmatrix}

At the start of this section, we defined AA by stacking the vectors v1\vec v_1, v2\vec v_2, and v3\vec v_3 side-by-side, and I told you to think of a matrix as a collection of column vectors. The above result is precisely why – it’s because when we multiply AA by x\vec x, we’re computing a linear combination of the columns of AA, where the weights are the components of x\vec x!

Since AxA \vec x produces a linear combination of the columns of AA, a natural question to ask at this point is whether the columns of AA are all linearly independent. AA only has 3 columns, each of which is in R4\mathbb{R}^4, so while they may or may not be linearly independent (are they?), we know they cannot span all of R4\mathbb{R}^4, as we’d need at least 4 vectors to reach every element in R4\mathbb{R}^4.

This is the type of thinking we’ll return to in Chapter 2.8. This will lead us to define the rank of a matrix, perhaps the single most important number associated with a matrix.


Matrix Multiplication

Matrix-matrix multiplication – or just “matrix multiplication” – is a generalization of matrix-vector multiplication. Let’s present matrix multiplication in its most general terms.

Definition

Note that if p=1p = 1, this reduces to the matrix-vector multiplication case from before. In that case, the only possible value of jj is 1, since the output only has 1 column, and the element in row ii of the output vector is the dot product of row ii in AA and the vector BB (which we earlier referred to as x\vec x in the less general case).

For a concrete example, suppose AA and BB are defined below:

A=[314219010220]B=[120732]A = \begin{bmatrix} 3 & 1 & 4 \\ 2 & 1 & 9 \\ 0 & -1 & 0 \\ 2 & -2 & 0 \end{bmatrix} \quad B = \begin{bmatrix} 1 & 2\\ 0 & 7\\ 3 & 2 \end{bmatrix}

The number of columns of AA must equal the number of rows of BB in order for the product ABAB to be defined, as the Golden Rule tells us. That is fortunately the case here. Since AA has shape 4×3\boxed{4} \times 3 and BB has shape 3×23 \times \boxed{2}, the output matrix will have shape 4×24 \times 2. Each of those 42=84 \cdot 2 = 8 elements will be the dot product of a row in AA with a column in BB.

Here is the product of AA and BB:

AB=[314219010220][120732]=[1521292907210]AB = \begin{bmatrix} {\color{#3d81f6} \mathbf{3}} & {\color{#3d81f6} \mathbf{1}} & {\color{#3d81f6} \mathbf{4}} \\ {\color{orange} \mathbf{2}} & {\color{orange} \mathbf{1}} & {\color{orange} \mathbf{9}} \\ {\color{#d81b60}\mathbf{0}} & {\color{#d81b60} \mathbf{-1}} & {\color{#d81b60} \mathbf{0}} \\ {\color{#004d40} \mathbf{2}} & {\color{#004d40} \mathbf{-2}} & {\color{#004d40} \mathbf{0}} \end{bmatrix} \begin{bmatrix} 1 & 2\\ 0 & 7\\ 3 & 2 \end{bmatrix} = \begin{bmatrix} {\color{#3d81f6} \mathbf{15}} & \color{#3d81f6} \mathbf{21} \\ \color{orange} \mathbf{29} & \color{orange} \mathbf{29} \\ \color{#d81b60} \mathbf{0} & \color{#d81b60} \mathbf{-7} \\ \color{#004d40} \mathbf{2} & \color{#004d40} \mathbf{-10} \end{bmatrix}

Let’s see if we can audit where these numbers came from. Let’s consider (AB)32(AB)_{32}, which is the element in row 3 and column 2 of the output. It should have come from the dot product of row 3 of AA and column 2 of BB.

AB=[314219010220][120732]=[1521292907210]AB = \begin{bmatrix} {\color{#3d81f6} \mathbf{3}} & {\color{#3d81f6} \mathbf{1}} & {\color{#3d81f6} \mathbf{4}} \\ {\color{orange} \mathbf{2}} & {\color{orange} \mathbf{1}} & {\color{orange} \mathbf{9}} \\ {\color{#d81b60}\boxed{\mathbf{0}}} & {\color{#d81b60} \boxed{\mathbf{-1}}} & {\color{#d81b60} \boxed{\mathbf{0}}} \\ {\color{#004d40} \mathbf{2}} & {\color{#004d40} \mathbf{-2}} & {\color{#004d40} \mathbf{0}} \end{bmatrix} \begin{bmatrix} 1 & \boxed{2}\\ 0 & \boxed{7}\\ 3 & \boxed{2} \end{bmatrix} = \begin{bmatrix} {\color{#3d81f6} \mathbf{15}} & \color{#3d81f6} \mathbf{21} \\ \color{orange} \mathbf{29} & \color{orange} \mathbf{29} \\ \color{#d81b60} \mathbf{0} & \color{#d81b60} \boxed{\mathbf{-7}} \\ \color{#004d40} \mathbf{2} & \color{#004d40} \mathbf{-10} \end{bmatrix}

And indeed, -7 is the dot product of [010]\color{#d81b60}\begin{bmatrix} \mathbf{0} \\ \mathbf{-1} \\ \mathbf{0} \end{bmatrix} and [272]\begin{bmatrix} 2 \\ 7 \\ 2 \end{bmatrix}.

You should notice that many of the numbers in the output ABAB look familiar. That’s because we used the same AA as we did earlier in the section, and the first column of BB is just x\vec x from the matrix-vector example. So, the first column in ABAB is the same as the vector Ax=[152902]A \vec x = \begin{bmatrix} 15 \\ 29 \\ 0 \\ 2\end{bmatrix} as we computed earlier. The difference now is that the output ABAB isn’t just a single vector, but is a matrix with 2 columns. The second column, [2129710]\begin{bmatrix} 21 \\ 29 \\ -7 \\ -10\end{bmatrix}, comes from multiplying AA by the second column in BB, [272]\begin{bmatrix} 2 \\ 7 \\ 2\end{bmatrix}.

Note that as we add columns to BB, we’d add columns to the output. If BB had 10 columns, then ABAB would have 10 columns, too, without AA needing to change. As long as the Golden Rule – that the number of columns in AA equals the number of rows in BB – holds, the product ABAB can be computed, and it has shape (number of rows in A)×(number of columns in B)(\text{number of rows in } A) \times (\text{number of columns in } B).

Properties

The first two properties – associativity and distributivity – match standard arithmetic properties that we’ve become accustomed to. The associative property allows you to, for example, compute ABxAB \vec x only by using matrix-vector multiplications, since you can first multiply BxB \vec x, which results in a vector, and then multiply AA by that vector. (I had you do this in Activity 2 earlier in this section – I hope you did it! 🧐)

The fact that matrix multiplication is not commutative may come as a surprise, as every other form of multiplication you’ve learned about up until this point has been commutative (including the dot product).

In fact, if ABAB exists, BABA may or may not! If AA is n×dn \times d and BB is d×pd \times p, then BABA only exists if n=pn = p. But even then, ABBAAB \neq BA in general.

For example, if AA is 2×32 \times 3 and BB is 3×23 \times 2, then ABAB is 2×22 \times 2 and BABA is 3×33 \times 3; here, both products exist, but they cannot be equal since they have different shapes.

Even if AA and BB are both square matrices with the same shape, ABBAAB \neq BA in general. For illustration, consider:

A=[1234],B=[5678]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}

Then,

AB=[19224350]BA=[23343146]AB = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} \neq BA = \begin{bmatrix} 23 & 34 \\ 31 & 46 \end{bmatrix}

Computation

I’ve shown you the naïve – and by far most common – algorithm for matrix multiplication. If AA and BB are both square n×nn \times n matrices, then the runtime of the naïve algorithm is O(n3)O(n^3).

However, there exist more efficient algorithms for matrix multiplication. Strassen’s algorithm is one such example; it describes how to multiply two square n×nn \times n matrices in O(n2.807)O(n^{2.807}) time. The study of efficient algorithms for matrix multiplication is an active area of research; if you’re interested in learning more, look here.

Matrix multiplication, I might argue, is one of the reasons NVIDIA is the most valuable company in the world. Modern machine learning is built on matrix multiplication, and GPUs are optimized for it. Why? This comment from Reddit does a good job of explaining:

Imagine you have 1 million math assignments to do, they are very simple assignments, but there are a lot that need to be done, they are not dependent on each other so they can be done on any order.

You have two options, distribute them to 10 thousand people to do it in parallel or give them to 10 math experts. The experts are very fast, but hey, there are only 10 of them, the 10 thousand are more suitable for the task because they have the “brute force” for this.

GPUs have thousands of cores, CPUs have tens.

On that note, the @ operator in numpy is used for matrix multiplication; it is a shorthand for np.matmul. You can also use it to multiply a matrix by a vector.


The Transpose

There’s an important operation on matrices that we haven’t discussed yet.

Definition

To illustrate, let’s start with our familiar matrix AA:

A=[314219010220]A = \begin{bmatrix} 3 & {\color{#3d81f6} \mathbf{1}} & 4 \\ 2 & {\color{#3d81f6} \mathbf{1}} & 9 \\ 0 & {\color{#3d81f6} \mathbf{-1}} & 0 \\ 2 & {\color{#3d81f6} \mathbf{-2}} & 0 \end{bmatrix}

The transpose of AA is:

AT=[320211124900]A^T = \begin{bmatrix} 3 & 2 & 0 & 2 \\ {\color{#3d81f6} \mathbf{1}} & {\color{#3d81f6} \mathbf{1}} & {\color{#3d81f6} \mathbf{-1}} & {\color{#3d81f6} \mathbf{-2}} \\ 4 & 9 & 0 & 0 \end{bmatrix}

Note that AR4×3A \in \mathbb{R}^{4 \times 3} and ATR3×4A^T \in \mathbb{R}^{3 \times 4}.

Why would we ever need to do this? To illustrate, suppose u=[u1u2u3u4]\vec u = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix}, and that we’d like to compute the product ATuA^T \vec u. (Note that u\vec u must be in R4\mathbb{R}^4 in order for ATuA^T \vec u to be defined, unlike xR3\vec x \in \mathbb{R}^3 in the product AxA \vec x). Then:

ATu=[320211124900][u1u2u3u4]=u1[314]+u2[219]+u3[010]+u4[220]\begin{align*}A^T \vec u &= \begin{bmatrix} 3 & 2 & 0 & 2 \\ 1 & 1 & -1 & -2 \\ 4 & 9 & 0 & 0 \end{bmatrix} \begin{bmatrix} {\color{orange} \mathbf{u_1}} \\ {\color{orange} \mathbf{u_2}} \\ {\color{orange} \mathbf{u_3}} \\ {\color{orange} \mathbf{u_4}} \end{bmatrix} \\ &= {\color{orange} \mathbf{u_1}} \begin{bmatrix} 3 \\ 1 \\ 4 \end{bmatrix} + {\color{orange} \mathbf{u_2}} \begin{bmatrix} 2 \\ 1 \\ 9 \end{bmatrix} + {\color{orange} \mathbf{u_3}} \begin{bmatrix} 0 \\ -1 \\ 0 \end{bmatrix} + {\color{orange} \mathbf{u_4}} \begin{bmatrix} 2 \\ -2 \\ 0 \end{bmatrix} \end{align*}

This is a linear combination of the rows of AA, where the weights are the components of u\vec u. Remember, the standard product AxA \vec x is a linear combination of the columns of AA, so the transpose helps us if we want to compute a linear combination of the rows of AA. (Equivalently, it helps us if we want to compute the dot product of the columns of AA with u\vec u – see the “Two Pictures” note from earlier in this chapter.)

The transpose also gives us another way of expressing the dot product of two vectors. If u\color{orange} \vec u and v\color{#3d81f6} \vec v are two vectors in Rn\mathbb{R}^n, then uT\color{orange} \vec u^T is a row vector with 1 row and nn columns. Multiplying uT\color{orange} \vec u^T by v\color{#3d81f6} \vec v results in a 1×11 \times 1 matrix, which is just the scalar uv{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}.

uTv=[u1u2un][v1v2vn]=u1v1+u2v2++unvn=uv=vu=vTu\vec {\color{orange}u}^T \vec{\color{#3d81f6}v} = \begin{bmatrix} {\color{orange}u_1} & {\color{orange}u_2} & \ldots & {\color{orange}u_n} \end{bmatrix} \begin{bmatrix}{\color{#3d81f6}v_1} \\{\color{#3d81f6}v_2} \\ \vdots \\{\color{#3d81f6}v_n} \end{bmatrix} = {\color{orange}u_1}{\color{#3d81f6}v_1} + {\color{orange}u_2}{\color{#3d81f6}v_2} + \ldots + {\color{orange}u_n}{\color{#3d81f6}v_n} = \vec {\color{orange}u} \cdot \vec{\color{#3d81f6}v} = \vec{\color{#3d81f6}v} \cdot \vec {\color{orange}u} = \vec{\color{#3d81f6}v}^T \vec {\color{orange}u}

The benefit of using the transpose to express the dot product is that it allows us to write the dot product of two vectors in terms of matrix multiplication, rather than being an entirely different type of operation. (In fact, as we’ve seen here, matrix multiplication is just a generalization of the dot product.)

There are other uses for the transpose, too, so it’s a useful tool to have in your toolbox.

Properties

The first three properties are relatively straightforward. The last property is a bit more subtle. Try and reason as to why it’s true on your own, then peek into the box below to verify your reasoning and to see an example.

The fact that (AB)T=BTAT(AB)^T = B^T A^T comes in handy when finding the norm of a matrix-vector product. If AA is an n×dn \times d matrix and xRd\vec x \in \mathbb{R}^d, then:

Ax2=(Ax)T(Ax)=xTATAx\lVert A \vec x \rVert^2 = (A \vec x)^T (A \vec x) = \vec x^T A^T A \vec x

As we’ll soon see, some matrices AA have special properties that make this computation particularly easy.

In numpy, the T attribute is used to compute the transpose of a 2D array.


The Identity Matrix

Saying “the identity matrix” is a bit ambiguous, as there are infinitely many identity matrices – there’s a 1×11 \times 1 identity matrix, a 2×22 \times 2 identity matrix, a 3×33 \times 3 identity matrix, and so on. Often, the dimension of the identity matrix is implied by context, and if not, we might provide it as a subscript, e.g. InI_n for the n×nn \times n identity matrix.

Why is the identity matrix defined this way? It’s the matrix equivalent of the number 1 in scalar multiplication, also known as the multiplicative identity. If cc is a scalar, then c1=cc \cdot 1 = c and 1c=c1 \cdot c = c. (0 is known as the additive identity in scalar multiplication.)

Similarly, if AA is square n×nn \times n matrix and xRn\vec x \in \mathbb{R}^n is a vector, then the n×nn \times n identity matrix II is the unique matrix that satisfies:

  • Ix=xI \vec x = \vec x for all xRn\vec x \in \mathbb{R}^n.

  • IA=AI=AI A = A I = A for all ARn×nA \in \mathbb{R}^{n \times n}.

A good exercise is to verify that the identity matrix satisfies these properties.


Preview: Transformations

This section was relatively mechanical, and I didn’t spend much time explaining why we’d multiply two matrices (or a matrix and a vector) together. More context for this operation will come throughout the rest of the chapter.

To conclude, I want to show you some of the magic behind matrix multiplication.

Consider the relatively innocent looking 2×22 \times 2 matrix

A=[32121232]A = \begin{bmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2} \\[6pt] \frac{1}{2} & \frac{\sqrt{3}}{2} \end{bmatrix}

Below, you’ll see that I’ve drawn out six vectors in R2\mathbb{R}^2.

  • u=[32]\color{orange} \vec u = \begin{bmatrix} 3 \\ 2 \end{bmatrix} and AuA \color{orange} \vec u

  • v=[22]\color{#3d81f6} \vec v = \begin{bmatrix} 2 \\ -2 \end{bmatrix} and AvA \color{#3d81f6} \vec v

  • w=[54]\color{#d81a60} \vec w = \begin{bmatrix} -5 \\ -4 \end{bmatrix} and AwA \color{#d81a60} \vec w

What do you notice about the vectors AuA \color{orange} \vec u, AvA \color{#3d81f6} \vec v, and AwA \color{#d81a60} \vec w? and how they relate to u\color{orange} \vec u, v\color{#3d81f6} \vec v, and w\color{#d81a60} \vec w?

Image produced in Jupyter

AA is called a rotation matrix, since it rotates vectors by a certain angle (in this case, π6\frac{\pi}{6} radians, or 3030^\circ). Rotations are a type of linear transformation.

Not all matrices are rotation matrices; there exist plenty of different types of linear transformations, like reflections, sheers, and projections (which sound familiar). These will all become familiar in Chapter 2.9. All I wanted to show you for now is that matrix multiplication may look like a bunch of random number crunching, but there’s a lot of meaning baked in.