Skip to article frontmatterSkip to article content

2.2. The Dot Product

Definitions

In Chapter 2.1, we learned how to add and scale vectors. The next natural operation is to consider how to multiply two vectors together. Let’s start with a definition, and then make sense of it.

The Computational Definition

I call this the computational definition because it’s the definition that’s easiest to compute. Let’s work out an example. Consider the vectors u=[62]\color{orange} \vec u = \begin{bmatrix} 6 \\ 2 \end{bmatrix} and v=[53]\color{#3d81f6} \vec v = \begin{bmatrix} 5 \\ -3 \end{bmatrix}. Their dot product is:

uv=[62][53]=(6)(5)+(2)(3)=306=24{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} = {\color{orange} \begin{bmatrix} 6 \\ 2 \end{bmatrix}} \cdot {\color{#3d81f6} \begin{bmatrix} 5 \\ -3 \end{bmatrix}} = ({\color{orange} 6}) ({\color{#3d81f6} 5}) + ({\color{orange} 2}) ({\color{#3d81f6} -3}) = 30 - 6 = 24

Note that the dot product is one number (24 here), not another vector.

What does the dot product of 24 tell us about u\color{orange} \vec u and v\color{#3d81f6} \vec v?

Loading...

On its own, 24 doesn’t mean much. Let’s imagine we keep v\color{#3d81f6} \vec v fixed, and move u\color{orange} \vec u around. What do you notice about the resulting dot products?

Image produced in Jupyter

It seems like the dot product has something to do with the angle between u\color{orange} \vec u and v\color{#3d81f6} \vec v:

  • When uv\color{orange} \vec u \cdot \color{#3d81f6} \vec v is large and positive, it seems like the two vectors are pointing in the same direction. (The larger the dot product, the more aligned they are.)

  • When uv\color{orange} \vec u \cdot \color{#3d81f6} \vec v is large and negative, it seems like the two vectors are pointing in opposite directions.

  • When uv\color{orange} \vec u \cdot \color{#3d81f6} \vec v is 0, it seems like the two vectors are perpendicular.

In fact, there’s another equivalent definition of the dot product that makes this relationship explicit.

The Geometric Definition

To prove why these two definitions are equivalent, we’ll need to learn a bit more about the properties of the dot product. For now, let’s just try and interpret this new formula. Here are both definitions of the dot product, for two vectors u\color{orange} \vec u and v\color{#3d81f6} \vec v:

uv=u1v1+u2v2++unvn=uvcosθ\begin{aligned} {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} &= {\color{orange} u_1} {\color{#3d81f6} v_1} + {\color{orange} u_2} {\color{#3d81f6} v_2} + \cdots + {\color{orange} u_n} {\color{#3d81f6} v_n} \\ &= \left\| {\color{orange} \vec u} \right\| \left\| {\color{#3d81f6} \vec v} \right\| \cos \theta \end{aligned}

How does the function cosθ\cos \theta behave? Remember, θ\theta is the angle between the two vectors.

Image produced in Jupyter

This explains what we saw in the earlier grid, which contained four pairs of vectors along with their dot products. To recap the logic:

two vectors pointin similar directions    θ small    cosθ close to 1    dot product large\substack{\text{two vectors point} \\ \text{in similar directions}} \implies \theta \text{ small} \implies \cos \theta \text{ close to 1} \implies \text{dot product large}

There’s another hugely important property that the plot of cosθ\cos \theta reveals. Hugely.

Orthogonal Vectors

Orthogonal is just a fancy word for perpendicular. Both words mean that the two vectors are at a right angle (90º) to each other.

As an example in R2\mathbb{R}^2, the vectors u=[12]\color{orange} \vec u = \begin{bmatrix} 1 \\ 2 \end{bmatrix} and v=[105]\color{#3d81f6} \vec v = \begin{bmatrix} -10 \\ 5 \end{bmatrix} are orthogonal:

  • Computationally, uv=(1)(10)+(2)(5)=10+10=0{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} = ({\color{orange} 1})({\color{#3d81f6} -10}) + ({\color{orange} 2})({\color{#3d81f6} 5}) = -10 + 10 = 0.

  • Geometrically, the angle between them is 90 degrees, so cosθ=0\cos \theta = 0, meaning uv=uvcosθ=0{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} = {\color{orange}\left\| \vec u \right\|} {\color{#3d81f6} \left\| \vec v \right\|} \cos \theta = 0.

Image produced in Jupyter

For an example in R3\mathbb{R}^3, the vectors w=[362]\color{#d81b60} \vec w = \begin{bmatrix} 3 \\ 6 \\ 2 \end{bmatrix} and r=[5232]\color{#004d40} \vec r = \begin{bmatrix} -5 \\ 2 \\ \frac{3}{2} \end{bmatrix} are also orthogonal:

  • Computationally, wr=(3)(5)+(6)(2)+(2)(32)=15+12+3=0{\color{#d81b60} \vec w} \cdot {\color{#004d40} \vec r} = ({\color{#d81b60} 3})({\color{#004d40} -5}) + ({\color{#d81b60} 6})({\color{#004d40} 2}) + ({\color{#d81b60} 2})({\color{#004d40} \frac{3}{2}}) = -15 + 12 + 3 = 0.

  • Geometrically, the angle between them is 90 degrees, so cosθ=0\cos \theta = 0, meaning wr=wrcosθ=0{\color{#d81b60} \vec w} \cdot {\color{#004d40} \vec r} = {\color{#d81b60} \left\| \vec w \right\|} {\color{#004d40} \left\| \vec r \right\|} \cos \theta = 0.

Loading...

What does a right angle look like in 4 or higher dimensions? I’m not sure, but that’s the beauty of abstraction once again – this definition of orthogonality works in any dimension, just like our definitions of the dot product. If two vectors are orthogonal, I like thinking of them as being “as different as possible”, in contrast to two vectors that point in the same direction.

Orthogonality, as it turns out, is crucial to our goal of framing the linear regression problem in terms of linear algebra. I’m not a big proponent of asking you to memorize things (I’d rather you internalize them through practice!), but the definition of orthogonality is one that you need to remember.


Properties of the Dot Product

We’ve already taken the commutative property for granted (the angle between u\color{orange} \vec u and v\color{#3d81f6} \vec v is the same as the angle between v\color{#3d81f6} \vec v and u\color{orange} \vec u), and we’re about to see a powerful application of the distributive property (though it’s a good exercise to see if you can verify it yourself).

Let me comment on the last property, the associativity of the dot product with respect to a scalar. In standard multiplication, the associativity property for scalars a,b,cRa, b, c \in \mathbb{R} says that abc=(ab)c=a(bc)abc = (ab)c = a(bc). However, this does not hold for the dot product, because the dot product of three vectors has no meaning! Instead, the modified associativity property for the dot product concerns itself with two vectors and a scalar.

Dot Product and the Vector Norm

What is the dot product of a vector with itself? If vRn{\color{#3d81f6} \vec v} \in \mathbb{R}^n, then:

vv=v1v1+v2v2++vnvn=v12+v22++vn2=v2\begin{aligned} \color{#3d81f6} \vec v \cdot \color{#3d81f6} \vec v &= {\color{#3d81f6} v_1} {\color{#3d81f6} v_1} + {\color{#3d81f6} v_2} {\color{#3d81f6} v_2} + \cdots + {\color{#3d81f6} v_n} {\color{#3d81f6} v_n} \\ &= {\color{#3d81f6} v_1}^2 + {\color{#3d81f6} v_2}^2 + \cdots + {\color{#3d81f6} v_n}^2 \\ &= \left\| {\color{#3d81f6} \vec v} \right\|^2 \end{aligned}

The fact that vv=v2\boxed{\vec v \cdot \vec v = \left\| \vec v \right\|^2} unlocks a variety of powerful analyses, and it’s such a core definition that I’ve boxed it.

For example, we now have the tools to prove that the geometric cosine definition of the dot product is equal to the computational definition! Let’s try and show that:

u1v1+u2v2++unvn=uvcosθ{\color{orange} u_1} {\color{#3d81f6} v_1} + {\color{orange} u_2} {\color{#3d81f6} v_2} + \cdots + {\color{orange} u_n} {\color{#3d81f6} v_n} = \left\| {\color{orange} \vec u} \right\| \left\| {\color{#3d81f6} \vec v} \right\| \cos \theta

Let’s consider two arbitrary vectors in Rn\mathbb{R}^n, u\color{orange} \vec u and v\color{#3d81f6} \vec v. (I’ve drawn them below as vectors in R2\mathbb{R}^2, but we won’t assume anything in particular about two dimensional space, and we won’t put specific numbers to them, since our proof should be general.)

Along with them, let’s consider their difference, uv\color{#d81b60} \vec u - \vec v. This step may seem arbitrary, but we’ll see why it’s useful soon.

Image produced in Jupyter

A confusing concept is whether the tip of the vector uv\color{#d81b60} \vec u - \vec v should be at the tip of u\color{orange} \vec u or the tip of v\color{#3d81f6} \vec v. To verify that the above diagram is correct, note that if you walk along the length of v\color{#3d81f6} \vec v, then along the length of uv\color{#d81b60} \vec u - \vec v, you end up at the tip of u\color{orange} \vec u, which matches what we’d expect from the expression v+(uv)=u{\color{#3d81f6} \vec v} + ({\color{orange} \vec u} - {\color{#3d81f6} \vec v}) = {\color{orange} \vec u}.

I’d like to try and find an expression involving θ\theta (the angle between u\color{orange} \vec u and v\color{#3d81f6} \vec v) and the dot product of u\color{orange} \vec u and v\color{#3d81f6} \vec v, without using the cosine definition of the dot product (since that’s what I’m trying to prove).

  1. First, let’s consider a rule we perhaps haven’t touched in a few years: the cosine law. The cosine law says that for any triangle with sides of length aa, bb, and cc, with an angle of CC opposite side cc,

    c2=a2+b22abcosC\begin{align*} c^2 &= a^2 + b^2 - 2ab \cos C \end{align*}

    We can apply this rule to the triangle formed by u\color{orange} \vec u, v\color{#3d81f6} \vec v, and uv\color{#d81b60} \vec u - \vec v (the dashed line in the diagram above). The cosine law tells us that

    uv2=u2+v22uvcosθ\lVert {\color{#d81b60} \vec u - \vec v} \rVert^2 = \lVert {\color{orange} \vec u} \rVert^2 + \lVert {\color{#3d81f6} \vec v} \rVert^2 - 2 \lVert {\color{orange} \vec u} \rVert \lVert {\color{#3d81f6} \vec v} \rVert \cos \theta

    There’s not much more I can do with this right now.

  2. Above, it’d be nice to have an expression for uv2\lVert {\color{#d81b60} \vec u - \vec v} \rVert^2 that involves the dot product of u\color{orange} \vec u and v\color{#3d81f6} \vec v. Let’s try and find one. I will use the fact that uv2=(uv)(uv)\lVert {\color{#d81b60} \vec u - \vec v} \rVert^2 = ({\color{#d81b60} \vec u - \vec v}) \cdot ({\color{#d81b60} \vec u - \vec v}).

uv2=(uv)(uv)=uuuvvuwhy are both terms are the same?+vv=uu2uv+vv=u2+v22uv\begin{align*} \lVert {\color{#d81b60} \vec u - \vec v} \rVert^2 &= ({\color{orange} \vec u} - {\color{#3d81f6} \vec v}) \cdot ({\color{orange} \vec u} - {\color{#3d81f6} \vec v}) \\ &= {\color{orange} \vec u} \cdot {\color{orange} \vec u} \underbrace{- {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} - {\color{#3d81f6} \vec v} \cdot {\color{orange} \vec u}}_{\text{why are both terms are the same?}} + {\color{#3d81f6} \vec v} \cdot {\color{#3d81f6} \vec v} \\ &= {\color{orange} \vec u} \cdot {\color{orange} \vec u} - 2{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} + {\color{#3d81f6} \vec v} \cdot {\color{#3d81f6} \vec v} \\ &= \lVert {\color{orange} \vec u} \rVert^2 + \lVert {\color{#3d81f6} \vec v} \rVert^2 - 2 {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} \end{align*}

Let’s take a step back. Independently, we’ve found two expressions for uv2\lVert {\color{#d81b60} \vec u - \vec v} \rVert^2:

  1. uv2=u2+v22uvcosθ\lVert {\color{#d81b60} \vec u - \vec v} \rVert^2 = \lVert {\color{orange} \vec u} \rVert^2 + \lVert {\color{#3d81f6} \vec v} \rVert^2 - 2 \lVert {\color{orange} \vec u} \rVert \lVert {\color{#3d81f6} \vec v} \rVert \cos \theta

  2. uv2=u2+v22uv\lVert {\color{#d81b60} \vec u - \vec v} \rVert^2 = \lVert {\color{orange} \vec u} \rVert^2 + \lVert {\color{#3d81f6} \vec v} \rVert^2 - 2 {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}

These must be equal! Equating the two expressions on the right-hand sides gives us

u2+v22uvcosθ=u2+v22uv\lVert {\color{orange} \vec u} \rVert^2 + \lVert {\color{#3d81f6} \vec v} \rVert^2 - 2 \lVert {\color{orange} \vec u} \rVert \lVert {\color{#3d81f6} \vec v} \rVert \cos \theta = \lVert {\color{orange} \vec u} \rVert^2 + \lVert {\color{#3d81f6} \vec v} \rVert^2 - 2 {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}

Subtracting the common terms from both sides gives us

2uvcosθ=2uv-2 \lVert {\color{orange} \vec u} \rVert \lVert {\color{#3d81f6} \vec v} \rVert \cos \theta = -2 {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}

And finally, dividing both sides by -2 gives us

uvcosθ=uv\boxed{\lVert {\color{orange} \vec u} \rVert \lVert {\color{#3d81f6} \vec v} \rVert \cos \theta = {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}}

This completes the proof that the two formulas for the dot product are equivalent! This is an extremely important proof, and proofs of this type will appear in labs, homeworks, and exams moving forward. You’re not expected to remember the cosine law from memory, but given the cosine law, you eventually will need to be able to produce something like this on your own.

An implication of this equality, as we saw in Activity 2, is that the angle between u\color{orange} \vec u and v\color{#3d81f6} \vec v can be found by using

cosθ=uvuv\cos \theta = \frac{{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}}{\left\| {\color{orange} \vec u} \right\| \left\| {\color{#3d81f6} \vec v} \right\|}

The expression on the right can be thought of as a normalized dot product, where we start by the dot product and divide by the product of the norms of the two vectors. You can also view it as the dot product of unit vectors U=uu\vec U = \frac{{\color{orange} \vec u}}{\left\| {\color{orange} \vec u} \right\|} and V=vv\vec V = \frac{{\color{#3d81f6} \vec v}}{\left\| {\color{#3d81f6} \vec v} \right\|}.

Cauchy-Schwarz and Triangle Inequalities

In Chapter 2.1, I stated – without proof! – that the vector norm satisfies the triangle inequality, which says that for any vectors u\color{orange} \vec u and v\color{#3d81f6} \vec v in Rn\mathbb{R}^n,

u+vu+v\left\| {\color{orange} \vec u} + {\color{#3d81f6} \vec v} \right\| \leq \left\| {\color{orange} \vec u} \right\| + \left\| {\color{#3d81f6} \vec v} \right\|

Remember, intuitively, this says that the length of one side of a triangle cannot be longer than the sum of the lengths of the other two sides, if you consider the triangle formed by u\color{orange} \vec u, v\color{#3d81f6} \vec v, and u+v{\color{orange} \vec u} + \color{#3d81f6} \vec v.

We now have the tools to prove this. But first, let me start by introducing a new inequality, the Cauchy-Schwarz inequality. This inequality says that for any vectors u\color{orange} \vec u and v\color{#3d81f6} \vec v in Rn\mathbb{R}^n,

uvuv|{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}| \leq \left\| {\color{orange} \vec u} \right\| \left\| {\color{#3d81f6} \vec v} \right\|

Gilbert Strang’s book calls the Cauchy-Schwarz inequality the most important inequality in mathematics, and the instructors of future machine learning courses specifically requested for us to put it in EECS 245.

Why is the Cauchy-Schwarz inequality true? Try and reason about it yourself.

Equipped with the Cauchy-Schwarz inequality, we can now prove the triangle inequality. I want to show that u+vu+v\lVert {\color{orange} \vec u} + {\color{#3d81f6} \vec v} \rVert \leq \lVert {\color{orange} \vec u} \rVert + \lVert {\color{#3d81f6} \vec v} \rVert.

Let me start by expanding u+v2\lVert {\color{orange} \vec u} + {\color{#3d81f6} \vec v} \rVert^2:

u+v2=(u+v)(u+v)=u2+2uv+v2\begin{aligned} \lVert {\color{orange} \vec u} + {\color{#3d81f6} \vec v} \rVert^2 &= ({\color{orange} \vec u} + {\color{#3d81f6} \vec v}) \cdot ({\color{orange} \vec u} + {\color{#3d81f6} \vec v}) \\ &= \lVert {\color{orange} \vec u} \rVert^2 + 2 {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} + \lVert {\color{#3d81f6} \vec v} \rVert^2 \end{aligned}

We can’t use the Cauchy-Schwarz inequality here just yet, because it says something about uv|{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}| but there isn’t an absolute value around uv{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} above. But, we can use the fact that:

xxx \leq |x|

for any and all xRx \in \mathbb{R}. (The absolute value is just xx if xx is positive, and is still positive even when xx is negative, so x|x| can never be less than xx.)

Applying this to uv{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}, we get:

uvuv{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} \leq | {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} |

Then, from the Cauchy-Schwarz inequality, we know that:

uvuv|{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}| \leq \left\| {\color{orange} \vec u} \right\| \left\| {\color{#3d81f6} \vec v} \right\|

Putting these two inequalities together, we get:

uvuv{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} \leq \left\| {\color{orange} \vec u} \right\| \left\| {\color{#3d81f6} \vec v} \right\|

Back to the main proof. Using the most recent inequality above, we have:

u+v2=(u+v)(u+v)=u2+2uv+v2u2+2uv+v2=(u+v)2\begin{aligned} \lVert {\color{orange} \vec u} + {\color{#3d81f6} \vec v} \rVert^2 &= ({\color{orange} \vec u} + {\color{#3d81f6} \vec v}) \cdot ({\color{orange} \vec u} + {\color{#3d81f6} \vec v}) \\ &= \lVert {\color{orange} \vec u} \rVert^2 + 2 {\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v} + \lVert {\color{#3d81f6} \vec v} \rVert^2 \\ &\leq \lVert {\color{orange} \vec u} \rVert^2 + 2 \left\| {\color{orange} \vec u} \right\| \left\| {\color{#3d81f6} \vec v} \right\| + \lVert {\color{#3d81f6} \vec v} \rVert^2 \\ &= \left( \lVert {\color{orange} \vec u} \rVert + \lVert {\color{#3d81f6} \vec v} \rVert \right)^2 \end{aligned}

Taking the square root of both sides, we get:

u+vu+v\lVert {\color{orange} \vec u} + {\color{#3d81f6} \vec v} \rVert \leq \lVert {\color{orange} \vec u} \rVert + \lVert {\color{#3d81f6} \vec v} \rVert

This completes the proof of the triangle inequality!

To recap:

  • The Cauchy-Schwarz inequality says that uvuv| {\color{orange} \vec{u}} \cdot {\color{#3d81f6} \vec{v}}| \leq \| {\color{orange} \vec{u}}\| \| {\color{#3d81f6} \vec{v}}\|.

  • The triangle inequality says that u+vu+v\|{\color{orange} \vec{u}} + {\color{#3d81f6} \vec{v}}\| \leq \|{\color{orange} \vec{u}}\| + \|{\color{#3d81f6} \vec{v}}\|.

Both of these inequalities are true for any vectors u\color{orange} \vec u and v\color{#3d81f6} \vec v.