Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

4.4. Lines, Planes, Hyperplanes, and the Cross Product

Think of this section as a brief detour from the main storyline of the course. Here, I’ll cover how to describe lines, planes, and hyperplanes in Rn\mathbb{R}^n, and how they relate to our understanding of spans and subspaces from earlier in this chapter.

Lines

Intuition in R2\mathbb{R}^2 and R3\mathbb{R}^3

We’re very familiar with lines in R2\mathbb{R}^2. The line y=2x+3y = 2x + 3, in R2\mathbb{R}^2, is the set of all (x,y)(x, y) coordinates that satisfy the equation y=2x+3y = 2x + 3.

The line y=2x+3y = 2x + 3 is in slope-intercept form, which more generally looks like y=w0+w1xy = w_0 + w_1 x. Sometimes, we may write lines in standard form, like 2xy+3=02x - y + 3 = 0, or more generally, ax+by+c=0ax + by + c = 0.

Let’s kick things up a notch and consider lines in R3\mathbb{R}^3. What is the equation of the line below?

Loading...

As we saw in Chapter 4.1, the line shown above is the span of the vector v=[213]\color{#3d81f6}\vec v = \begin{bmatrix} 2 \\ -1 \\ 3 \end{bmatrix}. It passes through the origin, (0,0,0)(0, 0, 0), and passes through the point (2,1,3)(2, -1, 3).

Ideally, we’d be able to express the line as a function

z=f(x,y)z = f(x, y)

as we did in the R2\mathbb{R}^2 case, where y=f(x)=mx+by = f(x) = mx + b.

Unfortunately, there is no way to express the lines in R3\mathbb{R}^3, or R4\mathbb{R}^4, or Rn\mathbb{R}^n for n>2n > 2 as a simple function. Why not? If there were some formula for zz (the height of the line) in terms of xx and yy, that would imply that we should be able to plug in any xx and any yy to get an output zz. But, the line above only works for very specific combinations of xx and yy. For instance, there’s no point on the line above that has x=1x = 1 and y=1y = 1. Rather, when x=1x = 1, yy is forced to be 12-\frac{1}{2}, and zz is forced to be 32\frac{3}{2}.

The key idea that I stressed in Chapter 4.1 is that lines are 1-dimensional objects, meaning that the location of any point on the line can be described using a single free variable.

So, the equation of the line above is

L=t[213],tRL = t\begin{bmatrix} 2 \\ -1 \\ 3 \end{bmatrix}, t \in \mathbb{R}

tt here is a free variable – sometimes called a parameter (though this term is confusing in the context of our course) – meaning we can set it to whatever we’d like. The line is the set of all points that can be reached by plugging in different values of tt.

Since the line is really a set of points, I should have written it as

L={t[213]tR}L = \{t\begin{bmatrix} 2 \\ -1 \\ 3 \end{bmatrix} \mid t \in \mathbb{R}\}

but I’ll use the former notation for brevity.

Equivalently, you can think of the line as three separate functions of tt. Pick a tt. Then, LL is

x=2ty=tz=3t\begin{align*} x = 2t \\ y = -t \\ z = 3t \end{align*}

Drag the value of tt below to see how tt allows us to move along the line.

Loading...

The line LL above passes through the origin, since if we set t=0t = 0, we get the point (0,0,0)(0, 0, 0). This matches what we’d expect out of the span of a single vector, since 0v=00 {\color{#3d81f6}\vec v} = \vec 0.

But how do we express a line that passes through some other fixed point that isn’t the origin? Such a line might not be the span of a single vector, since the span of a single vector is always a line that passes through the origin. But, it’s good to know how to think about lines in this more general form.

Lines in Parametric Form

The definition above is not specific to 2-dimensional or 3-dimensional space – it works in any Rn\mathbb{R}^n. (Technically, I’m mixing the meaning of a point and a vector here, but as long as we remember that points describe positions and vectors describe directions, we should be fine.) Here’s a line in R100\mathbb{R}^{100}:

L=[1234100]+t[11121314110],tRL = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ \vdots \\ 100 \end{bmatrix} + t \begin{bmatrix} -11 \\ 12 \\ -13 \\ 14 \\ \vdots \\ 110 \end{bmatrix}, t \in \mathbb{R}

Note that the parametric form of a line is not unique! Since the parametric definition of a line depends on a “starting point” p0\vec p_0, we can pick any starting point we’d like. We can also scale the direction vector by any non-zero scalar. So,

L1=[12]+t[34],tRL_1 = \begin{bmatrix} 1 \\ 2 \end{bmatrix} + t \begin{bmatrix} -3 \\ 4 \end{bmatrix}, t \in \mathbb{R}

is the same line as

L2=[26]+t[68],tRL_2 = \begin{bmatrix} -2 \\ 6 \end{bmatrix} + t \begin{bmatrix} 6 \\ -8 \end{bmatrix}, t \in \mathbb{R}

once you consider all possible values of tt in both cases. (I know this is a little confusing, since plugging the same value of tt into L1L_1 and into L2L_2 will give you different points, but remember that L1L_1 and L2L_2 are sets, and so we need to consider all possible values of tt.)

Below is a plot of L1=[12]+t[34],tRL_1 = \begin{bmatrix} 1 \\ 2 \end{bmatrix} + t \begin{bmatrix} -3 \\ 4 \end{bmatrix}, t \in \mathbb{R}.

Image produced in Jupyter

And here’s L2=[26]+t[68],tRL_2 = \begin{bmatrix} -2 \\ 6 \end{bmatrix} + t \begin{bmatrix} 6 \\ -8 \end{bmatrix}, t \in \mathbb{R}.

Image produced in Jupyter

Note that we end up with the same line, despite the different starting points and direction vectors!

The proceeding activities give you some practice with the parametric form of a line.

Activity 1

Activity 2

Activity 3

Activity 4


Planes

Lines are 1-dimensional objects, whether they exist in R2\mathbb{R}^2, or R3\mathbb{R}^3, or R47\mathbb{R}^{47}, or in general Rn\mathbb{R}^n.

Similarly, planes are 2-dimensional objects. In R2\mathbb{R}^2, since there only exist two dimensions in the first place, the entirety of the coordinate system is one single plane, which we call the xyxy-plane.

Let’s start by building intuition for planes in R3\mathbb{R}^3, the most natural setting for them, and then generalize.

Planes in R3\mathbb{R}^3

For example, let’s draw:

  1. 3x+4y5z12=0\color{#3d81f6} 3x + 4y - 5z - 12 = 0, or equivalently z=35x+45y+125z = \frac{3}{5}x + \frac{4}{5}y + \frac{12}{5}

  2. 5x3yz=0\color{orange} -5x - 3y - z = 0, or equivalently z=5x3yz = -5x - 3y

You’ll notice that they intersect at a line. This is not a coincidence; any two non-parallel planes in R3\mathbb{R}^3 will intersect at a line.

Loading...

Note that both planes are flat surfaces that extend infinitely in all directions. The fact that the blue plane is cut off at the edges is just due to how I’m plotting the planes, not that there’s some boundary within which the plane is defined.

You’ll notice that the blue plane is relatively shallow, while the orange plane is relatively steep. Why?

I find the form z=Ax+By+Cz = Ax + By + C easier to understand intuitively, since it shows the rate of change of zz with respect to xx and yy more clearly. Starting with z=Ax+By+Cz = Ax + By + C, we have that

zx=A,zy=B\frac{\partial z}{\partial x} = A, \quad \frac{\partial z}{\partial y} = B

In this example, the blue plane has A=35A = \frac{3}{5} and B=45B = \frac{4}{5}, while the orange plane has A=5A = -5 and B=3B = -3, which explains their relative steepness.

That said, be careful, since a plane need not have a non-zero coefficient on zz. For example, 3x+4y=03x + 4y = 0 and 3x+4y=53x + 4y = 5 is are perfectly valid planes, and they happen to be parallel.

Loading...

A key property of planes is that they are flat. Sure, we know that intuitively, but what does it actually mean?

This property is not true in general for other surfaces.

Loading...

The Cross Product

I first mentioned planes back in Chapter 3.1, when we intuitively discussed the fact that the set of all linear combinations of two non-collinear vectors in R3\mathbb{R}^3 forms a plane. We discussed this idea at length in Chapter 4.1, too.

So, given two vectors u=[u1u2u3],v=[v1v2v3]R3{\color{orange} \vec u = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \end{bmatrix}}, {\color{#3d81f6} \vec v = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}} \in \mathbb{R}^3, how do we find the equation of the plane they span, in standard form?

Loading...

The standard form of a plane in R3\mathbb{R}^3 is

ax+by+cz+d=0ax + by + cz + d = 0

We know that the plane spanned by two vectors in R3\mathbb{R}^3 must contain the zero vector, since 0u+0v=00{\color{orange} \vec u} + 0{\color{#3d81f6} \vec v} = \vec 0. This means that the point (x,y,z)=(0,0,0)(x, y, z) = (0, 0, 0) must satisfy the equation of the plane. Plugging in (x,y,z)=(0,0,0)(x, y, z) = (0, 0, 0) into ax+by+cz+d=0ax + by + cz + d = 0 gives us d=0d = 0.

So, I’m searching for a plane of the form ax+by+cz=0ax + by + cz = 0. Plugging in u=[u1u2u3]\color{orange} \vec u = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \end{bmatrix} tells me that aa, bb, and cc must satisfy

au1+bu2+cu3=0a{\color{orange} u_1} + b{\color{orange} u_2} + c{\color{orange} u_3} = 0

Similarly, aa, bb, and cc must also satisfy

av1+bv2+cv3=0a{\color{#3d81f6} v_1} + b{\color{#3d81f6} v_2} + c{\color{#3d81f6} v_3} = 0

Look closely. The left-hand side of both equations looks a lot like the dot product of [abc]\begin{bmatrix} a \\ b \\ c \end{bmatrix} with each of u\color{orange} \vec u and v\color{#3d81f6} \vec v. Since those dot products must both be 0 (coming from the right-hand side of each equation), we’re really just looking for a vector [abc]\begin{bmatrix} a \\ b \\ c \end{bmatrix} that’s orthogonal to both u\color{orange} \vec u and v\color{#3d81f6} \vec v.

There are infinitely many vectors orthogonal to a particular pair of vectors u\color{orange} \vec u and v\color{#3d81f6} \vec v, meaning there are infinitely many possible values of aa, bb, and cc that satisfy the above equations. (There are 2 equations but 3 unknowns, so we’d expect there to be infinitely many solutions.)

But, one property that all of these vectors share is that they all point in the same direction – if [abc]\begin{bmatrix} a \\ b \\ c \end{bmatrix} is orthogonal to u\color{orange} \vec u and v\color{#3d81f6} \vec v, then so is any non-zero scalar multiple of [abc]\begin{bmatrix} a \\ b \\ c \end{bmatrix}.

Loading...

One particular vector (i.e. set of coefficients aa, bb, and cc) that satisfies the above equations is the cross product of u\color{orange} \vec u and v\color{#3d81f6} \vec v.

There’s a lot of meaning baked into the definition of the cross product, but most of it is more relevant in a traditional engineering or physics context. For example, the cross product is anticommutative, meaning that the order you compute it in matters.

u×v=(v×u){\color{orange} \vec u} \times {\color{#3d81f6} \vec v} = -({\color{#3d81f6} \vec v} \times {\color{orange} \vec u})

That’s the type of statement we won’t bother investigating further here. The key fact that is relevant for us right now is that the vector u×v{\color{orange} \vec u} \times {\color{#3d81f6} \vec v} is orthogonal to both u\color{orange} \vec u and v\color{#3d81f6} \vec v.

Activity 5

Activity 6

Let’s use the cross product to concretely find the equation of the plane planned by two vectors in R3\mathbb{R}^3. Suppose u=[521]\color{orange} \vec u = \begin{bmatrix} 5 \\ 2 \\ 1 \end{bmatrix} and v=[230]\color{#3d81f6} \vec v = \begin{bmatrix} -2 \\ 3 \\ 0 \end{bmatrix}.

The cross product of u\color{orange} \vec u and v\color{#3d81f6} \vec v is given by

u×v=[20131(2)50532(2)]=[3219]{\color{orange} \vec u} \times {\color{#3d81f6} \vec v} = \begin{bmatrix} {\color{orange}2} \cdot {\color{#3d81f6}0} - {\color{orange}1} \cdot {\color{#3d81f6}3} \\ {\color{orange}1} \cdot {\color{#3d81f6}(-2)} - {\color{orange}5} \cdot {\color{#3d81f6}0} \\ {\color{orange}5} \cdot {\color{#3d81f6}3} - {\color{orange}2} \cdot {\color{#3d81f6}(-2)} \end{bmatrix} = \begin{bmatrix} -3 \\ -2 \\ 19 \end{bmatrix}

The equation of the plane spanned by u\color{orange} \vec u and v\color{#3d81f6} \vec v is then given by

3x2y+19z=0-3x - 2y + 19z = 0

The vector that the cross product returns is sometimes called the normal vector of the plane. Normal is another term for orthogonal or perpendicular. For the plane 3x2y+19z=0-3x - 2y + 19z = 0, the normal vector is [3219]\begin{bmatrix} -3 \\ -2 \\ 19 \end{bmatrix}, as that vector is orthogonal to the two vectors u\color{orange} \vec u and v\color{#3d81f6} \vec v that span the plane. When we’re looking at the standard form of the equation of a plane in R3\mathbb{R}^3, the normal vector is just the coefficients of xx, yy, and zz in the equation ax+by+cz=0ax + by + cz = 0.

There are infinitely many normal vectors for a given plane, since we can multiply any normal vector by a scalar and still get a normal vector. For example, [3219]\begin{bmatrix} -3 \\ -2 \\ 19 \end{bmatrix} is a normal vector for the plane 3x2y+19z=0-3x - 2y + 19z = 0, and so is [6438]\begin{bmatrix} -6 \\ -4 \\ 38 \end{bmatrix} and [12/319/3]\begin{bmatrix} 1 \\ 2/3 \\ -19/3 \end{bmatrix}. Equivalently, 6x4y+38z=0-6x - 4y + 38z = 0 and x+23y193z=0x + \frac{2}{3}y - \frac{19}{3}z = 0 are ways to write the same plane we’ve been looking at.

Activity 7

Planes in Parametric Form

The cross product is a construct that only exists in 3-dimensions. Why is that? The cross product relies on the fact that the vectors u\color{orange} \vec u and v\color{#3d81f6} \vec v are linearly independent, meaning they span a plane, and that there is only one direction in R3\mathbb{R}^3 that is orthogonal to that plane. The cross product returns a vector in that direction. But, given two vectors in R4\mathbb{R}^4, for instance, there are infinitely many directions that are orthogonal to both of those two vectors, so it’s hard to think of an operation that returns any one of them.

All of that is to say, in R4\mathbb{R}^4 and above, we can’t express planes in standard form, the same way we can’t express lines in R3\mathbb{R}^3 in standard form. Instead, we’ll need to resort to their parametric form.

Again, the formal way of stating this definition is to treat the plane like a set of points that obeys an inclusion condition.

P={p0+su+tvs,tR}P = \left\{ \vec p_0 + s\vec u + t\vec v \mid s, t \in \mathbb{R} \right\}

This definition is very similar to the definition of the parametric form of a line in Rn\mathbb{R}^n, it’s just that instead of one direction vector, we have two. For instance,

P=[38127π]+s[102100]+t[521310]P = \begin{bmatrix} 3 \\ 8 \\ 1 \\ 2 \\ -7 \\ \pi \end{bmatrix} + s\begin{bmatrix} 1 \\ 0 \\ 2 \\ -1 \\ 0 \\ 0 \end{bmatrix} + t\begin{bmatrix} 5 \\ 2 \\ -1 \\ 3 \\ 1 \\ 0 \end{bmatrix}

is a plane in R6\mathbb{R}^6, and you should think of it as a 2-dimensional “slice” of 6-dimensional space.

Activity 8

Activity 9


Hyperplanes

So far, we’ve learned how to think of lines and planes in arbitrarily high dimensions. We can’t visualize a plane in R76\mathbb{R}^{76}, but we have some intuition that it’s a 2-dimensional “slice” of 76-dimensional space.

On the topic of slices:

  • A line is a 1-dimensional “slice” of 2-dimensional space.

  • A plane is a 2-dimensional “slice” of 3-dimensional space.

In general, a hyperplane is an (n1)(n-1)-dimensional “slice” of nn-dimensional space.

The most common way of representing a hyperplane is the form ax+b=0\vec a \cdot \vec x + b = 0.

  • Example: 2x1+3x25=02x_1 + 3x_2 - 5 = 0 is a hyperplane in R2\mathbb{R}^2, defined by the vector a=[23]\vec a = \begin{bmatrix} 2 \\ 3 \end{bmatrix} and b=5b = -5. This is just a line in R2\mathbb{R}^2. (If it helps to see that this is a line, relabel x1x_1 and x2x_2 as xx and yy.)

  • Example: x1+x2+x3=0x_1 + x_2 + x_3 = 0 is a hyperplane in R3\mathbb{R}^3, defined by the vector a=[111]\vec a = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} and b=0b = 0. This is just a plane in R3\mathbb{R}^3.

Hyperplanes are hugely important in machine learning, particularly in the context of classification. You should think of a hyperplane in Rn\mathbb{R}^n as a boundary that divides all of Rn\mathbb{R}^n into two halves: everything is either above it or below it.

For example, the hyperplane [3219]x=0\begin{bmatrix} -3 \\ -2 \\ 19 \end{bmatrix} \cdot \vec x = 0 is shown below. Any point in R3\mathbb{R}^3 is either above it, meaning [3219]x>0\begin{bmatrix} -3 \\ -2 \\ 19 \end{bmatrix} \cdot \vec x > 0, or below it, meaning [3219]x<0\begin{bmatrix} -3 \\ -2 \\ 19 \end{bmatrix} \cdot \vec x < 0. (Yes, this hyperplane is just the plane 3x2y+19z=0-3x - 2y + 19z = 0 from earlier!)

Loading...

Another more concrete example of a hyperplane comes from looking at the diabetes classification problem first introduced in Homework 3. There, we explored a dataset of several patients, each of which had two features – a glucose level and a body mass index (BMI) – along with a binary label indicating whether they have diabetes or not.

Image produced in Jupyter

In Homework 3, we introduce the kk-nearest neighbors (k-NN) classifier. You might recall that the decision boundary of a kk-NN classifier looks like a bunch of irregularly shaped blobs in the feature space (R2\mathbb{R}^2 here).

Image produced in Jupyter

Another common family of classifiers is linear classifiers, where the decision boundary is a hyperplane. One such linear classifier is the logistic regression classifier. On this dataset, its decision boundary is plotted below.

Image produced in Jupyter

Here the decision boundary looks like a line because the data is only 2-dimensional, but in general (with more than two features) a linear classifier’s decision boundary is a hyperplane in Rn\mathbb{R}^n. The w\vec w in the decision boundary equation wx+b=0\vec w \cdot \vec x + b = 0 comes from minimizing empirical risk, for some model and loss function!

We can even peek at the decision boundary:

model = LogisticRegression()
model.fit(X_train, y_train)
Loading...
model.coef_
array([[0.04, 0.08]])
model.intercept_
array([-7.85])

This is telling us that the decision boundary is of the form

0.04Glucosei+0.08BMIi7.85=00.04 \cdot \text{Glucose}_i + 0.08 \cdot \text{BMI}_i - 7.85 = 0

or

[0.040.08]w[GlucoseiBMIi]xi7.85=0\underbrace{\begin{bmatrix} 0.04 \\ 0.08 \end{bmatrix}}_{\vec w^*} \cdot \underbrace{\begin{bmatrix} \text{Glucose}_i \\ \text{BMI}_i \end{bmatrix}}_{\vec x_i} - 7.85 = 0

If this classifier used more features, then the decision boundary would involve more terms. Either way, it would be a hyperplane in Rd\mathbb{R}^d, where dd is the number of features used. Here, d=2d = 2, so the decision boundary is a (d1)(d-1)-dimensional hyperplane in R2\mathbb{R}^2, i.e. a line in R2\mathbb{R}^2.

The specifics of logistic regression and how it works are beyond the scope of our course, and certainly not relevant to this section of the notes. I’ve provided this example here just to give you context for where hyperplanes come up in machine learning.