Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

2.2. Detour: Partial Derivatives

Partial Derivatives

How do we take the derivative of a function with multiple input variables?

Rsq(w0,w1)=1ni=1n(yi(w0+w1xi))2R_\text{sq}(w_0, w_1) = \frac{1}{n} \sum_{i=1}^n (y_i - (w_0 + w_1 x_i))^2

To illustrate, let’s focus on a simpler function with two input variables:

f(x,y)=x2+y29f(x,y) = \frac{x^2 + y^2}{9}

This is a quadratic function of two variables, and its graph is known as a paraboloid.

Before optimizing the regression parameters, we take a brief detour into functions of multiple variables and partial derivatives.

Loading...

In the single-input case – i.e., for functions of the form f:RRf: \mathbb{R} \to \mathbb{R} – the derivative ddxf(x)\frac{\text{d}}{\text{d}x}f(x) captured f(x)f(x)'s rate of change along the xx-axis, which was the only axis of motion.

The function f(x,y)f(x, y) has two input variables, and so there are two directions along which we can move. As such, we need two “derivatives” to describe the rate of change of f(x,y)f(x, y) – one for the xx-axis and one for the yy-axis. Think of this as a science experiment, where we need control variables to isolate changes to a single variable. Our solution to this dilemma comes in the form of partial derivatives.

If ff has nn input variables, it has nn partial derivatives, one for each axis. The function f(x,y)=x2+y29f(x, y) = \frac{x^2 + y^2}{9} has two partial derivatives, fx(x,y)\frac{\partial f}{\partial x}(x, y) and fy(x,y)\frac{\partial f}{\partial y}(x, y). (The symbol you’re seeing, \partial, is the lowercase Greek letter delta, and is used specifically for partial derivatives.)

Let me show you how to compute partial derivatives before we visualize them. We’ll start with fx(x,y)\frac{\partial f}{\partial x}(x, y).

f(x,y)=x2+y29fx(x,y)=x ⁣(x2+y29)=19x(x2+y2)=19 ⁣(xx2+xy2=0)=19(2x+0)=2x9\begin{align*} f(x,y) &= \frac{x^2 + y^2}{9} \\ \frac{\partial f}{\partial x}(x, y) &=\frac{\partial}{\partial x}\!\left(\frac{x^2+y^2}{9}\right) \\[4pt] &=\frac{1}{9}\,\frac{\partial}{\partial x}(x^2+y^2) \\ &=\frac{1}{9}\!\left(\frac{\partial}{\partial x}x^2+ \underbrace{\frac{\partial}{\partial x}y^2}_{=0}\right) \\ &=\frac{1}{9}\,(2x+0) \\ &=\frac{2x}{9} \end{align*}

The result, fx(x,y)=2x9\frac{\partial f}{\partial x}(x, y) = \frac{2x}{9}, is a function of xx and yy. It tells us the rate of change of f(x,y)f(x,y) along the xx axis, at any point (x,y)(x, y). It just so happens that this function doesn’t involve yy since we chose a relatively simple function ff, but we’ll see more sophisticated examples soon.

Following similar steps, you’ll see that fy(x,y)=2y9\frac{\partial f}{\partial y}(x, y) = \frac{2y}{9}. This gives us:

fx(x,y)=2x9,fy(x,y)=2y9\frac{\partial f}{\partial x}(x, y) = \frac{2x}{9}, \quad \frac{\partial f}{\partial y}(x, y) = \frac{2y}{9}

Let’s pick an arbitrary point and see what the partial derivatives tell us about it. Consider, say, (3,0.5)(-3, 0.5):

  • fx(3,0.5)=2(3)9=23\frac{\partial f}{\partial x}(-3, 0.5) = \frac{2(-3)}{9} = -\frac{2}{3}, so if we holdy\: \color{orange} y \:constant, f\color{orange} f decreases asx\: \color{orange} x \:increases.

  • fy(3,0.5)=2(0.5)9=19\frac{\partial f}{\partial y}(-3, 0.5) = \frac{2(0.5)}{9} = \frac{1}{9}, so if we holdx\: \color{#3d81f6} x \:constant, f\color{#3d81f6} f increases asy\: \color{#3d81f6} y \:increases.

Loading...

Above, we’ve shown the tangent lines in both the xx and yy directions at the point (3,0.5)(-3, 0.5). After all, the derivative of a function at a point tells us the slope of the tangent line at that point; that interpretation remains true with partial derivatives.

Let’s look at a more complex example. Consider:

g(x,y)=x33xy2+2sin(x)cos(y)g(x, y) = x^3 - 3xy^2 + 2 \sin(x) \cos(y)
Loading...

Both partial derivatives are functions of both xx and yy, which is typically what we’ll see.

g(x,y)=x33xy2+2sin(x)cos(y)gx(x,y)=3x23y2+2cos(x)cos(y)gy(x,y)=6xy2sin(x)sin(y)\begin{align*} g(x, y) &= x^3 - 3xy^2 + 2 \sin(x) \cos(y) \\ \frac{\partial g}{\partial x}(x, y) &= 3x^2 - 3y^2 + 2 \cos(x) \cos(y) \\ \frac{\partial g}{\partial y}(x, y) &= -6xy - 2 \sin(x) \sin(y) \end{align*}

To compute gx(x,y)\frac{\partial g}{\partial x}(x, y), we treated yy as a constant. Let me try and make more sense of this.

To help visualize, we’ve drawn the function g(x,y)\color{#d81b60} g(x, y), along with the plane y=a\color{#3d81f6} y = a. The slider lets you change the value of a\color{#3d81f6} a being considered, i.e., it lets you change the constant value that we’re assigning to yy.

The intersection of g(x,y)\color{#3d81f6} g(x, y) and y=a\color{#3d81f6} y = a is marked as a gold curve and is a function of xx alone.

Loading...

Drag the slider to y=1.40\color{#3d81f6} y = 1.40, for example, and look at the gold curve that results. The expression below tells you the derivative of that gold curve with respect to xx.

gx(x,1.40)=3x23(1.40)2+2cos(x)cos(1.40)=3x20.34cos(x)5.88derivative of gold curve w.r.t. x\frac{\partial g}{\partial x}(x, {\color{#3d81f6}1.40}) = 3x^2 - 3({\color{#3d81f6}1.40})^2 + 2 \cos(x) \cos({\color{#3d81f6}1.40}) = \underbrace{3x^2 - 0.34 \cos(x) - 5.88}_\text{derivative of {\color{gold}\textbf{gold curve}} w.r.t. $x$}

Thinking in three dimensions can be difficult, so don’t fret if you’re confused as to what all of these symbols mean – this is all a bit confusing to me too. (Are professors allowed to say this?) Nonetheless, I hope these interactive visualizations are helping you make some sense of the formulas, and if there’s anything I can do to make them clearer, please do tell me!

Optimization

To minimia (or maximize) a function f:RRf: \mathbb{R} \to \mathbb{R}, we solved for critical points, which were points where the (single variable) derivative was 0, and used the second derivative test to classify them as minima or maxima (or neither, like in the case of f(x)=x3f(x) = x^3 at x=0x = 0).

The analog in the R2R\mathbb{R}^2 \rightarrow \mathbb{R} case is solving for the points where both partial derivatives are 0, which corresponds to the points where the function is neither increasing nor decreasing along either axis.

In the case of our first example,

f(x,y)=x2+y29f(x, y) = \frac{x^2 + y^2}{9}

the partial derivatives were relatively simple,

fx=2x9,fy=2y9\frac{\partial f}{\partial x} = \frac{2x}{9}, \quad \frac{\partial f}{\partial y} = \frac{2y}{9}

and both are 0 when x=y=0x = y = 0. So, (0,0,f(0))(0, 0, f(0)) is a critical point, and we can see visually that it’s a global minimum.

(Notice that above I wrote fx\frac{\partial f}{\partial x} and fy\frac{\partial f}{\partial y} instead of fx(x,y)\frac{\partial f}{\partial x}(x, y) and fy(x,y)\frac{\partial f}{\partial y}(x, y) to save space, but don’t forget that both partial derivatives are functions of both xx and yy in general.)

Loading...

There is a second derivative test for functions of multiple variables, but it’s a bit more complicated than the single variable case, and to give you an honest explanation of it, I’ll need to introduce you to quite a bit of linear algebra first. So, we’ll table that thought for now.

The function g(x,y)=x33xy2+2sin(x)cos(y)g(x, y) = x^3 - 3xy^2 + 2 \sin(x) \cos(y) has much more complicated partial derivatives, and so it’s difficult to solve for its critical points by hand. Fear not – in Chapter 8, when we discover the technique of gradient descent, we’ll learn how to minimize such functions just by using their partial derivatives, even when we can’t solve for where they’re 0.

With partial derivatives in hand, we can now minimize mean squared error to solve for the optimal regression parameters.