Partial Derivatives¶
How do we take the derivative of a function with multiple input variables?
To illustrate, let’s focus on a simpler function with two input variables:
This is a quadratic function of two variables, and its graph is known as a paraboloid.
Before optimizing the regression parameters, we take a brief detour into functions of multiple variables and partial derivatives.
In the single-input case – i.e., for functions of the form – the derivative captured 's rate of change along the -axis, which was the only axis of motion.
The function has two input variables, and so there are two directions along which we can move. As such, we need two “derivatives” to describe the rate of change of – one for the -axis and one for the -axis. Think of this as a science experiment, where we need control variables to isolate changes to a single variable. Our solution to this dilemma comes in the form of partial derivatives.
If has input variables, it has partial derivatives, one for each axis. The function has two partial derivatives, and . (The symbol you’re seeing, , is the lowercase Greek letter delta, and is used specifically for partial derivatives.)
Let me show you how to compute partial derivatives before we visualize them. We’ll start with .
The result, , is a function of and . It tells us the rate of change of along the axis, at any point . It just so happens that this function doesn’t involve since we chose a relatively simple function , but we’ll see more sophisticated examples soon.
Following similar steps, you’ll see that . This gives us:
Let’s pick an arbitrary point and see what the partial derivatives tell us about it. Consider, say, :
, so if we holdconstant, decreases asincreases.
, so if we holdconstant, increases asincreases.
Above, we’ve shown the tangent lines in both the and directions at the point . After all, the derivative of a function at a point tells us the slope of the tangent line at that point; that interpretation remains true with partial derivatives.
Let’s look at a more complex example. Consider:
Both partial derivatives are functions of both and , which is typically what we’ll see.
To compute , we treated as a constant. Let me try and make more sense of this.
To help visualize, we’ve drawn the function , along with the plane . The slider lets you change the value of being considered, i.e., it lets you change the constant value that we’re assigning to .
The intersection of and is marked as a gold curve and is a function of alone.
Drag the slider to , for example, and look at the gold curve that results. The expression below tells you the derivative of that gold curve with respect to .
Thinking in three dimensions can be difficult, so don’t fret if you’re confused as to what all of these symbols mean – this is all a bit confusing to me too. (Are professors allowed to say this?) Nonetheless, I hope these interactive visualizations are helping you make some sense of the formulas, and if there’s anything I can do to make them clearer, please do tell me!
Activity 1
Find all three partial derivatives of the function:
Optimization¶
To minimia (or maximize) a function , we solved for critical points, which were points where the (single variable) derivative was 0, and used the second derivative test to classify them as minima or maxima (or neither, like in the case of at ).
The analog in the case is solving for the points where both partial derivatives are 0, which corresponds to the points where the function is neither increasing nor decreasing along either axis.
In the case of our first example,
the partial derivatives were relatively simple,
and both are 0 when . So, is a critical point, and we can see visually that it’s a global minimum.
(Notice that above I wrote and instead of and to save space, but don’t forget that both partial derivatives are functions of both and in general.)
There is a second derivative test for functions of multiple variables, but it’s a bit more complicated than the single variable case, and to give you an honest explanation of it, I’ll need to introduce you to quite a bit of linear algebra first. So, we’ll table that thought for now.
The function has much more complicated partial derivatives, and so it’s difficult to solve for its critical points by hand. Fear not – in Chapter 8, when we discover the technique of gradient descent, we’ll learn how to minimize such functions just by using their partial derivatives, even when we can’t solve for where they’re 0.
Activity 2
Find the values of and that minimize the function:
Here, we’ve used and to denote the two input variables, rather than and .
With partial derivatives in hand, we can now minimize mean squared error to solve for the optimal regression parameters.