Skip to article frontmatterSkip to article content

Chapter 0.2. Derivatives

Calculus is the study of rates of change, and without it, modern machine learning would not be possible. In many ways, machine learning is about optimizing quantities – making the best possible predictions, or making prediction errors as small as possible – and calculus is the tool that enables us to perform this optimization.

We’ll address these lofty goals throughout the semester. Here, we will review key ideas from a first course in calculus (e.g. Math 115).

Tangent Lines

Suppose ff is a function that takes in a single real number and outputs a single real number, i.e. f:RRf: \mathbb{R} \to \mathbb{R}.

If ff is a function, then the derivative of ff is another function, sometimes denoted ff', such that f(x)f'(x) is the “instantaneous” rate of change of ff at the input xx.

To understand what I mean by “instantaneous” rate of change, let’s consider an example. Suppose f(x)=14x23f(x) = \frac{1}{4}x^2 - 3. The graph of ff is shown below, in blue, along with a slider for input values of xx. Drag the slider.

Loading...

At any point xx, the tangent line to the graph of ff is the best linear approximation of the graph of ff near xx.

For instance, consider when x=4x=-4. At x=4x = -4, f(4)=14(4)23=1f(-4) = \frac{1}{4}(-4)^2 - 3 = 1.

Loading...

The tangent line at x=4x=-4 is the line that passes through the point (4,1)(-4, 1) that best approximates ff near x=4x=-4, among all other lines that pass through (4,1)(-4, 1). In the plot above, use your mouse to zoom in on the point (4,1)(-4, 1), and you’ll see that when zoomed in, the tangent line and original function are very difficult to distinguish.

This intuitive definition of the derivative – as being the slope of the tangent line – is the most important way to think about the derivative. The formal definition is also important, and we’ll get there next, but you should have enough context for this first activity.


Formal Definition

Secant Lines

Let’s review the more formal definition of the derivative. First, remember the general formula for the slope of the line between two points (x1,y1)(x_1, y_1) and (x2,y2)(x_2, y_2):

y2y1x2x1\frac{y_2 - y_1}{x_2 - x_1}

Let’s say we’re trying to find the slope of the tangent line at x=ax=a, which is the line that passes through the point (a,f(a))(a, f(a)) whose slope is the instantaneous rate of change of ff at x=ax=a. To find that instantaneous rate of change, we can find the slope of the line between (a,f(a))(a, f(a)) and some other point (b,f(b))(b, f(b)), where bab-a is as close to 0 as possible.

In the example below, as bb approaches a=1a = 1, the slope of the line between (1,f(1))(1, f(1)) and (b,f(b))(b, f(b)) approaches the slope of the tangent line at x=1x=1. Note that the formal name for the line between any two points on a function is a secant line.

Image produced in Jupyter

Limits

Let’s be more precise, using the idea of a limit from Calculus 1. Recall, limxag(x)=L\displaystyle \lim_{x \rightarrow a} g(x) = L is pronounced “the limit of g(x)g(x) as xx approaches aa is LL”. If limxag(x)=L\displaystyle \lim_{x \rightarrow a} g(x) = L, then as xx gets closer and closer to aa, g(x)g(x) gets closer and closer to LL. (Intuitively, you might think this must always mean that g(a)=Lg(a) = L, but that’s not always the case.)

The slope of the tangent line at x=ax=a is the limit of the slope of the line between (a,f(a))(a, f(a)) and (b,f(b))(b, f(b)) as bb approaches aa:

slope of tangent line at a=limbaf(b)f(a)ba\text{slope of tangent line at } a = \lim_{b \to a}\frac{f(b)-f(a)}{b-a}

This definition alone can help us compute some common derivatives. For instance, if f(x)=x2f(x) = x^2, then the slope of the tangent line at x=ax=a is:

limbaf(b)f(a)ba=limbab2a2ba=limba(ba)(b+a)ba=limba(b+a)=2a\lim_{b \to a}\frac{f(b)-f(a)}{b-a} = \lim_{b \to a}\frac{b^2-a^2}{b-a} = \lim_{b \to a}\frac{(b-a)(b+a)}{b-a} = \lim_{b \to a}(b+a) = 2a

Here, I used the difference of squares formula, b2a2=(ba)(b+a)b^2 - a^2 = (b-a)(b+a). To find the slope of the tangent line using the limit definition, you’ll need to use some sort of algebraic manipulation to simplify the expression, because the limit of the denominator is 0, and we can’t divide by 0.

Instead of thinking of the slope of the tangent line as the limit of the slope of the line between the points (a,f(a))(a, f(a)) and (b,f(b))(b, f(b)) as bab \rightarrow a, a more general (and equivalent!) definition of the derivative is the limit of the slope of the line between the points (x,f(x))(x, f(x)) and (x+h,f(x+h))(x+h, f(x+h)) as h0h \rightarrow 0.

Derivatives

I will not use this formal definition much in this class, but it’s good to understand where it comes from and why it works.

There are two equivalent notations for the derivative: dfdx(x)\frac{\text{d}f}{\text{d}x}(x) and f(x)f'(x). I used the notation f(x)f'(x) in the previous section since it’s easier to write and more commonly used in calculus courses. However, I’ll use the notation dfdx(x)\frac{\text{d}f}{\text{d}x}(x) from now on, as it’ll make the transition to multivariable calculus more natural when we get there.

Often, for brevity, I will drop the (x)(x) and just write dfdx\frac{\text{d}f}{\text{d}x}. As an example, suppose g(x)=sin2(x)+3log(x)g(x) = \sin^2(x) + 3 \log (x), where log()\log( \cdot ) is the natural logarithm (with base ee). Then, the derivative of gg is:

dgdx=2sin(x)cos(x)+3x\frac{\text{d}g}{\text{d}x} = 2\sin(x)\cos(x) + \frac{3}{x}

dgdx\frac{\text{d}g}{\text{d}x} (equivalently, dgdx(x)\frac{\text{d}g}{\text{d}x}(x)) is a function, not a number. To get a number as an output, we need to plug in a value for xx. For example, dgdx(π)\frac{\text{d}g}{\text{d}x}(\pi) is the number (3π\frac{3}{\pi}) corresponding to the slope of the tangent line to gg at x=πx=\pi.

To actually find dgdx\frac{\text{d}g}{\text{d}x}, we used several derivative rules, which we’ll now review.


Rules and Examples

These rules can all be proved using the formal definition of the derivative, but those proofs won’t be emphasized in this class.

Attempt each of the following examples before peeking at their solutions.

Example: Polynomials

Differentiate f(x)=4x5+3x2+2x+1f(x) = 4x^5 + 3x^2 + 2x + 1.

Example: Reciprocals

Differentiate f(x)=1x21f(x) = \frac{1}{x^2 - 1}.

Example: Quotients

Differentiate f(x)=x2+1x21f(x) = \frac{x^2 + 1}{x^2 - 1}.

Example: Trigonometric and Logarithmic Functions

Differentiate f(x)=log(cos(ex))f(x) = \log(\cos(e^x)).

To do so, you’ll need to remember a few other common derivatives that the five key rules don’t cover.

Trigonometric Derivatives:

ddxsin(x)=cos(x)\frac{\text{d}}{\text{d}x} \sin(x) = \cos(x)
ddxcos(x)=sin(x)\frac{\text{d}}{\text{d}x} \cos(x) = -\sin(x)

Logarithmic Derivatives: If log()\log( \cdot ) is the natural logarithm (with base ee), then:

ddxlog(x)=1x\frac{\text{d}}{\text{d}x} \log(x) = \frac{1}{x}
ddxlogb(x)=1xlog(b)\frac{\text{d}}{\text{d}x} \log_b(x) = \frac{1}{x \log(b)}

Exponential Derivatives:

ddxex=ex\frac{\text{d}}{\text{d}x} e^x = e^x
ddxax=axlog(a)\frac{\text{d}}{\text{d}x} a^x = a^x \log(a)

The chain rule is extremely pervasive in machine learning, which is why I’ve included examples like the one above. This site contains dozens more examples of the chain rule in practice.


Optimization

Maxima and Minima

As stated at the start of this section, calculus is a tool for optimization – that is, finding the inputs that maximize or minimize a function. Let’s be more precise about what we mean by “maximize” and “minimize”.

Consider f(x)=14x4+13x3x2+2f(x) = \frac{1}{4}x^4 + \frac{1}{3} x^3 - x^2 + 2, shown below.

Image produced in Jupyter

Where is f(x)f(x) maximized and minimized?

  • At x=2x=-2, f(x)f(x) is less than it is at all other inputs. This is means (2,f(2))(-2, f(-2)) is a global minimum.
  • At x=0x=0, f(x)f(x) looks like it is greater than all other inputs, but only if you restrict your attention to points near x=0x=0. This means (0,f(0))(0, f(0)) is a local maximum. (0,f(0))(0, f(0)) is not a global maximum because there are plenty of points where f(x)>f(0)f(x) > f(0) – they are just not immediately adjacent to x=0x=0.
  • Similarly, (1,f(1))(1, f(1)) is a local minimum.

f(x)f(x) does not have a global maximum, since f(x)f(x) approaches infinity as xx increases beyond x=1x=1 or decreases beyond x=2x=-2. If we were to restrict the domain (or set of possible inputs) of f(x)f(x) to, say, [3,3][-3, 3], then there would be a global maximum at x=3x=3.

Note that we usually care more about the inputs that maximize or minimize a function, rather than the actual values of the function at those inputs. In the above example, the fact that x=2x=-2 is a global minimum is important; the fact that f(2)=23f(-2) = -\frac{2}{3} is not as important.

To recap:

Note that maxima is the plural of maximum, minima is the plural of minimum, and extrema refers to both maxima and minima.

Below, you’ll find several other examples of functions with varying amounts and types of extrema. Play close attention to the relationship between the two functions in the second row, h(x)\color{#d81b60}h(x) and k(x)\color{#004d40}k(x). k(x)\color{#004d40}k(x) results from stretching h(x)\color{#d81b60}h(x) vertically and shifting it up vertically, and has the same extrema.

Image produced in Jupyter

Critical Points

How do we actually find the critical points of a function, especially when we can’t graph the function? The derivative plays a crucial role.

Let’s revisit the function f(x)=14x4+13x3x2+2f(x) = \frac{1}{4}x^4 + \frac{1}{3} x^3 - x^2 + 2, shown in blue along with its derivative, dfdx=x3+x22x\frac{\text{d}f}{\text{d}x} = x^3 + x^2 - 2x, shown in orange.

Image produced in Jupyter

You should notice that the derivative is 0 at all three extrema we identified earlier – the global minimum at x=2x=-2, the local maximum at x=0x=0, and the local minimum at x=1x=1. Intuitively, the derivative is 0 at a maximum or minimum because the tangent lines at these points are horizontal (with slope 0), as the function is neither increasing nor decreasing at these points.

In the region between x=2x=-2 and x=0x=0, the derivative is positive, meaning the function is increasing.

Solving for the inputs that make the derivative 0 – i.e., finding the critical points – is a necessary, but not sufficient, step. If all we know is that the derivative is 0 at a point, we don’t know whether the point is a maximum or minimum. It may not be either, such as in the case of f(x)=x3f(x) = x^3, which has a critical point at x=0x = 0 that is neither a maximum nor a minimum.

Second Derivatives

To be able to determine whether a critical point of f(x)f(x) is a maximum or minimum, we need to look at the second derivative of f(x)f(x). If the (first) derivative of f(x)f(x) is a function that describes the rate at which f(x)f(x) is changing, the second derivative – denoted d2fdx2\frac{\text{d}^2f}{\text{d}x^2} – is a function that describes the rate at which the derivative is changing.

Physics provides us with an analogy that helps us understand the role of the second derivative. Suppose you’re driving down a straight road, and s(t)s(t) is your position on the road at time tt, relative to your starting point (so a negative value of s(t)s(t) means you’ve moved backwards).

Then, v(t)=dsdtv(t) = \frac{\text{d}s}{\text{d}t} is your velocity (the rate at which your position is changing) and a(t)=d2sdt2a(t) = \frac{\text{d}^2s}{\text{d}t^2} is your acceleration (the rate at which your velocity is changing).

  • If dsdt>0\frac{\text{d}s}{\text{d}t} > 0 and d2sdt2=0\frac{\text{d}^2s}{\text{d}t^2} = 0, you are moving forward at a constant speed (say, on cruise control).
  • If dsdt>0\frac{\text{d}s}{\text{d}t} > 0 and d2sdt2>0\frac{\text{d}^2s}{\text{d}t^2} > 0, you are moving forward and your speed is increasing (you are accelerating).
  • If dsdt>0\frac{\text{d}s}{\text{d}t} > 0 and d2sdt2<0\frac{\text{d}^2s}{\text{d}t^2} < 0, you are moving forward, but your speed is decreasing, and eventually, your car will come to a halt.
  • Cases where dsdt<0\frac{\text{d}s}{\text{d}t} < 0 correspond to driving backwards!

Let’s put this in the context of our running example, f(x)=14x4+13x3x2+2f(x) = \frac{1}{4}x^4 + \frac{1}{3} x^3 - x^2 + 2. The second derivative of f(x)f(x) is:

d2fdx2=ddx(ddx(14x4+13x3x2+2))=ddx(x3+x22xfirst derivative of f(x))=3x2+2x2\begin{align*} \frac{\text{d}^2f}{\text{d}x^2} &= \frac{\text{d}}{\text{d}x} \left( \frac{\text{d}}{\text{d}x} \left( \frac{1}{4}x^4 + \frac{1}{3} x^3 - x^2 + 2 \right) \right) \\ &= \frac{\text{d}}{\text{d}x} \left( \underbrace{x^3 + x^2 - 2x}_{\text{first derivative of } f(x)} \right) \\ &= 3x^2 + 2x - 2 \end{align*}

The second derivative, d2fdx2\frac{\text{d}^2f}{\text{d}x^2}, is a function, not a number. What does the second derivative look like, relative to the original function and first derivative?

Image produced in Jupyter

f(x)f(x) is a polynomial of degree 4, dfdx\frac{\text{d}f}{\text{d}x} is a polynomial of degree 3, and d2fdx2\frac{\text{d}^2f}{\text{d}x^2} is a polynomial of degree 2 – the degree drops by one each time, as a consequence of the power rule.

Recall that f(x)f(x) has critical points at x=2x=-2, x=0x=0, and x=1x=1, which we’ve highlighted in all three plots above. Our goal is to determine an algebraic approach for determining whether these points are maxima, minima, or neither; we shouldn’t rely on the graph of f(x)f(x) alone, since we won’t always be able to see its graph.

At all three of these points, the first derivative is 0, meaning that the function is neither increasing nor decreasing at these points. But the second derivative d2fdx2=3x2+2x2\frac{\text{d}^2f}{\text{d}x^2} = 3x^2 + 2x - 2 gives us additional information:

  • At x=2x = -2, d2fdx2(2)=6\frac{\text{d}^2f}{\text{d}x^2}(-2) = 6, which is positive. So, at x=2x = -2, f(x)f(x) is neither increasing nor decreasing, but is also “speeding up”, since the second derivative is positive. So, as we move to the right of x=2x=-2, the slope of the tangent line will increase, causing the function to increase. If the function increases to the right of x=2x=-2, then x=2x=-2 must correspond to a local minimum of f(x)f(x).

  • At x=0x = 0, d2fdx2(0)=2\frac{\text{d}^2f}{\text{d}x^2}(0) = -2, which is negative. So, at x=0x = 0, f(x)f(x) is neither increasing nor decreasing, but is also “slowing down”. So, as we move to the right of x=0x=0, the slope of the tangent line will decrease, causing the function to decrease. If the function decreases to the right of x=0x=0, then x=0x=0 must correspond to a local maximum of f(x)f(x).

  • At x=1x = 1, d2fdx2(1)=3\frac{\text{d}^2f}{\text{d}x^2}(1) = 3 is positive, which, using the logic from the x=2x = -2 case, means that x=1x = 1 also corresponds to a local minimum of f(x)f(x).

Convexity

The sign of the second derivative is useful for more than just determining whether a critical point is a local maximum or minimum. Below, we’ve plotted f(x)f(x), along with annotations for the regions where the second derivative is positive and negative.

Image produced in Jupyter

When the second derivative is positive, the function is concave opening up, also known as convex. You should think of convex functions as “bowl-shaped” or “smiling”. When the second derivative is negative, the function is concave opening down, or simply concave; the equivalent analogy is that concave down regions are “upside-down bowls” or “sad faces”.

From the perspective of finding local mimina, if a function is concave up at a critical point, then we must be at the bottom of a bowl – a local minimum – and if a function is concave down at a critical point, we must be at the top of a hill, corresponding to a local maximum.

If a function is concave up across its entire domain – unlike in the example above, but like in f(x)=x2f(x) = x^2 – then any local minimum must be a global minimum. Convexity is a hugely important concept in optimization and machine learning, and we’ll see it again in more detail throughout the course.

The points at which the second derivative is 0 are called inflection points. f(x)f(x) has two inflection points, marked by vertical dotted lines above, roughly at x=1.22x = -1.22 and x=0.55x = 0.55. These are the roots of the quadratic equation d2fdx2=3x2+2x2=0\frac{\text{d}^2f}{\text{d}x^2} = 3x^2 + 2x - 2 = 0.

We’ve implictly used a second derivative test for determining whether a critical point is a local maximum or minimum:

Again, the second derivative test only tries to tell us whether critical points are local maxima or minima; it does not tell us whether they are global maxima or minima.

Let’s look at another example, particularly one where the second derivative test is inconclusive. Consider f(x)=x2sin(x)f(x) = x^2 \sin(x), shown below.

Image produced in Jupyter

f(x)=x2sin(x)f(x) = x^2 \sin(x), like sin(x)\sin(x), is oscillatory, and has no global extrema; see here for a larger graph of it. Above, we’ve plotted f(x)f(x) within the domain [2π,2π][-2\pi, 2\pi], and we see several local maxima and minima.

The first and second derivatives of f(x)=x2sin(x)f(x) = x^2 \sin(x) are given by:

dfdx=x2(ddxsin(x))+(ddxx2)sin(x)=x2cos(x)+2xsin(x)d2fdx2=ddx(x2cos(x)+2xsin(x))=x2(ddxcos(x))+(ddxx2)cos(x)+2x(ddxsin(x))+(ddx2x)sin(x)=x2sin(x)+2xcos(x)+2xcos(x)+2sin(x)=2xcos(x)(x22)sin(x)\begin{aligned} \frac{\text{d}f}{\text{d}x} &= x^2 \left( \frac{\text{d}}{\text{d}x} \sin(x) \right) + \left( \frac{\text{d}}{\text{d}x} x^2 \right) \sin(x) \\ &= x^2 \cos(x) + 2x \sin(x) \\ \\ \frac{\text{d}^2f}{\text{d}x^2} &= \frac{\text{d}}{\text{d}x} \left( x^2 \cos(x) + 2x \sin(x) \right) \\ &= x^2 \left( \frac{\text{d}}{\text{d}x} \cos(x) \right) + \left( \frac{\text{d}}{\text{d}x} x^2 \right) \cos(x) + 2x \left( \frac{\text{d}}{\text{d}x} \sin(x) \right) + \left( \frac{\text{d}}{\text{d}x} 2x \right) \sin(x) \\ &= - x^2 \sin(x) + 2x \cos(x) + 2x \cos(x) + 2 \sin(x) \\ &= 2x \cos(x) - (x^2 - 2) \sin(x) \\ \end{aligned}

Solving for the critical points of f(x)f(x) by setting dfdx=x2cos(x)+2xsin(x)=0\frac{\text{d}f}{\text{d}x} = x^2 \cos(x) + 2x \sin(x) = 0 is no easy task, as there are infinitely many solutions, most of which cannot be solved for by hand. We’ll learn how to write code to approximate solutions to dfdx=0\frac{\text{d}f}{\text{d}x} = 0 in Chapter 4 of the course, when we study gradient descent. There are also infinitely many inflection points, since d2fdx2=0\frac{\text{d}^2f}{\text{d}x^2} = 0 has infinitely many solutions, meaning that there are many regions where f(x)f(x) is concave up and many others where it is concave down.

However, one critical point is easy to spot: x=0x = 0. At x=0x = 0, the derivative is 0:

dfdx(0)=02cos(0)+20sin(0)=0\frac{\text{d}f}{\text{d}x}(0) = 0^2 \cos(0) + 2 \cdot 0 \cdot \sin(0) = 0

x=0x = 0 is also an inflection point, since the second derivative is also 0:

d2fdx2(0)=20cos(0)(022)sin(0)=0\frac{\text{d}^2f}{\text{d}x^2}(0) = 2 \cdot 0 \cdot \cos(0) - (0^2 - 2) \sin(0) = 0

To be clear, not every inflection point is a critical point, and not every critical point is an inflection point; x=0x = 0 just happens to be both.

If we look at the graph of f(x)f(x) near x=0x = 0, we’ll see that x=0x = 0 corresponds to neither a local maximum nor a local minimum, but rather, a region where f(x)f(x) is very flat. If we weren’t able to graph f(x)f(x), we could try and determine its behavior around (0,f(0))(0, f(0)) by looking at points immediately to the left and right of x=0x = 0 – say, (0.001,f(0.001))(0.001, f(0.001)) and (0.001,f(0.001))(-0.001, f(-0.001)). If f(0.001)>f(0)f(0.001) > f(0) and f(0.001)>f(0)f(-0.001) > f(0), then x=0x = 0 would be a local minimum (but that’s not the case here).


Continuity and Differentiability

Finally, I’ll remark that we’ve presented derivatives, extrema, and optimization all in the most ideal setting: where the functions we’re working with are continuous and differentiable. A function is continuous if its graph can be drawn without lifting a pen; any point where the graph has a “jump” or “break” is a discontinuity. (Of course, there is a more formal definition of continuity, but this is a good enough illustration for now.) A function is differentiable if its derivative exists everywhere; otherwise, there exist some points at which the derivative does not exist.

Most relevant functions in machine learning are continuous, but non-differentiable functions do appear, so it’s worth understanding what they are and how to deal with them. Let’s look at a few examples.

Example 1: f(x)=x\color{#3d81f6} f(x) = |x|

Image produced in Jupyter

f(x)=x\color{#3d81f6} {f(x) = |x|} is continuous everywhere, as intuitively, we can draw its graph without lifting our pen. It is differentiable everywhere except at x=0x = 0; the reason it is not differentiable at x=0x = 0 is that the slopes approaching it from the left (-1) and right (1) are different, and in order for a derivative at x=ax = a to exist, the limit of the slopes approaching aa from the left and right must be the same.

dfdx={1x<01x>0undefinedx=0\frac{\text{d}{\color{#3d81f6} f}}{\text{d}x} = \begin{cases} -1 & x < 0 \\ 1 & x > 0 \\ \text{undefined} & x = 0 \end{cases}

Example 2: g(x)={x33x+4x21x=2\color{orange} g(x) = \begin{cases} x^3-3x + 4 & x \neq 2 \\ 1 & x = 2 \end{cases}

Image produced in Jupyter

g(x)={x33x+4x21x=2\color{orange} g(x) = \begin{cases} x^3-3x + 4 & x \neq 2 \\ 1 & x = 2 \end{cases} is continuous and differentiable everywhere, except at x=2x = 2, where it is neither.

dgdx={3x23x2undefinedx=2\frac{\text{d}{\color{orange} g}}{\text{d}x} = \begin{cases} 3x^2 - 3 & x \neq 2 \\ \text{undefined} & x = 2 \end{cases}

Example 3: h(x)={x2+12x+1x<0x+1x0\color{#d81b60} h(x)=\begin{cases} x^2 + \frac{1}{2}x + 1 & x < 0 \\ \sqrt{x + 1} & x \geq 0 \end{cases}

Image produced in Jupyter

h(x)={x2+12x+1x<0x+1x0\color{#d81b60} h(x)=\begin{cases} x^2 + \frac{1}{2}x + 1 & x < 0 \\ \sqrt{x + 1} & x \geq 0 \end{cases} is continuous and differentiable everywhere, despite being a piecewise function. Its individual pieces are continuous, and the entire function is continuous because the “left” and “right” functions at the connection point of x=0x = 0 have the same value.

dhdx={2x+12x<012x+1x0\frac{\text{d}{\color{#d81b60} h}}{\text{d}x} = \begin{cases} 2x + \frac{1}{2} & x < 0 \\ \frac{1}{2\sqrt{x + 1}} & x \geq 0 \end{cases}

Since the two piecewise derivatives agree at x=0x = 0, h(x)\color{#d81b60}h(x) is differentiable at x=0x = 0 (and across its entire domain).


Example 4: k(x)={(x+2)2+5x<212x+6x2,x4undefinedx=4\color{#004d40} k(x)=\begin{cases} -{(x+2)}^2+5 & x < -2 \\ \frac{1}{2}x+6 & x \geq -2,\, x \neq 4 \\ \text{undefined} & x=4 \end{cases}

Image produced in Jupyter

k(x)={(x+2)2+5x<212x+6x2,x4undefinedx=4\color{#004d40} k(x)=\begin{cases} -{(x+2)}^2+5 & x < -2 \\ \frac{1}{2}x+6 & x \geq -2,\, x \neq 4 \\ \text{undefined} & x=4 \end{cases} is continuous everywhere, except at x=4x = 4, where it has a “jump” and is neither continuous nor differentiable. But in addition, k(x)\color{#004d40}k(x) is not differentiable at x=2x = -2 because the slopes approaching it from the left and right are different.

dkdx={2(x+2)x<2undefinedx=212x>2,x4undefinedx=4\frac{\text{d}{\color{#004d40} k}}{\text{d}x} = \begin{cases} -2(x+2) & x < -2 \\ \text{undefined} & x = -2 \\ \frac{1}{2} & x > -2,\, x \neq 4 \\ \text{undefined} & x=4 \end{cases}

An important point is that any function that is differentiable everywhere is also continuous everywhere; differentiability is a stronger condition than continuity. Plenty of functions are continuous but not differentiable, like f(x)=x\color{#3d81f6} f(x) = |x| in Example 1.