We have built up most of the tools that we need to express derivatives of complicated functions in terms of derivatives of simpler known functions. We started by learning how to evaluate
derivatives of sums, products and quotients
derivatives of constants and monomials
These tools allow us to compute derivatives of polynomials and rational functions. In the previous sections, we added exponential and trigonometric functions to our list. The final tool we add is called the chain rule. It tells us how to take the derivative of a composition of two functions. That is if we know \(f(x)\) and \(g(x)\) and their derivatives, then the chain rule tells us the derivative of \(f\big(g(x)\big)\text{.}\)
Before we get to the statement of the rule, let us look at an example showing how such a composition might arise (in the “real-world”).
Example2.9.1.Walking towards a campfire.
You are out in the woods after a long day of mathematics and are walking towards your camp fire on a beautiful still night. The heat from the fire means that the air temperature depends on your position. Let your position at time \(t\) be \(x(t)\text{.}\) The temperature of the air at position \(x\) is \(f(x)\text{.}\) What instantaneous rate of change of temperature do you feel at time \(t\text{?}\)
Because your position at time \(t\) is \(x=x(t)\text{,}\) the temperature you feel at time \(t\) is \(F(t)=f\big(x(t)\big)\text{.}\)
The instantaneous rate of change of temperature that you feel is \(F'(t)\text{.}\) We have a complicated function, \(F(t)\text{,}\) constructed by composing two simpler functions, \(x(t)\) and \(f(x)\text{.}\)
We wish to compute the derivative, \(F'(t) = \diff{}{t} f( x(t) )\text{,}\) of the complicated function \(F(t)\) in terms of the derivatives, \(x'(t)\) and \(f'(x)\text{,}\) of the two simple functions. This is exactly what the chain rule does.
Subsection2.9.1Statement of the Chain Rule
Theorem2.9.2.The chain rule — version 1.
Let \(a \in \mathbb{R}\) and let \(g(x)\) be a function that is differentiable at \(x=a\text{.}\) Now let \(f(u)\) be a function that is differentiable at \(u=g(a)\text{.}\) Then the function \(F(x) = f(g(x))\) is differentiable at \(x=a\) and
Here, as was the case earlier in this chapter, we have been very careful to give the point at which the derivative is evaluated a special name (i.e. \(a\)). But of course this evaluation point can really be any point (where the derivative is defined). So it is very common to just call the evaluation point “\(x\)” rather than give it a special name like “\(a\)”, like this:
Theorem2.9.3.The chain rule — version 2.
Let \(f\) and \(g\) be differentiable functions then
Notice that when we form the composition \(f\big(g(x)\big)\) there is an “outside” function (namely \(f(x)\)) and an “inside” function (namely \(g(x)\)). The chain rule tells us that when we differentiate a composition that we have to differentiate the outside and then multiply by the derivative of the inside.
Of course, \(\mathrm{d}u\) is not, by itself, a number or variable 1
In this context \(\mathrm{d}u\) is called a differential. There are ways to understand and manipulate these in calculus but they are beyond the scope of this course.
that can be cancelled. But this is still a good memory aid.
The hardest part about applying the chain rule is recognising when the function you are trying to differentiate is really the composition of two simpler functions. This takes a little practice. We can warm up with a couple of simple examples.
Example2.9.5.Derivative of a power of \(\sin x\).
Let \(f(u) = u^5\) and \(g(x) = \sin(x)\text{.}\) Then set \(F(x) = f\big(g(x)\big) = \big(\sin(x)\big)^5\text{.}\) To find the derivative of \(F(x)\) we can simply apply the chain rule — the pieces of the composition have been laid out for us. Here they are.
This example shows one of the ways that the chain rule appears very frequently — when we need to differentiate the power of some simpler function. More generally we have the following.
Example2.9.6.Derivative of a power of a function.
Let \(f(u) = u^n\) and let \(g(x)\) be any differentiable function. Set \(F(x) = f\big(g(x)\big) = g(x)^n\text{.}\) Then
Again we should approach this by first writing down \(f\) and \(g\) and their derivatives and then putting everything together as the chain rule tells us.
This example shows a second way that the chain rule appears very frequently — when we need to differentiate some function of \(ax+b\text{.}\) More generally we have the following.
Example2.9.8.Derivative of \(f(ax+b)\).
Let \(a,b \in \mathbb{R}\) and let \(f(x)\) be a differentiable function. Set \(g(x) = ax+b\text{.}\) Then
Let us now go back to our motivating campfire example. There we had
\begin{align*}
f(x) &= \text{ temperature at position $x$}\\
x(t) &= \text{ position at time $t$}\\
F(t) &= f(x(t)) = \text{ temperature at time $t$}
\end{align*}
Notice that the units of measurement on both sides of the equation agree — as indeed they must. To see this, let us assume that \(t\) is measured in seconds, that \(x(t)\) is measured in metres and that \(f(x)\) is measured in degrees. Because of this \(F(x(t))\) must also be measured in degrees (since it is a temperature).
What about the derivatives? These are rates of change. So
\(F'(t)\) has units \(\frac{\rm degrees}{\rm second}\text{,}\)
\(f'(x)\) has units \(\frac{\rm degrees}{\rm metre}\text{,}\) and
\(x'(t)\) has units \(\frac{\rm metre}{\rm second}\text{.}\)
has the same units as \(F'(t)\text{.}\) So the units on both sides of the equation agree. Checking that the units on both sides of an equation agree is a good check of consistency, but of course it does not prove that both sides are in fact the same.
Subsection2.9.2(Optional) — Derivation of the Chain Rule
First, let’s review what our goal is. We have been given a function \(g(x)\text{,}\) that is differentiable at some point \(x=a\text{,}\) and another function \(f(u)\text{,}\) that is differentiable at the point \(u=b = g(a)\text{.}\) We have defined the composite function \(F(x) = f\big(g(x)\big)\) and we wish to show that
We are going to use similar manipulation tricks as we did back in the proofs of the arithmetic of derivatives in Section 2.5. Unfortunately, we have already used up the symbols “\(F\)” and “\(H\)”, so we are going to make use the Greek letters \(\gamma, \varphi\text{.}\)
As was the case in our derivation of the product rule it is convenient to introduce a couple of new functions. Set
This is exactly the RHS of the chain rule. It is possible to have \(H=0\) in the second line above. But that possibility is easy to deal with:
If \(g'(a)\ne 0\text{,}\) then, since \(\lim_{h \to 0} \gamma(h) = g'(a)\text{,}\)\(H= h \gamma(h)\) cannot be \(0\) for small nonzero \(h\text{.}\) Technically, there is an \(h_0\gt 0\) such that \(H= h \gamma(h)\ne 0\) for all \(0 \lt |h| \lt h_0\text{.}\) In taking the limit \(h\to 0\text{,}\) above, we need only consider \(0 \lt |h| \lt h_0\) and so, in this case, the above computation is completely correct.
If \(g'(a)=0\text{,}\) the above computation is still fine provided we exclude all \(h\)’s for which \(H= h \gamma(h)\ne 0\text{.}\) When \(g'(a)=0\text{,}\) the right hand side, \(f'\big(g(a)\big) \cdot g'(a)\text{,}\) of the chain rule is \(0\text{.}\) So the above computation gives
We’ll now use the chain rule to compute some more derivatives.
Example2.9.11.\(\diff{}{x}\big(1+3x\big)^{75}\).
Find \(\diff{}{x}\big(1+3x\big)^{75}\text{.}\)
This is a concrete version of Example 2.9.8. We are to find the derivative of a function that is built up by first computing \(1+3x\) and then taking the \(75^{\rm th}\) power of the result. So we set
In this example we are to compute the derivative of \(\sin\) with a (slightly) complicated argument. So we apply the chain rule with \(f\) being \(\sin\) and \(g(x)\) being the complicated argument. That is, we set
In this example we are to compute the derivative of the cube root of a (moderately) complicated argument, namely \(\sin(x^2)\text{.}\) So we apply the chain rule with \(f\) being “cube root” and \(g(x)\) being the complicated argument. That is, we set
Indeed it is not too hard to generalise further (in the manner of Example 2.6.6 to find the derivative of the composition of 4 or more functions (though things start to become tedious to write down):
This time we are to compute the derivative of \(\cos\) with a really complicated argument.
So, to start, we apply the chain rule with \(g(x)=\frac{x^5\sqrt{3+x^6}}{{(4+x^2)}^3}\) being the really complicated argument and \(f\) being \(\cos\text{.}\) That is, \(f(u)=\cos(u)\text{.}\) Since \(f'(u)=-\sin(u)\text{,}\) the chain rule gives
This reduced our problem to that of computing the derivative of the really complicated argument \(\tfrac{x^5\sqrt{3+x^6}}{{(4+x^2)}^3}\text{.}\) We can think of the argument as being built up out of three pieces, namely \(x^5\text{,}\) multiplied by \(\sqrt{3+x^6}\text{,}\) divided by \({(4+x^2)}^3\text{,}\) or, equivalently, multiplied by \({(4+x^2)}^{-3}\text{.}\) So we may rewrite \(\tfrac{x^5\sqrt{3+x^6}}{{(4+x^2)}^3}\) as \(x^5\,\big(3+x^6\big)^{\frac{1}{2}}\ {(4+x^2)}^{-3}\text{,}\) and then apply the product rule to reduce the problem to that of computing the derivatives of the three pieces.
This has reduced our problem to computing the derivatives of \(x^5\text{,}\) which is easy, and of \({(3+x^6)}^{\frac{1}{2}}\) and \({(4+x^2)}^{-3}\text{,}\) both of which can be done by the chain rule. Doing so,
Now we can clean things up in a sneaky way by observing
differentiating \(x^5\text{,}\) to get \(5x^4\text{,}\) is the same as multiplying \(x^5\) by \(\frac{5}{x}\text{,}\) and
differentiating \({(3+x^6)}^{\frac{1}{2}}\) to get \(\frac{1}{2}(3+x^6)^{-\frac{1}{2}}\cdot 6x^5\) is the same as multiplying \({(3+x^6)}^{\frac{1}{2}}\) by \(\frac{3x^5}{3+x^6}\text{,}\) and
differentiating \({(4+x^2)}^{-3}\) to get \(-3{(4+x^2)}^{-4}\cdot 2x\) is the same as multiplying \({(4+x^2)}^{-3}\) by \(-\frac{6x}{4+x^2}\text{.}\)
Using these sneaky tricks we can write our solution quite neatly:
This method of cleaning up the derivative of a messy product is actually something more systematic in disguise — namely logarithmic differentiation. We will come to this later.
Exercises2.9.4Exercises
Exercises — Stage 1
.
1.
Suppose the amount of kelp in a harbour depends on the number of urchins. Urchins eat kelp: when there are more urchins, there is less kelp, and when there are fewer urchins, there is more kelp. Suppose further that the number of urchins in the harbour depends on the number of otters, who find urchins extremely tasty: the more otters there are, the fewer urchins there are.
Let \(O\text{,}\)\(U\text{,}\) and \(K\) be the populations of otters, urchins, and kelp, respectively.
Is \(\diff{K}{U}\) positive or negative?
Is \(\diff{U}{O}\) positive or negative?
Is \(\diff{K}{O}\) positive or negative?
Remark: An urchin barren is an area where unchecked sea urchin grazing has decimated the kelp population, which in turn causes the other species that shelter in the kelp forests to leave. Introducing otters to urchin barrens is one intervention to increase biodiversity. A short video with a more complex view of otters and urchins in Canadian waters is available on YouTube: youtube.com
2.
Suppose \(A, B, C, D\) and \(E\) are functions describing an interrelated system, with the following signs: \(\diff{A}{B} \gt 0\text{,}\)\(\diff{B}{C} \gt 0\text{,}\)\(\diff{C}{D} \lt 0\text{,}\) and \(\diff{D}{E} \gt 0\text{.}\) Is \(\diff{A}{E}\) positive or negative?
Exercises — Stage 2
.
3.
Evaluate the derivative of \(f(x)=\cos(5x+3)\text{.}\)
4.
Evaluate the derivative of \(f(x)=\left({x^2+2}\right)^5\text{.}\)
5.
Evaluate the derivative of \(T(k)=\left({4k^4+2k^2+1}\right)^{17}\text{.}\)
6.
Evaluate the derivative of \(f(x)=\sqrt{\dfrac{x^2+1}{x^2-1}}\text{.}\)
7.
Evaluate the derivative of \(f(x)=e^{\cos(x^2)}\text{.}\)
A particle moves along the Cartesian plane from time \(t=-\pi/2\) to time \(t=\pi/2\text{.}\) The \(x\)-coordinate of the particle at time \(t\) is given by \(x=\cos t\text{,}\) and the \(y\)-coordinate is given by \(y=\sin t\text{,}\) so the particle traces a curve in the plane. When does the tangent line to that curve have slope \(-1\text{?}\)
32.(✳).
Show that, for all \(x \gt 0\text{,}\)\(e^{x+x^2} \gt 1+x\text{.}\)
33.
We know that \(\sin (2x) = 2\sin x \cos x\text{.}\) What other trig identity can you derive from this, using differentiation?
34.
Evaluate the derivative of \(f(x)=\sqrt[3]{\dfrac{e^{\csc x^2}}{ \sqrt{x^3-9} \tan x }}\text{.}\) You do not have to simplify your answer.
35.
Suppose a particle is moving in the Cartesian plane over time. For any real number \(t \geq 0\text{,}\) the coordinate of the particle at time \(t\) is given by \((\sin t, \cos^2 t)\text{.}\)
Sketch a graph of the curve traced by the particle in the plane by plotting points, and describe how the particle moves along it over time.
What is the slope of the curve traced by the particle at time \(t=\dfrac{10\pi}{3}\text{?}\)