Chain Rule
Description
The Chain Rule is the fundamental tool for differentiating composite functions—functions inside other functions, like $\sin(x^2)$ or $(2x+1)^5$. It states that the derivative of a composite function is the derivative of the outer function multiplied by the derivative of the inner function.
A common mnemonic is **"derivative of the outside times derivative of the inside."**
**Intuitive Analogy (Gears):** Think of three gears connected in a chain. If Gear A turns Gear B twice as fast ($dy/du=2$), and Gear B turns Gear C three times as fast ($du/dx=3$), then Gear A turns Gear C $2 \times 3 = 6$ times as fast. You simply multiply the rates of change.
In Leibniz notation, if $y = f(u)$ and $u = g(x)$, then: $\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}$
This notation is incredibly intuitive because it looks like the fractions "cancel out" the $du$ terms, although derivatives aren't technically fractions. The Chain Rule allows us to peel back the layers of complex functions like an onion, solving one layer at a time.
History & Origins
The Chain Rule is one of the oldest rules in calculus, closely tied to its invention. Gottfried Wilhelm Leibniz (1676): Leibniz discovered the rule specifically to handle algebraic functions of the form $\sqrt{a + bz + cz^2}$. He made a calculation error in his first attempt but corrected it, realizing the relationship between variables. Guillaume de l'Hôpital (1696): He included the rule in his textbook Analyse des Infiniment Petits, the first textbook on differential calculus. Modern Proofs: While early proofs relied on infinitesimals (which were controversial), rigorous proofs using limits ($\\epsilon-\\delta$) were developed in the 19th century to handle edge cases like oscillating functions.
Proof using Limits
We use the limit definition of the derivative for the composite function $f(g(x))$.
Let $y = f(g(x))$. We want to find $\lim_{h \to 0} \frac{f(g(x+h)) - f(g(x))}{h}$.
Multiply and divide by $[g(x+h) - g(x)]$: $\lim_{h \to 0} \frac{f(g(x+h)) - f(g(x))}{g(x+h) - g(x)} \cdot \frac{g(x+h) - g(x)}{h}$.
Let $k = g(x+h) - g(x)$. As $h \to 0$, $k \to 0$ (assuming $g$ is continuous).
Substitute $k$: $\lim_{k \to 0} \frac{f(g(x)+k) - f(g(x))}{k} \cdot \lim_{h \to 0} \frac{g(x+h) - g(x)}{h}$.
The first term is the derivative of the outer function $f'(g(x))$.
The second term is the derivative of the inner function $g'(x)$.
Result: $f'(g(x)) \cdot g'(x)$.
Variables
| Symbol | Meaning |
|---|---|
f(u) | Outer function |
g(x) | Inner function (u) |
f' | Derivative of outer function |
g' | Derivative of inner function |
Examples
Basic Calculation
Problem: Find the derivative of y = (3x² + 1)⁵
Solution:
Polynomial Power
Problem: Find the derivative of $h(x) = (3x^2 + 1)^5$.
Solution: $30x(3x^2 + 1)^4$
- Identify Inner and Outer: Inner $u = 3x^2 + 1$. Outer $y = u^5$.
- Differentiate Outer: $\frac{dy}{du} = 5u^4 = 5(3x^2 + 1)^4$.
- Differentiate Inner: $\frac{du}{dx} = 6x$.
- Multiply: $5(3x^2 + 1)^4 \cdot 6x$.
- Simplify: $30x(3x^2 + 1)^4$.
Trigonometric Function
Problem: Differentiate $y = \cos(e^x)$.
Solution: $-e^x \sin(e^x)$
- Identify functions: Outer is $\cos(u)$, Inner is $u = e^x$.
- Derivative of Outer: $\frac{d}{du}(\cos(u)) = -\sin(u)$.
- Derivative of Inner: $\frac{d}{dx}(e^x) = e^x$.
- Apply Chain Rule: $-\sin(e^x) \cdot e^x$.
- Result: $-e^x \sin(e^x)$.
Common Mistakes
Forgetting the inner derivative
The most common error is differentiating the outside but forgetting to multiply by $g'(x)$. For $(2x)^3$, getting $3(2x)^2$ is wrong. It should be $3(2x)^2 \cdot 2 = 6(2x)^2$.
Changing the inside function
When you take the derivative of the outside $f'(g(x))$, you must keep the inside $g(x)$ exactly as it is. Don't change it to $g'(x)$ inside the parenthesis.
Real-World Applications
Machine Learning: Backpropagation
This is arguably the most important application in the modern world. Neural networks "learn" by adjusting weights to minimize errors. The **Backpropagation** algorithm calculates the gradient of the loss function with respect to every weight in the network. This is essentially applying the Chain Rule repeatedly backwards from the output layer to the input layer.
Physics: Doppler Effect
When calculating rates of change where dependencies are nested (e.g., how the perceived frequency of sound changes as a car moves, where position depends on time), the Chain Rule allows physicists to link these rates together.
Frequently Asked Questions
Can I use it for 3 functions?
Yes! For $f(g(h(x)))$, you peel it like an onion: $f'(g(h(x))) \cdot g'(h(x)) \cdot h'(x)$. You just keep multiplying by the derivative of the next inner layer.
How do I identify the inner function?
Look for parentheses. Whatever is inside the parentheses (or under a square root, or in the exponent) is usually the inner function $g(x)$.