Chain Rule

\frac{d}{dx}[f(g(x))] = f'(g(x)) \cdot g'(x)

Description

The Chain Rule is the fundamental tool for differentiating composite functions—functions inside other functions, like $\sin(x^2)$ or $(2x+1)^5$. It states that the derivative of a composite function is the derivative of the outer function multiplied by the derivative of the inner function.

A common mnemonic is **"derivative of the outside times derivative of the inside."**

**Intuitive Analogy (Gears):** Think of three gears connected in a chain. If Gear A turns Gear B twice as fast ($dy/du=2$), and Gear B turns Gear C three times as fast ($du/dx=3$), then Gear A turns Gear C $2 \times 3 = 6$ times as fast. You simply multiply the rates of change.

In Leibniz notation, if $y = f(u)$ and $u = g(x)$, then: $\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}$

This notation is incredibly intuitive because it looks like the fractions "cancel out" the $du$ terms, although derivatives aren't technically fractions. The Chain Rule allows us to peel back the layers of complex functions like an onion, solving one layer at a time.

History & Origins

The Chain Rule is one of the oldest rules in calculus, closely tied to its invention. Gottfried Wilhelm Leibniz (1676): Leibniz discovered the rule specifically to handle algebraic functions of the form $\sqrt{a + bz + cz^2}$. He made a calculation error in his first attempt but corrected it, realizing the relationship between variables. Guillaume de l'Hôpital (1696): He included the rule in his textbook Analyse des Infiniment Petits, the first textbook on differential calculus. Modern Proofs: While early proofs relied on infinitesimals (which were controversial), rigorous proofs using limits ($\\epsilon-\\delta$) were developed in the 19th century to handle edge cases like oscillating functions.

Proof using Limits

We use the limit definition of the derivative for the composite function $f(g(x))$.

Let $y = f(g(x))$. We want to find $\lim_{h \to 0} \frac{f(g(x+h)) - f(g(x))}{h}$.

Multiply and divide by $[g(x+h) - g(x)]$: $\lim_{h \to 0} \frac{f(g(x+h)) - f(g(x))}{g(x+h) - g(x)} \cdot \frac{g(x+h) - g(x)}{h}$.

Let $k = g(x+h) - g(x)$. As $h \to 0$, $k \to 0$ (assuming $g$ is continuous).

Substitute $k$: $\lim_{k \to 0} \frac{f(g(x)+k) - f(g(x))}{k} \cdot \lim_{h \to 0} \frac{g(x+h) - g(x)}{h}$.

The first term is the derivative of the outer function $f'(g(x))$.

The second term is the derivative of the inner function $g'(x)$.

Result: $f'(g(x)) \cdot g'(x)$.

Variables

Symbol	Meaning
`f(u)`	Outer function
`g(x)`	Inner function (u)
`f'`	Derivative of outer function
`g'`	Derivative of inner function

Examples

Basic Calculation

Problem: Find the derivative of y = (3x² + 1)⁵

Solution:

y' = 30x(3x² + 1)⁴

Polynomial Power

Problem: Find the derivative of $h(x) = (3x^2 + 1)^5$.

Solution: $30x(3x^2 + 1)^4$

Identify Inner and Outer: Inner $u = 3x^2 + 1$. Outer $y = u^5$.
Differentiate Outer: $\frac{dy}{du} = 5u^4 = 5(3x^2 + 1)^4$.
Differentiate Inner: $\frac{du}{dx} = 6x$.
Multiply: $5(3x^2 + 1)^4 \cdot 6x$.
Simplify: $30x(3x^2 + 1)^4$.

Trigonometric Function

Problem: Differentiate $y = \cos(e^x)$.

Solution: $-e^x \sin(e^x)$

Identify functions: Outer is $\cos(u)$, Inner is $u = e^x$.
Derivative of Outer: $\frac{d}{du}(\cos(u)) = -\sin(u)$.
Derivative of Inner: $\frac{d}{dx}(e^x) = e^x$.
Apply Chain Rule: $-\sin(e^x) \cdot e^x$.
Result: $-e^x \sin(e^x)$.

Common Mistakes

❌ Mistake

Forgetting the inner derivative

✅ Correction

The most common error is differentiating the outside but forgetting to multiply by $g'(x)$. For $(2x)^3$, getting $3(2x)^2$ is wrong. It should be $3(2x)^2 \cdot 2 = 6(2x)^2$.

❌ Mistake

Changing the inside function

✅ Correction

When you take the derivative of the outside $f'(g(x))$, you must keep the inside $g(x)$ exactly as it is. Don't change it to $g'(x)$ inside the parenthesis.

Real-World Applications

Machine Learning: Backpropagation

This is arguably the most important application in the modern world. Neural networks "learn" by adjusting weights to minimize errors. The **Backpropagation** algorithm calculates the gradient of the loss function with respect to every weight in the network. This is essentially applying the Chain Rule repeatedly backwards from the output layer to the input layer.

Physics: Doppler Effect

When calculating rates of change where dependencies are nested (e.g., how the perceived frequency of sound changes as a car moves, where position depends on time), the Chain Rule allows physicists to link these rates together.

Frequently Asked Questions

Can I use it for 3 functions?

Yes! For $f(g(h(x)))$, you peel it like an onion: $f'(g(h(x))) \cdot g'(h(x)) \cdot h'(x)$. You just keep multiplying by the derivative of the next inner layer.

How do I identify the inner function?

Look for parentheses. Whatever is inside the parentheses (or under a square root, or in the exponent) is usually the inner function $g(x)$.