Normal Distribution
Description
The Normal Distribution, often called the **Bell Curve** or Gaussian Distribution, is the most important probability distribution in statistics. It describes a dataset where most values cluster around a central average (mean), with fewer and fewer values appearing as you move away from the center.
It is defined by two parameters: * **Mean ($\mu$):** The center of the peak. * **Standard Deviation ($\sigma$):** How wide or spread out the curve is.
The **Empirical Rule (68-95-99.7)** states that for a normal distribution: * 68% of data falls within 1 standard deviation of the mean. * 95% falls within 2 standard deviations. * 99.7% falls within 3 standard deviations.
History & Origins
The discovery of the normal distribution is a tale of three mathematicians. Abraham de Moivre (1733): He was the first to notice the bell shape while approximating coin flips (binomial distribution) for large numbers. Carl Friedrich Gauss (1809): He applied the formula to analyze errors in astronomical observations. Because of his work, it is often called the "Gaussian Distribution." Pierre-Simon Laplace (1812): He proved the Central Limit Theorem, which explains why the normal distribution appears everywhere in nature—from human heights to test scores.
Why the constant 1/√(2π)?
The area under the entire curve must equal 1 (100% probability). To prove this constant, we solve the famous Gaussian Integral.
Let $I = \int_{-\infty}^{\infty} e^{-x^2} dx$.
Square it: $I^2 = \int_{-\infty}^{\infty} e^{-x^2} dx \int_{-\infty}^{\infty} e^{-y^2} dy = \int \int e^{-(x^2+y^2)} dx dy$.
Convert to Polar Coordinates: $x^2 + y^2 = r^2$ and $dx dy = r dr d\theta$.
The integral becomes $\int_0^{2\pi} d\theta \int_0^{\infty} e^{-r^2} r dr$.
Solve the inner integral using substitution ($u=r^2$): It evaluates to $1/2$.
Multiply by $2\pi$: $I^2 = 2\pi(1/2) = \pi$.
So $I = \sqrt{\pi}$.
Our function has extra scaling factors, leading to the normalization constant $\frac{1}{\sqrt{2\pi}}$.
Variables
| Symbol | Meaning |
|---|---|
f(x) | Probability Density (Height of curve) |
μ | Mean (Average/Center) |
σ | Standard Deviation (Spread/Width) |
x | Value you are checking |
Examples
Basic Calculation
Problem: IQ Scores are normally distributed with Mean=100 and SD=15. What is the Z-score for an IQ of 130?
Solution:
Factory Quality Control
Problem: A machine fills cereal boxes. The mean weight is 500g with a standard deviation of 5g. What percentage of boxes are between 495g and 505g?
Solution: 68%
- Identify limits: 495g and 505g.
- Calculate distance from mean: $500 - 495 = 5$g and $505 - 500 = 5$g.
- Convert to Standard Deviations: $5\text{g} = 1\sigma$.
- Apply Empirical Rule: The area within $\pm 1\sigma$ is roughly 68%.
- Conclusion: About 68% of boxes are in this range.
Grading on a Curve
Problem: A test has a mean of 70 and SD of 10. To get an A, you need to be in the top 2.5%. What score do you need?
Solution: ~90
- Identify the threshold: Top 2.5%.
- From the 68-95-99.7 rule, 95% is in the middle. The remaining 5% is in the tails (2.5% low, 2.5% high).
- The top 2.5% starts at $+2$ Standard Deviations.
- Calculate score: $\text{Mean} + 2\sigma$.
- Substitute: $70 + 2(10) = 70 + 20 = 90$.
- You need a score of 90.
Common Mistakes
Thinking PDF value is probability
The value $f(x)$ is the *density*, not the probability. Probability is the *area* under the curve between two points. For a specific point $x$, the probability is technically 0.
Confusing Mean and Median
In a perfect normal distribution, Mean = Median = Mode. But in skewed real-world data, they differ. The Bell Curve formula assumes perfect symmetry.
Real-World Applications
Six Sigma Manufacturing
In manufacturing, "Six Sigma" is a quality goal. It means the process is so precise that defects only happen outside of 6 standard deviations from the mean. This corresponds to only 3.4 defects per million opportunities.
Finance: Value at Risk (VaR)
Banks use the normal distribution to model the risk of financial portfolios. By calculating the standard deviation (volatility) of asset prices, they estimate the maximum potential loss over a given timeframe.
Frequently Asked Questions
What is a Z-score?
A Z-score tells you how many standard deviations a data point is from the mean. $Z = (x - \mu) / \sigma$. It allows you to compare different datasets (like SAT vs ACT scores).
Why is it called "Normal"?
Because 19th-century statisticians found that errors in measurement "normally" followed this pattern. It became the standard or "normal" expectation for random data.