The Central Limit Theorem
Why averages of random samples tend toward a normal distribution — and how this single fact makes all of classical statistics possible.
The most useful theorem in statistics
Roll a six-sided die 30 times and record the average. Then do that 10,000 times.
The die's own distribution is perfectly flat — every face has equal probability. But the distribution of your 10,000 averages will be shaped like a bell curve, centered on 3.5.
Now do the same experiment with a heavily skewed distribution: flip a biased coin where heads appears 90% of the time. Again collect sample averages. Again: a bell curve.
This is the Central Limit Theorem. It says that the distribution of sample means is approximately normal, regardless of the shape of the underlying population distribution, as long as the sample size is large enough.
Why this matters so much
Statistics is built on a fundamental problem: we can't observe entire populations. We measure samples and try to say something about the world. This only works if we understand how samples behave.
The CLT is the answer. It tells us that sample means are predictable and well-behaved even when individual observations are not. This is why:
- A pollster can survey 1,000 people and make confident statements about 300 million.
- A pharmaceutical trial with a few hundred participants can establish drug efficacy.
- A quality control engineer can inspect 50 items from a production run and assess the whole batch.
Without the CLT, none of this would have a rigorous mathematical foundation.
The two things it tells you
Part one: shape. The distribution of sample means is approximately normal, regardless of the population's shape. The approximation gets better as sample size increases. For many real-world distributions, a sample size of around 30 is enough.
Part two: spread. The spread of that bell curve depends on sample size. Specifically, the standard deviation of the sample mean — called the standard error — is:
Where is the population standard deviation and is the sample size. Larger samples produce more precise estimates. Quadruple your sample size, and you halve the standard error.
A polling example
A pollster wants to estimate the fraction of voters who support a candidate. They survey 1,000 randomly selected voters. 53% say yes.
What can they say about the true population proportion?
The CLT says the sampling distribution of that 53% estimate is approximately normal with a standard error of roughly:
So the true proportion is within about ±3% of 53%, with high confidence. That's where "margin of error" comes from — it's the standard error scaled to a confidence level. The CLT is what makes the arithmetic valid.
The formal statement
Let be independent and identically distributed random variables with mean and finite variance . Define the sample mean:
Then as , the standardized sample mean converges in distribution to the standard normal:
The remarkable part of this statement is what's missing from it: any description of what the look like. They could be uniform, exponential, Bernoulli, highly skewed — it doesn't matter. The means will still be approximately normally distributed, and the approximation sharpens as grows.
Where it doesn't apply
The CLT has conditions:
- Independence: observations must not be correlated. Time series data, spatial data, and clustered data all violate this.
- Finite variance: a few important distributions (like the Cauchy distribution) have infinite variance. For these, the CLT fails.
- "Large enough" : for very skewed distributions, you need more data. The rule of thumb of is a starting point, not a guarantee.
In finance, asset returns are heavy-tailed enough that the CLT's normal approximation breaks down at the extremes. The bell curve underestimates how often large losses occur. The CLT is powerful, but it's a theorem about averages approaching normality — the individual outcomes can still be very non-normal.
The Normal Distribution
Why so many things in nature cluster around a middle value — and how to read the bell curve.
Sampling
How a poll of 1,000 people can reliably represent 330 million — and why it breaks the moment sampling stops being random.
Confidence Intervals
What '95% confident' actually means — and why the most common interpretation is precisely backwards.
Enjoying this? Get notified when new concepts and articles launch.