applied·Interactive
StatisticsProbability

The Central Limit Theorem

Why averages of random samples tend toward a normal distribution — and how this single fact makes all of classical statistics possible.

The most useful theorem in statistics

Roll a six-sided die 30 times and record the average. Then do that 10,000 times.

The die's own distribution is perfectly flat — every face has equal probability. But the distribution of your 10,000 averages will be shaped like a bell curve, centered on 3.5.

Now do the same experiment with a heavily skewed distribution: flip a biased coin where heads appears 90% of the time. Again collect sample averages. Again: a bell curve.

This is the Central Limit Theorem. It says that the distribution of sample means is approximately normal, regardless of the shape of the underlying population distribution, as long as the sample size is large enough.

Why this matters so much

Statistics is built on a fundamental problem: we can't observe entire populations. We measure samples and try to say something about the world. This only works if we understand how samples behave.

The CLT is the answer. It tells us that sample means are predictable and well-behaved even when individual observations are not. This is why:

  • A pollster can survey 1,000 people and make confident statements about 300 million.
  • A pharmaceutical trial with a few hundred participants can establish drug efficacy.
  • A quality control engineer can inspect 50 items from a production run and assess the whole batch.

Without the CLT, none of this would have a rigorous mathematical foundation.

The two things it tells you

Part one: shape. The distribution of sample means is approximately normal, regardless of the population's shape. The approximation gets better as sample size increases. For many real-world distributions, a sample size of around 30 is enough.

Part two: spread. The spread of that bell curve depends on sample size. Specifically, the standard deviation of the sample mean — called the standard error — is:

Standard error of the mean

SE=σn\text{SE} = \frac{\sigma}{\sqrt{n}}

Where σ\sigma is the population standard deviation and nn is the sample size. Larger samples produce more precise estimates. Quadruple your sample size, and you halve the standard error.

Central limit theorem simulator
n = 10
n = 1n = 50
Source distribution
Sample means
Take samples to begin
0
Samples collected
0.913
Std error (σ/√n)

A polling example

A pollster wants to estimate the fraction of voters who support a candidate. They survey 1,000 randomly selected voters. 53% say yes.

What can they say about the true population proportion?

The CLT says the sampling distribution of that 53% estimate is approximately normal with a standard error of roughly:

SE=p(1p)n=0.53×0.4710000.016\text{SE} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.53 \times 0.47}{1000}} \approx 0.016

So the true proportion is within about ±3% of 53%, with high confidence. That's where "margin of error" comes from — it's the standard error scaled to a confidence level. The CLT is what makes the arithmetic valid.

The formal statement

Let X1,X2,,XnX_1, X_2, \ldots, X_n be independent and identically distributed random variables with mean μ\mu and finite variance σ2\sigma^2. Define the sample mean:

Sample mean

Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^{n} X_i

Then as nn \to \infty, the standardized sample mean converges in distribution to the standard normal:

Central Limit Theorem

Xˉnμσ/ndN(0,1)\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1)

The remarkable part of this statement is what's missing from it: any description of what the XiX_i look like. They could be uniform, exponential, Bernoulli, highly skewed — it doesn't matter. The means will still be approximately normally distributed, and the approximation sharpens as nn grows.

Where it doesn't apply

The CLT has conditions:

  • Independence: observations must not be correlated. Time series data, spatial data, and clustered data all violate this.
  • Finite variance: a few important distributions (like the Cauchy distribution) have infinite variance. For these, the CLT fails.
  • "Large enough" nn: for very skewed distributions, you need more data. The rule of thumb of n30n \geq 30 is a starting point, not a guarantee.

In finance, asset returns are heavy-tailed enough that the CLT's normal approximation breaks down at the extremes. The bell curve underestimates how often large losses occur. The CLT is powerful, but it's a theorem about averages approaching normality — the individual outcomes can still be very non-normal.

Enjoying this? Get notified when new concepts and articles launch.