applied·Interactive

StatisticsProbability

The Central Limit Theorem

Why averages of random samples tend toward a normal distribution — and how this single fact makes all of classical statistics possible.

Before this:Standard Deviation and Variance Sampling

The most useful theorem in statistics

Roll a six-sided die 30 times and record the average. Then do that 10,000 times.

The die's own distribution is perfectly flat — every face has equal probability. But the distribution of your 10,000 averages will be shaped like a bell curve, centered on 3.5.

Now do the same experiment with a heavily skewed distribution: flip a biased coin where heads appears 90% of the time. Again collect sample averages. Again: a bell curve.

This is the Central Limit Theorem. It says that the distribution of sample means is approximately normal, regardless of the shape of the underlying population distribution, as long as the sample size is large enough.

Why this matters so much

Statistics is built on a fundamental problem: we can't observe entire populations. We measure samples and try to say something about the world. This only works if we understand how samples behave.

The CLT is the answer. It tells us that sample means are predictable and well-behaved even when individual observations are not. This is why:

A pollster can survey 1,000 people and make confident statements about 300 million.
A pharmaceutical trial with a few hundred participants can establish drug efficacy.
A quality control engineer can inspect 50 items from a production run and assess the whole batch.

Without the CLT, none of this would have a rigorous mathematical foundation.

The two things it tells you

Part one: shape. The distribution of sample means is approximately normal, regardless of the population's shape. The approximation gets better as sample size increases. For many real-world distributions, a sample size of around 30 is enough.

Part two: spread. The spread of that bell curve depends on sample size. Specifically, the standard deviation of the sample mean — called the standard error — is:

Standard error of the mean

$\text{SE} = \frac{\sigma}{\sqrt{n}}$

Where $\sigma$ is the population standard deviation and $n$ is the sample size. Larger samples produce more precise estimates. Quadruple your sample size, and you halve the standard error.

Central limit theorem simulator

Sample sizen = 10

n = 1n = 50

Source distribution

Sample means

Take samples to begin

Samples collected

0.913

Std error (σ/√n)

A polling example

A pollster wants to estimate the fraction of voters who support a candidate. They survey 1,000 randomly selected voters. 53% say yes.

What can they say about the true population proportion?

The CLT says the sampling distribution of that 53% estimate is approximately normal with a standard error of roughly:

$\text{SE} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.53 \times 0.47}{1000}} \approx 0.016$

So the true proportion is within about ±3% of 53%, with high confidence. That's where "margin of error" comes from — it's the standard error scaled to a confidence level. The CLT is what makes the arithmetic valid.

The formal statement

Let $X_1, X_2, \ldots, X_n$ be independent and identically distributed random variables with mean $\mu$ and finite variance $\sigma^2$ . Define the sample mean:

Sample mean

$\bar{X}_n = \frac{1}{n}\sum_{i=1}^{n} X_i$

Then as $n \to \infty$ , the standardized sample mean converges in distribution to the standard normal:

Central Limit Theorem

$\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1)$

The remarkable part of this statement is what's missing from it: any description of what the $X_i$ look like. They could be uniform, exponential, Bernoulli, highly skewed — it doesn't matter. The means will still be approximately normally distributed, and the approximation sharpens as $n$ grows.

Where it doesn't apply

The CLT has conditions:

Independence: observations must not be correlated. Time series data, spatial data, and clustered data all violate this.
Finite variance: a few important distributions (like the Cauchy distribution) have infinite variance. For these, the CLT fails.
"Large enough" $n$ : for very skewed distributions, you need more data. The rule of thumb of $n \geq 30$ is a starting point, not a guarantee.

In finance, asset returns are heavy-tailed enough that the CLT's normal approximation breaks down at the extremes. The bell curve underestimates how often large losses occur. The CLT is powerful, but it's a theorem about averages approaching normality — the individual outcomes can still be very non-normal.

Explore in Playground →

Continue exploring

foundational·Interactive

The Normal Distribution

Why so many things in nature cluster around a middle value — and how to read the bell curve.

foundational·Interactive

Sampling

How a poll of 1,000 people can reliably represent 330 million — and why it breaks the moment sampling stops being random.

applied·Interactive

Confidence Intervals

What '95% confident' actually means — and why the most common interpretation is precisely backwards.

Enjoying this? Get notified when new concepts and articles launch.