Sampling

How a poll of 1,000 people can reliably represent 330 million — and why it breaks the moment sampling stops being random.

Size isn't the problem you think it is

A poll of 1,000 people can reliably represent 330 million. That sounds wrong — how can a fraction so small tell you anything about the whole? But it works, and the math is surprisingly clean.

The key insight: accuracy depends almost entirely on sample size, not on what fraction of the population you've sampled — as long as that fraction stays small. A sample of 1,000 from a country of 330 million is essentially as accurate as a sample of 1,000 from a city of 100,000. Population size barely matters until your sample becomes a large slice of it — at which point a finite population correction actually makes the smaller population a little easier to pin down.

You can feel why it's true: if your sample is truly random, each additional observation adds new signal regardless of how many people you haven't yet asked. The standard error of an estimate falls with the sample size $n$ , not the population size $N$ . (The Central Limit Theorem is the separate result that tells you the shape of that error — approximately normal — which is what turns a standard error into a margin of error.)

What "random" actually requires

Here's where it breaks. Random does not mean haphazard. It means every person in the population has an equal (or known) probability of being selected.

In 1936, the Literary Digest polled over 10 million Americans to predict the presidential election. They predicted a landslide for Alf Landon. Roosevelt won by the largest Electoral College margin in history. Their sample was enormous and wrong, because it drew from automobile owners and telephone subscribers — wealthier, more Republican voters. The sample wasn't random. It was a mirror of a slice.

This is selection bias: the sample differs from the population in a way that's related to what you're measuring. A customer satisfaction survey that only goes to customers who respond is measuring people who bother to respond — a different group than all customers.

How error shrinks with size

When sampling is random, error follows a precise pattern: it shrinks with the square root of sample size.

Double your sample → error falls by about 30%. To cut error in half, you need 4× as many observations. To cut it by 10×, you need 100×.

This is why 1,000 is a common poll size. Going from 1,000 to 10,000 responses improves accuracy by a factor of about 3 — useful but rarely worth 10× the cost.

Standard error of a proportion

$SE = \sqrt{\frac{p(1-p)}{n}}$

Where p is the estimated proportion and n is the sample size. For a 50/50 split (the worst case, maximum variance) with n = 1,000, the standard error is about 1.6 percentage points. That gives a margin of error of roughly ±3 points — which is exactly what you see reported in polls.

√n Shrinkage

Sample size (n)10

1010000

error ∝ 1/√n

Sample size

±31.0 pp

Margin of error (95%)

0.1581

Standard error

At n=10, the margin is enormous — your estimate barely constrains anything.

The practical checklist

Before trusting a sample statistic, ask:

How was the sample selected? Random and representative, or convenient?
Who is missing? Non-response is a form of selection bias.
What is the sample size? Affects the margin of error, not the bias.
Does the sample match the population on known characteristics? Age, geography, income — if these are off, the sample is probably not representative.

Getting a random sample is harder than it sounds. Clinical trials spend enormous effort on recruitment exactly because convenience samples — whoever shows up — are almost never representative of the patients the treatment will eventually reach.

Explore in Playground →

Continue exploring

foundational·Interactive

Standard Deviation and Variance

Why spread matters as much as the average, and how to measure it.

applied·Interactive

The Central Limit Theorem

Why averages of random samples tend toward a normal distribution — and how this single fact makes all of classical statistics possible.

applied·Interactive

Confidence Intervals

What '95% confident' actually means — and why the most common interpretation is precisely backwards.

Enjoying this? Get notified when new concepts and articles launch.