foundational·Interactive

ProbabilityData Thinking

Base Rate Neglect

Why a 99%-accurate positive test can still be mostly wrong — and why our intuition about probability is systematically broken.

A test that is 99% accurate, yet wrong 99% of the time

A disease affects 1 in 10,000 people. A new test for it is 99% accurate: it correctly identifies 99% of people who have the disease, and it only flags 1% of healthy people by mistake.

You take the test. It comes back positive. How worried should you be?

Most people say: very worried. The test is 99% accurate, so there must be a 99% chance you have the disease.

The real answer: the chance you have the disease is roughly 1%.

Why your intuition is wrong

The mistake is ignoring the base rate — how common the disease is in the first place. Rare conditions are rare. That seems obvious, but it has a non-obvious implication: even a very accurate test will generate mostly false positives when the thing it's detecting almost never happens.

Here's the arithmetic. Imagine testing 10,000 people.

People who have the disease:

1 in 10,000 people have it, so: 1 person
The test catches 99% of them, so: 0.99 true positives (call it 1)

People who don't have the disease:

That's 9,999 people
The test wrongly flags 1% of them: ~100 false positives

So out of the ~101 people who test positive, only 1 actually has the disease. That's less than 1%.

Base rate neglect

Disease prevalence1%

0.5%5%

Sensitivity (true positive rate)95%

80%99%

Specificity (true negative rate)90%

80%99%

10,000 people · each square = 10 people

8.8%

of positive tests are real

(95 true positives, 990 false positives, among 1085 positive tests)

Has disease · test positiveNo disease · test positiveHas disease · test negativeNo disease · test negative

The mammogram problem — and why doctors get it wrong

In a famous series of surveys, Gerd Gigerenzer asked physicians a version of this question with breast cancer screening data: roughly 1% prevalence, 90% sensitivity, 9% false positive rate. The question: if a woman tests positive on a routine mammogram, what is the probability she actually has cancer?

Most doctors estimated 50–90%. The correct answer is closer to 9%.

The doctors weren't innumerate — they were ignoring the base rate. A positive mammogram is more likely to be a false alarm than a real diagnosis. This isn't an argument against screening; it's an argument for confirmatory testing before acting on a single positive result.

The base rate is the starting point

Before you update on any new piece of evidence, you have to ask: what is the prior probability? How common is this thing in the reference class of people who get tested?

The base rate is not just background information. It is the foundation that determines what any subsequent test result can tell you.

Screening a low-risk population for a rare disease? Expect mostly false positives.
Screening a high-risk population for the same disease? The base rate is higher, so positive results carry far more weight.
A spam filter flags an email: spam is common, so the base rate is high and the flag probably means something.
A fraud detection system flags a transaction: fraud is rare, so even a sophisticated system will be wrong most of the time.

The same test, producing the same positive result, can mean completely different things depending on who gets tested.

Where base rate neglect causes the most harm

Journalism: "People who eat X are twice as likely to develop Y." Twice a tiny base rate is still a tiny rate. Relative risk makes rare outcomes sound dramatic.

Legal proceedings: DNA evidence with a 1-in-a-million match probability sounds overwhelming. But if you're searching a database of millions of people, the chance of a coincidental match is substantial.

Data science: A model with 99% accuracy on a dataset where 99% of records belong to one class has learned nothing. It's just predicting the majority class every time.

Public health: Mandatory testing programs for low-prevalence conditions generate enormous numbers of false positives, causing harm — anxiety, unnecessary follow-up procedures, treatment side effects — to healthy people.

The formal statement

The calculation we did above is Bayes' theorem in disguise. The probability of having the disease given a positive test is:

Posterior probability (Bayes)

$P(\text{disease} \mid \text{positive}) = \frac{P(\text{positive} \mid \text{disease}) \cdot P(\text{disease})}{P(\text{positive})}$

The denominator is the total probability of testing positive:

Total probability of a positive test

$P(\text{positive}) = P(\text{pos} \mid \text{disease}) \cdot P(\text{disease}) + P(\text{pos} \mid \text{no disease}) \cdot P(\text{no disease})$

With a 1-in-10,000 base rate, 99% sensitivity, and 1% false positive rate:

$P(\text{positive}) = 0.99 \times 0.0001 + 0.01 \times 0.9999 \approx 0.01009$

$P(\text{disease} \mid \text{positive}) = \frac{0.99 \times 0.0001}{0.01009} \approx 0.0098$

About 0.98% — barely above the base rate of 0.01%. The positive test result barely moved the needle.

This is why Bayes' theorem exists: to make this correction automatically, keeping the base rate in the calculation where intuition forgets to put it.

Explore in Playground →

Continue exploring

applied·Interactive

Bayes' Theorem

How a positive test result can still be mostly wrong — and how to update beliefs correctly when evidence arrives.

foundational·Interactive

Survivorship Bias

Why the data that reaches you is never the full story — and how the missing failures quietly corrupt every conclusion you draw from winners.

Enjoying this? Get notified when new concepts and articles launch.