Survivorship Bias

Why the data that reaches you is never the full story — and how the missing failures quietly corrupt every conclusion you draw from winners.

The engineers were wrong. Wald was right.

During World War II, the U.S. military analyzed damage patterns on bombers returning from missions. The bullet holes clustered on the wings and fuselage. The engineers' conclusion: reinforce those areas.

Abraham Wald, a statistician, stopped them. The planes with holes in the wings and fuselage had made it back. That was precisely the problem. The aircraft hit in the engines and cockpit hadn't returned at all — they were at the bottom of the English Channel. The data told the engineers where a plane could afford to be hit. The absence of damage at the engines and cockpit was the real signal.

Reinforce the areas with no holes.

This is survivorship bias: your sample is conditioned on survival, so every failure has been quietly filtered out before you start counting.

The mechanic of the mistake

The engineers weren't irrational. They were doing something that feels like good empiricism — looking at real damage on real planes. The flaw was in who got to show up in the dataset. Planes that returned were not a random sample of all planes that flew. They were a selected subset: the ones that could survive damage to wings and fuselage.

Any inference drawn from that sample applies only to survivors. Extending it to the full population requires asking what the non-survivors looked like — and they were never in the room.

Survivorship bias — the Wald problem

Damage recorded on returning bombers. Engineers wanted to reinforce the areas with the most holes.

Returning planes (observed)

Toggle "Reveal the full picture" above. The accent-colored dots appear where the missing planes were hit — engines and cockpit. The returning planes have no holes there. That blank space is the information.

The same mistake, everywhere

The WWII story is memorable, but the pattern runs through modern data work constantly.

Mutual fund track records. A fund family advertises that its funds have beaten the market over the past decade. What the advertisement omits: the funds that underperformed were quietly merged or delisted over that same decade. The survivors look outstanding precisely because the failures have been removed from the average. The 10-year return figure is real — but the population it describes no longer exists.

"Successful founders dropped out of college." The canonical examples are real: Gates, Jobs, Zuckerberg. What's missing: the much larger number of people who dropped out of college to pursue a startup and failed. Those founders aren't giving keynotes or appearing in profiles. The sample you're reasoning from consists entirely of survivors, and the lesson you extract — drop out if you have conviction — is built on invisible failures.

Backtested trading strategies. A strategy that "worked" over the past 15 years has been selected, implicitly, from the space of strategies someone bothered to test and publish. Strategies that failed during testing were abandoned and never written up. The published record skews toward lucky fits. When you run the strategy forward, the selection pressure is gone, and so is the edge.

Building and bridge survival. Ancient Roman concrete structures that still stand are remarkable. But they're the subset that survived millennia of weather, earthquakes, and use. We have no direct record of how many Roman construction projects failed immediately or within a century. The durability of the survivors tells us something, but not as much as it seems.

The formal structure

Survivorship bias is a form of selection bias. Your observed sample $S$ is not drawn uniformly from the population $P$ — it is conditioned on the event "survived" (or "succeeded," "was published," "was listed"). Call that conditioning event $A$ .

The quantity you can estimate from your data is $E[X \mid A]$ — the expected value of whatever you're measuring, given survival. The quantity you want is $E[X]$ — the expected value across the full population.

These are equal only when $X$ and $A$ are independent: when survival is unrelated to the thing you're measuring. In almost every real case of interest, they are not independent. Planes that can survive engine damage would need to be structurally different from planes that can't. Funds that survive a decade of market conditions are not a random draw from the original fund universe.

The core inequality

$E[X \mid \text{survived}] \neq E[X]$

The gap between these two quantities is the bias. It has no fixed sign — sometimes survivors look better than the full population (mutual funds), sometimes they look worse — but it is almost always non-zero when "survival" is correlated with $X$ .

The reference-class habit

The practical antidote is to ask: what is the correct reference class? For the engineers, the reference class was not "planes that returned" — it was "all planes that flew the mission." For the fund analyst, it is not "funds currently listed" — it is "all funds that existed at the start of the period."

Once you have the right reference class, the missing data becomes visible as an absence. You may not be able to fill it in, but you can at least know it's missing and discount your conclusions accordingly.

This connects to a broader habit: whenever you read a causal claim built on a "winners" sample, ask what the losers looked like and where they went. The data that never reached you is often the most important data of all. See also confounding variables for the related case where a hidden third variable distorts what you can see, and base rate neglect for the mistake of ignoring how rare success is in the first place.

Explore in Playground →

Continue exploring

foundational·Interactive

Correlation vs. Causation

Why two things moving together doesn't mean one causes the other — and how to tell the difference.

applied·Interactive

Confounding Variables

Why the variable you're not measuring is often the one driving the result — and how to defend against it.

foundational·Interactive

Base Rate Neglect

Why a 99%-accurate positive test can still be mostly wrong — and why our intuition about probability is systematically broken.

Enjoying this? Get notified when new concepts and articles launch.