Playground

Interactive companions to selected Concepts and Field Notes — not a substitute for them. Adjust the sliders, pressure-test the mental model, then follow the link back to the full explanation.

Statistics

Distribution Shape Explorer

Read the concept →

Switch between symmetric, skewed, and bimodal data. Slide the bin count and watch shape emerge — or vanish — while mean and median lines reveal the skew.

Distribution shape explorer

Bin count20 bins

5 bins50 bins

MeanMedian

Raw values

50.7

Mean

50.3

Median

Symmetric distribution: mean and median agree, and either is a fair summary.

Mean vs. Median Puller

Read the concept →

Drag the dots along a number line. Pull one far to the right and watch the mean chase it while the median barely moves.

Mean vs. median puller

63.9

Mean

61.0

Median

MeanMedian

Drag the right-side dot far out. Watch which line chases it.

Variance Explorer

Read the concept →

Adjust the spread of a distribution and add outliers. Watch how mean, variance, and standard deviation respond in real time.

Variance explorer

50.6

Mean

85.6

Variance

9.25

Std dev

Spread (std dev)10

230

Outliers0

Confidence Interval Coverage

Read the concept →

Draw repeated samples and see how often the 95% confidence interval actually captures the true mean — and what happens when it misses.

Confidence interval simulator

Draw samples to see intervals accumulate

Contains μ = 50Misses μ = 50True mean

Regression Fit & Residuals

Read the concept →

Control slope, noise, and sample size to see how a line fits data and where residuals come from.

Regression playground

0.82

Fitted slope

9.0

Intercept

0.404

R²

Slope1.0

-3.03.0

Noise30

080

Sample size50

10100

Sample Means Converge to a Bell Curve

Read the concept →

Pick a source distribution — uniform, skewed, or bimodal. Collect sample means and watch them converge to a bell curve, regardless of what the source looks like.

Central limit theorem simulator

Sample sizen = 10

n = 1n = 50

Source distribution

Sample means

Take samples to begin

Samples collected

0.913

Std error (σ/√n)

Null Distribution Explorer

Read the concept →

Drop an observed z-statistic and watch the tail shade. Then switch to p-curve mode and run 1000 null experiments — the p-values scatter uniformly, with 5% below 0.05 by chance alone.

Null distribution explorer

Observed z-statistic2.00

-4.004.00

2.00

|z|

0.0455

p (two-sided)

1 in 22

frequency

Uncommon under the null, but happens about 1 in 22 experiments by chance.

Effect Size Overlap

Read the concept →

Slide Cohen's d from 0 to 2 and watch two distributions pull apart. The shaded overlap shrinks while the chance a treated individual beats an untreated one climbs from 50% to 92%.

Effect Size Overlap

Cohen's d0.50

0.002.00

0.50

Cohen's d

64%

P(treated > ctrl)

80%

Visual overlap

Cohen's 'medium' (d ≈ 0.5). ~64% chance treated beats untreated — the curves visibly pull apart.

√n Shrinkage

Read the concept →

Slide the sample size on a log scale from 10 to 10,000. Watch the margin of error shrink as 1/√n — then flip to mode B and watch the same tiny effect become 'highly significant' on n alone.

√n Shrinkage

Sample size (n)10

1010000

error ∝ 1/√n

Sample size

±31.0 pp

Margin of error (95%)

0.1581

Standard error

At n=10, the margin is enormous — your estimate barely constrains anything.

Chartist Fallacy

Read the concept →

Three financial charts. Pick the one with a real upward trend — then reveal that all three are zero-drift random walks. Dial drift up to feel where 'real' starts.

Chartist Fallacy

Drift per step (μ)0.000

0.0000.150

—

Your pick

μ = 0.000

Drift per step

Three financial charts. Which one shows a real upward trend?

Trend or Noise

Read the concept →

Two panels. One holds a hidden upward trend; one is pure noise. Identify the trend across rounds — at small n, your accuracy will hover around 50%.

Trend or Noise

Points per panel (n)30

10200

Left

Right

0/0

Correct

—

Accuracy

n = 30

Points per panel

Two panels. One holds a hidden upward trend; one is pure noise. Identify the trend.

Multiple Testing on Pure Noise

Read the concept →

Run K significance tests on pure noise, then watch Bonferroni and Benjamini-Hochberg sweep the false positives away. At K=20 with no correction, expect about one 'hit' — even though nothing is real.

Jelly Bean Lab

Tests (K)20

1100

T10.627

T20.003

T30.527

T40.981

T50.968

T60.281

T70.613

T80.721

T90.426

T100.995

T110.455

T120.489

T130.139

T140.404

T150.248

T160.154

T170.489

T180.067

T190.395

T200.767

Tests run

Significant

1.0

Expected at p<0.05

Test K=20 colors at α=0.05 — about 1.0 'hit' expected from pure noise.

Probability

False Positives Under Low Prevalence

Read the concept →

Adjust test accuracy and disease prevalence. See how many of the positive results are actually false alarms — and why rare conditions are hard to screen for.

Base rate neglect

Disease prevalence1%

0.5%5%

Sensitivity (true positive rate)95%

80%99%

Specificity (true negative rate)90%

80%99%

10,000 people · each square = 10 people

8.8%

of positive tests are real

(95 true positives, 990 false positives, among 1085 positive tests)

Has disease · test positiveNo disease · test positiveHas disease · test negativeNo disease · test negative

Posterior Probability by Scenario

Read the concept →

See why a test that flags 95% of true cases can still be wrong most of the time when the condition is rare — base rate and false-positive rate decide the answer. Switch between disease testing, spam filtering, and fraud detection.

Bayes visualizer

A disease affects 1% of people. The test correctly identifies 95% who have it, but also flags 10% of healthy people.

100 people

Tests positive, has conditionHas condition, not detectedFalse positiveHealthy, tests negative

1.0%

Prior

positive test

8.8%

Posterior

Out of 100,000 people tested:

950 test positive and have the condition
9,900 test positive but do not

Of 10,850 positive results, 8.8% actually have the condition.

Adjust prior probability1.0%

0.1%50.0%

Monty Hall: Should You Switch?

Read the concept →

Choose a door. The host reveals a goat. Should you switch? Run a thousand trials and let the numbers settle the argument.

Monty Hall simulator

Pick a door to start.

Causal Reasoning

When a Hidden Variable Drives Both

Read the concept →

Ice cream sales track drownings perfectly — until you recolor by season and the link vanishes. The DAG below names what you just saw: a third variable driving both.

Confounder DAG

r = 0.86

Pooled r

r = -0.08

Within-season r

More ice cream, more drownings — strong correlation. Sounds causal.

Correlation Failure Modes

Read the concept →

Four real correlations, each broken by a different mechanism. Click through and watch the taxonomy emerge: confounder, reverse causation, chance, selection bias.

Why a correlation lies

r = 0.85

Pearson r

—

Mechanism

Ice cream sales track drownings tightly — but does ice cream cause drowning?

ML Intuition

Overfitting Explorer

Read the concept →

Fit polynomials of increasing degree to noisy data. Watch the training error fall while the test error climbs — and see exactly where the model starts memorizing instead of learning.

Overfitting explorer

Polynomial degree3 — Good fit

1 — Good fit10 — Good fit

Fitted curveTrue function (sin)Training points

0.0974

Training error

0.1071

Test error

Data Thinking

Survivorship Bias — Wald's Planes

Read the concept →

See the WWII bomber damage pattern that Abraham Wald used to prove we were reinforcing the wrong parts of the plane.

Survivorship bias — the Wald problem

Damage recorded on returning bombers. Engineers wanted to reinforce the areas with the most holes.

Returning planes (observed)

How Peeking Inflates False Positives

Read the field note →

Simulate 5,000 A/B tests under no real effect. Adjust how often you peek at the dashboard and watch the false positive rate climb from 5% to over 25%.

Peeking simulator

5,000 simulated A/B tests, no real effect — adjust how often you peek

Looks per test10

130

Looks

—

Fixed-N FPR

—

Peek-and-stop FPR

The dashed line marks the 5% rate the math is supposed to guarantee.

Average vs. typical user lifetime

Read the field note →

Slide the power-user fraction from 0 to 30% and watch the mean lifetime climb 20× beyond the median. The 'average user' is the gap between these two numbers.

Average vs. typical user lifetime

Power-user fraction (q)0.15

0.000.30

MedianMean

3.5

Median lifetime

Mean lifetime

22.2×

Mean / median

15% power users → mean lives ~22× longer than the median. The "average user" is the gap.