Regression Intuition

What a regression line actually is, why it's the best line, and what R² is really telling you.

Before this:Correlation vs. Causation Standard Deviation and Variance

What is a regression line trying to do?

You have a scatter plot. Points spread across the chart. You want a single line that summarizes the relationship between x and y.

Any line would fit in some sense. Regression finds the best one — but best by what standard?

Regression playground

0.82

Fitted slope

9.0

Intercept

0.404

R²

Slope1.0

-3.03.0

Noise30

080

Sample size50

10100

The vertical lines connecting each point to the fitted line are the residuals — the errors. Adjust the slope and noise sliders and watch how the residuals change. Regression minimizes the total size of those residuals.

Least squares: why we square the errors

For any candidate line, every point has a residual: the vertical distance between the actual y and the line's predicted y.

We want to minimize total error. But residuals can be positive or negative — they'd cancel out if we just summed them. Squaring them solves this and also penalizes large errors more than small ones.

The regression line minimizes the sum of squared residuals (hence "least squares"):

Ordinary least squares

$\hat{\beta} = \arg\min_{\beta} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

This has a closed-form solution: the slope and intercept can be computed directly from the data without iterating or guessing.

What R² is actually measuring

R² (R-squared) measures how much of the variance in y is explained by x. For a standard least-squares line fit with an intercept, it ranges from 0 to 1. (Out-of-sample, or for models forced through the origin, R² can go negative — a sign the line predicts worse than the mean.)

R² = 1: the line fits perfectly — all points are on it, no residuals
R² = 0: the line is no better than just predicting the mean every time
R² = 0.7: knowing x accounts for 70% of the variation in y; 30% remains unexplained

Try the Noise slider in the playground. As noise increases, the points scatter further from the true line and R² falls. As sample size increases, the fitted slope gets closer to the true one — because more data averages out the noise.

The assumptions hiding under regression

Ordinary linear regression assumes:

The relationship between x and y is actually linear
The errors are independent of each other
The variance of errors is constant across all values of x (homoscedasticity)
The errors are approximately normally distributed (for inference, not prediction)

Violating these doesn't always break the model, but it changes what you can claim. Always plot your data and residuals before trusting regression output.

Where regression shows up

Economics: how does an extra year of education affect earnings?
Medicine: what's the relationship between dose and response?
Engineering: predicting output from inputs in a manufacturing process
Machine learning: linear regression is the simplest supervised learning algorithm; understanding it is the foundation for understanding everything more complex

Explore in Playground →

Continue exploring

foundational·Interactive

Correlation vs. Causation

Why two things moving together doesn't mean one causes the other — and how to tell the difference.

applied·Interactive

Overfitting

Why a model that gets everything right on training data is probably wrong — and how to build models that generalize.

Enjoying this? Get notified when new concepts and articles launch.