applied·Interactive
StatisticsML Intuition

Regression Intuition

What a regression line actually is, why it's the best line, and what R² is really telling you.

What is a regression line trying to do?

You have a scatter plot. Points spread across the chart. You want a single line that summarizes the relationship between x and y.

Any line would fit in some sense. Regression finds the best one — but best by what standard?

Regression playground
0.82
Fitted slope
9.0
Intercept
0.404
1.0
-3.03.0
30
080
50
10100

The vertical lines connecting each point to the fitted line are the residuals — the errors. Adjust the slope and noise sliders and watch how the residuals change. Regression minimizes the total size of those residuals.

Least squares: why we square the errors

For any candidate line, every point has a residual: the vertical distance between the actual y and the line's predicted y.

We want to minimize total error. But residuals can be positive or negative — they'd cancel out if we just summed them. Squaring them solves this and also penalizes large errors more than small ones.

The regression line minimizes the sum of squared residuals (hence "least squares"):

Ordinary least squares

β^=argminβi=1n(yiy^i)2\hat{\beta} = \arg\min_{\beta} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

This has a closed-form solution: the slope and intercept can be computed directly from the data without iterating or guessing.

What R² is actually measuring

R² (R-squared) measures how much of the variance in y is explained by x. For a standard least-squares line fit with an intercept, it ranges from 0 to 1. (Out-of-sample, or for models forced through the origin, R² can go negative — a sign the line predicts worse than the mean.)

  • R² = 1: the line fits perfectly — all points are on it, no residuals
  • R² = 0: the line is no better than just predicting the mean every time
  • R² = 0.7: knowing x accounts for 70% of the variation in y; 30% remains unexplained

Try the Noise slider in the playground. As noise increases, the points scatter further from the true line and R² falls. As sample size increases, the fitted slope gets closer to the true one — because more data averages out the noise.

The assumptions hiding under regression

Ordinary linear regression assumes:

  1. The relationship between x and y is actually linear
  2. The errors are independent of each other
  3. The variance of errors is constant across all values of x (homoscedasticity)
  4. The errors are approximately normally distributed (for inference, not prediction)

Violating these doesn't always break the model, but it changes what you can claim. Always plot your data and residuals before trusting regression output.

Where regression shows up

  • Economics: how does an extra year of education affect earnings?
  • Medicine: what's the relationship between dose and response?
  • Engineering: predicting output from inputs in a manufacturing process
  • Machine learning: linear regression is the simplest supervised learning algorithm; understanding it is the foundation for understanding everything more complex

Enjoying this? Get notified when new concepts and articles launch.