Social Science Statistics

The Standard Error of Estimate (SEE) measures the accuracy of predictions in linear regression. It tells you how far observed values typically deviate from the regression line — similar to how standard deviation measures spread around a mean.

What is Standard Error of Estimate?

The Standard Error of Estimate (also called Standard Error of the Regression or Residual Standard Error) quantifies the typical prediction error when using a regression line to estimate Y values from X values.

Think of it this way: if standard deviation tells you how far data points spread around their mean, SEE tells you how far they spread around the regression line. A smaller SEE means more accurate predictions.

Calculation Methods

You can calculate SEE in two ways:

Method 1: From Residuals

SEE = √(Σ(y - ŷ)² / (n - 2))

Where (y - ŷ) are the residuals (differences between actual and predicted Y values), and n is the sample size.

Method 2: From Summary Statistics

SEE = Sy × √(1 - r²)

Where Sy is the standard deviation of Y, and r is the Pearson correlation coefficient.

Requirements

Two continuous variables with a linear relationship
At least 3 pairs of observations (n ≥ 3)
For Method 1: Raw X and Y data values
For Method 2: Standard deviation of Y, correlation r, and sample size n

Interpretation

Smaller SEE = Better predictions — values cluster closely around the regression line
Larger SEE = Worse predictions — values are more spread out from the line
SEE is in Y's units — if Y is in dollars, SEE is in dollars too
Compare to SD of Y — SEE should be smaller than the standard deviation of Y; the more smaller, the better the regression

Rule of Thumb for Prediction Intervals

Approximately 68% of predictions fall within ±1 SEE of the regression line
Approximately 95% of predictions fall within ±2 SEE of the regression line
Approximately 99.7% of predictions fall within ±3 SEE of the regression line

Example

If you're predicting test scores (Y) from study hours (X):

SD of test scores = 15 points
SEE = 8 points
This means using the regression line improves prediction accuracy by nearly 50% compared to just guessing the mean
A student who studies for a certain time will likely score within ±16 points (2 × SEE) of the predicted score

Take me to the calculator!