Standard Error of Estimate Calculator

The Standard Error of Estimate (SEE) measures the accuracy of predictions in linear regression. It tells you how far observed values typically deviate from the regression line — similar to how standard deviation measures spread around a mean.

What is Standard Error of Estimate?

The Standard Error of Estimate (also called Standard Error of the Regression or Residual Standard Error) quantifies the typical prediction error when using a regression line to estimate Y values from X values.

Think of it this way: if standard deviation tells you how far data points spread around their mean, SEE tells you how far they spread around the regression line. A smaller SEE means more accurate predictions.

Calculation Methods

You can calculate SEE in two ways:

Method 1: From Residuals

SEE = √(Σ(y - ŷ)² / (n - 2))

Where (y - ŷ) are the residuals (differences between actual and predicted Y values), and n is the sample size.

Method 2: From Summary Statistics

SEE = Sy × √(1 - r²)

Where Sy is the standard deviation of Y, and r is the Pearson correlation coefficient.

Requirements

  • Two continuous variables with a linear relationship
  • At least 3 pairs of observations (n ≥ 3)
  • For Method 1: Raw X and Y data values
  • For Method 2: Standard deviation of Y, correlation r, and sample size n

Interpretation

  • Smaller SEE = Better predictions — values cluster closely around the regression line
  • Larger SEE = Worse predictions — values are more spread out from the line
  • SEE is in Y's units — if Y is in dollars, SEE is in dollars too
  • Compare to SD of Y — SEE should be smaller than the standard deviation of Y; the more smaller, the better the regression

Rule of Thumb for Prediction Intervals

  • Approximately 68% of predictions fall within ±1 SEE of the regression line
  • Approximately 95% of predictions fall within ±2 SEE of the regression line
  • Approximately 99.7% of predictions fall within ±3 SEE of the regression line

Example

If you're predicting test scores (Y) from study hours (X):

  • SD of test scores = 15 points
  • SEE = 8 points
  • This means using the regression line improves prediction accuracy by nearly 50% compared to just guessing the mean
  • A student who studies for a certain time will likely score within ±16 points (2 × SEE) of the predicted score