Social Science Statistics

The Problem with Finding Multiple Outliers

Outliers are data points that are unusually far from the rest of your observations, and they can seriously distort your statistical analyses. While there are tests designed to detect a single outlier — such as Grubbs' test — things get more complicated when your dataset might contain two, three, or even more outliers. The Generalized Extreme Studentized Deviate (ESD) test, developed by Rosner in 1983, was created specifically to handle this situation.

The Masking Effect: Why Single-Outlier Tests Can Fail

To understand why we need the Generalized ESD test, you first need to understand a phenomenon called the masking effect. Imagine a researcher measuring the resting heart rates of 40 participants. Most participants have heart rates between 60 and 80 beats per minute, but three participants have rates of 120, 125, and 130. If you apply a single-outlier test like Grubbs' test, those three extreme values collectively pull the mean and standard deviation upward. This makes each individual extreme value appear less unusual than it actually is, because the inflated mean is now closer to them and the inflated standard deviation makes the spread seem larger. The result? The test might conclude that none of the values are outliers, even though three of them clearly are. The outliers are effectively hiding behind each other.

You might think the solution is to just apply Grubbs' test repeatedly — find one outlier, remove it, test again. But this sequential approach has a statistical problem: each time you run the test, you increase the chance of making an error. The significance level you thought you were using no longer holds. The Generalized ESD test solves both of these problems elegantly.

How the Generalized ESD Test Works

The Generalized ESD test requires you to specify an upper bound on the number of outliers you suspect might be in your data. You do not need to know the exact number — you just need a reasonable maximum. For instance, if you have 50 data points and think there could be anywhere from zero to five outliers, you would set the upper bound at 5.

The test then works through an iterative procedure. In the first step, it identifies the data point that is furthest from the mean and calculates a test statistic for it (similar to the Grubbs' test statistic — the distance from the mean divided by the standard deviation). It then temporarily removes that point and repeats the process on the remaining data, identifying the next most extreme point and calculating a new test statistic. This continues until the procedure has been repeated as many times as the upper bound you specified.

At each step, the test also computes a corresponding critical value. Once all the iterations are complete, the test compares the test statistics to their critical values, working from the last iteration backward. The number of outliers is determined by finding the largest iteration number where the test statistic still exceeds the critical value. All data points removed up to and including that iteration are declared outliers.

A Concrete Example

Imagine a researcher studying the daily calorie intake of 30 participants in a nutrition study. Most participants report intakes between 1,800 and 2,500 calories, but three report values of 4,200, 4,500, and 5,100. The researcher suspects there could be up to four outliers, so she sets the upper bound at 4. The Generalized ESD test iterates four times, removing the most extreme value at each step and computing test statistics and critical values. After reviewing the results, the test identifies three outliers — the three extreme values. The fourth iteration does not produce a significant result, confirming that three is the right number.

Advantages Over Single-Outlier Tests

The Generalized ESD test has several important advantages:

It overcomes the masking effect. Because it iteratively removes extreme values and recalculates, outliers cannot hide behind each other.
It maintains the correct significance level. Unlike running Grubbs' test multiple times, the Generalized ESD test is designed so that the overall chance of a false positive remains at the level you chose (typically 0.05).
You do not need to know the exact number of outliers in advance. You only need to set a reasonable upper bound, and the test figures out the actual number.
It is straightforward to apply and interpret, making it practical even for researchers who are not specialists in statistics.

Key Assumptions

Like Grubbs' test, the Generalized ESD test assumes that the underlying data (without the outliers) follow an approximately normal distribution — the familiar bell-shaped curve. If your data are heavily skewed or come from a non-normal distribution, the test may not perform well. It is wise to examine a histogram of your data before applying the test. Additionally, the test requires a reasonable sample size to work reliably; Rosner recommended at least 25 observations.

As with any outlier detection method, finding a statistically significant outlier does not automatically mean you should remove it from your data. Always investigate why an extreme value exists. If it results from a data entry error or equipment malfunction, removal is justified. If it is a genuine but unusual observation, you should think carefully about whether removing it is appropriate, and report your reasoning transparently.