Spearman’s Rank Correlation

What Is Spearman’s Rank Correlation?

Spearman’s rank correlation coefficient, usually written as rs or the Greek letter rho, measures the strength and direction of the relationship between two variables using their ranks rather than their raw values. Instead of asking whether two variables follow a straight-line pattern (as the Pearson correlation does), Spearman’s correlation asks a broader question: when one variable increases, does the other tend to increase (or decrease) as well, regardless of whether the change is perfectly proportional?

Imagine a researcher who asks 12 judges to rank 10 wines from best to worst. She wants to know whether Judge A and Judge B tend to agree. Because the data are already ranks, Spearman’s correlation is the natural choice. It will tell her how closely the two judges’ rankings correspond.

How Does Ranking Work?

The first step in computing Spearman’s correlation is to convert each variable’s values into ranks. The smallest value gets rank 1, the next smallest gets rank 2, and so on. If two or more values are tied, they each receive the average of the ranks they would have occupied. For example, if two values are tied for 3rd and 4th place, both are assigned rank 3.5.

Once the data have been ranked, Spearman’s coefficient is simply the Pearson correlation calculated on the ranks. The result falls between −1 and +1, just like Pearson’s coefficient. A value of +1 means the ranks are in perfect agreement, −1 means they are in perfect reverse order, and 0 means there is no consistent relationship.

Monotonic Relationships

A key concept for understanding Spearman’s correlation is the idea of a monotonic relationship. A relationship is monotonic if, as one variable increases, the other consistently increases (or consistently decreases)—but not necessarily at a constant rate. For example, the relationship between years of experience and salary might be monotonic: more experience generally means more pay, even though the pay increases may be larger early in a career and smaller later. This relationship is monotonic but not linear, and Spearman’s correlation will capture it well, while Pearson’s correlation might underestimate it.

When to Use Spearman Instead of Pearson

Spearman’s correlation is a good alternative to Pearson correlation in several situations:

  • Ordinal data: When your variables are measured on a ranked or ordinal scale (e.g., satisfaction ratings from “very dissatisfied” to “very satisfied”), Spearman’s is more appropriate because Pearson assumes continuous, interval-level data.
  • Non-linear but monotonic relationships: If the relationship between two variables is consistently increasing or decreasing but not in a straight line, Spearman’s will detect this while Pearson may miss it.
  • Outliers: Because Spearman’s works with ranks, extreme values are reduced to their rank position, which limits their influence. Pearson correlation, by contrast, can be heavily distorted by a single outlier.
  • Non-normal data: Spearman’s does not assume the data are normally distributed, making it suitable when the normality assumption of Pearson’s test is violated.

Interpreting the Result

Like Pearson’s coefficient, Spearman’s rs ranges from −1 to +1. A positive value means that higher ranks on one variable tend to go with higher ranks on the other. A negative value means that higher ranks on one variable tend to go with lower ranks on the other.

For example, imagine a study in which students’ class rankings in mathematics are compared with their rankings in physics. A Spearman coefficient of 0.78 would suggest a strong positive relationship: students who rank highly in mathematics also tend to rank highly in physics. The test also produces a p-value, which tells you whether the observed correlation is statistically significant. A p-value below 0.05 indicates that the relationship is unlikely to be due to chance.

Key Assumptions

Spearman’s rank correlation has fewer assumptions than Pearson’s, which is one of its strengths:

  • Both variables should be at least ordinal, meaning their values can be meaningfully ranked from lowest to highest.
  • The relationship between the variables should be monotonic (consistently increasing or consistently decreasing). If the relationship changes direction—for instance, rising and then falling—Spearman’s correlation will not represent it accurately.
  • The observations should be independent of each other; each data point should come from a different subject or unit.

Because it makes no assumptions about the distribution of the data and works with ranks rather than raw values, Spearman’s rank correlation is one of the most versatile and widely used tools for assessing the relationship between two variables. It is a dependable choice whenever the stricter requirements of the Pearson coefficient cannot be met.