Social Science Statistics

What Is the Friedman Test?

The Friedman test is a non-parametric statistical method used to detect differences across three or more related conditions. It is the non-parametric alternative to the repeated measures analysis of variance (ANOVA). "Repeated measures" means that the same participants are measured under every condition, so the observations are linked rather than independent. By working with ranks instead of raw scores, the Friedman test avoids the assumption that the data are normally distributed, making it well suited for ordinal data, small samples, or distributions with heavy skew.

Why Do We Need It?

Imagine a food scientist running a taste test. She asks 20 volunteers to rate three new flavours of ice cream on a scale from 1 (dislike strongly) to 7 (like strongly). Every volunteer tastes all three flavours, so the data are naturally paired — each person provides three scores. A repeated measures ANOVA could compare the flavours, but it assumes the scores are normally distributed and measured on an interval scale. Taste ratings on a 1-to-7 scale are ordinal: the difference between a 2 and a 3 may not mean the same thing as the difference between a 5 and a 6. The Friedman test respects this by analysing ranks rather than raw numbers.

More broadly, any time you measure the same group of people under three or more conditions and your data are not suitable for parametric analysis, the Friedman test is the standard go-to method.

How Does It Work?

The procedure begins by ranking the scores within each participant separately. In our ice-cream example, each volunteer's three ratings are ranked from 1 (lowest score) to 3 (highest score). If a volunteer gives the same rating to two flavours, those tied values receive the average of the ranks they would otherwise occupy. Once every participant's scores have been ranked, the test adds up the ranks for each condition across all participants.

The test statistic, often denoted χ²_F (chi-square subscript F), measures how much the column rank sums deviate from what we would expect if there were no difference among conditions. Under the null hypothesis — that all conditions are equivalent — each condition should receive roughly equal rank sums. A larger test statistic indicates greater departure from this expectation. For moderate-to-large samples, the statistic approximately follows a chi-square distribution with degrees of freedom equal to the number of conditions minus one, and this approximation is used to produce a p-value.

What Does the Result Mean?

If the p-value is below your chosen significance level (commonly 0.05), you conclude that at least one condition differs from the others. Just like the one-way ANOVA, the Friedman test is an omnibus test: it tells you that a difference exists somewhere but not specifically where. To identify which conditions differ from each other, researchers typically conduct post-hoc pairwise comparisons — for example, a series of Wilcoxon signed-rank tests with a correction (such as the Bonferroni adjustment) to control for the increased risk of false positives that comes from making multiple comparisons.

A non-significant result means you do not have sufficient evidence to claim that the conditions differ. It does not prove they are identical — the study may simply have lacked the statistical power to detect a real but small difference.

Key Assumptions

The Friedman test has relatively few assumptions, but they are important:

Related samples: The same individuals (or matched sets) must be measured under every condition. If your groups consist of different people, the Kruskal-Wallis test is the appropriate non-parametric alternative.
Ordinal or continuous data: The outcome variable must be at least ordinal so that ranking is meaningful.
Independent blocks: Each participant (or matched set) should be independent of every other. One person's ratings should not influence another's.
Three or more conditions: With only two related conditions, the Wilcoxon signed-rank test or the sign test is used instead.

When to Use It

Choose the Friedman test when you have a within-subjects design with three or more conditions and your data are ordinal or do not meet the normality assumption required by a repeated measures ANOVA. It appears frequently in consumer research (comparing product preferences), clinical trials (rating symptoms under different treatments), and educational studies (evaluating teaching methods experienced by the same students). If the normality assumption holds and the data are measured on an interval or ratio scale, a repeated measures ANOVA will offer slightly more statistical power.

A Quick Example

A music psychologist asks 12 listeners to rate the emotional intensity of three pieces of classical music, each on a scale from 1 to 10. Each listener's three scores are ranked within that listener. The rank sums for the three pieces are 28, 19, and 25. The Friedman test yields χ²_F = 6.17 with 2 degrees of freedom and a p-value of 0.046. Because 0.046 is below 0.05, the psychologist concludes that emotional intensity ratings differ significantly across the three pieces and proceeds with pairwise follow-up tests to find out which pieces differ.

The Friedman test is a practical, assumption-light tool for any repeated-measures design where the data are not well suited to parametric methods. Its ranking approach makes it both intuitive and reliable, even with small or messy datasets.