Social Science Statistics

What Is the Sign Test?

The sign test is the simplest non-parametric test for paired data. Given two measurements on the same individual (or matched pair), it asks one basic question: did the score go up or down? It ignores how much the score changed and focuses entirely on the direction of the change. Because of this extreme simplicity, the sign test makes almost no assumptions about your data. It does not require a normal distribution, it does not require a specific scale of measurement, and it is not thrown off by outliers. If you can say whether each person improved or worsened, you have enough information to run a sign test.

Why Do We Need It?

Imagine a researcher studying whether a brief public-speaking workshop reduces stage fright. She asks 18 participants to rate their anxiety on a scale from 1 to 5 before the workshop and again afterwards. With such a coarse scale, measuring the exact size of any change is unreliable — the difference between a 2 and a 3 may not mean the same thing as the difference between a 4 and a 5. What she can say is whether each person's score went down (improvement), went up (worsening), or stayed the same (no change). The sign test is designed precisely for this situation.

More generally, the sign test is useful whenever the magnitude of paired differences is unreliable or meaningless but the direction of change is clear. It is also a safe fallback when other paired tests (such as the paired t-test or the Wilcoxon signed-rank test) cannot be used because their assumptions are not met.

How Does It Work?

For each pair of observations, you note the sign of the difference: positive (+) if the second measurement is higher, negative (−) if it is lower. Pairs that show no change (a difference of zero) are excluded from the analysis because they provide no evidence either way. Let n be the number of remaining pairs and S be the number of positive signs (or the number of negative signs — you choose the smaller count for a two-tailed test).

Under the null hypothesis — that there is no consistent direction of change — each difference is equally likely to be positive or negative, just like a fair coin flip. The probability of getting S or fewer positives out of n trials follows the binomial distribution with a probability of 0.5. If this probability (the p-value) is very small, it means the lopsided split between positives and negatives is unlikely to be due to chance, and you reject the null hypothesis.

What Does the Result Mean?

A significant sign test result tells you that there is a consistent trend in one direction across your pairs. In the public-speaking example, if 14 out of 16 usable pairs showed a decrease in anxiety and only 2 showed an increase, the sign test would likely return a small p-value, leading you to conclude that the workshop is associated with reduced stage fright.

However, because the sign test discards all information about the size of each difference, it has relatively low statistical power. Power refers to a test's ability to detect a real effect when one exists. A test like the Wilcoxon signed-rank test, which accounts for the magnitude of differences, will generally be better at detecting subtle effects. The trade-off is that the sign test demands almost nothing of your data, making it applicable in situations where more powerful tests are not.

Key Assumptions

Paired observations: Each individual (or matched unit) must provide a score under both conditions.
Independent pairs: The outcome for one pair must not affect the outcome for another pair.
Dichotomous classification: You must be able to classify each difference as either positive or negative. Ties (differences of zero) are excluded.

Notice what is not on this list: there is no requirement for normality, no requirement for symmetry, and no requirement for a particular level of measurement beyond the ability to determine direction. This makes the sign test one of the most widely applicable statistical methods available.

When to Use It

Use the sign test when you have paired data and the only reliable information is the direction of change — for instance, when data are measured on a very coarse ordinal scale, when differences are highly asymmetric, or when you simply want the most assumption-free test possible. If the magnitude of differences is reliable and the distribution of those differences is roughly symmetric, the Wilcoxon signed-rank test will give you more power. If the differences are approximately normal, the paired t-test is the most powerful option. Think of the sign test as your statistical safety net: it may not be the sharpest tool in the box, but it works in almost any situation.

A Quick Example

A teacher wants to know if a new reading programme improves comprehension. She tests 20 students before and after the programme. Of the 20 pairs, 2 show no change and are dropped, leaving 18 usable pairs. Of these, 14 students improved and 4 declined. Under the null hypothesis, the probability of 14 or more positives out of 18 (with p = 0.5 each) is approximately 0.015. Because 0.015 is below 0.05, she concludes that the reading programme is associated with a significant improvement in comprehension.

The sign test is a humble but valuable tool. Its minimal assumptions make it applicable in a remarkably wide range of research scenarios, and its logic — counting positives versus negatives — is as intuitive as statistics gets.