Pearson Correlation

What Is Pearson Correlation?

The Pearson correlation coefficient, often written as r, is a number that measures the strength and direction of the linear relationship between two continuous variables. In plain terms, it tells you how closely two variables move together in a straight-line pattern.

Imagine a researcher who collects data on the number of hours students spend studying and their exam scores. If students who study more tend to score higher, and this relationship follows a roughly straight line when plotted on a graph, the Pearson correlation will capture both how strong and how consistent that trend is.

What Does the Coefficient Tell You?

The value of r always falls between −1 and +1. A value of +1 means a perfect positive linear relationship: as one variable increases, the other increases in perfect lockstep. A value of −1 means a perfect negative linear relationship: as one variable increases, the other decreases perfectly. A value of 0 means there is no linear relationship at all.

In practice, perfect correlations almost never occur. Researchers generally use rough guidelines to describe the strength of a correlation: values of r around ±0.10 to ±0.30 are considered weak, ±0.30 to ±0.50 are moderate, and anything above ±0.50 is strong. These are only guidelines, however, and the practical importance of a correlation depends on the field of study.

R-Squared: Shared Variance

A closely related value is r-squared (written r²), which you obtain by squaring the correlation coefficient. R-squared tells you the proportion of variance in one variable that is accounted for by the other. For example, if r = 0.60, then r² = 0.36, meaning that 36% of the variation in one variable can be explained by its linear relationship with the other. The remaining 64% is due to other factors.

Correlation Does Not Mean Causation

This is one of the most important principles in statistics. Finding a strong correlation between two variables does not prove that one causes the other. For example, there is a positive correlation between ice cream sales and the number of drownings each year. This does not mean ice cream causes drowning. Both variables are driven by a third factor: hot weather. People buy more ice cream and also swim more when temperatures rise.

To establish causation, you typically need a controlled experiment, not just a correlation. The Pearson coefficient is a tool for measuring association, and you should always interpret it with this limitation in mind.

When to Use Pearson Correlation

Use Pearson correlation when you have two continuous variables and you want to assess whether a linear relationship exists between them. Common examples include:

  • Examining whether hours of sleep are related to test performance.
  • Investigating whether advertising expenditure is associated with sales revenue.
  • Checking whether age is related to blood pressure.

Key Assumptions

Pearson correlation relies on several important assumptions. Violating these can lead to misleading results:

  • Linearity: The relationship between the two variables should be approximately linear. If the true relationship is curved, Pearson’s r will underestimate the strength of the association.
  • Continuous data: Both variables should be measured on a continuous scale (interval or ratio level).
  • Normality: For hypothesis testing (determining whether the correlation is statistically significant), both variables should be roughly normally distributed. Small departures from normality are usually acceptable with larger samples.
  • No extreme outliers: Pearson correlation is sensitive to outliers. A single unusual data point can dramatically inflate or deflate the value of r.
  • Independence: Each pair of observations should be independent of the others.

If your data do not meet these assumptions—for instance, if the relationship is curved, the data contain extreme outliers, or the variables are measured on an ordinal (ranked) scale—you should consider an alternative such as Spearman’s rank correlation, which is more flexible and robust in those situations.