Point-Biserial Correlation

What Is the Point-Biserial Correlation?

The point-biserial correlation is a measure of the strength and direction of the relationship between two variables when one of them is dichotomous (binary) and the other is continuous. A dichotomous variable has exactly two categories — for example, pass or fail, male or female, treatment or control. A continuous variable is one that can take any value within a range, such as exam scores, reaction times, or income. The point-biserial correlation tells you whether membership in one category of the binary variable is associated with higher or lower values on the continuous variable.

Why Do We Need It?

Researchers frequently want to know whether a group difference exists. Does taking a new medication (yes or no) relate to blood pressure? Does being a first-generation college student (yes or no) relate to GPA? While you could compare group means using a t-test, the point-biserial correlation gives you something additional: a standardised measure of the strength of the relationship, on a scale from −1 to +1. This makes it easy to compare the strength of one association with another, even when the variables are measured in completely different units.

Its Relationship to Pearson's r

Here is a fact that surprises many students: the point-biserial correlation is mathematically identical to Pearson's r when one variable is dichotomous and the other is continuous. If you code the two groups as 0 and 1 and then compute a standard Pearson correlation, you will get exactly the same value as the point-biserial correlation. The point-biserial formula is simply a computationally convenient way of expressing Pearson's r for this specific type of data. This means you can interpret it in exactly the same way: its square gives you the proportion of variance in the continuous variable that is associated with group membership.

A Concrete Example

Imagine a researcher studying whether gender is associated with performance on a spatial reasoning test. The researcher collects data from 60 participants. Gender is coded as a binary variable (0 = female, 1 = male), and the spatial reasoning score is a continuous variable measured out of 100. After computing the point-biserial correlation, the researcher obtains a value of rpb = 0.35 with p < 0.01.

This means there is a moderate positive correlation between being coded as 1 (male) and higher spatial reasoning scores. The positive sign simply reflects the coding scheme: the group coded as 1 tends to score higher. If the researcher had reversed the coding (0 = male, 1 = female), the correlation would be −0.35 — the same strength, but with the sign flipped. This is important to remember: the sign of the point-biserial correlation depends entirely on which group you assign to 0 and which to 1.

Interpreting the Values

Like Pearson's r, the point-biserial correlation ranges from −1 to +1. A value of 0 indicates no relationship between the binary grouping and the continuous variable. General guidelines for interpretation are:

  • 0.00 to 0.10: negligible relationship
  • 0.10 to 0.30: small relationship
  • 0.30 to 0.50: medium relationship
  • 0.50 to 1.00: large relationship

If you square the point-biserial correlation, you get the coefficient of determination (r2). In our example, 0.352 = 0.12, meaning that about 12% of the variability in spatial reasoning scores is associated with gender. The remaining 88% is explained by other factors.

Key Assumptions

To use the point-biserial correlation correctly, several conditions should be met:

  • One truly dichotomous variable: The binary variable must have exactly two naturally occurring categories. If you have artificially split a continuous variable into two groups (for example, splitting test scores into "high" and "low"), the biserial correlation is a more appropriate measure.
  • One continuous variable: The other variable should be measured on an interval or ratio scale with a reasonable range of values.
  • Normality: The continuous variable should be approximately normally distributed within each of the two groups. This is especially important for smaller samples.
  • Equal variances: Ideally, the variance of the continuous variable should be similar in both groups (this is called homogeneity of variance or homoscedasticity).
  • Independence: Each observation should be independent of the others. One participant's data should not influence another's.

When to Use It

The point-biserial correlation is particularly useful in educational testing (for example, does getting a particular question right correlate with total test score?), in clinical research (does having a diagnosis correlate with a biological marker?), and in any situation where you want a single number that summarises how strongly a two-group distinction relates to a measured outcome. Because it is equivalent to Pearson's r, it slots neatly into broader analyses and is widely understood across disciplines.