Section 5.17 min read

Correlation: Pearson and Spearman

Core summary

Correlation measures the strength and direction of the relationship between two continuous variables, from -1 to +1. Pearson measures linear relationships; Spearman uses ranks for monotonic, skewed, or ordinal data.

Detailed explanation

Correlation answers a simple question: as one variable changes, does the other tend to change too, and how strongly? The correlation coefficient, r, runs from -1 to +1. Its sign gives the direction: a positive r means both rise together, a negative r means one rises as the other falls. Its magnitude gives the strength: near plus or minus 1 is a tight relationship, while near 0 means little linear relationship. Rough labels put about 0.1 as weak, 0.3 as moderate, and 0.5 or more as strong, but context decides. Pearson's r is the default for two continuous, roughly normal variables and measures the linear relationship using the raw values. Spearman's rho is its non-parametric cousin: it correlates the ranks instead of the values, so it captures any monotonic relationship, one that consistently increases or decreases, and is robust to skew, outliers, and ordinal data. Choose Spearman when the data are skewed or ordinal, or when the relationship is monotonic but curved. Three cautions are essential. First and most important, correlation is not causation: a strong r can arise from a confounder, from reverse causation, or from coincidence. Second, r measures only linear (Pearson) or monotonic (Spearman) trends, so a strong curved relationship, such as a U-shape, can give an r near 0; always plot a scatterplot before trusting r. Third, a single outlier can dramatically inflate or deflate Pearson's r, which is another reason to look at the scatter. Report r with its confidence interval and p-value, but remember the p-value only tests whether r differs from 0; with a large sample even a trivially weak correlation can be 'statistically significant'. The strength of the relationship, the size of r, not the p-value, is what matters clinically.

Clinical example

BMI and systolic blood pressure correlate with a Pearson r of 0.45 in a clinic sample, a moderate positive linear relationship, but this does not prove that a higher BMI causes higher blood pressure.

Research example

A study of a skewed biomarker versus ordinal disease severity uses Spearman's rho (0.6) because the data are not normal and the relationship is monotonic.

Knowledge check

Q1. A correlation coefficient of -0.8 indicates:

Q2. When should you use Spearman's rho instead of Pearson's r?

Q3. A strong correlation between two variables proves that: