Section 2.57 min read

Effect Size: Beyond P-Values

Core summary

Effect size measures how big a difference or relationship actually is, independent of sample size. It answers the question a p-value cannot: not 'is there an effect?' but 'how much does it matter?'

Detailed explanation

A p-value tells you whether an effect is detectable; the effect size tells you whether it is worth caring about. These are different questions, and confusing them is one of the most common errors in reading research. Effect size is the magnitude of a finding, how far apart two groups are, how strong an association is, expressed in a way that does not depend on how many people you studied. Effect sizes come in two broad flavours. Some are in natural, clinical units: a mean difference of 8 mmHg in blood pressure, an absolute risk reduction of 4%, a number needed to treat of 25. These are the easiest to interpret because they speak the language of practice. Others are standardized, unitless numbers that let you compare across different scales and studies. The best known is Cohen's d, which expresses a difference in standard-deviation units, with rough conventions of 0.2 (small), 0.5 (medium), and 0.8 (large). Correlation coefficients and odds ratios are also effect sizes. The point of all of them is to quantify 'how much', not merely 'whether'. Why does this matter so much? Because statistical significance and practical importance can point in opposite directions. With a huge sample, a trivial effect (a third of a millimetre of mercury) can be highly significant; with a small sample, a large and important effect can fail to reach significance. A p-value alone cannot distinguish these cases, but the effect size immediately reveals them. This is exactly why modern reporting standards insist that every result include an effect size with its confidence interval, not just a p-value. Interpreting effect sizes well requires clinical judgment, not just labels. Cohen's thresholds are convenient but arbitrary; what counts as a meaningful effect depends entirely on context. A 'small' reduction in mortality from a cheap, safe drug can be enormously valuable across a population, while a 'large' improvement on a meaningless surrogate marker may be worthless. The key question is always the minimal clinically important difference, the smallest effect that would actually change how you treat a patient. For the clinician, the discipline is to read every result in two layers: first the effect size, to see how big the difference is and whether it would matter at the bedside, and then the p-value and confidence interval, to see how certain that estimate is. A finding that is both large and precise is compelling; one that is significant but tiny, or large but wildly imprecise, deserves caution. Effect size is what turns a statistical result back into a clinical decision.

Clinical example

Two blood-pressure drugs are both 'significantly' better than placebo, but drug A lowers pressure by 2 mmHg and drug B by 14 mmHg. The p-values may look similar, yet the effect sizes show drug B offers a clinically meaningful benefit while drug A barely moves the needle.

Research example

A meta-analysis reports a standardized mean difference (Cohen's d) of 0.15 for an antidepressant versus placebo, statistically significant across thousands of patients, but small enough that reviewers debate whether it is clinically meaningful, illustrating why effect size, not just significance, drives the discussion.

Knowledge check

Q1. What does an effect size tell you that a p-value does not?

Q2. Cohen's d of 0.8 is conventionally considered:

Q3. A drug shows a statistically significant but very small effect size in a huge trial. The best interpretation is: