P-Values Explained Simply
Core summary
A p-value is the probability of seeing results as extreme as yours if there were truly no effect. It is not the probability that your hypothesis is true, and a small p-value does not mean a large or important effect.
Detailed explanation
Detailed explanation
The p-value is the most used and most misunderstood number in medical research. Here is the honest, plain-language definition: the p-value is the probability of observing a result at least as extreme as the one you got, assuming the null hypothesis (no real effect) is true. A p-value of 0.03 means that if the treatment truly did nothing, data like yours or more extreme would occur only about 3% of the time by chance. Because that is unlikely, we lean toward concluding the effect is real. By convention researchers compare p to a threshold called alpha, usually 0.05. If p is below 0.05 the result is called 'statistically significant'; if above, 'not significant'. But this cutoff is an arbitrary tradition, not a law of nature, p = 0.049 and p = 0.051 are practically identical, yet they fall on opposite sides of the line. Treating 0.05 as a magic boundary is one of the deepest problems in how research is reported. It is just as important to know what a p-value is NOT. It is not the probability that the null hypothesis is true. It is not the probability that your result happened by chance. It does not tell you the size or importance of an effect, a tiny, clinically irrelevant difference can have a very small p-value if the sample is huge, and a large, important difference can be 'non-significant' in a small study. And a non-significant p-value does not prove there is no effect; absence of evidence is not evidence of absence. These misreadings appear constantly in published papers and press releases. The deeper issue is that a p-value answers only one narrow question, 'are these data surprising if nothing is going on?', and says nothing about how big or meaningful the effect is. That is why modern statistical guidance urges researchers to report effect sizes and confidence intervals alongside (or instead of) bare p-values, and to avoid reducing a study to 'significant or not'. Some journals now discourage the word 'significant' entirely. For a clinician reading a paper, the practical habits are: never accept a p-value without also asking how large the effect was and how wide its confidence interval is; be suspicious of a result that is statistically significant but clinically trivial; and remember that one p-value below 0.05 is weak evidence on its own, especially when many comparisons were tested. The p-value is a useful screening signal, not a verdict, and certainly not a measure of truth or importance.
Clinical example
A large database study finds that a drug raises mean fasting glucose by 0.4 mg/dL with p < 0.001. The p-value is tiny because the sample is enormous, but a 0.4 mg/dL change is clinically meaningless, a perfect example of statistical significance without clinical significance.
Research example
A small trial of a promising therapy reports a 20% reduction in mortality with p = 0.12. The result is 'not significant', but with only 40 patients the study was underpowered; the non-significant p does not prove the drug fails, only that this study could not confirm the effect.
Knowledge check
Q1. Which statement about a p-value of 0.04 is correct?
Q2. A study of 50,000 people finds a 0.2 mmHg blood-pressure difference with p < 0.001. Best interpretation?
Q3. A small trial reports p = 0.20. What is the correct conclusion?