Section 1.77 min read

The Normal Distribution

Core summary

The normal distribution is the symmetric bell-shaped curve that countless biological measurements follow. Its predictable shape underlies reference ranges and most common statistical tests.

Detailed explanation

If statistics has a single most important picture, it is the normal distribution, the smooth, symmetric, bell-shaped curve. An astonishing number of biological measurements approximate it: height, blood pressure, hemoglobin, birth weight, and many lab values in healthy populations. Understanding its shape unlocks much of the reasoning in this level. The normal distribution has three defining features. It is symmetric around its center, so the mean, median, and mode all sit at the same middle point. It is bell-shaped, with most values clustered near the center and progressively fewer toward the extremes in either direction. And it is completely described by just two numbers: the mean (which fixes where the peak sits) and the standard deviation (which fixes how wide and flat, or narrow and tall, the bell is). Change the mean and the whole curve slides left or right; change the SD and it stretches or squeezes. Its great practical value is predictability. Because the shape is fixed, we know precisely what fraction of values lies in any region, the empirical rule from the SD lesson: about 68% within one SD of the mean, about 95% within two, and about 99.7% within three. This is exactly how laboratory reference ranges are built. A 'normal' range is typically the central 95% of healthy people, the mean plus or minus about two SDs, which is also why, by definition, about 5% of perfectly healthy people fall outside the reference range and get flagged 'abnormal'. Knowing this prevents over-reacting to a mildly out-of-range result in a well patient. The normal distribution also underpins much of inferential statistics. Many of the most common tests, the t-test, ANOVA, Pearson correlation, and linear regression, assume that the data (or more precisely the errors) are roughly normally distributed. These are called parametric tests. When data are badly skewed and this assumption fails, you either transform the data or switch to non-parametric tests, which you will study in Module 3. So checking whether your data look approximately normal, usually with a quick histogram, is a routine first step before choosing a test. A crucial caution: 'normal' here is a mathematical term for the bell shape, not a judgment that the data are healthy or correct. Plenty of important medical variables are not normally distributed, hospital stay, viral load, triglycerides, and waiting times are all right-skewed, and forcing normal-based methods onto them produces misleading results. The skill is recognizing when the bell shape applies and when it does not, because that single judgment steers your entire choice of summary and test.

Clinical example

A laboratory sets its reference range for serum sodium as the central 95% of healthy donors (about mean plus or minus 2 SD). A patient's value just outside that range is not necessarily diseased; by construction roughly 1 in 20 healthy people falls outside, so the result must be read in clinical context, not treated as automatic proof of illness.

Research example

Before comparing mean birth weight between two groups with a t-test, a researcher plots a histogram and sees the familiar bell shape, confirming approximate normality and justifying the parametric test. Had the data been strongly skewed, a non-parametric test would have been the honest choice.

Knowledge check

Q1. Which two numbers completely describe a normal distribution?

Q2. A lab reference range is usually set as the central 95% of healthy people. What follows from this?

Q3. Why does it matter whether data are approximately normal before choosing a test?