Section 1.47 min read

Standard Deviation and Variance

Core summary

The mean tells you the center; the standard deviation tells you how spread out values are around it. Two groups can share a mean yet behave completely differently.

Detailed explanation

A center alone never describes data fully. Two wards can both have a mean patient age of 50, yet one is all middle-aged adults while the other is half children and half elderly. To capture that difference we need a measure of spread, and the workhorse is the standard deviation (SD). The standard deviation answers a simple question: on average, how far do individual values fall from the mean? A small SD means the data cluster tightly around the mean (values are similar, the group is homogeneous); a large SD means they scatter widely (values are diverse, the group is heterogeneous). The variance is just the standard deviation squared; it is used inside many statistical calculations, but because its units are squared (squared mmHg, squared kilograms) it is hard to interpret directly. The SD is preferred for reporting precisely because it is in the same units as the data, so 'mean blood pressure 130 with SD 12 mmHg' is immediately meaningful. Here is the single most useful fact about the SD, and it links to the normal distribution you will meet shortly. When data follow the familiar bell shape, about 68% of values lie within one SD of the mean, about 95% within two SDs, and about 99.7% within three. So if mean systolic blood pressure is 120 with an SD of 10, then roughly 95% of people fall between 100 and 140. This 'empirical rule' turns the SD from an abstract number into an intuitive picture of where most patients lie, and it is the basis for laboratory reference ranges. Two cautions. First, the SD is only a clean summary when data are roughly symmetric. For skewed data, report the median with the interquartile range instead, because the SD, like the mean, is distorted by outliers. Second, do not confuse the SD with the standard error, which you will study later: the SD describes the spread of individual patients, while the standard error describes the precision of an estimate such as the mean. Mixing them up makes results look far more precise than they are, a common and serious error in papers. In practice you will almost never compute an SD by hand. The skill is interpretation: a small SD says your patients are alike on this measure; a large SD warns that an average may hide enormous individual variation, which can matter more for the patient in front of you than the group average ever does.

Clinical example

Two diabetes clinics both report a mean HbA1c of 8%. Clinic A has an SD of 0.5% (most patients tightly controlled near 8), while Clinic B has an SD of 2.5% (some excellent, some dangerously high). The means are identical, but the spread reveals that Clinic B has patients at real risk hidden inside the average.

Research example

A trial reports mean weight loss of 4 kg with an SD of 1 kg in one arm and 4 kg with an SD of 6 kg in another. Same average effect, but the large SD signals that the second treatment helps some a lot and others not at all, prompting a look at who responds.

Knowledge check

Q1. Two groups have the same mean cholesterol, but Group A has SD 10 and Group B has SD 40. What does this tell you?

Q2. In a normal distribution with mean 120 and SD 10, roughly what percentage of values fall between 100 and 140?

Q3. Why is the standard deviation usually reported instead of the variance?