Section 1.37 min read

Mean, Median, and Mode

Core summary

The mean, median, and mode each describe the 'typical' value differently. In skewed clinical data the median is often the honest choice, because the mean is pulled by extreme values.

Detailed explanation

When we summarize a set of numbers, the first question is usually: what is the typical value? Statistics gives three answers, called measures of central tendency. The mean is the ordinary average, add all values and divide by how many there are. The median is the middle value when the data are lined up in order, with half below and half above. The mode is simply the value that occurs most often. In a perfectly symmetric dataset these three coincide. But real clinical data is often skewed, stretched out by a tail of unusually high or low values, and then they diverge, sometimes dramatically. The crucial property is that the mean is sensitive to outliers while the median is resistant to them. Imagine ten patients whose hospital stays are mostly 3 to 5 days, but one stays 90 days after a complication. That single value barely moves the median but drags the mean far upward, producing an 'average' stay that describes none of the actual patients. This is why the choice matters clinically. For symmetric, bell-shaped data (like height or many lab values in healthy people) the mean is an excellent, efficient summary. For skewed data, length of stay, hospital cost, income, biomarker levels, waiting times, the median usually tells the more honest story, because it represents the experience of a typical patient rather than being inflated by a few extremes. A simple diagnostic: if the mean and median are far apart, your data is skewed and you should probably report the median. The mode is used less often for numbers but is essential for categorical data, where 'most common' is exactly what you want (the most frequent diagnosis, the modal blood type). A famous trap is the 'mean income' problem: put a billionaire in a room of ordinary earners and the mean salary suggests everyone is rich, while the median still reflects the typical person. The same logic applies to skewed medical variables. Journals increasingly expect the median with the interquartile range for skewed variables, and the mean with standard deviation for symmetric ones. The practical takeaway is not a formula but a judgment: look at the shape of your data first, a quick histogram is enough, then pick the measure of center that genuinely represents your patients.

Clinical example

An ICU reports a mean length of stay of 11 days, but most patients actually go home in 4 days; a handful of very long stays pull the mean up. The median of 4 days describes the typical patient far better and changes how administrators should plan bed capacity.

Research example

A study of serum ferritin (a famously skewed marker) reports a median of 80 micrograms/L with its interquartile range, not a mean, because a few very high values would otherwise inflate the average and misrepresent the cohort.

Knowledge check

Q1. Hospital stays: most patients 3 to 5 days, one patient 90 days. Which best describes the typical patient?

Q2. If the mean is much larger than the median, the data are probably:

Q3. For reporting the most common blood type in a sample, which measure fits?