Section 3.37 min read

Mann-Whitney U and Wilcoxon Signed-Rank

Core summary

These are the non-parametric counterparts of the t-tests. Mann-Whitney U compares two independent groups; Wilcoxon signed-rank compares two paired measurements; both work on ranks rather than raw values.

Detailed explanation

When data are skewed, ordinal, or small, the t-tests' normality assumption fails, and you switch to their rank-based cousins. The Mann-Whitney U test (also called the Wilcoxon rank-sum test) is the non-parametric version of the independent t-test: it pools all the values, ranks them from smallest to largest, and checks whether one group's ranks are systematically higher, effectively comparing the distributions, and often the medians, of two independent groups. The Wilcoxon signed-rank test is the non-parametric version of the paired t-test: it ranks the sizes of the paired differences and tests whether the positive and negative changes balance out, comparing two related measurements on the same subjects. Because they use ranks, both tests are robust to outliers and need no normality assumption. They are ideal for ordinal outcomes (pain scores, Likert scales, tumor grade) and for skewed continuous data (biomarkers, length of stay). The output gives a p-value; report it alongside the matching descriptive statistics, medians with interquartile ranges, and ideally an effect size such as the rank-biserial correlation. One nuance: Mann-Whitney compares whole distributions, and interpreting it strictly as a 'median difference' assumes the two groups have similarly shaped distributions. If the shapes differ greatly, describe the result as a general tendency for one group to score higher. And never pair these tests with a mean (SD), match the descriptive statistic (median, IQR) to the rank-based analysis. Report an effect size alongside the p-value: for two groups the rank-biserial correlation, derived from the U statistic, expresses how strongly the groups separate. With very large samples these tests become sensitive enough to flag tiny, clinically trivial differences, so anchor your interpretation in the actual medians. The practical habit is consistency: the descriptive statistics, the graph (a box plot), and the test should all speak the same rank-based language.

Clinical example

Comparing pain scores (0 to 10) between two analgesics in 25 patients each is ordinal data, so the Mann-Whitney U test is used and reported with medians and IQRs.

Research example

A study of serum ferritin (skewed) before and after iron therapy in the same patients uses the Wilcoxon signed-rank test, reporting the median change rather than a mean.

Knowledge check

Q1. What is the non-parametric equivalent of the independent (two-sample) t-test?

Q2. For paired, skewed data on the same subjects, which test is appropriate?

Q3. When you use a Mann-Whitney U test, which descriptive statistics should you report?