Section 2.47 min read

Type I and Type II Errors

Core summary

Because we decide from samples, we can be wrong two ways: a Type I error is a false alarm (claiming an effect that is not real), and a Type II error is a missed finding (failing to detect a real effect).

Detailed explanation

Every time we use a sample to judge a hypothesis, we risk being wrong, and there are exactly two ways to err. Laying them out in a simple two-by-two table (the truth versus our decision) makes them easy to remember and is one of the most useful pictures in all of statistics. A Type I error is a false positive: we reject the null hypothesis and declare an effect when in reality there is none. It is the statistical equivalent of a false alarm, concluding a drug works when it actually does not. The probability of a Type I error is exactly alpha, the significance level we chose. Setting alpha at 0.05 means we accept a 5% chance of a false alarm on any single test. This is also why testing many outcomes is dangerous: run 20 independent tests on useless treatments and, on average, one will be 'significant' by chance alone. A Type II error is a false negative: we fail to reject the null and miss an effect that is genuinely there. It is the missed diagnosis of statistics, concluding a drug does not work when it actually does. Its probability is called beta. The flip side of beta is power (1 minus beta), the probability of correctly detecting a real effect when it exists. A study with 80% power has a 20% chance of missing a true effect of the size it was designed to find. The two errors trade off against each other and are governed by different forces. Lowering alpha to make false alarms rarer makes the test more conservative, which raises the chance of missing real effects (more Type II errors), and vice versa. The most reliable way to reduce Type II errors without inflating Type I is to increase the sample size, bigger studies have more power. This is the core reason sample-size planning exists, which is the focus of the next level. The clinical stakes of each error differ by context. A Type I error might launch a useless or harmful drug into practice; a Type II error might cause a genuinely effective therapy to be abandoned. Which is worse depends on the situation, and good study design weighs them deliberately rather than defaulting to habit. For a clinician, the takeaway is to read results with both errors in mind. A single 'significant' finding could be a Type I false alarm, especially if many things were tested or the study was small. A 'non-significant' finding in an underpowered study could easily be a Type II miss. Neither result is the final word, and knowing which error is more likely in a given study sharpens how much you should trust its conclusion.

Clinical example

A screening test flags a healthy patient as having disease (a false positive, analogous to a Type I error) versus missing disease in a sick patient (a false negative, analogous to a Type II error). Clinicians already reason this way every day; the same logic applies to interpreting study results.

Research example

A trial with only 30 patients finds no significant benefit from a drug that truly works. This is a likely Type II error caused by low power; a larger trial later detects the same effect as significant, confirming the first study simply missed it.

Knowledge check

Q1. A study concludes a drug works when in truth it does not. This is:

Q2. What is statistical power?

Q3. A small, underpowered trial finds 'no significant effect' for a drug that truly works. This is most likely: