Section 2.110 min read

Case-Control Studies

Core summary

A case-control study identifies people with a disease (cases) and similar people without it (controls), then looks backward to compare their past exposures. It is efficient for studying rare diseases and uses the odds ratio as its measure of association.

Detailed explanation

Case-control studies work backward from outcome to exposure — the opposite direction of cohort studies. You recruit cases (patients with the disease) and controls (patients without it, matched for age, sex, or other key variables). Then you determine each group's past exposures through medical records, interviews, or biological samples. Because you start with a fixed number of cases and controls (not a natural population), you cannot calculate risk directly. Instead, you calculate the odds ratio (OR): the odds of exposure among cases divided by the odds of exposure among controls. A critical statistical point: when the disease is rare (prevalence below approximately 10%), the OR approximates the relative risk. But when the disease is common, the OR and RR can diverge significantly — the OR will overestimate the RR. There are several important variants. In a matched case-control study, each case is paired with one or more controls matched on specific variables (age, sex, institution). This improves efficiency but requires conditional logistic regression for analysis. In a nested case-control study, cases and controls are drawn from within an existing cohort — combining the efficiency of case-control design with the data quality of a cohort study. In a case-crossover study, each patient serves as their own control at different time points, eliminating person-level confounders entirely. The major biases are recall bias (cases tend to search harder for explanations, remembering exposures that controls forget), selection bias (the source of controls fundamentally affects results — hospital-based controls may have different exposure patterns than population controls), and information bias. Choosing controls is often the most critical and debated decision in case-control design. Despite these limitations, case-control studies are uniquely suited for rare diseases (where cohort studies would need impossibly large samples), diseases with long latency periods (where prospective follow-up is impractical), outbreak investigations (you start with the cases and work backward rapidly), and preliminary investigations of multiple exposures for a single outcome. In later levels of this course, you will learn how to conduct each of these study designs step by step from zero — from writing the protocol to collecting data to analyzing results and writing the manuscript.

Clinical example

You want to study whether a rare industrial chemical used in plastic manufacturing causes hepatocellular carcinoma (liver cancer). Finding enough liver cancer patients in a prospective cohort would require following tens of thousands of factory workers for decades — prohibitively expensive and slow. Instead, you take a case-control approach: you identify 200 patients diagnosed with hepatocellular carcinoma at regional cancer centers (cases) and 400 patients admitted for unrelated conditions like elective surgery (controls), matched by age (within 5 years), sex, and geographic region. You review their detailed occupational history, going back 20-30 years, looking specifically for exposure to the industrial chemical. If 30% of cases report significant occupational exposure versus 10% of controls, the odds ratio is (30/70) / (10/90) = 3.86, suggesting that exposed workers have nearly 4 times the odds of developing liver cancer. This result would justify a larger cohort study to confirm the finding.

Research example

The landmark 1961 study by Widukind Lenz and William McBride linking thalidomide to birth defects is one of the most famous case-control studies in medical history. Lenz identified infants born with phocomelia (limb malformations — short or absent limbs) in Germany. He then compared the medication histories of mothers of affected infants (cases) with mothers of healthy infants (controls). The association was staggering: nearly all mothers of affected infants had taken thalidomide during the first trimester, while almost none of the control mothers had. Within months of these case-control results, thalidomide was withdrawn from the market. This study demonstrated two key strengths of the case-control design: speed (the answer came in weeks, not years) and efficiency for rare outcomes (you do not need to follow thousands of pregnancies prospectively). It also demonstrates a potential limitation: the results cannot prove causation alone. However, the strength of the association, the dose-response relationship, and the biological plausibility made the causal link highly convincing even before confirmatory studies were completed.

Knowledge check

Q1. In a case-control study, what is the correct measure of association?

Q2. Which bias is MOST characteristic of case-control studies?

Q3. When is a case-control study MOST appropriate?