Logistic Regression
Core summary
Logistic regression predicts a binary (yes/no) outcome from one or more predictors. Its effect measure is the odds ratio, and it is the most common model in clinical research.
Detailed explanation
Detailed explanation
When the outcome is binary, such as disease or no disease, survived or died, readmitted or not, you cannot fit a straight line, because a probability must stay between 0 and 1. Logistic regression solves this by modeling the odds of the outcome on a transformed scale, then reporting each predictor's effect as an odds ratio. An odds ratio above 1 means the predictor raises the odds of the outcome, below 1 lowers them, and 1 means no effect, for example 'each 10-year increase in age carries an odds ratio of 1.4 for the complication'. Like linear regression, its great value is the multivariable, adjusted version: you include several predictors and obtain an adjusted odds ratio for each, holding the others constant. This is the standard way to adjust for confounders when the outcome is yes/no, which is why logistic regression dominates clinical research and is the engine behind many risk scores and prediction models. The interpretation cautions from the odds-ratio lesson carry over: the odds ratio approximates the relative risk only when the outcome is rare; for a common outcome it overstates the effect, so be careful translating an adjusted odds ratio into everyday risk. Report each odds ratio with its 95% confidence interval, where an interval including 1 is non-significant. Assumptions include independent observations, enough outcome events per predictor (a common rule of thumb is at least 10 events per predictor to avoid overfitting), and no extreme correlation among the predictors. Pitfalls include cramming too many predictors into a model with few events, reading an adjusted odds ratio for a common outcome as if it were a risk ratio, and forgetting that, like all regression, it adjusts only for the confounders you actually measured and included.
Clinical example
A model predicts 30-day readmission from age, comorbidity count, and length of stay; each additional comorbidity has an adjusted odds ratio of 1.3 (95% CI 1.1 to 1.6).
Research example
A case-control study uses logistic regression to estimate the adjusted odds ratio of an exposure for a rare cancer, controlling for smoking and age.
Knowledge check
Q1. Logistic regression is the appropriate model when the outcome is:
Q2. The effect measure reported by logistic regression is the:
Q3. A common rule of thumb to avoid overfitting a logistic model is at least: