Linear Regression
Core summary
Linear regression predicts a continuous outcome from one or more predictors by fitting a straight line. The slope (coefficient) shows how much the outcome changes per unit of the predictor, and the model can adjust for confounders.
Detailed explanation
Detailed explanation
Where correlation just measures association, regression builds a predictive model. Simple linear regression fits a straight line to predict a continuous outcome from one predictor: outcome = intercept + slope times predictor. The slope, the regression coefficient often labeled B, is the key number: it tells you how much the outcome changes for each one-unit increase in the predictor, for example 'systolic blood pressure rises by 0.8 mmHg per kilogram of weight'. The intercept is the predicted outcome when the predictor is zero, which is often not clinically meaningful. The real power comes from multiple, or multivariable, linear regression, which includes several predictors at once. This lets you estimate the effect of each predictor while holding the others constant, the essential trick for adjusting for confounders. So you can ask 'what is the effect of weight on blood pressure after adjusting for age and sex?' Each predictor gets its own adjusted coefficient with a confidence interval and p-value. The model assumes a roughly linear relationship, independent observations, roughly normal residuals (the leftover errors), and constant variance of those residuals. Always report the coefficient with its 95% confidence interval, and judge clinical meaning by the size of the coefficient, not the p-value alone. R-squared describes how much of the outcome's variation the model explains, from 0 to 1, but a high R-squared does not prove the model is correct or causal. Common pitfalls include extrapolating beyond the range of the data, assuming a coefficient proves causation (regression adjusts only for the confounders you measured and included), cramming in too many predictors for the sample size (overfitting), and ignoring nonlinearity or influential outliers. A good habit is to inspect residual plots before trusting the model.
Clinical example
A regression predicts birth weight from gestational age and maternal BMI; the gestational-age coefficient, say +180 g per week, is interpreted while holding BMI constant.
Research example
A multivariable linear model estimates a drug's effect on cholesterol adjusted for age, sex, and baseline cholesterol, isolating the drug's independent effect.
Knowledge check
Q1. In a linear regression, the slope (coefficient) tells you:
Q2. What is the main purpose of multivariable (multiple) linear regression?
Q3. R-squared of 0.30 means: