Hypothesis Testing Logic
Core summary
Hypothesis testing is a structured way to decide between 'no effect' (null) and 'some effect' (alternative). You assume no effect, then check whether the data are surprising enough to reject that assumption.
Detailed explanation
Detailed explanation
Almost every statistical test follows the same logical recipe, and once you see the pattern it demystifies most of the analysis in any paper. The process is a bit like a courtroom trial, where the treatment is 'presumed innocent' of having any effect until the evidence is strong enough to convict. Step one: state two competing hypotheses. The null hypothesis (H0) is the default claim of no effect, no difference, no association, for example 'the new drug and placebo produce the same mean blood pressure'. The alternative hypothesis (H1) is what you actually suspect, 'the two means differ'. Crucially, the test always starts by assuming the null is true, exactly as a court presumes innocence. Step two: collect data and compute a test statistic, which measures how far your observed result sits from what the null predicts. Step three: convert that into a p-value, the probability of data this extreme if the null were true. Step four: compare p to your pre-chosen significance level, alpha (usually 0.05). If p is below alpha, you reject the null hypothesis and say the result is statistically significant; if not, you fail to reject the null. The language here is deliberately careful and worth getting right. We never 'accept' or 'prove' the null hypothesis, we only 'fail to reject' it, just as a court returns 'not guilty' rather than 'innocent'. Failing to find evidence of an effect is not the same as proving there is none. Likewise, rejecting the null does not prove the alternative with certainty; it only says the data are hard to explain by chance alone. A few principles keep this from being mechanical. The hypotheses and alpha must be set before looking at the data, otherwise you can fool yourself by moving the goalposts. The test answers only the narrow question of whether an effect exists, not how big or important it is, which is why effect sizes and confidence intervals must accompany it. And a single test rarely settles a question; reproducibility across studies matters far more than crossing the 0.05 line once. For a clinician, the value of understanding this logic is that it lets you read any 'we tested whether...' sentence and know exactly what was and was not shown. A significant result means the data would be surprising if nothing were happening; a non-significant result means they would not be. Neither is a final truth, both are one structured step in weighing evidence, to be combined with effect size, study quality, and biological plausibility.
Clinical example
Researchers compare infection rates between two surgical-prep solutions. H0: the rates are equal; H1: they differ. They find p = 0.01 and reject H0, concluding the difference is unlikely to be chance, but they still report the actual difference in infection rates so clinicians can judge whether it matters.
Research example
A trial testing whether a drug improves survival finds p = 0.30 and fails to reject the null. The authors correctly write that they 'found no significant difference', not that the drug 'has no effect', the study may simply have been too small to detect a real benefit.
Knowledge check
Q1. In hypothesis testing, what does the null hypothesis state?
Q2. A test yields p = 0.40 with alpha set at 0.05. The correct decision is to:
Q3. Why must the hypotheses and alpha be set before collecting data?