Statistically significant — what it really means
The p-value is one of the most misunderstood concepts in medical literature. A clarification in five points.
Almost every headline about a new study ends with "statistically significant", as if that were a quality seal. What most readers connect with it — "the effect is real" or "the effect is large" — is both wrong. p-values measure something else.
1. What the p-value actually says
A p-value is the probability of observing the result (or a more extreme one), ASSUMING the null hypothesis is true. The null hypothesis is typically: "there is no effect".
So a p-value of 0.03 means: if the substance did not actually work, we would see the observed result (or a more extreme one) by chance in 3% of cases. That is a conditional probability, not a direct measure of "how likely is the effect real".
2. The 0.05 threshold is a convention
p < 0.05 as "significant" goes back to Ronald Fisher in the 1920s. It is an arbitrary convention, not a law of nature. p = 0.049 and p = 0.051 are statistically practically identical — one is celebrated, the other ignored.
High-quality medical statistics increasingly call for p-values to be reported together with effect sizes and confidence intervals — not instead, and not as the sole claim.
3. Effect size vs. significance
A tiny, clinically irrelevant difference can be statistically significant if the sample is large enough. A 0.02-percentage-point reduction in HbA1c with p < 0.001 is statistically impressive and clinically irrelevant.
Conversely: a 15% reduction of a clinically important endpoint can have p = 0.12 because the study was too small. "Not significant" does not mean "no effect" — it means "the data are not sufficient to say it for sure".
4. Multiple tests and p-hacking
When a study tests 20 different endpoints, statistically one is expected to reach p < 0.05 purely by chance. That is not proof of anything — it is mathematically guaranteed.
p-hacking is the practice of adjusting analyses until a significant p-value emerges: analysing different subgroups, trying different statistical tests, removing individual outliers. Pre-registered study protocols and "intention-to-treat" analyses are meant to prevent this.
5. Confidence intervals as the better measure
Rather than just "significant yes/no", a 95% confidence interval expresses effect size AND precision in one number. HR 0.80 with 95% CI 0.72–0.90 is more informative than "HR 0.80, p < 0.001":
- Point estimate: the most likely effect (HR 0.80 = 20% reduction).
- Interval width: how certain this estimate is (0.72–0.90 = quite narrow = precise).
- Whether 1.0 is included: signals whether the effect is statistically significant (1.0 NOT included = significant).
A wide CI means: lots of uncertainty. "HR 0.80 with 95% CI 0.35–1.80" means — the true effect could lie between a 65% reduction and an 80% increase. That is practically no information.