Primer · Methodology9 min read

How to read a randomised controlled trial

An RCT is methodologically the strongest form of clinical evidence — but only if you know what to look at. A six-step guide.

Published: 2026-05-21

When someone says "studies show X works", the follow-up question matters: what kind of study, how many people, how long observed, and who paid for it? The randomised controlled trial — RCT — is the gold standard when these questions have good answers. It is not automatic proof of truth.

1. What randomisation actually does

Randomisation distributes participants randomly between treatment and control groups. The point: all known and unknown confounders — age, comorbidities, lifestyle, genetics — distribute evenly across both groups, in the ideal case. What remains is the pure effect of the intervention.

This works less well in small studies. With 20 people, chance can make the groups unequal. That is why every RCT's "baseline characteristics" table at the beginning is worth checking: do age, sex, BMI, comorbidities look similar across groups? If not, randomisation may have failed — or the study was too small.

2. Blinding — and why it is hard

Blinding means: participants, researchers and ideally the analysing statistician do not know who is in which group. This prevents expectation effects — both in those treated (placebo) and in those observing (bias in endpoint assessment).

Double-blind (participants + treating doctors) is standard. Triple-blind (plus the statistician) is better. Open-label trials — where everyone knows who gets what — are methodologically weaker, especially when endpoints are subjective (e.g. "well-being", "pain").

What if blinding is impossible?

Some treatments can hardly be blinded — surgical interventions, or substances with distinctive side effects. Then harder, more objective endpoints become important (mortality, lab values) rather than subjective ones.

3. Primary endpoint — and what doesn't count

An RCT selects ONE primary endpoint before it starts and calculates the sample size to test exactly that endpoint with statistical robustness. All other measured quantities are secondary or exploratory.

This matters: when a study misses its primary endpoint but the discussion talks prominently about a secondary endpoint, that is a warning sign. It does not necessarily mean the secondary finding is wrong — but it was not planned that way, so its strength is smaller.

4. Effect size vs. statistical significance

An RCT can statistically significantly show a tiny result if the sample is large enough. Example: a study with 10,000 participants finds a HbA1c reduction of 0.02 percentage points with p < 0.001. Statistically significant — clinically meaningless.

Conversely, a clinically meaningful effect can appear non-significant in a small study. That is why effect size and confidence interval belong in the foreground, not the p-value. A 20% reduction with 95% CI from 12% to 28% says more than "p = 0.03".

5. Who funded it?

Industry-funded studies are not automatically unreliable — many excellent RCTs are paid for by pharmaceutical companies. But the funding source is relevant for interpretation: positive results are more likely to be published than negative (publication bias), and study designs can carry subtle pro-substance assumptions.

Look at the "Funding" section and the "Conflict of Interest" statements. Both are mandatory in quality-assured journals. Independent replications by other research groups are the strongest trust signal.

6. What a small RCT CANNOT tell you

Rare adverse events — an RCT with 200 people cannot reliably detect a 1-in-1000 risk.
Long-term safety — a 12-week study says nothing about effects after 5 years.
Real-world effect — RCT populations are often selected (young, healthy, compliant). Real patients have comorbidities, forget doses, combine with other substances.
Subgroup effects — secondary analyses by sex, age or weight are exploratory, not conclusive.

The most honest question

Before every "studies show" claim, the test is worth running: would the effect still be there after three replications in independent labs with larger samples? If the answer is "probably", the evidence is robust. If it is "let's see", it is provisional.