“Randomized Controlled Trial” is good but NOT gold-standard!

Randomized Controlled Trial (RCTs) is a good experimental design used to examine the effect of an intervention. This scientific design (Randomized controlled trial) was first described by Austin Bradford Hill in 1950s [1–3]. The introduction of RCTs in clinical research inspired governmental drug agencies to amend their conditions, requiring pharmaceuticals to provide results of well-designed prospective studies as an indicator for drugs’ efficacy instead of testimonials, case-reports, and expert opinions.

Within 1980s and with the voices calling for the shift towards evidence based medicine, epidemiologists described RCTs as the gold-standard research design. Since then, the claim that RCT is the gold standard research design has widely spread and is currently used by many researchers, instructors, and advocates.

In this article, I will explain historically, scientifically, and philosophically that RCT should not be regarded as the gold-standard research design. The reason I decided to discuss this problem is that the claim that RCT is a gold-standard research design affects the decision of funding agencies, giving RCTs a priority over all other research designs – which is not always correct. In addition, this claim affects the evidence synthesis process of systematic reviews and meta-analysis studies “as we will explain”.

Is the randomized control trial the best scientific method?

The answer of this question is No. Randomized controlled trials, like all other research designs, can be good and can not. Simply, because there are multiple other quality measures to consider before judging that a research study is good. In addition, each study has its own environment, setting, time, population, and circumstances; therefore, judging the quality of studies should be individually rather than collectively.

Therefore, let’s make it clear from the beginning that the design of randomized controlled trial, if well-performed, has the potential to evaluate the effect of an intervention and to examine the exposure-risk relationship.

The design of randomized controlled trial is equipped with multiple methodological keys: random allocation, concealment of allocation, blinding, and intention to treat analysis (ITT), which are unique features that allow for reducing bias.

Therefore, if we have two research studies that are similar in sample size, data quality, the same population definition, and outcome measures, the randomized controlled design is expected to be better than the observational design.

However, this similarity is only hypothetical and is not the case when assessing protocols for funding eligibility or when assessing studies for inclusion in systematic reviews and meta-analyses.

The meaning of “gold-standard design”

As explained by earlier epidemiologists and advocates, the “gold-standard” means the RCT design is superior to all other research designs as explained in all evidence hierarchies. Moreover, some evidence based medicine advocates mentioned that one RCT is superior to 13 observational studies that, all, showed contradictory findings to that RCT [4]. From this point, I will argue, advocating that RCTs should not be considered superior to all other research designs.

(1)   Practically, there is no standard

In the opening of my medical statistics sessions, I always explain the term “probability” and emphasize on the importance of understanding probability during the analysis, interpretation, and translation of clinical research data. Because it is impossible to include the whole target population in a clinical trial, we usually enroll a sample from the population; therefore, results will never give 100% accurate, true effect size.

Therefore, the definition of a standard test or design that has 100% accuracy is only theoretical and is impossible in practice. This principle goes with diagnostic methods, statistical analysis tests, and research methods, as well. Even with the intelligent design, and under the most controlled conditions, there is always a possibility of error and errors must exist. It is the duty of trialists to reduce the error to minimum.

Based on this, the faith of researchers and advocates in the RCT as a standard research design is unreasonable. Simply because, RCT like all other research designs, can give wrong effect estimates under some conditions:

  1. RCTs with small sample size are more liable to type I error
  2. Randomization can not balance variables within small sample size
  3. High risk randomization techniques might not balance variables between groups
  4. Inadequate blinding, unconcealed allocation, and improper handling of missing data might influence study findings
  5. RCTs with insensitive outcome measures are more liable to type I error
  6. The placebo effect in control group might underestimate drug efficacy
  7. Financial interests and industry might play a role in the reporting of trial results

In our systematic review of Coenzyme Q10 for Parkinson’s disease [5], the RCT of Muller et al. [6] showed mild symptomatic benefit from the Coenzyme Q10 supplementation as compared with the placebo group. These results were contradicted by subsequent studies. Although the study of Muller et al was described as RCT with adequate blinding, the randomization failed to balance the baseline values of the study groups in terms of the Unified Parkinson’s Disease Rating Scale which is the primary outcome measure. This example indicates that even with adequate random allocation and blinding procedures, other methodological issues as small sample size, can affect RCT results defying the claim that RCT design is a gold-standard.

If randomized controlled studies are gold-standard, why do we need to review and aggregate RCTs in the systematic review and meta-analysis? Moreover, why do we need to critically appraise these RCTs and assess their risk of bias during the systematic reviews and meta-analysis?

(2)   RCTs are not valid for every research question

It is known that the experimental design is not valid for every research question. Multiple ethical, religious, social, and cultural considerations can limit the use of experimental design. Which means that RCTs are not valid for every research question. In these cases, the observational design will be used instead. Therefore, in this case, RCT design will fail to examine the effect of the intervention. If you look at the evidence hierarchies, you will find observational studies as level 2 or level 3 evidence and does not qualify to level 1 evidence which is usually allocated for the RCT design.

Although many advocates suggested that public health and community interventions should be assessed within the RCT design exactly as pharmaceutical research, this approach did not seem practical for the field of public health all over the recent decades.

Many surgical interventions and psychotherapies can not be assessed under the conditions of randomized controlled design with sham control group. In addition, RCTs might not be practical for chronic disease and for urgent conditions when accelerated drug approval is warranted as no other available treatment choice. (For example: the RCTs of antiviral drugs for the treatment of AIDS [7,8]). In such case, observational studies are more suitable designs [9].

While funding personnel are not aware with the dilemma of other factors affecting research quality, the most important factor that jumps in minds in the (study design). By continuing to consider RCT as the gold-standard research design, a funding agency will, for example, prefer an RCT over an observational study assessing factors affecting malnutrition in African population – which is not fair.

(3)   Randomization

It is widely established that randomization is the only scientific method that can balance the differences in unknown variables.

  • Randomization in low sample size

Randomization can not balance variables if the same size is small. Therefore, describing a study as RCT with small sample size does not give it any superiority over a non-randomized study particularly if the latter has larger sample size.

  • High risk randomization techniques

However, we should consider the quality of randomization techniques. Although many randomization procedures include a high risk of bias, the investigators can still name their design as “Randomized controlled trial” and can still say that “… we will randomly assign …”. Shultz et al. showed that the quality of randomization techniques can affect the study outcomes which means that studies described as randomized controlled trials will not always give the same effect size [10].

  • Non-random allocation techniques might be good

We would like to mention that in special cases, non-random allocation techniques might be better than randomization of RCTs. Which means that it is a mistake to assume that any RCT is the best in its evidence hierarchy.

  • Randomized versus non-randomized studies

Shultz et al.[10] and Jaeschke et al.[11] reported that non-randomized studies are more likely to overestimate treatment effects. However, this conclusion can not be reached without assuming that RCT is more accurate design, meaning that the logic, behind this conclusion, is circular.

(4)   The potential to be superior does not necessarily mean they are always superior

As we mentioned earlier that blinding, allocation concealment, and ITT analysis are accompanying features that might advantage RCT design over the other designs. However, not all RCTs include proper blinding, allocation concealment, and handling of missing data. Therefore, describing a study as RCT does not advantage it over non-randomized studies, even if they have the same sample size, without looking at many other quality factors in the research design. Some advocates consider that blinding, allocation concealment, and ITT analysis are considered part of the randomization process and therefore, even if patients were randomly allocated to the ttt groups, the randomization process of a study is not complete unless the investigators ensure proper blinding, allocation concealment, and ITT.

How systematic reviews are affected

According to the preferred reporting items of systematic reviews and meta-analysis (PRISMA) and the Cochrane handbook of systematic reviews of interventional studies, the review authors set the eligibility criteria upon which they determine which studies will be included in the evidence synthesis process [12,13]. These criteria are classified into five domains (Population, Intervention, Comparator, Outcome, and Study Design – summarized as PICOS). After applying these criteria in the process of garbage in and garbage out, the review authors will have a set of included studies which meeting the eligibility criteria. These studies are then processed in the qualitative and quantitative evidence synthesis methods.

The problem here is that the quality assessment process was done after the selection of studies for eligibility. And the selection criteria include the domain of “study design” meaning that studies were filtrated according to their design, then eligible studies were assessed for the quality. I think that the process should be reversed so that we avoid excluding studies due to their design. Then, we assess the quality of all eligible studies and based on the quality assessment, authors should select high quality studies to be processed for evidence synthesis.


In conclusion, (1) Randomized controlled trial is a good scientific research method but should not be regarded as “Gold-Standard”; (2) the superiority of a research study should be determined in the context of many methodological issues including but not restricted to the study design; (3) funders and governmental agencies should consider quality indicators of research design other than  RCT/non-RCT classification; and (4) When conducting systematic reviews, type of study design should not be among the domains of eligibility criteria to avoid filtering studies based on their designs; All relevant studies to be assessed for the quality of design and then selected for evidence synthesis process.

Conflict of Interest: None to declare


[1]      V. Farewell, T. Johnson, Woods and Russell, Hill, and the emergence of medical statistics, Stat. Med. (2010) n/a–n/a. doi:10.1002/sim.3893.

[2]      R. DOLL, A.B. HILL, Smoking and carcinoma of the lung; preliminary report., Br. Med. J. 2 (1950) 739–48. http://www.ncbi.nlm.nih.gov/pubmed/14772469.

[3]      R. DOLL, A.B. HILL, The mortality of doctors in relation to their smoking habits; a preliminary report., Br. Med. J. 1 (1954) 1451–5. http://www.ncbi.nlm.nih.gov/pubmed/13160495.

[4]      E. Fineout-Overholt, Users’ Guides to the Medical Literature, Evid. Based. Nurs. 5 (2002) 8–8. doi:10.1136/ebn.5.1.8.

[5]      A. Negida, A. Menshawy, G. El Ashal, Y. Elfouly, Y. Hani, Y. Hegazy, et al., Coenzyme Q10 for Patients with Parkinson’s Disease: A Systematic Review and Meta-Analysis., CNS Neurol. Disord. Drug Targets. 15 (2016) 45–53. doi:10.2174/1871527314666150821103306.

[6]      T. Müller, T. Büttner, A.-F. Gholipour, W. Kuhn, Coenzyme Q10 supplementation provides mild symptomatic benefit in patients with Parkinson’s disease, Neurosci. Lett. 341 (2003) 201–204. doi:10.1016/S0304-3940(03)00185-X.

[7]      S. Epstein, Impure science: AIDS, activism, and the politics of knowledge, Univ of California Press, 1996.

[8]      S. Hellman, D.S. Hellman, Of mice but not men: problems of the randomized clinical trial, N Engl J Med. 324 (1991) 1585–1589.

[9]      A.N. Phillips, S. Grabar, J.-M. Tassie, D. Costagliola, J.D. Lundgren, M. Egger, Use of observational databases to evaluate the effectiveness of antiretroviral therapy for HIV infection: comparison of cohort studies with randomized trials, Aids. 13 (1999) 2075–2082.

[10]    K.F. Schulz, I. Chalmers, R.J. Hayes, D.G. Altman, Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials, Jama. 273 (1995) 408–412.

[11]    R. Jaeschke, D.L. Sackett, Research methods for obtaining primary evidence, Int. J. Technol. Assess. Health Care. 5 (1989) 503–519.

[12]    D. Moher, A. Liberati, J. Tetzlaff, D.G. Altman, Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement., PLoS Med. 6 (2009) e1000097. doi:10.1371/journal.pmed.1000097.

[13]    C.B. Series, J.P.T. Higgins, S. Green, Cochrane handbook for systematic reviews of interventions, Wiley Online Library, 2008. http://books.google.com/books?id=NKMg9sMM6GUC&pgis=1 (accessed June 1, 2014).



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s