Systematic reviews (SR) are commonly defined as “a summary of studies addressing a clear question, using systematic and explicit methods to identify, select, and critically appraise relevant studies, and to collect and analyse data from them” [1]. One of the essential tenets of evidence-based medicine is that optimal care requires up-to-date, rigorous summaries of evidence [2]. Without such summaries, clinicians and patients are vulnerable to unrepresentative samples of the evidence, misinterpreted and biased estimates of benefits and harms of interventions. For instance, consider the evolution of expert views regarding thrombolysis for acute myocardial infarction (AMI) [3]. An SR found 33 trials evaluating streptokinase in AMI patients, 25 suggesting mortality reduction and six reaching conventional levels of statistical significance. A cumulative meta-analysis found that by 1973—after eight trials (2432 patients)—results showed a mortality reduction with streptokinase. None of the subsequent trials changed the direction or magnitude of the odds ratio of dying. Yet, 34,542 AMI patients continued to be exposed to placebo in subsequent trials, including so-called mega-trials. The reason for the conduct of all the unnecessary trials was that experts, without the benefit of an SR and meta-analyses, could not grasp the big picture of the benefits of thrombolytic therapy, forcing trialists to conduct study after study until the message finally got through. In the meanwhile, patients died unnecessarily.

The fundamental standards for trustworthy clinical practice guidelines is that they rest on a foundation of a systematic summary of the highest quality evidence; among those offering standards for trustworthy guidelines, the need for such summaries is not in dispute [4, 5]. Moreover, those who have written compellingly regarding waste in medical research have emphasized the needless duplication and misguided research directions when carrying out new studies not preceded by systematic summaries of actual knowledge [6].

We consider these tenets indisputable: optimal clinical practice, optimal guidelines to inform clinical practice and avoiding research waste; all require systematic summaries of evidence. Indeed, though they were first introduced in medical science [7], other fields, including social [8] and business [9] sciences, have recognized the crucial role of SR.

The issue then arises of how to produce the most efficient, useful and informative SR. Arguing that the problems of duplication and poor conduct of SR cast doubt on their usefulness—indeed, their necessity—is as facile and imprudent as arguing that the profusion of poorly conducted randomized trials fundamentally challenges their usefulness.

In the 30 years since authorities suggested initial standards for the conduct of SR [10], their methodology has advanced enormously, including insights into the importance of appropriately chosen eligibility criteria and of partnership of methodologists and clinicians, greater sophistication in assessing risk of bias in individual studies, insights into use of optimal statistical models, and the advent of network meta-analysis for simultaneous consideration of multiple interventions. The GRADE system is the most recent and possibly most relevant development.

The GRADE methodology for rating the quality (otherwise known as certainty or confidence) of the evidence has been now adopted by over 100 organizations worldwide including the Cochrane Collaboration, the World Health Organization, UpToDate, and most critical societies. That methodology, extensively described for clinicians and other users [11], and for SR authors, guideline developers and health technology assessment practitioners [12], allows the classification of evidence as high, moderate, low or very low quality considering study design, risk of bias, precision, consistency, directness, publication bias and magnitude of effect. The detailed guidance for making quality judgements provides a transparent, straightforward approach that has facilitated a new rigour for the SR process. Not surprisingly, the widespread adoption of GRADE has not ushered in an era of uniformly exemplary SR, conducted efficiently and with successful efforts to avoid duplication. Systematic reviews continue to share a number of problems with other forms of research—poorly designed question, questionable population/interventions/outcomes, the conduct full of lapses in rigour both at the level of individual studies and of SR itself, and non-transparent academic and financial conflicts of interests.

There are about 11 SR published daily in the field of medicine [13] with high risk of redundancy, flawed methodology, conflict of interest-driven biases and misinterpretation of evidence [14]. Journals should refuse to publish SR not meeting rigorous standards, including duplicate assessment of eligibility and risk of bias, explanation of heterogeneity and consideration of conflict of interest. Application of GRADE in further use of SR reinforces adherence to optimal standards. Stemming the flow of poor quality misleading research will require a change in academic medical culture. As long as job security, promotion and recognition require high volumes of publications, academics will put a premium on volume rather than quality, and there will always be journals that accommodate that culture. Editors should no longer consider SR for publication without an open access protocol such as on PROSPERO [15]. Fortunately, there are many rigorous SR and guides for clinicians and other users to distinguish the credible from the fundamentally flawed [16]. The alternative to using systematic reviews is simply not available (Fig. 1).

Fig. 1
figure 1

Cartoon highlighting that in modern medicine, expensive interventions are rapidly increasing and delivered at the patient’s bedside. Evidence-based medicine remains the most efficient approach in the management of patients