Background

Rheumatoid arthritis (RA) is a systemic inflammatory autoimmune disease that may trouble patients as a result of morning stiffness, painful joints, chronic inflammation, synovitis, irrecoverable joint damage, and the presence of autoantibodies [1, 2]. The prevalence of RA in adults worldwide is 0.04–1.6%, with significant national differences [3]. In China, RA has an estimated prevalence of 0.42% and affected more than 5 million patients in 2018 [4]. The pathogenesis of RA is complex, and the course of RA is lingering; RA is characterized by symmetrical, chronic, and progressive polyarthritis, which, as the disease progresses, leads to the destruction of articular cartilage, bone, and capsule, resulting in irreversible joint deformity and incapacitation [5, 6]. At present, the common medications for RA include glucocorticoids (GCs), nonsteroidal anti-inflammatory drugs (NSAIDs), and disease-modifying antirheumatic drugs (DMARDs) [7,8,9]. Some studies have reported that sinomenine (SIN), Tripterygium wilfordii Hook, Simiao pill, Wang-bi tablet, total glucosides of paeony (TGP) [10,11,12,13,14,15] and other traditional Chinese medicines and their related prescriptions possess beneficial effects and show good clinical efficacy in the treatment of RA, supporting why traditional Chinese medicines and prescriptions have received increasing attention [16,17,18].

Zheng Qing Feng Tong Ning (ZQFTN) is one of the SIN preparations, and it is an alkaloid monomer extracted from the traditional Chinese herb Sinomenium acutum and has been used in clinical practice [19]. Some studies have shown that SIN may have a good effect on the treatment of RA (e.g., less pain and an improvement in physical function or morning stiffness) [20, 21]. Mechanistic studies have indicated that SIN can alleviate collagen-induced arthritis (CIA) via the inhibition of angiogenesis [22], induce the generation of intestinal Treg cells, relieve arthritis by activating the aryl hydrocarbon receptor [23] and suppress RA progression by modulating the secretion of various inflammatory cytokines and the monocyte/macrophage subpopulation [24]. Currently, ZQFTN series products are one of the Chinese medicine varieties used for the domestic treatment of RA, and ZQFTN is a modern Chinese medicine preparation [25]. Studies have shown that SIN has anti-inflammatory, analgesic and immunosuppressive effects [26], which indicates that it may play a crucial role in the treatment of RA. A multitude of clinical trials on the efficacy and safety of ZQFTN in the treatment of RA have been performed in mainland China and other countries. The relevant methods and quality analyses of the reports may promote the evidence-based clinical treatment of RA. Systematic limitations or deficiencies in the design, conduct, or report of articles may bias the results.

The assessment of multiple systematic reviews (AMSTAR) is a tool used for the rigorous evaluation of systematic reviews of randomized controlled clinical trials that explicitly focuses on assessing risk of bias (RoB) and internal effectiveness in the methodological quality of intervention-related systemic resuscitation [27]; the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) is a reporting guideline that has made some advances in concepts and methods in randomized trials that conduct and report systematic reviews [28]; and the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach is more reliable than intuitive judgments when assessing the quality of evidence on outcomes of health care interventions [29]. However, until now, there has been no systematic review that explored the characteristics associated with the methodological quality of controlled trials (random or nonrandom) that evaluated the effectiveness and safety of ZQFTN in the treatment of RA. Therefore, we searched all systematic reviews and meta-analyses of SIN and its preparations in RA until 2019 and applied three tools, AMSTAR 2, PRISMA and GRADE, to evaluate the quality of these studies. Ultimately, the aim of our study was to provide better evidence-based medical support for the clinical application of SIN in RA.

Methods

Search strategy

Systematic searches were carried out in the China National Knowledge Infrastructure (CNKI), Wanfang, VIP database for Chinese technical periodicals (VIP), Cochrane Library and PubMed databases through the end of July 2019. The Medical Subject Headings (MeSH) items included “sinomenine”, “sinomenine preparation”, “Zhengqing Fengtongning”, “RA”, “rheumatoid arthritis”, “meta-analysis” and “systematic review”. The keywords contained “Qing teng jian”, “Qing teng jian zhi ji”, “Zheng qing feng tong ning”, “Lei feng shi guan jie yan”, “Lei feng shi xing guan jie yan”, “meta fen xi”, “Xi tong ping jia” and “Hui cui fen xi” (in Chinese). The detailed search strategy is shown in supplementary Tables 1 and 2.

Selection of reviews

The inclusion criteria were as follows: (1) article types were systematic reviews and meta-analyses; (2) the drug intervention was SIN, SIN preparations, ZQFTN, or ZQFTN sustained-release tablets; (3) studies that utilized the RA classification standards established by the American College of Rheumatology (ACR) in 1987; (4) articles published in English or Chinese; and (5) studies published in journals.

The exclusion criteria were as follows: (1) studies were neither systematic reviews nor meta-analyses; (2) the drug intervention was neither SIN nor ZQFTN; (3) the sample included patients with other diseases; (4) systematic reviews/meta-analyses theory or literature quality; (5) a republished article or an article not published in full; and (6) academic dissertations or conference papers.

Document selection and data extraction

Excel 2010 software was used to establish AMSTAR 2, PRISMA and GRADE evaluation scales. Two reviewers completed the literature retrieval independently, screening according to the inclusion and exclusion criteria, and extracted the data according to the preestablished forms. The extracted data were as follows: basic information (studies, publication year, language, publication form, number of documents, and number of cases), intervention measures (experimental group vs. control group), outcome, and conclusion. Any disagreement was resolved by discussion with a third party (Zhitao Feng).

Quality assessment

The AMSTAR 2 scale and PRISMA statement were used for the methodological and reporting evaluation, respectively, and the GRADE was used for the evidence quality evaluation [27,28,29]. The evaluation scales were preassigned by Excel 2010. Two reviewers completed the evaluation of the quality of the literature independently. The literature was also evaluated by the AMSTAR 2 scale, PRISMA statement, and GRADE. The rating criteria were as follows.

The AMSTAR 2 scale comprises 16 items. If the item is adequately answered and correct, it is judged as “Yes”. If the item is answered correctly but the evidence is insufficient, it is judged as “Partial Yes”. If there is no information in the article, it is judged as “No”. Answers of “Yes” are scored as 1 point, and answers of “No” and “Partial Yes” receive no score; the total score is 11 points.

The PRISMA statement contains 27 items, and each item is scored as follows: a complete report scores 1 point, a partial report scores 0.5 points, and no report scores 0 points. When the score is 21–27, the report is considered relatively complete; when the score is 15–21, the report is considered to have certain defects; and when the score is below 15, relatively serious information is considered to be missing.

The five downgrading elements of the GRADE were as follows: RoB (unrepresentative sample, allocation concealment, not blinded, incomplete reporting of patient and outcome events, and selective results reporting bias and other limitations), indirectness (indirect comparison of the population, intervention, comparator, and outcome (PICO)), inconsistency (similarity of point estimates, overlap degree of confidence intervals (CIs), heterogeneity test P < 0.05, and heterogeneity I2 > 50%), imprecision (small sample size and a wide 95% CI) and publication bias (funnel plots, Egger test, including unpublished research and gray literature). The quality of evidence is divided into four levels by the GRADE: high (we have great confidence that the real effect is close to the estimated result), moderate (we have moderate belief that the actual effect is close to the estimated result), low (we have limited confidence in the effectiveness estimate), and very low (we have little confidence that the actual results are comparable to the estimated results). Initially, each result defaults to “high” quality and is classified into the above 4 levels after a judgment of the 5 downgrading factors. Two reviewers carefully studied each evaluation scale and agreed on the evaluation criteria, and then each reviewer performed an independent literature evaluation. In the case of a disagreement, a third party (Zhitao Feng) discussed the decision to reach an agreement.

Results

Results of the search strategy

The initial search yielded 180 articles, of which 15 were excluded because they were duplicates, and 14 were excluded after reading the titles and abstracts. Of the remaining 151 articles, 143 were excluded because they did not meet the inclusion criteria after the full-text screen. Finally, 8 articles were accepted: 5 published in Chinese and 3 published in English. The screening process is summarized in a flow diagram in Fig. 1, and the basic information of the included studies is shown in Table 1.

Fig. 1
figure 1

Flow chart of literature search. Abbreviation: CNKI, China National Knowledge Infrastructure; VIP, VIP Database for Chinese Technical Periodicals

Table 1 The general information of the included studies

Literature analysis

Amstar 2

The average AMSTAR 2 score was 6.625 (full score 16); the highest score was 10, and the lowest score was 4 (Table 1). Only two included studies achieved a good overall AMSTAR 2 score (“Y” ≥ 50% of the items) [21, 33], and the optimal items (8/8) were item 1, item 5, item 6 and item 8. All of the studies adequately used the PICO components. Five (5/8) [20, 21, 32, 33, 35] reviews appropriately explained the sources of funding. Five (5/8) [20, 21, 31, 34, 35] reviews accounted for RoB in the primary studies. Three (3/8) [21, 33, 35] studies assessed the potential impact of RoB in individual studies on the results and investigated the publication bias sufficiently. Only two (2/8) [30, 32] studies evaluated RoB using an acceptable technique and provided a satisfactory explanation for and discussion of any observed heterogeneity. A comprehensive literature search strategy is necessary; however, it appeared only in one (1/8) [21] review. Any missing reports regarding a conflict of interest could mislead researchers, and only one (1/8) [20] study mentioned this topic. None (0/8) of the reviews mentioned item 2, item 3, item 7 or item 11, and a statement regarding the review methods being established primarily, the selection of the studies for inclusion, the appropriate methods for the statistical combination of results, and a list of excluded studies were all lacking (Table 2).

Table 2 AMSTAR 2 scores for the methodology of reviewers included in study

PRISMA

The average PRISMA score was 17.69 (maximum score 27). The maximum score of the eight included articles was 20.5, and the minimum score was only 15.5, as shown in Table 1. None of the articles reported the 27 items completely. (1) Title: All articles reported the title (8/8); (2) Structured summary: Two papers did not meet the criteria of providing structured abstracts; neither of them reported the background of the study nor the registration number of the study [20, 34]. (3) Introduction: All the studies described the theoretical basis in detail and reported the purpose completely, but no complete report on previous reviews was provided. (4) Methods: None of the documents reported registration information or complete report plans. None of the corresponding gray literature was selected. Only one of the studies completely reported a database search strategy [21]. In the course of describing the selected studies, 4 studies reported a PRISMA literature screening flow chart [21, 33,34,35]. Four papers reported RoB in individual studies but did not describe how bias was used to evaluate the results or its impact on outcomes in further studies [21, 31, 33, 35]. Only 3 studies reported publication bias (i.e., funnel charts were drawn) [21, 33, 35]. All of the studies listed the characteristics of the included studies in detail and tested for homogeneity and heterogeneity. (5) Results: None of the articles fully described the characteristics of the studies or reported the follow-up time, funding resources, etc. Two papers did not fully report the study selection [20, 30] and failed to provide the reasons for excluding the literature at each step. Eight papers described the results of individual studies and results in the synthesis and carried out homogeneity and heterogeneity tests. Only 1 article [21] explained other analyses, such as subgroup analysis and sensitivity analysis. (6) Discussion: Five articles [20, 30, 31, 33, 35] used graphs to demonstrate each major result, and only 1 article [32] did not report the limitations of the systematic review. (7) Funding: Five articles reported funding sources [20, 21, 32, 33, 35], but only 1 mentioned the role of the funders [20] (Table 3).

Table 3 Reporting quality analysis of Meta-analyses of SIN treatment of RA

Grade

Sixty-one outcomes measured by the 8 included reviews. Among these outcomes, high quality of evidence was found in none of the reviews (0.0%), moderate evidence was found in 15 reviews (25%), low evidence was found in 34 reviews (55%), and very low evidence was found in 12 reviews (20%). Regarding the five downgrading elements, the most common items were RoB (n = 61, 100%), inconsistency (n = 30, 50%), publication bias (n = 17, 28%), imprecision (n = 11, 18%) and indirectness (n = 0, 0%) (Table 4).

Table 4 GRADE for quality of evidence profile

Discussion

It is important to assess the methodological quality and quality of evidence of systematic reviews/meta-analyses in the field of evidence-based medicine before any conclusions can be reached for clinical decision making [36, 37]. Reviews with qualified methodologies and high quality of evidence can provide comprehensive and reliable evidence for decision-makers [38]. This study is the first to evaluate the methodological and reporting quality of meta-analyses or systematic reviews on SIN and its preparation, ZQFTN, in the treatment of RA, intending to improve the quality of systematic reviews and better guide clinical decisions. In addition to AMSTAR 2, PRISMA was also used, and the GRADE was used to assess the quality of evidence for the outcome of RA interventions with SIN or ZQFTN. This study will help improve the quality of systematic reviews/meta-analyses and provide an intuitive judgment on the clinical efficacy of SIN and ZQFTN on RA. Concerning the quality of the eight articles we included, unfortunately, the results revealed some limitations in the quality of methodology and reporting, suggesting the need for an improvement in quality in the future.

In summary, only a mean of 42% of AMSTAR 2 items were fulfilled across all articles. The major defects found are described as follows: first, there was no mention of whether the systematic evaluation method was predetermined, there was no complete explanation of the type of study design, and the list of excluded studies was not provided, which may be related to layout restrictions; second, the appropriate statistical methods were not used for the combined analysis of the results; and more than half of the reviews mentioned financial support for inclusion, but only a small proportion explained its function and clarified conflicts of interest in detail. The impact of the RoB of each included study on outcomes, the heterogeneity of the results, and publication bias were limited. All of these are important for readers to accurately assess the methods and results.

However, we found that the reporting was of poor quality, and the Chinese literature scores were generally lower than those of the English literature; some of these low scores were the result of underreporting or a lack of information. No registration number was provided, and only one of the studies provided a complete report of the database search strategy used [21]. The individual research bias of four studies was absent [20, 30, 32, 34], the publication bias of four studies was absent [20, 30,31,32], and the selection bias of three studies was absent [21, 33, 35], all of which should be described and analyzed. There was a lack of detailed information on financial support [30, 31, 34] and the role of the funder in the study [21, 32, 33, 35]. A failure to report such information may increase bias and reduce the authenticity and reliability of the research. Therefore, the results of this study may have been underestimated due to a lack of important information. We strongly recommend that editors and authors recognize and promote the use of reporting guidelines in their publications.

In addition, we found that 75% of the outcome indicators had a low or very low quality of evidence in the GRADE table, indicating that the true effect might be substantially different from the estimated effect in these reviews. Of the five downgrading factors, RoB was the most common factor that reduced the level of evidence. This indicates that we should pay close attention to assignment hiding, blinding methods and selective reporting to reduce the impact of limitations on outcome indicators. Because the overlap degree of different research CIs was poor and I2 > 50%, the inconsistency of the result indicators was reduced. This inaccuracy is mostly due to insufficient sample sizes and a wide 95% CI, which indicates that the sample size and sample advisability should receive more attention. Regarding publication bias, most of the included literature did not carry out specific tests or analyses, mostly because of the lack of gray literature and statistical tests showing insufficient momentum, resulting in reduced quality. Therefore, in future research on ZQFTN or SIN for the treatment of RA, researchers need to pay close attention to the quality of evidence of outcome indicators and provide readers with the highest possible quality of evidence indicators.

Research has revealed that SIN may aid in the relief of the clinical symptoms of RA. Guo et al. explored the potential targets underlying the effect of SIN on RA by utilizing a network pharmacology approach; sixty-seven potential targets of SIN and 3797 related targets involved in RA were subjected to network analysis, and the 20 intersection targets indicated the principal pathways linked to RA [39]. In vitro and in vivo studies by Shen et al. have shown that thermosensitive liposomes loaded with sinomenine hydrochloride (SIN-TSL) combined with microwave thermotherapy have superior anti-RA effects [40]. In our research, almost 60% of the systematic reviews were found to have good methodological quality, and these reviews showed that ZQFTN or SIN could improve clinical symptoms and delay disease progression in patients with RA. These findings suggest that clinical trials on SIN for the treatment of RA may be prove its effectiveness.

The following are strengths of our overview. On the one hand, we used well-validated and accepted guidelines to assess both reporting and methodological quality. With the completion of a comprehensive and detailed plan, a rigorous and clear search strategy, and a highly adopted assessment guideline, we identified systematic reviews on the use of ZQFTN or SIN for the treatment of RA efficiently and reliably. On the other hand, we used the AMSTAR 2 system for reporting systematic reviews; AMSTAR 2 is an updated version of the classical AMSTAR instrument, and it conforms well to the PICO framework on research issues, controls the details of included studies more strictly, and considers RoB in more detail [27]. Furthermore, the GRADE system is a validated scientific approach used to evaluate the quality of evidence.

Although we followed strict procedures in this overview, it still has some limitations. First, although a predefined search strategy was used, we cannot guarantee that all relevant articles were included due to language limitations, which might have an effect on publication bias. Second, the methodological tools and reporting guidelines adopted in our study might not cover all details specific to systematic reviews and meta-analyses regarding RA. Third, the overall quality was not evaluated because we believed it would be sufficient to reflect the quality of each item instead of the overall quality. In addition, we used AMSTAR 2, released in 2017, whereas the included studies were published between 2008 and 2016, and no new study has been reported in the past 3 years, which may lead to bias. Last but not least, there are many other approaches that can be used to identify quality metrics, such as the journal impact factor, h-index, and other indicator systems [41, 42]. The impact factors of the eight studies were not satisfactory, which may also lead to certain publication bias and partiality.

Conclusion

We collected 8 systematic reviews and meta-analyses published from database inception to July 2019 and assessed their methodological and reporting quality and quality of evidence. The average methodological quality score was 6.625, and the average reporting score was 17.69. In addition, 58% (n = 35, 35/61) of the outcome indicators had limitations based on the GRADE table. The reporting and methodological quality of the included meta-analyses and systematic reviews were less than optimal, which indicates that researchers should undergo additional training and follow the AMSTAR 2 scale, PRISMA statement and GRADE to design high-quality studies in the future. This procedure will provide better suggestions for the clinical treatment of RA.