Background

Leptospirosis is an underdiagnosed infectious disease, with an estimated global annual number of illnesses of more than one million per year from 1970 to 2008 [1], 60,000 estimated annual deaths [1], and a mortality ratio ranging from 2% through to 60%, among older patients with icteric disease or renal failure [2]. Although tropical regions have the highest incidence of disease, with climate change and massive urbanization of frequently flooded areas in low-income countries, the epidemiology of this zoonosis is changing and it is a growing global public health problem [3,4,5]. In tropical and subtropical settings, the symptoms and signs of leptospirosis overlap with those of many other acute febrile illnesses including malaria, arboviral, and rickettsial diseases, and thus require laboratory confirmation for diagnosis [6,7,8].

Numerous diagnostic tests based on nucleic acid or antibody detection have been developed for early diagnosis of leptospirosis [9], but the serologic reference standard remains the microscopic agglutination test (MAT) on paired samples with a four-fold or greater rise, or seroconversion, confirming the diagnosis [10, 11]. Nevertheless, reported estimates of sensitivity vary [12, 13]. The clinical characteristics of the populations studied, including days post-onset of symptoms and prior use of antibacterials, the serovars included in the MAT panel in relation to the epidemiology of the disease in the geographic region studied, as well as the laboratory performance, contribute to heterogeneous estimates of MAT sensitivity in paired samples [11,12,13].

Because MAT is an imperfect reference test, accuracy evaluations that do not account for the imperfect nature of the test are biased [13, 14]. To explore this, Bayesian latent class analysis can be used to estimate the accuracy of a test, without assuming that any test is 100% accurate [15]. To our knowledge there is no published systematic review regarding MAT diagnostic accuracy using latent class analysis.

The Febrile Illness Evaluation in a Broad Range of Endemicities (FIEBRE) study is a prospective observational study of the infectious causes of fever at four sites in Africa and Asia, collecting data and samples from adult and paediatric outpatients, inpatients, and community controls [16]. FIEBRE tests for preventable and treatable infections, including leptospirosis, using reference standard diagnostic tests performed at specialised laboratory centres of excellence. The approach for the diagnosis of leptospirosis used in FIEBRE was an initial IgM ELISA screen using Leptospira fainei serovar Hurstbridge antigen on participants’ convalescent sera, or for participants who did not provide convalescent serum, screening of acute serum from the day of clinical presentation. For IgM ELISA positive samples, MAT using a globally representative panel of Leptospira serovars enriched when possible with local strains was performed on acute and, when available, convalescent sera. MAT was also performed on all acute plasma samples positive by SYBR Green based real-time polymerase chain reaction (PCR) assay targeting the Lfb1 gene [17, 18].

We conducted a systematic review and meta-analysis to assess the accuracy of the index tests: MAT, PCR with the pathogenic Leptospira target gene Lfb1, and ELISA IgM with the target antigen Leptospira fainei serovar Hurstbridge. We compared the index tests with reference standard diagnostic tests for lepstospirosis diagnosis [10]: blood culture and/or PCR and/or MAT (comparator tests). We used a Bayesian latent class model to evaluate the sensitivity and specificity of MAT on single acute-phase samples and MAT on paired samples.

Methods

PROSPERO protocol

The protocol of our systematic review was developed prior to conducting the review, and was registered in the International Prospective Register of Systematic Reviews (PROSPERO) at https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=285773, registration number CRD42021285773.

Search strategy

The original searches were conducted by a library information specialist (JF) on 9 September 2020 for PCR, 10 September 2020 for MAT, and 30 November 2020 for IgM ELISA, and all searches were updated on 16 August 2022. Databases searched included OvidSP Medline, OvidSP Embase, OvidSP Global Health, Wiley Cochrane Central Register of Controlled Trials, Clarivate Analytics Web of Science (Science Citation Index Expanded and Social Sciences Citation Index only), Elsevier Scopus, Ebsco Africa-Wide Information, World Health Organization (WHO) Latin American and Caribbean Health Sciences Literature, and WHO Global Index Medicus.

The search included strings of terms, synonyms, and controlled vocabulary terms to reflect two concepts: leptospirosis, and either MAT, PCR, or IgM ELISA, hereafter referred to as the index test of each search. The exact search terms used for each search are shown in the Supplementary material (Appendix S1). Animal studies were excluded, and the search was limited by date of publication from 1950 when MAT protocols were initially published [19] through 16 August 2022. Duplicates were removed. Additional eligible studies were found by manually searching the reference lists of relevant manuscripts and by contacting authors.

Selection criteria

The selection criteria applied to all studies found in the search are detailed in Table 1.

Table 1 Selection criteria applied to studies found in the systematic review of studies evaluating the diagnostic accuracy of MAT, PCR, and IgM ELISA, published global and between 1950–2022

For the MAT systematic review, we included the threshold of single acute-phase sample in the selection criteria. Since leptospirosis case definitions for single acute-phase samples vary according to background seroprevalence [10], we sub-classified the study settings considering where leptospirosis is endemic and non-endemic based on national level assessments. In line with Costa et al. [1] we considered non-endemic settings to be countries with 10 or fewer leptospirosis cases per 100,000 population per year, and endemic settings to be countries with more than 10 cases per 100,000 population per year. Costa’s review [1] identified 80 studies from 34 countries that fulfilled the selection and quality criteria for a disease incidence study with a defined study period of leptospirosis endemic transmission, and developed a multivariable regression model to estimate leptospirosis incidence for each country and territory.

Following this rationale, we set as selection criteria the titre cut-off for a positive MAT in a single acute-phase sample of ≥ 1:400 for endemic settings, and ≥ 1:100 for non-endemic settings. For all settings, the criteria for a serologically confirmed case of leptospirosis was defined as seroconversion or a four-fold or greater rise in MAT antibody titre between paired samples from a person with a history of measured or reported fever, or with suspected leptospirosis [10].

Study selection and data extraction

Two reviewers (JB, MV) screened and selected all studies independently and in duplicate, using two separate Excel spreadsheets (Authors, Title, Abstract, Journal, Year, Volume, Issue, Pages, DOI) for MAT and PCR studies, and for IgM ELISA studies using the online tool Cadima (https://www.cadima.info/) [20].

The initial eligibility assessment of all titles and abstracts identified by the search strategy was performed using the predetermined selection criteria (Table 1). Full-text copies of all potentially eligible reports were retrieved and reviewed, independently and in duplicate by JB and MV. Any disagreements about eligibility were resolved through discussion between JB and MV, leading to the inclusion of reports meeting all selection criteria and exclusion of those not meeting criteria. For each included report, JB and MV independently abstracted data using a standardized data abstraction sheet that was first piloted on fifteen studies (see Supplementary material, Table S1). We contacted study investigators when a report appeared to meet selection criteria, but data reported were unclear or insufficient to abstract a 2 × 2 contingency table comparing one or more index with another test. If sufficient data were not available or there was no reply from the authors, the study was excluded.

Bias assessment

We assessed study quality using the revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) criteria, which assesses both the risk of bias and applicability to the review question for four domains: participant selection, index test, reference standard, and flow and timing of participants [21]. Each included article was graded as ‘low risk’ or ‘high risk.’ Each category was defined according to the criteria included in the manuscript, as shown in Tables 2 and 3.

Table 2 Criteria for assessing bias in the systematic review of studies evaluating the diagnostic accuracy of MAT, PCR, and IgM ELISA, published global and between 1950–2022
Table 3 Criteria for assessing applicability in the systematic review of studies evaluating the diagnostic accuracy of MAT, PCR, and IgM ELISA, published global and between 1950–2022

Data analysis

For analysis we required data from each study in the form of a 2 × 2 contingency table showing results of the index test and a comparator test. The index test was any of the tests of interest for each systematic review: single acute-phase MAT, paired MAT, PCR with target gene Lfb1, or ELISA IgM with target antigen Hurstbridge. The comparator tests were pre-determined before beginning the review according to the reference standard diagnostic tests for lepstospirosis diagnosis [10]. When MAT (on either a single sample or paired sera) was the index test, the comparator tests were blood culture and/or PCR to any target gene; when PCR with target gene Lfb1 was the index test, the comparator test was MAT (on either a single sample or paired sera) and/or blood culture and/or PCR (with other target genes); when ELISA IgM was the index test, the comparator test was MAT (on either a single sample or paired sera) and/or PCR (with any target gene) and/or blood culture.

Regarding MAT (on either a single sample or paired sera) meta-analysis, when a study reported data on multiple comparator tests, we created separate 2 × 2 contingency tables comparing the index test with each comparator test. In these cases, without individual level data we were unable to include all data in the meta-analyses without introducing bias. To systematically ensure only one 2 × 2 table from each study was included in the meta-analyses, we chose to include the 2 × 2 table where the comparator test was blood culture. This choice was made because more accuracy data on the specificity of blood culture are available than data on the sensitivity or specificity of PCR [22].

We implemented a Bayesian random-effect latent class meta-analysis, which is an extension to the Hierarchical Summary Receiver Operating Characteristic (HSROC) Model [18] to estimate the sensitivity and specificity of index tests. This framework took into account the imperfect nature of all tests included, as well as accounting for within- and between-study variability.

We fitted separate meta-analyses for MAT single acute-phase and paired sera, and for each analysis calculated the median and 95% credible interval (CrI) for the estimated sensitivity and specificity of the index test in each study. Importantly, we also calculated both the estimated median and 95% CrI for sensitivity and specificity across studies, known as pooled accuracy, as well as the predicted sensitivity and specificity. These predicted values estimate the sensitivity and specificity that would be expected if the test were to be used in a hypothetical future study. These pooled and predicted estimates of accuracy are presented through summary Receiver Operating Characteristic (ROC) curves which represent the 95% credible region for the joint estimate of the index tests sensitivity and specificity. If a meta-analysis could not be performed due to scarcity of data, as was the case with PCR and ELISA reviews, we estimated accuracy of the index test in individual studies using latent class analysis [23].

All analyses were carried out in R using stan [24]. A full model specification including sensitivity analysis investigating the impact on estimates of accounting for conditional dependence between tests within a disease class, as well as results where non-endemic studies are excluded, can be found in Supplementary material (Appendix 2). All code can be found at: https://github.com/shk313/diagnostic-test-metaanalysis/tree/main/Leptospirosis.

Results

Study selection

Single acute-phase and paired MAT

Our systematic review of MAT performed on single acute-phase and paired samples identified 691 reports. Of these, 58 (8.4%) were identified as potentially relevant on the basis of the title and abstract and underwent full-text review. Of these, 15 (25.9%) met our selection criteria and were included [25,26,27,28,29,30,31,32,33,34,35,36,37,38,39]; 12 (80%) [25,26,27,28,29,30,31,32,33,34,35,36] tested samples from endemic countries and three (20%) [37,38,39] from non-endemic countries. Of the 12 studies in endemic countries, nine studies (75%) [25,26,27,28,29,30, 35, 36] reported data from single acute-phase samples and ten studies (83,3%) [25,26,27,28,29, 31,32,33,34] reported data from paired samples. Of the three studies in non-endemic countries, two (66.6%) [37, 38] reported data from single acute-phase samples and two (66.6%) [38, 39] from paired samples. We excluded results of single acute-phase samples from three studies [32, 33, 39] because the threshold of detection used was different from our national leptospirosis endemicity-based selection criteria (Fig. 1).

Fig. 1
figure 1

Study flow diagram for systematic review of studies evaluating the diagnostic accuracy of MAT, PCR, and IgM ELISA, published global and between 1950–2022. A Flow diagram of the selection process of MAT studies. B Flow diagram of the selection process of PCR studies. C Flow diagram of the selection process of IgM ELISA studies

The studies that were not included due having insufficient data available to create a 2 × 2 contingent table for single acute-phase samples and/or paired samples are detailed in Appendix S3.

PCR target gene lfb1

Our PCR review identified 1,094 reports. Of these, 18 (1.6%) were identified as potentially relevant on the basis of the title and abstract and underwent full-text review. Of these 18 reports, two (11.1%) articles [27, 40] met our selection criteria and were included (Fig. 1).

ELISA IgM target antigen Leptospira fainei serovar Hurstbridge

Our IgM ELISA review identified 5,092 reports. Of these, 58 (1.1%) were identified as potentially relevant on the basis of title and abstract and underwent full-text review. Of these 58 reports, one (1.7%) article [41] met our selection criteria and was included (Fig. 1).

Study characteristics

Single acute-phase and paired MAT

The characteristics of all included studies are detailed in Table 4. The 15 studies included for MAT (11 (73%) studies were of single-sample MAT, 12 (80%) studies of paired MAT and 8 (53%) studies were of both) were conducted from 2000 through 2020. Of these studies, 14 (93%) of 15 [25,26,27,28,29,30,31,32,33,34,35,36,37,38] included participants with suspected leptospirosis and one (7%) of 15 [39] included participants with fever. Of studies from endemic regions, recruitment occurred in Brazil [28, 29]; Japan [34]; Pacific Island Countries and Territories such as Marquesas Islands, Society Islands, Wallis and Futuna, and New Caledonia [27]; India [32, 33]; Laos [25, 28]; Malaysia [30, 35]; and Thailand [31, 36]. In non-endemic countries, recruitment occurred in New Zealand [39] and Slovenia [37, 38]. All studies were prospective. The MAT panel comprised 20 to 22 serovars in five studies [25, 26, 29, 30, 35], 13 to 15 serovars in three studies [34, 37, 38], and 8 to 11 serovars in three studies [32, 33, 39]. The MAT panel was not described in four studies [27, 28, 31, 36]. The comparator test was blood culture in five studies [29, 32, 33, 36, 37], PCR in four studies [26, 27, 30, 35], and both were used as comparators in six studies [25, 28, 31, 34, 38, 39]. Of studies with PCR as a comparator test, three studies used serum samples [26,27,28], five used whole blood samples [31, 34, 35, 38, 39], one used both [30], and one study used serum and buffy coat [25]. Recruitment of individuals varied in relation to time of illness onset across studies. The number of days post-onset (DPO) of symptoms at recruitment were 0 to 14 days [34], 1 to 30 days [25, 27], a mean of 6 days [29], and an interquartile range of 2 to 5 [36], 2 to 6 [31], and 3 to 7 days [28]. The DPO of symptoms was not detailed in eight studies [26, 30, 32, 33, 35, 37,38,39]. The number of days between acute and convalescent samples also varied with reported timeframes including: 7 to 15 days [25, 31, 32], more than 15 days [29, 35, 38], and was not detailed in nine studies [26,27,28, 30, 33, 34, 36, 37, 39].

Table 4 Characteristics of studies selected in the systematic review of studies evaluating the diagnostic accuracy of MAT, PCR, and IgM ELISA, published global and between 1950 – 2022 for MAT, PCR and IgM accuracy evaluation

PCR target gene lfb1

The two studies included for PCR accuracy analysis were conducted 2004–2005 [27] and 2015–2016 [40]. Both studies included patients with suspected leptospirosis, were prospective, and enrolled in the endemic countries Azores [40], and the Pacific Island Countries and Territories of Marquesas Islands, New Caledonia, Society Islands, and Wallis and Futuna [27]. In one study [27] the comparator test was MAT, in which the MAT panel was not described, and 10 (24%) of 41 patients had paired samples. In other study [40] the comparator test was PCR targeting the rrs gene in serum samples. The DPO of symptoms was of 1 to 30 days in one study [27] and was not described in other study [40].

ELISA IgM with antigen Leptospira fainei serovar Hurstbridge

The eligible study included for IgM ELISA accuracy analysis [41] was conducted in France, French Polynesia, Guadeloupe, Guyana, and Martinique, and was a two-gate design study that included patients with suspected leptospirosis and controls from patients with evidence of recent infection for dengue and syphilis, or from healthy blood donors. IgM ELISA was performed in serum samples and the comparator test was MAT. The MAT panel included 22 serovars, and it was not mentioned how many participants had paired samples.

Study quality

The results of bias assessment are shown in Table 5.

Table 5 Bias assessment in the systematic review of studies evaluating the diagnostic accuracy of MAT, PCR, and IgM ELISA, published global and between 1950–2022

Single acute-phase and paired MAT

In the patient domain, all studies were graded as low risk of bias and applicability, because they were all prospective and with a population of suspected leptospirosis or febrile patients. In the index test domain, when studies used single acute-phase samples for a confirmatory diagnosis of leptospirosis [25,26,27,28,29,30,31, 35,36,37,38], they were graded as high risk of bias. When studies used paired samples for a confirmatory diagnosis of leptospirosis [25,26,27,28,29,30,31,32,33,34, 38, 39], they were graded low risk of bias on the basis that the positivity criteria included a four-fold rise or greater, or seroconversion, between samples. Regarding applicability, nine studies were graded low risk because they used a globally representative panel of 20 to 22 serovars [25, 26, 29, 30, 35], or used 10 to 15 locally known circulating serovars [32, 33, 37, 38]. Two studies [34, 39] were graded high risk since the MAT panels composed of 13 serogroups and eight serovars, respectively, and they were not mentioned as being locally representative of the study setting. Finally, four studies [27, 28, 31, 36] were graded high risk because MAT panel composition was not described.

In the comparator test domain, regarding bias and applicability, 14 studies [25,26,27,28, 30,31,32,33,34,35,36,37,38,39] were graded low risk because the comparator tests were performed in recruitment samples and according to standard methodology. One study [29] was graded high risk because laboratory procedures were not described or referenced. For the timing and flow domain, all studies were graded low risk of bias because patients were subject to the same comparator tests, and comparator tests and index test were performed on samples taken at the same time for acute phase.

PCR target gene lfb1

In the patient and index test domain both PCR studies [27, 40] were graded low risk for quality concerns because they were prospective, in patients suspected of leptospirosis, and the index test was performed in recruitment samples and according to standard methodology. In the comparator test domain, one study [27] was graded high risk of bias because MAT was the comparator test and less than 75% of the samples were paired samples, and graded as high risk for applicability concerns because the MAT panel composition was not described. The second study [40] was graded low risk for quality concerns since the comparator test was performed according to standard methodology. For timing and flow domain, both studies were graded low risk of bias because patients were subject to the same comparator tests, and comparator tests and index test were performed on samples taken at the same time for acute phase.

ELISA IgM target antigen Leptospira fainei serovar Hurstbridge

The single IgM ELISA study [41] was graded high risk of bias and high risk for applicability concerns in the patient domain, because it was a two-gate design study and controls were healthy blood donors or patients with other diseases. In the index test domain, it was graded low risk for quality concerns since it was performed according to detailed standard methodology and the threshold for positivity defined a priori. In the comparator test domain, it was graded as high risk of bias because MAT was the comparator test and there was no information regarding the use of paired samples for a confirmatory case. For timing and flow domain, it was graded as low risk of bias since patients were subject to the same comparator tests, and comparator tests and index test were performed on samples taken at the same time for acute phase.

Sensitivity and specificity estimates

Single acute-phase and paired MAT

Overall, 11 studies with data on single acute-phase samples representing 2,625 individuals and 12 studies on paired samples representing 1,721 individuals were included in a meta-analysis for MAT. Abstracted data are detailed in Supplementary material, Table S2.

For single acute-phase samples, the pooled sensitivity and specificity of MAT were 14% (95% CrI 3–38%) and 86% (95% CrI 59–96%), respectively, and the predicted sensitivity and specificity were 14% (95% CrI 0–90%) and 86% (95% CrI 9–100%). The estimates for the sensitivity and specificity of MAT in each individual study can be found in Fig. 2 and the summary receiver operating characteristic (SROC) curves representing the pooled and predicted estimates in Fig. 3.

Fig. 2
figure 2

Forest plot of estimated and pooled sensitivity and specificity of studies evaluating the diagnostic accuracy of MAT in single acute-phase samples, published global and between 1950–2022

Fig. 3
figure 3

Roc curve of pooled and predicted sensitivity and specificity of studies evaluating the diagnostic accuracy of MAT in single acute-phase samples, published global and between 1950–2022

Among paired samples, the pooled sensitivity and specificity of MAT were 68% (95% CrI 32–92%) and 75% (95% CrI 45–93%) respectively, and the predicted sensitivity and specificity were 69% (95% CrI 2–100%) and 75% (95% CrI 2–100%). The estimates for individual studies can be found in Fig. 4 and the SROC curves for pooled and predicted estimates in Fig. 5.

Fig. 4
figure 4

Forest plot of estimated and pooled sensitivity and specificity of studies evaluating the diagnostic accuracy of MAT in paired samples, published global and between 1950–2022

Fig. 5
figure 5

Roc curve of pooled and predicted sensitivity and specificity of studies evaluating the diagnostic accuracy of MAT in paired samples, published global and between 1950–2022

PCR targeting lfb1

Two studies were included in our review of PCR diagnosis, including a total of 253 individuals. The estimated median sensitivity of PCR in Merien, et al. [27] was 92% (95% CrI 72–100%) and median specificity was 66% (95% CrI 49–91%). In Esteves, et al. [40] the median sensitivity of PCR was 98% (95% CrI 90–100%) and the median specificity was 99% (98–100%) (Table 6).

Table 6 Extracted data, sensitivity and specificity estimates in the systematic review of studies evaluating the diagnostic accuracy of PCR and IgM ELISA, published global and between 1950 – 2022

ELISA IgM target antigen Leptospira fainei serovar Hurstbridge

A single study that included 519 individuals was identified in our review of IgM ELISA. The estimated median sensitivity of IgM was 97% (93–100%) and the median specificity was 99% (97–100%) (Table 6).

Discussion

We carried out a systematic review of the sensitivity and specificity of MAT, PCR with the target gene Lfb1, and IgM ELISA with the antigen Leptospira fainei serovar Hurstbridge for diagnosis of human leptospirosis. Our meta-analysis of 15 studies, including 3,188 participants, found that MAT on single acute-phase samples had a predicted median sensitivity and specificity of 14% and 86%, respectively, for detecting leptospirosis, and using paired samples MAT had a predicted median sensitivity and specificity of 69% and 75%, respectively.

Our estimates of the sensitivity of MAT in single acute-phase samples were low across all studies, but specificity was generally high. These findings are in line with the dynamics of the humoral immune response and with previous work from studies in a variety of countries including the Barbados [42], Netherlands [15], and Sri Lanka [43]. Moreover, numerous studies have shown the value of adding culture, nucleic acid amplification, or antigen detection to MAT serology during the early phase of the disease [44,45,46,47,48,49,50].

In paired samples we estimated to correctly identify just over two-thirds of true leptospirosis cases, and correctly reject the diagnosis for three-quarters of suspected cases. We found a more heterogeneous picture of estimated accuracy but our median estimates of 69% sensitivity and 75% specificity were also in line with previous findings in Barbados [42], Brazil [51], and Thailand [52]. Conversely, another study in Thailand [13], that also used a latent class model, estimated sensitivity to be lower than previous studies at 49.9%, with 95% CI from 37.6 to 60.8%. However, the authors stated that this could have been the result of convalescent-phase samples being collected only ten DPO of symptoms, allowing insufficient time for the antibody response to develop, and that 34% of participants did not have convalescent-phase serum specimens collected. Importantly, the estimate of MAT sensitivity in paired samples was 70.3% was consistent with our analysis.

Heterogeneity among studies is reflected in the wide credible intervals for the predicted sensitivity and specificity in this meta-analysis, particularly among the paired samples. The variability in estimates from single acute-phase samples could be explained by the heterogeneity of DPO of fever in the studies included, as shown by Goris et al. [12]. Single acute-phase samples may have been collected early in the illness, less than seven DPO of fever [11], too early in the humoral immune response for it to be a reliably detect infection. The high variability in the sensitivity of MAT in paired samples could be partially explained by the inclusion of patients with a brief interval, less than 14 days [11], between samples, and thus not reaching seroconversion or a four-fold rise or greater between titers [13]. It also could be attributed to failure to consider patients’ use of antimicrobials before testing, particularly relevant when culture was used as a comparator test. It also could be due to MAT panel composition not representing the locally circulating strains [53,54,55].

Our meta-analysis had several limitations. Firstly, a key assumption of the Bayesian latent class model used is that there exist only two disease classes in the underlying population: diseased and disease-free. If in fact more than two classes exist, this assumption can result in biased estimates of test sensitivity and specificity when conditional independence between tests is assumed [56]. While the results presented in the main text of this paper do not make the assumption of conditional independence between tests, two disease classes are assumed. Further limitations include low geographical diversity, since included studies were conducted in only eight endemic countries, the majority in Southeast Asia, so that our estimates are not representative of all leptospirosis endemic countries. Moreover, our classification of a country’s endemicity followed Costa, et al. [1], but these estimates are based on limited data and do not account for sub-national variation in leptospirosis incidence. Our bias assessment (Table 5) highlights the high risk of bias of all studies using single acute-phase samples as a confirmatory test for leptospirosis, and also that some studies do not describe or account for a globally or locally representative MAT panel, an important quality concern. Moreover, data on DPO of symptoms, the interval between paired samples, and the use of antimicrobials prior to testing were widely heterogeneous or unknown. This information was not included in the quality assessment but could be an important source for bias in some of our studies, interfering with the proportion of positive and negative tests results that correctly identify the infection status of individuals. Also, the low number of positive MAT results in the majority of selected studies compromised power. Another limitation was not finding studies that reported titres on acute and convalescent samples that would have allowed the direct evaluation of single sample MAT in the context of paired MAT. A final limitation was the difficulty in assessing QUADAS-2, due to the lack of detailed data reported on the selected studies and due to the heterogeneity in MAT procedure and panel composition, since laboratories uses diverse antigen panels and every setting has different endemic local Leptospira serovars, sometimes unstated.

Our review also has many strengths. To our knowledge, this is the first meta-analysis of MAT accuracy for human leptospirosis diagnosis, and the first using Bayesian latent class modelling to account for the imperfect comparator tests. Our approach took into account different case definitions according to endemicity, and evaluated test results from single acute-phase samples separately from paired samples results. Importantly we used an extensive search strategy, contacted authors for additional data where necessary to complete a 2 × 2 table, and performed in duplicate and independently the process from study screening to data extraction.

Regarding our review of PCR targeting lfb1 and ELISA IgM targeting antigen Leptospira fainei serovar Hurstbridge, due to the scarcity of data available, no meta-analysis could be performed. Instead, we report the estimated accuracy of each test within the included studies only. These results are not generalizable to other studies but suggest that both IgM ELISA and PCR had a high sensitivity in the included studies (median sensitivity: 92%, 98%, and 97%). Specificity varied in the two studies included for PCR (median specificity: 66% and 99%) and was high for IgM ELISA (99%). A 2017 systematic review of IgM ELISA for leptospirosis diagnosis not specifically targeting the antigen Leptospira fainei serovar Hurstbridge found similar results [57].

Conclusions

To our knowledge, this is the first meta-analysis estimating the accuracy of MAT in paired samples for diagnosis of human leptospirosis. Our study found that the sensitivity and specificity of MAT in paired samples were not high. However, MAT on paired sera remains the reference standard until a more accurate diagnostic strategy is developed. A key challenge for our review was the scarcity of high-quality studies driven by a low proportion of participants with paired serum samples, and a lack of detailed reporting of sample timing collection and panel composition. Future studies that use paired samples and that report in detail the sample timing collection and MAT panel composition will improve the certainty of accuracy estimates.