Use of artificial intelligence (AI) in the interpretation of intrapartum fetal heart rate (FHR) tracings: a systematic review and meta-analysis
To determine the degree of inter-rater reliability (IRR) between human and artificial intelligence (AI) interpretation of fetal heart rate tracings (FHR), and to determine whether AI-assisted electronic fetal monitoring interpretation improves neonatal outcomes amongst laboring women.
We searched Medline, EMBASE, Google Scholar, Scopus, ISI Web of Science and Cochrane database search, as well as PubMed (www.pubmed.gov) and RCT registry (www.clinicaltrials.gov) until the end of October 2018 to conduct a systematic review and meta-analysis comparing visual and AI interpretation of EFM in labor. Similarly, we sought out all studies evaluating the IRR between AI and expert interpretation of EFM.
Tabulation, integration and results
Weighed mean Cohen’s Kappa was calculated to assess the global IRR. Risk of bias was assessed using the Cochrane Handbook for Systematic Reviews of Interventions. We used relative risks (RR) and a random effects (RE) model to calculate weighted estimates. Statistical homogeneity was checked by the χ2 test and I2 using Review Manager 5.3.5 (The Cochrane Collaboration, 2014.) We obtained 201 records, of which 9 met inclusion criteria. Three RCT’s were used to compare the neonatal outcomes and 6 cohort studies were used to establish the degree of IRR between both approaches of EFM evaluation. With regards to the neonatal outcomes, a total of 55,064 patients were included in the analysis. Relative to the use of clinical (visual) evaluation of the FHR, the use of AI did not change the incidence rates of neonatal acidosis, cord pH below < 7.20, 5-min APGAR scores < 7, mode of delivery, NICU admission, neonatal seizures, or perinatal death. With regards to the degrees of inter-rater reliability, a weighed mean Cohen’s Kappa of 0.49 [0.32–0.66] indicates moderate agreement between expert observers and computerized systems.
The use of AI and computer analysis for the interpretation of EFM during labor does not improve neonatal outcomes. Inter-rater reliability between experts and computer systems is moderate at best. Future studies should aim at further elucidating these findings.
KeywordsArtificial intelligence Computer Fetal monitoring Fetal heart rate Inter-rater reliability Neonatal outcomes
Both authors (JB, GS) accomplished all tasks equally: protocol/project development; data collection or management; data analysis; manuscript writing/editing.
This study was not funded.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants performed by any of the authors.
- 7.Alfirevic Z et al (2017) Continuous cardiotocography (CTG) as a form of electronic fetal monitoring (EFM) for fetal assessment during labour. Cochrane Database Syst Rev 2:CD006066Google Scholar
- 19.Devoe L et al (2000) A comparison of visual analyses of intrapartum fetal heart rate tracings according to the new national institute of child health and human development guidelines with computer analyses by an automated fetal heart rate monitoring system. Am J Obstet Gynecol 183(2):361–366CrossRefGoogle Scholar
- 20.Bracero LA, Roshanfekr D, Byrne DW (2000) Analysis of antepartum fetal heart rate tracing by physician and computer. J Maternal Fetal Med 9(3):181–185Google Scholar