Skip to main content

A Critical Look at Studies Applying Over-Sampling on the TPEHGDB Dataset

  • Conference paper
  • First Online:
Artificial Intelligence in Medicine (AIME 2019)

Abstract

Preterm birth is the leading cause of death among young children and has a large prevalence globally. Machine learning models, based on features extracted from clinical sources such as electronic patient files, yield promising results. In this study, we review similar studies that constructed predictive models based on a publicly available dataset, called the Term-Preterm EHG Database (TPEHGDB), which contains electrohysterogram signals on top of clinical data. These studies often report near-perfect prediction results, by applying over-sampling as a means of data augmentation. We reconstruct these results to show that they can only be achieved when data augmentation is applied on the entire dataset prior to partitioning into training and testing set. This results in (i) samples that are highly correlated to data points from the test set are introduced and added to the training set, and (ii) artificial samples that are highly correlated to points from the training set being added to the test set. Many previously reported results therefore carry little meaning in terms of the actual effectiveness of the model in making predictions on unseen data in a real-world setting. After focusing on the danger of applying over-sampling strategies before data partitioning, we present a realistic baseline for the TPEHGDB dataset and show how the predictive performance and clinical use can be improved by incorporating features from electrohysterogram sensors and by applying over-sampling on the training set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reproducibility and Dataset Availability

In order to allow reproduction of the reported results on this public dataset, we host all code, required to reproduce the results reported in this paper, on a public GitHub repositoryFootnote 1. The dataset is available from that repository, or from the original hosting locationFootnote 2.

Notes

  1. 1.

    https://github.com/IBCNServices/TPEHGDB-Experiments/.

  2. 2.

    https://physionet.org/physiobank/database/tpehgdb/.

References

  1. Acharya, U.R., et al.: Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals. Comput. Biol. Med. 85, 33–42 (2017)

    Article  Google Scholar 

  2. Ahmed, M.U., Chanwimalueang, T., Thayyil, S., Mandic, D.P.: A multivariate multiscale fuzzy entropy algorithm with application to uterine EMG complexity analysis. Entropy 19(1), 2 (2016)

    Article  Google Scholar 

  3. Baghamoradi, S.M.S., Naji, M., Aryadoost, H.: Evaluation of cepstral analysis of EHG signals to prediction of preterm labor. In: 2011 18th Iranian Conference of Biomedical Engineering (ICBME), pp. 81–83. IEEE (2011)

    Google Scholar 

  4. Beiranvand, M., Shahbakhti, M., Eslamizadeh, M., Bavi, M., Mohammadifar, S.: Investigating wavelet energy vector for pre-term labor detection using EHG signals. In: Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2017, pp. 269–274. IEEE (2017)

    Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  6. De Silva, D.A., Lisonkova, S., von Dadelszen, P., Synnes, A.R., Magee, L.A.: Timing of delivery in a high-risk obstetric population: a clinical prediction model. BMC Pregnancy Childbirth 17(1), 202 (2017)

    Article  Google Scholar 

  7. Despotović, D., Zec, A., Mladenović, K., Radin, N., Turukalo, T.L.: A machine learning approach for an early prediction of preterm delivery. In: 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000265–000270. IEEE (2018)

    Google Scholar 

  8. Far, D.T., Beiranvand, M., Shahbakhti, M.: Prediction of preterm labor from EHG signals using statistical and non-linear features. In: 2015 8th Biomedical Engineering International Conference (BMEiCON), pp. 1–5. IEEE (2015)

    Google Scholar 

  9. Fele-Žorž, G., Kavšek, G., Novak-Antolič, Ž., Jager, F.: A comparison of various linear and non-linear signal processing techniques to separate uterine emg records of term and pre-term delivery groups. Med. Biol. Eng. Comput. 46(9), 911–922 (2008)

    Article  Google Scholar 

  10. Fergus, P., Cheung, P., Hussain, A., Al-Jumeily, D., Dobbins, C., Iram, S.: Prediction of preterm deliveries from EHG signals using machine learning. PloS ONE 8(10), e77154 (2013)

    Article  Google Scholar 

  11. Fergus, P., Hussain, A., Al-Jumeily, D., Hamdan, H.: A machine learning system for automatic detection of preterm activity using artificial neural networks and uterine electromyography data. Int. J. Adapt. Innov. Syst. 2(2), 161–179 (2015)

    Article  Google Scholar 

  12. Fergus, P., Idowu, I., Hussain, A., Dobbins, C.: Advanced artificial neural network classification for detecting preterm births using EHG records. Neurocomputing 188, 42–49 (2016)

    Article  Google Scholar 

  13. García-Blanco, A., Diago, V., De La Cruz, V.S., Hervás, D., Cháfer-Pericás, C., Vento, M.: Can stress biomarkers predict preterm birth in women with threatened preterm labor? Psychoneuroendocrinology 83, 19–24 (2017)

    Article  Google Scholar 

  14. Goldberger, A.L., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000). https://doi.org/10.1161/01.CIR.101.23.e215

    Article  Google Scholar 

  15. Hoseinzadeh, S., Amirani, M.C.: Use of electro hysterogram (EHG) signal to diagnose preterm birth. In: Iranian Conference on Electrical Engineering (ICEE), pp. 1477–1481. IEEE (2018)

    Google Scholar 

  16. Hussain, A.J., Fergus, P., Al-Askar, H., Al-Jumeily, D., Jager, F.: Dynamic neural network architecture inspired by the immune algorithm to predict preterm deliveries in pregnant women. Neurocomputing 151, 963–974 (2015)

    Article  Google Scholar 

  17. Idowu, I.O.: Classification Techniques Using EHG Signals for Detecting Preterm Births. Ph.D. thesis, Liverpool John Moores University (2017)

    Google Scholar 

  18. Idowu, I.O., Fergus, P., Hussain, A., Dobbins, C., Al Askar, H.: Advance artificial neural network classification techniques using EHG for detecting preterm births. In: 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), pp. 95–100. IEEE (2014)

    Google Scholar 

  19. Idowu, I.O., et al.: Artificial intelligence for detecting preterm uterine activity in gynecology and obstetric care. In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), pp. 215–220. IEEE (2015)

    Google Scholar 

  20. Jager, F., Libensek, S., Gersak, K.: Characterization and automatic classification of preterm and term uterine records. bioRxiv, p. 349266 (2018)

    Google Scholar 

  21. Janjarasjitt, S.: Evaluation of performance on preterm birth classification using single wavelet-based features of EHG signals. In: 2017 10th Biomedical Engineering International Conference (BMEiCON), pp. 1–4. IEEE (2017)

    Google Scholar 

  22. Janjarasjitt, S.: Examination of single wavelet-based features of EHG signals for preterm birth classification. IAENG Int. J. Comput. Sci. 44(2), 212–218 (2017). https://www.researchgate.net/publication/317749466_Examination_of_single_wavelet-based_features_of_EHG_signals_for_preterm_birth_classification

  23. Liu, L., et al.: Global, regional, and national causes of under-5 mortality in 2000–15: an updated systematic analysis with implications for the sustainable development goals. Lancet 388(10063), 3027–3035 (2016)

    Article  Google Scholar 

  24. Meertens, L.J., et al.: Prediction models for the risk of spontaneous preterm birth based on maternal characteristics: a systematic review and independent external validation. Acta Obstet. Gynecol. Scand. 97(8), 907–920 (2018). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6099449/

    Article  Google Scholar 

  25. Naeem, S., Ali, A., Eldosoky, M.: Kl. comparison between using linear and non-linear features to classify uterine electromyography signals of term and preterm deliveries. In: 2013 30th National Radio Science Conference (NRSC), pp. 492–502. IEEE (2013)

    Google Scholar 

  26. Naeem, S.M., Seddik, A.F., Eldosoky, M.A.: New technique based on uterine electromyography nonlinearity for preterm delivery detection. J. Eng. Technol. Res. 6(7), 107–114 (2014)

    Google Scholar 

  27. Ren, P., Yao, S., Li, J., Valdes-Sosa, P.A., Kendrick, K.M.: Improved prediction of preterm delivery using empirical mode decomposition analysis of uterine electromyography signals. PloS ONE 10(7), e0132116 (2015)

    Article  Google Scholar 

  28. Ryu, J., Park, C.: Time-frequency analysis of electrohysterogram for classification of term and preterm birth. IEIE Trans. Smart Process. Comput. 4(2), 103–109 (2015)

    Article  Google Scholar 

  29. Sadi-Ahmed, N., Kacha, B., Taleb, H., Kedir-Talha, M.: Relevant features selection for automatic prediction of preterm deliveries from pregnancy electrohysterograhic (EHG) records. J. Med. Syst. 41(12), 204 (2017)

    Article  Google Scholar 

  30. Sadi-Ahmed, N., Kedir-Talha, M.: Contraction extraction from term and preterm electrohyterographic signals. In: 2015 4th International Conference on Electrical Engineering (ICEE), pp. 1–4. IEEE (2015)

    Google Scholar 

  31. Shahrdad, M., Amirani, M.C.: Detection of preterm labor by partitioning and clustering the EHG signal. Biomed. Signal Process. Control. 45, 109–116 (2018)

    Article  Google Scholar 

  32. Sim, S., Ryou, H., Kim, H., Han, J., Park, K.: Evaluation of electrohysterogram feature extraction to classify the preterm and term delivery groups. In: Goh, J. (ed.) The 15th International Conference on Biomedical Engineering. IP, vol. 43, pp. 675–678. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02913-9_172

    Chapter  Google Scholar 

  33. Smrdel, A., Jager, F.: Separating sets of term and pre-term uterine EMG records. Physiol. Meas. 36(2), 341 (2015)

    Article  Google Scholar 

  34. Subramaniam, K., Iqbal, N.V., et al.: Classification of fractal features of uterine EMG signal for the prediction of preterm birth. Biomed. Pharmacol. J. 11(1), 369–374 (2018)

    Article  Google Scholar 

  35. Watson, H., Carter, J., Seed, P., Tribe, R., Shennan, A.: QuiPP app: a safe alternative to a treat-all strategy for threatened preterm labor. Ultrasound Obstet. Gynecol. 50(3), 342–346 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

Gilles Vandewiele is funded by a scholarship of FWO (1S31417N). This study has been performed in the context of the ‘Predictive health care using text analysis on unstructured data project’, funded by imec, and the PRETURN (PREdiction Tool for prematUre laboR and Neonatal outcome) clinical trial (EC/2018/0609) of Ghent University Hospital.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gilles Vandewiele .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vandewiele, G. et al. (2019). A Critical Look at Studies Applying Over-Sampling on the TPEHGDB Dataset. In: Riaño, D., Wilk, S., ten Teije, A. (eds) Artificial Intelligence in Medicine. AIME 2019. Lecture Notes in Computer Science(), vol 11526. Springer, Cham. https://doi.org/10.1007/978-3-030-21642-9_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21642-9_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21641-2

  • Online ISBN: 978-3-030-21642-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics