Abstract
Preterm birth is the leading cause of death among young children and has a large prevalence globally. Machine learning models, based on features extracted from clinical sources such as electronic patient files, yield promising results. In this study, we review similar studies that constructed predictive models based on a publicly available dataset, called the Term-Preterm EHG Database (TPEHGDB), which contains electrohysterogram signals on top of clinical data. These studies often report near-perfect prediction results, by applying over-sampling as a means of data augmentation. We reconstruct these results to show that they can only be achieved when data augmentation is applied on the entire dataset prior to partitioning into training and testing set. This results in (i) samples that are highly correlated to data points from the test set are introduced and added to the training set, and (ii) artificial samples that are highly correlated to points from the training set being added to the test set. Many previously reported results therefore carry little meaning in terms of the actual effectiveness of the model in making predictions on unseen data in a real-world setting. After focusing on the danger of applying over-sampling strategies before data partitioning, we present a realistic baseline for the TPEHGDB dataset and show how the predictive performance and clinical use can be improved by incorporating features from electrohysterogram sensors and by applying over-sampling on the training set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Reproducibility and Dataset Availability
In order to allow reproduction of the reported results on this public dataset, we host all code, required to reproduce the results reported in this paper, on a public GitHub repositoryFootnote 1. The dataset is available from that repository, or from the original hosting locationFootnote 2.
References
Acharya, U.R., et al.: Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals. Comput. Biol. Med. 85, 33–42 (2017)
Ahmed, M.U., Chanwimalueang, T., Thayyil, S., Mandic, D.P.: A multivariate multiscale fuzzy entropy algorithm with application to uterine EMG complexity analysis. Entropy 19(1), 2 (2016)
Baghamoradi, S.M.S., Naji, M., Aryadoost, H.: Evaluation of cepstral analysis of EHG signals to prediction of preterm labor. In: 2011 18th Iranian Conference of Biomedical Engineering (ICBME), pp. 81–83. IEEE (2011)
Beiranvand, M., Shahbakhti, M., Eslamizadeh, M., Bavi, M., Mohammadifar, S.: Investigating wavelet energy vector for pre-term labor detection using EHG signals. In: Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2017, pp. 269–274. IEEE (2017)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
De Silva, D.A., Lisonkova, S., von Dadelszen, P., Synnes, A.R., Magee, L.A.: Timing of delivery in a high-risk obstetric population: a clinical prediction model. BMC Pregnancy Childbirth 17(1), 202 (2017)
Despotović, D., Zec, A., Mladenović, K., Radin, N., Turukalo, T.L.: A machine learning approach for an early prediction of preterm delivery. In: 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000265–000270. IEEE (2018)
Far, D.T., Beiranvand, M., Shahbakhti, M.: Prediction of preterm labor from EHG signals using statistical and non-linear features. In: 2015 8th Biomedical Engineering International Conference (BMEiCON), pp. 1–5. IEEE (2015)
Fele-Žorž, G., Kavšek, G., Novak-Antolič, Ž., Jager, F.: A comparison of various linear and non-linear signal processing techniques to separate uterine emg records of term and pre-term delivery groups. Med. Biol. Eng. Comput. 46(9), 911–922 (2008)
Fergus, P., Cheung, P., Hussain, A., Al-Jumeily, D., Dobbins, C., Iram, S.: Prediction of preterm deliveries from EHG signals using machine learning. PloS ONE 8(10), e77154 (2013)
Fergus, P., Hussain, A., Al-Jumeily, D., Hamdan, H.: A machine learning system for automatic detection of preterm activity using artificial neural networks and uterine electromyography data. Int. J. Adapt. Innov. Syst. 2(2), 161–179 (2015)
Fergus, P., Idowu, I., Hussain, A., Dobbins, C.: Advanced artificial neural network classification for detecting preterm births using EHG records. Neurocomputing 188, 42–49 (2016)
García-Blanco, A., Diago, V., De La Cruz, V.S., Hervás, D., Cháfer-Pericás, C., Vento, M.: Can stress biomarkers predict preterm birth in women with threatened preterm labor? Psychoneuroendocrinology 83, 19–24 (2017)
Goldberger, A.L., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000). https://doi.org/10.1161/01.CIR.101.23.e215
Hoseinzadeh, S., Amirani, M.C.: Use of electro hysterogram (EHG) signal to diagnose preterm birth. In: Iranian Conference on Electrical Engineering (ICEE), pp. 1477–1481. IEEE (2018)
Hussain, A.J., Fergus, P., Al-Askar, H., Al-Jumeily, D., Jager, F.: Dynamic neural network architecture inspired by the immune algorithm to predict preterm deliveries in pregnant women. Neurocomputing 151, 963–974 (2015)
Idowu, I.O.: Classification Techniques Using EHG Signals for Detecting Preterm Births. Ph.D. thesis, Liverpool John Moores University (2017)
Idowu, I.O., Fergus, P., Hussain, A., Dobbins, C., Al Askar, H.: Advance artificial neural network classification techniques using EHG for detecting preterm births. In: 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), pp. 95–100. IEEE (2014)
Idowu, I.O., et al.: Artificial intelligence for detecting preterm uterine activity in gynecology and obstetric care. In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), pp. 215–220. IEEE (2015)
Jager, F., Libensek, S., Gersak, K.: Characterization and automatic classification of preterm and term uterine records. bioRxiv, p. 349266 (2018)
Janjarasjitt, S.: Evaluation of performance on preterm birth classification using single wavelet-based features of EHG signals. In: 2017 10th Biomedical Engineering International Conference (BMEiCON), pp. 1–4. IEEE (2017)
Janjarasjitt, S.: Examination of single wavelet-based features of EHG signals for preterm birth classification. IAENG Int. J. Comput. Sci. 44(2), 212–218 (2017). https://www.researchgate.net/publication/317749466_Examination_of_single_wavelet-based_features_of_EHG_signals_for_preterm_birth_classification
Liu, L., et al.: Global, regional, and national causes of under-5 mortality in 2000–15: an updated systematic analysis with implications for the sustainable development goals. Lancet 388(10063), 3027–3035 (2016)
Meertens, L.J., et al.: Prediction models for the risk of spontaneous preterm birth based on maternal characteristics: a systematic review and independent external validation. Acta Obstet. Gynecol. Scand. 97(8), 907–920 (2018). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6099449/
Naeem, S., Ali, A., Eldosoky, M.: Kl. comparison between using linear and non-linear features to classify uterine electromyography signals of term and preterm deliveries. In: 2013 30th National Radio Science Conference (NRSC), pp. 492–502. IEEE (2013)
Naeem, S.M., Seddik, A.F., Eldosoky, M.A.: New technique based on uterine electromyography nonlinearity for preterm delivery detection. J. Eng. Technol. Res. 6(7), 107–114 (2014)
Ren, P., Yao, S., Li, J., Valdes-Sosa, P.A., Kendrick, K.M.: Improved prediction of preterm delivery using empirical mode decomposition analysis of uterine electromyography signals. PloS ONE 10(7), e0132116 (2015)
Ryu, J., Park, C.: Time-frequency analysis of electrohysterogram for classification of term and preterm birth. IEIE Trans. Smart Process. Comput. 4(2), 103–109 (2015)
Sadi-Ahmed, N., Kacha, B., Taleb, H., Kedir-Talha, M.: Relevant features selection for automatic prediction of preterm deliveries from pregnancy electrohysterograhic (EHG) records. J. Med. Syst. 41(12), 204 (2017)
Sadi-Ahmed, N., Kedir-Talha, M.: Contraction extraction from term and preterm electrohyterographic signals. In: 2015 4th International Conference on Electrical Engineering (ICEE), pp. 1–4. IEEE (2015)
Shahrdad, M., Amirani, M.C.: Detection of preterm labor by partitioning and clustering the EHG signal. Biomed. Signal Process. Control. 45, 109–116 (2018)
Sim, S., Ryou, H., Kim, H., Han, J., Park, K.: Evaluation of electrohysterogram feature extraction to classify the preterm and term delivery groups. In: Goh, J. (ed.) The 15th International Conference on Biomedical Engineering. IP, vol. 43, pp. 675–678. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02913-9_172
Smrdel, A., Jager, F.: Separating sets of term and pre-term uterine EMG records. Physiol. Meas. 36(2), 341 (2015)
Subramaniam, K., Iqbal, N.V., et al.: Classification of fractal features of uterine EMG signal for the prediction of preterm birth. Biomed. Pharmacol. J. 11(1), 369–374 (2018)
Watson, H., Carter, J., Seed, P., Tribe, R., Shennan, A.: QuiPP app: a safe alternative to a treat-all strategy for threatened preterm labor. Ultrasound Obstet. Gynecol. 50(3), 342–346 (2017)
Acknowledgements
Gilles Vandewiele is funded by a scholarship of FWO (1S31417N). This study has been performed in the context of the ‘Predictive health care using text analysis on unstructured data project’, funded by imec, and the PRETURN (PREdiction Tool for prematUre laboR and Neonatal outcome) clinical trial (EC/2018/0609) of Ghent University Hospital.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vandewiele, G. et al. (2019). A Critical Look at Studies Applying Over-Sampling on the TPEHGDB Dataset. In: Riaño, D., Wilk, S., ten Teije, A. (eds) Artificial Intelligence in Medicine. AIME 2019. Lecture Notes in Computer Science(), vol 11526. Springer, Cham. https://doi.org/10.1007/978-3-030-21642-9_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-21642-9_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21641-2
Online ISBN: 978-3-030-21642-9
eBook Packages: Computer ScienceComputer Science (R0)