A Critical Look at Studies Applying Over-Sampling on the TPEHGDB Dataset

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11526)


Preterm birth is the leading cause of death among young children and has a large prevalence globally. Machine learning models, based on features extracted from clinical sources such as electronic patient files, yield promising results. In this study, we review similar studies that constructed predictive models based on a publicly available dataset, called the Term-Preterm EHG Database (TPEHGDB), which contains electrohysterogram signals on top of clinical data. These studies often report near-perfect prediction results, by applying over-sampling as a means of data augmentation. We reconstruct these results to show that they can only be achieved when data augmentation is applied on the entire dataset prior to partitioning into training and testing set. This results in (i) samples that are highly correlated to data points from the test set are introduced and added to the training set, and (ii) artificial samples that are highly correlated to points from the training set being added to the test set. Many previously reported results therefore carry little meaning in terms of the actual effectiveness of the model in making predictions on unseen data in a real-world setting. After focusing on the danger of applying over-sampling strategies before data partitioning, we present a realistic baseline for the TPEHGDB dataset and show how the predictive performance and clinical use can be improved by incorporating features from electrohysterogram sensors and by applying over-sampling on the training set.


Preterm birth Electrohysterogram (EHG) Imbalanced data Over-sampling 



Gilles Vandewiele is funded by a scholarship of FWO (1S31417N). This study has been performed in the context of the ‘Predictive health care using text analysis on unstructured data project’, funded by imec, and the PRETURN (PREdiction Tool for prematUre laboR and Neonatal outcome) clinical trial (EC/2018/0609) of Ghent University Hospital.


  1. 1.
    Acharya, U.R., et al.: Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals. Comput. Biol. Med. 85, 33–42 (2017)CrossRefGoogle Scholar
  2. 2.
    Ahmed, M.U., Chanwimalueang, T., Thayyil, S., Mandic, D.P.: A multivariate multiscale fuzzy entropy algorithm with application to uterine EMG complexity analysis. Entropy 19(1), 2 (2016)CrossRefGoogle Scholar
  3. 3.
    Baghamoradi, S.M.S., Naji, M., Aryadoost, H.: Evaluation of cepstral analysis of EHG signals to prediction of preterm labor. In: 2011 18th Iranian Conference of Biomedical Engineering (ICBME), pp. 81–83. IEEE (2011)Google Scholar
  4. 4.
    Beiranvand, M., Shahbakhti, M., Eslamizadeh, M., Bavi, M., Mohammadifar, S.: Investigating wavelet energy vector for pre-term labor detection using EHG signals. In: Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2017, pp. 269–274. IEEE (2017)Google Scholar
  5. 5.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  6. 6.
    De Silva, D.A., Lisonkova, S., von Dadelszen, P., Synnes, A.R., Magee, L.A.: Timing of delivery in a high-risk obstetric population: a clinical prediction model. BMC Pregnancy Childbirth 17(1), 202 (2017)CrossRefGoogle Scholar
  7. 7.
    Despotović, D., Zec, A., Mladenović, K., Radin, N., Turukalo, T.L.: A machine learning approach for an early prediction of preterm delivery. In: 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000265–000270. IEEE (2018)Google Scholar
  8. 8.
    Far, D.T., Beiranvand, M., Shahbakhti, M.: Prediction of preterm labor from EHG signals using statistical and non-linear features. In: 2015 8th Biomedical Engineering International Conference (BMEiCON), pp. 1–5. IEEE (2015)Google Scholar
  9. 9.
    Fele-Žorž, G., Kavšek, G., Novak-Antolič, Ž., Jager, F.: A comparison of various linear and non-linear signal processing techniques to separate uterine emg records of term and pre-term delivery groups. Med. Biol. Eng. Comput. 46(9), 911–922 (2008)CrossRefGoogle Scholar
  10. 10.
    Fergus, P., Cheung, P., Hussain, A., Al-Jumeily, D., Dobbins, C., Iram, S.: Prediction of preterm deliveries from EHG signals using machine learning. PloS ONE 8(10), e77154 (2013)CrossRefGoogle Scholar
  11. 11.
    Fergus, P., Hussain, A., Al-Jumeily, D., Hamdan, H.: A machine learning system for automatic detection of preterm activity using artificial neural networks and uterine electromyography data. Int. J. Adapt. Innov. Syst. 2(2), 161–179 (2015)CrossRefGoogle Scholar
  12. 12.
    Fergus, P., Idowu, I., Hussain, A., Dobbins, C.: Advanced artificial neural network classification for detecting preterm births using EHG records. Neurocomputing 188, 42–49 (2016)CrossRefGoogle Scholar
  13. 13.
    García-Blanco, A., Diago, V., De La Cruz, V.S., Hervás, D., Cháfer-Pericás, C., Vento, M.: Can stress biomarkers predict preterm birth in women with threatened preterm labor? Psychoneuroendocrinology 83, 19–24 (2017)CrossRefGoogle Scholar
  14. 14.
    Goldberger, A.L., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000). Scholar
  15. 15.
    Hoseinzadeh, S., Amirani, M.C.: Use of electro hysterogram (EHG) signal to diagnose preterm birth. In: Iranian Conference on Electrical Engineering (ICEE), pp. 1477–1481. IEEE (2018)Google Scholar
  16. 16.
    Hussain, A.J., Fergus, P., Al-Askar, H., Al-Jumeily, D., Jager, F.: Dynamic neural network architecture inspired by the immune algorithm to predict preterm deliveries in pregnant women. Neurocomputing 151, 963–974 (2015)CrossRefGoogle Scholar
  17. 17.
    Idowu, I.O.: Classification Techniques Using EHG Signals for Detecting Preterm Births. Ph.D. thesis, Liverpool John Moores University (2017)Google Scholar
  18. 18.
    Idowu, I.O., Fergus, P., Hussain, A., Dobbins, C., Al Askar, H.: Advance artificial neural network classification techniques using EHG for detecting preterm births. In: 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), pp. 95–100. IEEE (2014)Google Scholar
  19. 19.
    Idowu, I.O., et al.: Artificial intelligence for detecting preterm uterine activity in gynecology and obstetric care. In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), pp. 215–220. IEEE (2015)Google Scholar
  20. 20.
    Jager, F., Libensek, S., Gersak, K.: Characterization and automatic classification of preterm and term uterine records. bioRxiv, p. 349266 (2018)Google Scholar
  21. 21.
    Janjarasjitt, S.: Evaluation of performance on preterm birth classification using single wavelet-based features of EHG signals. In: 2017 10th Biomedical Engineering International Conference (BMEiCON), pp. 1–4. IEEE (2017)Google Scholar
  22. 22.
    Janjarasjitt, S.: Examination of single wavelet-based features of EHG signals for preterm birth classification. IAENG Int. J. Comput. Sci. 44(2), 212–218 (2017).
  23. 23.
    Liu, L., et al.: Global, regional, and national causes of under-5 mortality in 2000–15: an updated systematic analysis with implications for the sustainable development goals. Lancet 388(10063), 3027–3035 (2016)CrossRefGoogle Scholar
  24. 24.
    Meertens, L.J., et al.: Prediction models for the risk of spontaneous preterm birth based on maternal characteristics: a systematic review and independent external validation. Acta Obstet. Gynecol. Scand. 97(8), 907–920 (2018). Scholar
  25. 25.
    Naeem, S., Ali, A., Eldosoky, M.: Kl. comparison between using linear and non-linear features to classify uterine electromyography signals of term and preterm deliveries. In: 2013 30th National Radio Science Conference (NRSC), pp. 492–502. IEEE (2013)Google Scholar
  26. 26.
    Naeem, S.M., Seddik, A.F., Eldosoky, M.A.: New technique based on uterine electromyography nonlinearity for preterm delivery detection. J. Eng. Technol. Res. 6(7), 107–114 (2014)Google Scholar
  27. 27.
    Ren, P., Yao, S., Li, J., Valdes-Sosa, P.A., Kendrick, K.M.: Improved prediction of preterm delivery using empirical mode decomposition analysis of uterine electromyography signals. PloS ONE 10(7), e0132116 (2015)CrossRefGoogle Scholar
  28. 28.
    Ryu, J., Park, C.: Time-frequency analysis of electrohysterogram for classification of term and preterm birth. IEIE Trans. Smart Process. Comput. 4(2), 103–109 (2015)CrossRefGoogle Scholar
  29. 29.
    Sadi-Ahmed, N., Kacha, B., Taleb, H., Kedir-Talha, M.: Relevant features selection for automatic prediction of preterm deliveries from pregnancy electrohysterograhic (EHG) records. J. Med. Syst. 41(12), 204 (2017)CrossRefGoogle Scholar
  30. 30.
    Sadi-Ahmed, N., Kedir-Talha, M.: Contraction extraction from term and preterm electrohyterographic signals. In: 2015 4th International Conference on Electrical Engineering (ICEE), pp. 1–4. IEEE (2015)Google Scholar
  31. 31.
    Shahrdad, M., Amirani, M.C.: Detection of preterm labor by partitioning and clustering the EHG signal. Biomed. Signal Process. Control. 45, 109–116 (2018)CrossRefGoogle Scholar
  32. 32.
    Sim, S., Ryou, H., Kim, H., Han, J., Park, K.: Evaluation of electrohysterogram feature extraction to classify the preterm and term delivery groups. In: Goh, J. (ed.) The 15th International Conference on Biomedical Engineering. IP, vol. 43, pp. 675–678. Springer, Cham (2014). Scholar
  33. 33.
    Smrdel, A., Jager, F.: Separating sets of term and pre-term uterine EMG records. Physiol. Meas. 36(2), 341 (2015)CrossRefGoogle Scholar
  34. 34.
    Subramaniam, K., Iqbal, N.V., et al.: Classification of fractal features of uterine EMG signal for the prediction of preterm birth. Biomed. Pharmacol. J. 11(1), 369–374 (2018)CrossRefGoogle Scholar
  35. 35.
    Watson, H., Carter, J., Seed, P., Tribe, R., Shennan, A.: QuiPP app: a safe alternative to a treat-all strategy for threatened preterm labor. Ultrasound Obstet. Gynecol. 50(3), 342–346 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.IDLabGhent University – imecGhentBelgium
  2. 2.Department of Gynaecology and ObstetricsGhent University HospitalGhentBelgium

Personalised recommendations