Next-generation heartbeat classification with a column-store DBMS and UDFs

  • Oscar Castro-Lopez
  • Daniel E. Lopez-Barron
  • Ines F. Vega-LopezEmail author


We live in a digital world where data is being generated at an always increasing rate. This creates the need to develop new technology not only for storing these vast amounts of data, but also for manipulating and analyzing it. It is through this data analysis that we can make decisions and generate knowledge. The medical field is no exception and healthcare and biomedical data must be stored and analyzed to gain insights that help in disease prevention and diagnostics. An example of this kind of data are electrocardiograms (ECG), whose careful analysis has proven to be of significant help to diagnose cardiovascular abnormalities. ECG recording devices can produce a very large amount of data in a short period of time. Usually abstracted as unstructured data, ECG digital signals have traditionally been stored and analyzed using file-based solutions for storage, and ad-hoc programs for data processing. We favor the idea that ECG signals can be abstracted as sets of tuples and stored in database relations. In this paper we present a proposal to store, manage, and analyze ECG data in a column-store database management system (DBMS). We provide extensive empirical evidence showing that incorporating complex analytical tasks such as ECG transformation and classification into a DBMS is not only feasible but also efficient and scalable. For this, we rely on the Structured Query Language provided by relational DBMSs, and the implementation of user defined functions.


User defined functions Machine learning deployment Data management Signal database 



The authors would like to acknowledge the funding provided for this research by the Mexican Council of Science and Technology (CONACYT) and the Autonomous University of Sinaloa (UAS).


  1. Berkaya, S.K., Uysal, A.K., Gunal, E.S., Ergin, S., Gunal, S., Gulmezoglu, M.B. (2018). A Survey on ECG Analysis. Biomedical Signal Processing and Control, 43, 216–235. Scholar
  2. Casas, M.M., Avitia, R.L., Reyna, M.A., Cárdenas, A. (2016). Evaluation of three machine learning algorithms as classifiers of premature ventricular contractions on ECG beats. In: Proceedings of the global medical engineering physics Exchanges/Pan American health care exchanges. Madrid, Spain, pp 1–6.
  3. Castro-Lopez, O., & Vega-Lopez, I. (2018a). glm.deploy: ’C’ and ’Java’ Source Code Generator for Fitted GLM Objects., r package version 1.0.4.
  4. Castro-Lopez, O., & Vega-Lopez, I.F. (2018b). ML2ESC: A source code generator to embed machine learning models in production environments. In: Proceedings of the international conference on data science, CSREA, Las Vegas, USA, vol 14, pp. 70–73.Google Scholar
  5. Chandra, S., & Motwani, D. (2016). An approach to enhance the performance of Hadoop MapReduce framework for big data. In: International conference on micro-electronics and telecommunication engineering, pp 178–182.
  6. Cottin, F., Leprêtre, P M, Lopes, P., Papelier, Y., Médigue, C, Billat, V. (2006). Assessment of ventilatory thresholds from heart rate variability in well-trained subjects during cycling. International journal of sports medicine, 27(12), 959–967.CrossRefGoogle Scholar
  7. Cuen-Téllez, O. (2016). A model for signal data management and processing. PhD thesis: Universidad Autónoma de Sinaloa.Google Scholar
  8. Deserno, T.M., & Marx, N. (2016). Computational electrocardiography: Revisiting Holter ECG monitoring. Methods of Information in Medicine, 55(4), 305–311. Scholar
  9. Gadepally, V., Chen, P., Duggan, J., Elmore, A., Haynes, B., Kepner, J., Madden, S., Mattson, T., Stonebraker, M. (2016). The BigDAWG polystore system and architecture. In Proceedings of the IEEE high performance extreme computing conference. (pp. 1–6). USA: Waltham.
  10. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), e215—e220. Scholar
  11. Guazzelli, A., Zeller, M., Lin, W.C., Williams, G., et al. (2009). PMML: An open standard for sharing models. The R Journal, 1(1), 60–65.CrossRefGoogle Scholar
  12. Hurst, J.W. (1998). Naming of the waves in the ECG, with a brief account of their genesis. Circulation, 98(18), 1937–1942. Scholar
  13. Kim, T.W., Park, K.H., Yi, S.H., Kim, H.C. (2014). A big data framework for u-healthcare systems utilizing vital signs. In: Proceedings of the international symposium on computer, consumer and control. Taichung, Taiwan, pp. 494–497.
  14. Kligfield, P., Gettes, L.S., Bailey, J.J., Childers, R., Deal, B.J., Hancock, E.W., van Herpen, G., Kors, J.A., Macfarlane, P., Mirvis, D.M., Pahlm, O., Rautaharju, P., Wagner, G.S. (2007). Recommendations for the standardization and interpretation of the electrocardiogram. Journal of the American College of Cardiology, 49(10), 1109–1127. Scholar
  15. Kumar, P.M., & Gandhi, U.D. (2018). A Novel Three-tier Internet of Things Architecture with Machine Learning Algorithm for Early Detection of Heart Diseases. Computers & Electrical Engineering, 65, 222–235. Scholar
  16. Kumar, A., Boehm, M., Yang, J. (2017). Data management in machine learning: Challenges, techniques, and systems. In Proceedings of the international conference on management of data. (pp. 1717–1722). New York: ACM.
  17. Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C. (2012). The Vertica Analytic Database: C-store 7 Years Later. VLDB Endowment, 5(12), 1790–1801. Scholar
  18. Le, M.K., Chang, H.T., Chang, Y.M., Hu, Y.H., Chen, H.T. (2016). An efficient multilevel healthy cloud system using spark for smart clothes. In: Proceedings of the international computer symposium. Chiayi, Taiwan, pp. 182–186.
  19. Li, Y., Guo, L., Wu, C., Lee, C., Guo, Y. (2014). Building a cloud-based platform for personal health sensor data management. In: Proceedings of the international conference on biomedical and health informatics. Valencia, Spain, pp. 223–226.
  20. Luo, K., Li, J., Wang, Z., Cuschieri, A. (2017). Patient-specific deep architectural model for ECG classification. Journal of Healthcare Engineering, 4108, 720. Scholar
  21. Luz, E.J.S., Schwartz, W.R., Cámara-Chávez, G, Menotti, D. (2016). ECG-based heartbeat classification for arrhythmia detection: A survey. Computer Methods and Programs in Biomedicine, 127, 144–164. Scholar
  22. Mahmoodabadi, S.Z., Ahmadian, A., Abolhasani, M.D., Eslami, M., Bidgoli, J.H. (2005). ECG feature extraction based on multiresolution wavelet transform. In: Proceedings of the IEEE engineering in medicine and biology. Shanghai, China, pp. 3902–3905.
  23. Martis, R.J., Acharya, U.R., Min, L.C. (2013). ECG beat classification using PCA, LDA, ICA and discrete wavelet transform. Biomedical Signal Processing and Control, 8(5), 437–448. Scholar
  24. Mateo, J., Torres, A., Aparicio, A., Santos, J. (2016). An efficient method for ecg beat classification and correction of ectopic beats. Computers and Electrical Engineering, 53(C), 219–229. Scholar
  25. McSharry, P.E., Clifford, G.D., Tarassenko, L., Smith, L.A. (2003). A dynamical model for generating synthetic electrocardiogram signals. IEEE Transactions on Biomedical Engineering, 50(3), 289–294. Scholar
  26. Mohammed, E.A., Far, B.H., Naugler, C. (2014). Applications of the mapreduce programming framework to clinical big data analysis: Current landscape and future trends. BioData Mining, 7(1), 22. Scholar
  27. Moody, G.B., & Mark, R.G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50. Scholar
  28. Mozaffarian, D., Benjamin, E., Go, A., Arnett, D., Blaha, M., Cushman, M., De Ferranti, S., Després, J, Fullerton, H., Howard, V., Huffman, M., Judd, S., Kissela, B., Lackland, D., Lichtman, J., Lisabeth, L., Liu, S., Mackey, R., Matchar, D., McGuire, D., Mohler, E., Moy, C., Muntner, P., Mussolino, M., Nasir, K., Neumar, R., Nichol, G., Palaniappan, L., Pandey, D., Reeves, M., Rodriguez, C., Sorlie, P., Stein, J., Towfighi, A., Turan, T., Virani, S., Willey, J., Woo, D., Yeh, R., Turner, M. (2015). Executive summary: Heart disease and stroke statistics-2015 update: A report from the american heart association. Circulation, 131(4), 434–441. Scholar
  29. Ordonez, C. (2007). Building statistical models and scoring with UDFs. In Proceedings of the ACM SIGMOD international conference on management of data. (pp. 1005–1016). New York: ACM.
  30. Ordonez, C. (2010). Statistical model computation with UDFs. IEEE Transactions on Knowledge and Data Engineering, 22(12), 1752–1765. Scholar
  31. Ordonez, C., & García-García, J. (2016). Managing big data analytics workflows with a database system. In Proceedings of the international symposium on cluster, cloud and grid computing. (pp. 649–655). Cartagena: IEEE.
  32. Pan, J., & Tompkins, W.J. (1985). A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering BME, 32(3), 230–236. Scholar
  33. Pandey, S., Voorsluys, W., Niu, S., Khandoker, A., Buyya, R. (2012). An autonomic cloud environment for hosting ecg data analysis services. Future Generation Computer Systems, 28(1), 147–154. Scholar
  34. Petrutiu, S., Sahakian, A.V., Swiryn, S. (2007). Abrupt changes in fibrillatory wave characteristics at the termination of paroxysmal atrial fibrillation in humans. Europace, 9(7), 466–470. Scholar
  35. Ramakrishnan, R., & Gehrke, J. (2000). Database management systems. McGraw Hill.Google Scholar
  36. Sahoo, S.S., Jayapandian, C., Garg, G., Kaffashi, F., Chung, S., Bozorgi, A., Chen, C.H., Loparo, K., Lhatoo, S.D., Zhang, G.Q. (2014). Heart beats in the cloud: Distributed analysis of electrophysiological ’Big Data’ using cloud computing for epilepsy clinical research. Journal of the American Medical Informatics Association, 21(2), 263–271. Scholar
  37. Saktheeswari, R., & Adalarasu, K. (2017). Survey on signal processing techniques for diagnoising cardiovascular diseases. In: Proceedings of the international conference on innovations in information, embedded and communication systems. Coimbatore, India, pp. 1–4.
  38. Shvachko, K., Kuang, H., Radia, S., Chansler, R. (2010). The Hadoop distributed file system. In Proceedings of the symposium on mass storage systems and technologies. (pp. 1–10). Washington: IEEE Computer Society.
  39. Trigo, J.D., Alesanco, Á, Martínez, I., García, J. (2012). A review on digital ecg formats and the relationships between them. IEEE Transactions on Information Technology in Biomedicine, 16(3), 432–444. Scholar
  40. Vincent, A.E., & Sreekumar, K. (2017). A survey on approaches for ECG signal analysis with focus to feature extraction and classification. In Proceedings of the international conference on inventive communication and computational technologies. (pp. 140–144). India: Coimbatore.
  41. Wang, L., Chen, D., Ranjan, R., Khan, S.U., KolOdziej, J., Wang, J. (2012). Parallel processing of massive eeg data with MapReduce. In: Proceedings of the international conference on parallel and distributed systems, pp. 164–171.
  42. Wee, K.C., & Zahid, M.S.M. (2015). Auto-tuned Hadoop MapReduce for ECG analysis. In: Proceedings of the IEEE student conference on research and development. Kuala Lumpur, Malaysia, pp. 329–334.
  43. Woodbridge, D.M., Wilson, A.T., Rintoul, M.D., Goldstein, R.H. (2015). Time series discord detection in medical data using a parallel relational database. In: Proceedings of the international conference on bioinformatics and biomedicine. Washington, DC, USA, pp. 1420–1426.
  44. Zhang, Y., Ordonez, C., Cabrera, W. (2016). Big data analytics integrating a parallel columnar DBMS and the R language. In: Proceedings of the international symposium on cluster, cloud and grid computing, pp. 627–630.
  45. Zhou, B., Ma, Q., Song, Y., Bian, C. (2016). Cloud-based dynamic electrocardiogram monitoring and analysis system, IEEE, Datong.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Oscar Castro-Lopez
    • 1
  • Daniel E. Lopez-Barron
    • 2
  • Ines F. Vega-Lopez
    • 3
    Email author
  1. 1.Facultad de InformaticaUniversidad Autonoma de SinaloaCuliacanMexico
  2. 2.Computing Science and Electrical EngineeringUniversity of Missouri Kansas City, UMKCKansas CityUSA
  3. 3.Parque de Innovacion TecnologicaUniversidad Autonoma de SinaloaCuliacanMexico

Personalised recommendations