Clustering based imputation algorithm using unsupervised neural network for enhancing the quality of healthcare data

Shobha, K.; Savarimuthu, Nickolas

doi:10.1007/s12652-020-02250-1

Clustering based imputation algorithm using unsupervised neural network for enhancing the quality of healthcare data

Original Research
Published: 30 June 2020

Volume 12, pages 1771–1781, (2021)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

537 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Historical and real-time healthcare data sets are valuable sources of information for predictive data analytics. However, most of the historical healthcare data sets are overloaded with challenges. One of the most frequently faced challenge is the problem of missing values, occurring because of the inaccuracies in data transmission or data entry processes. An appropriate technique for handling missing values is required to generate good quality data sets for achieving better prediction results. Removing the records with missing values, known as marginalization, poses an easy way out to this challenge. But, this will lessen the data volume of the historical data set and disturb the class balance of the data set. An alternative to marginalization is replacing missing values with plausible values, known as imputation. This paper proposes a missing value imputation technique, CLUSTIMP, using an unsupervised neural network Adaptive Resonance Theory 2 (ART2). The efficiency of the proposed imputation method is evaluated on the incomplete Mammographic mass data set and Hepatocellular Carcinoma data set (HCC) from the UCI repository considering Root Mean Squared Error (RMSE) rate and classification accuracy as the evaluation metrics. The proposed CLUSTIMP imputation algorithm outperforms existing state-of-the-art imputation methods by reducing classifiers error rates between 2 and 11%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploration of Neural Network Imputation Methods for Medical Datasets

Empirical comparison of supervised learning techniques for missing value imputation

Article 16 March 2022

A Novel Hybrid Imputation Method to Predict Missing Values in Medical Datasets

References

Almeida RJ, Kaymak U, Sousa JM (2010) A new approach to dealing with missing values in data-driven fuzzy modeling. In: International conference on fuzzy systems, pp. 1–7. IEEE
Armentano R, Bhadoria RS, Chatterjee P, Deka GC (2017) The internet of things: foundation for smart cities, EHealth, and ubiquitous computing. CRC Press, Boca Raton
Book Google Scholar
Arslanturk S, Siadat M-R, Ogunyemi T, Killinger K, Diokno A (2016) Analysis of incomplete and inconsistent clinical survey data. Knowl Inform Syst 46(3):731–750
Article Google Scholar
Beaulieu-Jones BK, Moore JH (2017) Missing data imputation in the electronic health record using deeply learned autoencoders. In: Pacific Symposium on Biocomputing 2017, pp. 207–218. World Scientific
Bhadoria RS, Bajpai D (2019) Stabilizing sensor data collection for control of environment-friendly clean technologies using internet of things. Wirel Personal Commun 108(1):493–510
Article Google Scholar
Carpenter GA, Grossberg S (2017) Adaptive resonance theory. Springer, Berlin
Book Google Scholar
Chan LS, Dunn OJ (1972) The treatment of missing values in discriminant analysisi. the sampling experiment. J Am Stat Assoc 67(338):473–477
MATH Google Scholar
Chen M, Hao Y, Hwang K, Wang L, Wang L (2017) Disease prediction by machine learning over big data from healthcare communities. Ieee Access 5:8869–8879
Article Google Scholar
Davis D, Rahman M (2016) Missing value imputation using stratified supervised learning for cardiovascular data. J. Inf. Data Min 1(2):1–13
Google Scholar
Elter M, Schulz-Wendtland R, Wittenberg T (2007) The prediction of breast cancer biopsy outcomes using two cad approaches that both emphasize an intelligible decision process. Med Phys 34(11):4164–4172
Article Google Scholar
Ford BL (1983) An overview of hot-deck procedures. Incomplete Data Sample Surv 2(Part IV):185–207
Google Scholar
Haji-Maghsoudi S, Rastegari A, Garrusi B, Baneshi MR (2018) Addressing the problem of missing data in decision tree modeling. J Appl Stat 45(3):547–557
Article MathSciNet Google Scholar
Imani F, Cheng C, Chen R, Yang H (2019) Nested gaussian process modeling and imputation of high-dimensional incomplete data under uncertainty. IISE Trans Healthc Syst Eng 9(4):315–326
Article Google Scholar
Jerez JM, Molina I, García-Laencina PJ, Alba E, Ribelles N, Martín M, Franco L (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artificial Intell Med 50(2):105–115
Article Google Scholar
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmospheric Environ 38(18):2895–2907
Article Google Scholar
Kayal CK, Bagchi S, Dhar D, Maitra T, Chatterjee S (2019) Hepatocellular carcinoma survival prediction using deep neural network. In: Proceedings of international ethical hacking conference 2018, pp. 349–358. Springer
Kurt I, Ture M, Kurum AT (2008) Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl 34(1):366–374
Article Google Scholar
LaFreniere D, Zulkernine F, Barber D, Martin K (2016) Using machine learning to predict hypertension from a clinical dataset. In: 2016 IEEE symposium series on computational intelligence (SSCI), pp. 1–7. IEEE
Mazumder RS, Bhadoria RS, Deka GC (eds) (2017) Distributed computing in big data analytics. Concepts, technologies and applications. Springer, Cham
Momeni A, Pincus M, Libien J (2018) Imputation and missing data. In: Introduction to statistical methods in pathology. Springer, Cham, pp 185–200
Chapter Google Scholar
Nguyen DV, Wang N, Carroll RJ (2004) Evaluation of missing value estimation for microarray data. J Data Sci 2(4):347–370
Google Scholar
Penny KI, Chesney T (2006) Imputation methods to deal with missing values when data mining trauma injury data. In: 28th international conference on information technology interfaces, 2006, pp. 213–218. IEEE
Rahman MM (2014) Machine learning based data pre-processing for the purpose of medical data mining and decision support. PhD thesis, University of Hull
Rubin DB (2004) Multiple imputation for nonresponse in surveys, vol 81. Wiley, Hoboken
MATH Google Scholar
Santos MS, Abreu PH, García-Laencina PJ, Simão A, Carvalho A (2015) A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J Biomed Inform 58:49–59
Article Google Scholar
Sen S, Das M, Chatterjee R (2018) Estimation of incomplete data in mixed dataset. In: Progress in intelligent computing techniques: theory, practice, and applications. Springer, Singapore, pp 483–492
Chapter Google Scholar
Shobha K, Nickolas S (2019) Imputation of multivariate attribute values in big data. In: Smart intelligent computing and applications. Springer, Singapore, pp 53–60
Chapter Google Scholar
Sokat KY, Dolinskaya IS, Smilowitz K, Bank R (2018) Incomplete information imputation in limited data environments with application to disaster response. Europ J Oper Res 269(2):466–485
Article Google Scholar
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for dna microarrays. Bioinformatics 17(6):520–525
Article Google Scholar
Turabieh H, Salem AA, Abu-El-Rub N (2018) Dynamic l-rnn recovery of missing data in iomt applications. Future Generation Comput Syst 89:575–583
Article Google Scholar
Tutz G, Ramzan S (2015) Improved methods for the imputation of missing data by nearest neighbor methods. Comput Stat Data Anal 90:84–99
Article MathSciNet Google Scholar
Van der Heijden GJ, Donders ART, Stijnen T, Moons KG (2006) Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 59(10):1102–1109
Article Google Scholar
Verma H, Kumar S (2019) An accurate missing data prediction method using lstm based deep learning for health care. In: Proceedings of the 20th international conference on distributed computing and networking, pp. 371–376. ACM

Download references

Author information

Authors and Affiliations

Department of Computer Applications, National Institute of Technology, Tiruchirappalli, India
K. Shobha & Nickolas Savarimuthu

Authors

K. Shobha
View author publications
You can also search for this author in PubMed Google Scholar
Nickolas Savarimuthu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Shobha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shobha, K., Savarimuthu, N. Clustering based imputation algorithm using unsupervised neural network for enhancing the quality of healthcare data. J Ambient Intell Human Comput 12, 1771–1781 (2021). https://doi.org/10.1007/s12652-020-02250-1

Download citation

Received: 03 December 2019
Accepted: 17 June 2020
Published: 30 June 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s12652-020-02250-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering based imputation algorithm using unsupervised neural network for enhancing the quality of healthcare data

Abstract

Access this article

Similar content being viewed by others

Exploration of Neural Network Imputation Methods for Medical Datasets

Empirical comparison of supervised learning techniques for missing value imputation

A Novel Hybrid Imputation Method to Predict Missing Values in Medical Datasets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering based imputation algorithm using unsupervised neural network for enhancing the quality of healthcare data

Abstract

Access this article

Similar content being viewed by others

Exploration of Neural Network Imputation Methods for Medical Datasets

Empirical comparison of supervised learning techniques for missing value imputation

A Novel Hybrid Imputation Method to Predict Missing Values in Medical Datasets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation