Abstract
Imputing missing data plays a pivotal role in minimizing the biases of knowledge in computational data. The principal purpose of this paper is to establish a better approach to dealing with missing data. Clinical data often contain erroneous data, which cause major drawbacks for analysis. In this paper, we present a new dynamic approach for managing missing data in biomedical databases in order to improve overall modeling accuracy. We propose a reinforcement Bayesian regression model. Furthermore; we compare the Bayesian Regression and the random forest dynamically under a reinforcement approach to minimize the ambiguity of knowledge. Our result indicates that the imputation method of random forest scores better than the Bayesian regression in several cases. At best the reinforcement Bayesian regression scores over 85% under range condition of 5% missing data. The reinforcement Bayesian regression performs over 70% accuracy for imputing missing medical data in overall condition. However; the proposed reinforcement Bayesian regression models imputed missing data on over 70% cases are exactly identical to the missing value, which is remarkably making the advantage of the study. This approach significantly improves the accuracy of imputing missing data for clinical research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biometrics Biostatistics 6(1), 1–6 (2015)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley, New York (2002)
Watada, J., Shi, C., Yabuuchi, Y., Yusof, R., Sahri, Z.: A rough set approach to data imputation and its application to a dissolved gas analysis dataset. In: 2016 Third International Conference on Computing Measurement Control and Sensor Network, pp. 24–27 (2016)
Sahri, Z., Yusof, R., Watada, J.: FINNIM: iterative imputation of missing values in dissolved gas analysis dataset. IEEE Trans. Ind. Informatics 10(4), 2093–2102 (2014)
Bennett, Derrick A.: How can I deal with missing data in my study? Australian New Zealand J. Public Health 25(5), 464–469 (2001)
Breiman, Leo: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Pantanowitz, A., Marwala, T.: Evaluating the impact of missing data imputation through the use of the random forest algorithm. arXiv:0812.2412 (2008)
Saravana, R.: Medical big data classification using a combination of random forest classifier and k-means clustering. Int. J. Intell. Syst. Appl. (IJISA) 10(11), 11–19 (2018)
Mason, Alexina, Richardson, Sylvia, Plewis, Ian, Best, Nicky: Strategy for modelling non-random missing data mechanisms in observational studies using Bayesian methods. J. Official Stat. 28(2), 279–302 (2012)
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Efron, B., HatieE, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Studies, T.E.: Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81(11), 3178–3192 (2000)
de la Fuente, Angel, Doménech, Rafael: Human capital in growth regressions: how much difference does data quality make? An update and further results. J. Eur. Econ. Assoc 4, 1–36 (2006)
State, T.P.: Toward best practices in analyzing datasets with missing data: comparisons and recommendations. J. Marriage Fam. 73(October), 926–945 (2011)
Taylor, P., Horton, N.J., Kleinman, K.P., Horton, N.J., Kleinman, K.P.: Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 61(1), 79–90 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sumit, S.S., Watada, J., Nasrin, F., Ahmed, N.I., Rambli, D.R.A. (2021). Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest. In: Kreinovich, V., Hoang Phuong, N. (eds) Soft Computing for Biomedical Applications and Related Topics. Studies in Computational Intelligence, vol 899. Springer, Cham. https://doi.org/10.1007/978-3-030-49536-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-49536-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49535-0
Online ISBN: 978-3-030-49536-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)