Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest

Sumit, Shahriar Shakir; Watada, Junzo; Nasrin, Fatema; Ahmed, Nafiz Ishtiaque; Rambli, D. R. A.

doi:10.1007/978-3-030-49536-7_8

Shahriar Shakir Sumit⁴,
Junzo Watada⁴,
Fatema Nasrin⁵,
Nafiz Ishtiaque Ahmed⁵ &
…
D. R. A. Rambli⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 899))

454 Accesses
2 Citations

Abstract

Imputing missing data plays a pivotal role in minimizing the biases of knowledge in computational data. The principal purpose of this paper is to establish a better approach to dealing with missing data. Clinical data often contain erroneous data, which cause major drawbacks for analysis. In this paper, we present a new dynamic approach for managing missing data in biomedical databases in order to improve overall modeling accuracy. We propose a reinforcement Bayesian regression model. Furthermore; we compare the Bayesian Regression and the random forest dynamically under a reinforcement approach to minimize the ambiguity of knowledge. Our result indicates that the imputation method of random forest scores better than the Bayesian regression in several cases. At best the reinforcement Bayesian regression scores over 85% under range condition of 5% missing data. The reinforcement Bayesian regression performs over 70% accuracy for imputing missing medical data in overall condition. However; the proposed reinforcement Bayesian regression models imputed missing data on over 70% cases are exactly identical to the missing value, which is remarkably making the advantage of the study. This approach significantly improves the accuracy of imputing missing data for clinical research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biometrics Biostatistics 6(1), 1–6 (2015)
Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley, New York (2002)
Book MATH Google Scholar
Watada, J., Shi, C., Yabuuchi, Y., Yusof, R., Sahri, Z.: A rough set approach to data imputation and its application to a dissolved gas analysis dataset. In: 2016 Third International Conference on Computing Measurement Control and Sensor Network, pp. 24–27 (2016)
Google Scholar
Sahri, Z., Yusof, R., Watada, J.: FINNIM: iterative imputation of missing values in dissolved gas analysis dataset. IEEE Trans. Ind. Informatics 10(4), 2093–2102 (2014)
Article Google Scholar
Bennett, Derrick A.: How can I deal with missing data in my study? Australian New Zealand J. Public Health 25(5), 464–469 (2001)
Article Google Scholar
Breiman, Leo: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Pantanowitz, A., Marwala, T.: Evaluating the impact of missing data imputation through the use of the random forest algorithm. arXiv:0812.2412 (2008)
Saravana, R.: Medical big data classification using a combination of random forest classifier and k-means clustering. Int. J. Intell. Syst. Appl. (IJISA) 10(11), 11–19 (2018)
Google Scholar
Mason, Alexina, Richardson, Sylvia, Plewis, Ian, Best, Nicky: Strategy for modelling non-random missing data mechanisms in observational studies using Bayesian methods. J. Official Stat. 28(2), 279–302 (2012)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Google Scholar
Efron, B., HatieE, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MathSciNet MATH Google Scholar
Studies, T.E.: Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81(11), 3178–3192 (2000)
Article Google Scholar
de la Fuente, Angel, Doménech, Rafael: Human capital in growth regressions: how much difference does data quality make? An update and further results. J. Eur. Econ. Assoc 4, 1–36 (2006)
Article Google Scholar
State, T.P.: Toward best practices in analyzing datasets with missing data: comparisons and recommendations. J. Marriage Fam. 73(October), 926–945 (2011)
Google Scholar
Taylor, P., Horton, N.J., Kleinman, K.P., Horton, N.J., Kleinman, K.P.: Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 61(1), 79–90 (2007)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, 32610, Perak, Malaysia
Shahriar Shakir Sumit, Junzo Watada & D. R. A. Rambli
Department of Computer Science, American International University-Bangladesh, Dhaka, 1229, Bangladesh
Fatema Nasrin & Nafiz Ishtiaque Ahmed

Authors

Shahriar Shakir Sumit
View author publications
You can also search for this author in PubMed Google Scholar
Junzo Watada
View author publications
You can also search for this author in PubMed Google Scholar
Fatema Nasrin
View author publications
You can also search for this author in PubMed Google Scholar
Nafiz Ishtiaque Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
D. R. A. Rambli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junzo Watada .

Editor information

Editors and Affiliations

Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA
Vladik Kreinovich
Informatics Division, Thang Long University, Hanoi, Vietnam
Nguyen Hoang Phuong

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sumit, S.S., Watada, J., Nasrin, F., Ahmed, N.I., Rambli, D.R.A. (2021). Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest. In: Kreinovich, V., Hoang Phuong, N. (eds) Soft Computing for Biomedical Applications and Related Topics. Studies in Computational Intelligence, vol 899. Springer, Cham. https://doi.org/10.1007/978-3-030-49536-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-49536-7_8
Published: 30 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49535-0
Online ISBN: 978-3-030-49536-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics