Skip to main content

Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest

  • Chapter
  • First Online:
Soft Computing for Biomedical Applications and Related Topics

Part of the book series: Studies in Computational Intelligence ((SCI,volume 899))

Abstract

Imputing missing data plays a pivotal role in minimizing the biases of knowledge in computational data. The principal purpose of this paper is to establish a better approach to dealing with missing data. Clinical data often contain erroneous data, which cause major drawbacks for analysis. In this paper, we present a new dynamic approach for managing missing data in biomedical databases in order to improve overall modeling accuracy. We propose a reinforcement Bayesian regression model. Furthermore; we compare the Bayesian Regression and the random forest dynamically under a reinforcement approach to minimize the ambiguity of knowledge. Our result indicates that the imputation method of random forest scores better than the Bayesian regression in several cases. At best the reinforcement Bayesian regression scores over 85% under range condition of 5% missing data. The reinforcement Bayesian regression performs over 70% accuracy for imputing missing medical data in overall condition. However; the proposed reinforcement Bayesian regression models imputed missing data on over 70% cases are exactly identical to the missing value, which is remarkably making the advantage of the study. This approach significantly improves the accuracy of imputing missing data for clinical research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biometrics Biostatistics 6(1), 1–6 (2015)

    Google Scholar 

  2. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley, New York (2002)

    Book  MATH  Google Scholar 

  3. Watada, J., Shi, C., Yabuuchi, Y., Yusof, R., Sahri, Z.: A rough set approach to data imputation and its application to a dissolved gas analysis dataset. In: 2016 Third International Conference on Computing Measurement Control and Sensor Network, pp. 24–27 (2016)

    Google Scholar 

  4. Sahri, Z., Yusof, R., Watada, J.: FINNIM: iterative imputation of missing values in dissolved gas analysis dataset. IEEE Trans. Ind. Informatics 10(4), 2093–2102 (2014)

    Article  Google Scholar 

  5. Bennett, Derrick A.: How can I deal with missing data in my study? Australian New Zealand J. Public Health 25(5), 464–469 (2001)

    Article  Google Scholar 

  6. Breiman, Leo: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  7. Pantanowitz, A., Marwala, T.: Evaluating the impact of missing data imputation through the use of the random forest algorithm. arXiv:0812.2412 (2008)

  8. Saravana, R.: Medical big data classification using a combination of random forest classifier and k-means clustering. Int. J. Intell. Syst. Appl. (IJISA) 10(11), 11–19 (2018)

    Google Scholar 

  9. Mason, Alexina, Richardson, Sylvia, Plewis, Ian, Best, Nicky: Strategy for modelling non-random missing data mechanisms in observational studies using Bayesian methods. J. Official Stat. 28(2), 279–302 (2012)

    Google Scholar 

  10. Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)

    Google Scholar 

  11. Efron, B., HatieE, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  12. Studies, T.E.: Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81(11), 3178–3192 (2000)

    Article  Google Scholar 

  13. de la Fuente, Angel, Doménech, Rafael: Human capital in growth regressions: how much difference does data quality make? An update and further results. J. Eur. Econ. Assoc 4, 1–36 (2006)

    Article  Google Scholar 

  14. State, T.P.: Toward best practices in analyzing datasets with missing data: comparisons and recommendations. J. Marriage Fam. 73(October), 926–945 (2011)

    Google Scholar 

  15. Taylor, P., Horton, N.J., Kleinman, K.P., Horton, N.J., Kleinman, K.P.: Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 61(1), 79–90 (2007)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junzo Watada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sumit, S.S., Watada, J., Nasrin, F., Ahmed, N.I., Rambli, D.R.A. (2021). Imputing Missing Values: Reinforcement Bayesian Regression and Random Forest. In: Kreinovich, V., Hoang Phuong, N. (eds) Soft Computing for Biomedical Applications and Related Topics. Studies in Computational Intelligence, vol 899. Springer, Cham. https://doi.org/10.1007/978-3-030-49536-7_8

Download citation

Publish with us

Policies and ethics