Advertisement

Imputation of Missing Values in the Fundamental Data: Using MICE Framework

  • Balasubramaniam Meghanadh
  • Lagesh Aravalath
  • Bhupesh Joshi
  • Raghunathan Sathiamoorthy
  • Manish Kumar
Original Article
  • 27 Downloads

Abstract

Revolutionary developments in the field of big data analytics and machine learning algorithms have transformed the business strategies of industries such as banking, financial services, asset management, and e-commerce. The most common problems these firms face while utilizing data is the presence of missing values in the dataset. The objective of this study is to impute fundamental data that is missing in financial statements. The study uses ‘Multiple Imputation by Chained Equations’ (MICE) framework by utilizing the interdependency among the variables that wholly comply with accounting rules. The proposed framework has two stages. The initial imputation is based on predictive mean matching in the first stage and resolving financial constraints in the second stage. The MICE framework allows us to incorporate accounting constraints in the imputation process. The performance tests conducted on the imputed dataset indicate that the imputed values for the 177 line items are good and in line with the expectations of subject matter experts.

Keywords

Multiple imputation MICE Fundamental data Accounting and financial statement 

JEL Classification

C13 C32 C51 C53 G20 

References

  1. Bouhlila, D.S., and F. Sellaouti. 2013. Multiple imputation using chained equations for missing data in TIMSS: a case study. Large-scale Assessments in Education 1: 1–33.CrossRefGoogle Scholar
  2. Buuren, S.V., and K. Groothuis-Oudshoorn. 2010. Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software. 45: 1–68.Google Scholar
  3. Van Buuren, S., J.P.L. Brand, C.G.M. Groothuis-Oudshoorn, and D.B. Rubin. 2006. Fully Conditional Specification in multivariate imputation. Journal of Statistical Computation and Simulation 76: 1049–1064.CrossRefGoogle Scholar
  4. De Waal, T. 2011. Handbook of statistical data editing and imputation. New York: Wiley.CrossRefGoogle Scholar
  5. Fogarty, D.J. 2006. Multiple imputation as a missing data approach to reject inference on consumer credit scoring. Interstat. 41: 1–41.Google Scholar
  6. Galler, B., and U. Kehral. 2012. Missing data methods in credit risk. Kirchberg: 5th European Risk Conference. (13–14 September 2012).Google Scholar
  7. He, Y., A.M. Zaslavsky, M.B. Landrum, D.P. Harrington, and P. Catalano. 2009. Multiple imputation in a large-scale complex survey: a practical guide. Statistical Methods in Medical Research 19: 653–670.CrossRefGoogle Scholar
  8. Kennickell, Arthur B. 1991. Imputation of the 1989 survey of consumer finances: stochastic relaxation and multiple imputation. Proceedings of the Survey Research Methods Section of the American Statistical Association 1 (10): 41.Google Scholar
  9. King, Gary, et al. 1998. List-wise deletion is evil: what to do about missing data in political science. Boston: Annual Meeting of the American Political Science Association.Google Scholar
  10. Kofman, P., and I.G. Sharpe. 2000. Imputation methods for incomplete dependent variables in finance. School of finance and economics. Sydney: University of Techology.Google Scholar
  11. Little, R.J., and D.B. Rubin. 2002. Statistical analysis with missing data. New York: Wiley.CrossRefGoogle Scholar
  12. Little, Roderick J.A. 1988. A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association 83 (404): 1198–1202.CrossRefGoogle Scholar
  13. Pagano, A., Perrotta, D., and S. Arsenis. 2012. Imputation and outlier detection in banking datasets. Paper presented at 46th SIS Scientific Meeting of the Italian Statistical Society, Rome.Google Scholar
  14. Raghunathan, T.E. 2001. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey methodology 27: 85–96.Google Scholar
  15. Rubin, D.B. 1976. Inference and missing data. Biometrika 63 (3): 581–592.CrossRefGoogle Scholar
  16. Rubin, D.B. 1987. Multiple imputation for nonresponse in surveys. New York: Wiley.CrossRefGoogle Scholar
  17. Rubin, D.B. 1996. Multiple imputation after 18 + years. Journal of the American statistical Association 91: 473–489.CrossRefGoogle Scholar
  18. Schafer, J.L. 1997. Analysis of incomplete multivariate data. Florida: CRC Press.CrossRefGoogle Scholar
  19. Schafer, J.L. 1999. Multiple imputation: a primer. Statistical Methods in Medical Research 8: 3–15.CrossRefGoogle Scholar
  20. Stuart, E.A., M. Azur, C. Frangakis, and P. Leaf. 2009. Multiple imputation with large data sets: a case study of the Children’s Mental Health Initiative. American Journal of Epidemiology 169: 1133–1139.CrossRefGoogle Scholar

Copyright information

© The Indian Econometric Society 2018

Authors and Affiliations

  • Balasubramaniam Meghanadh
    • 1
  • Lagesh Aravalath
    • 1
  • Bhupesh Joshi
    • 1
  • Raghunathan Sathiamoorthy
    • 1
  • Manish Kumar
    • 1
  1. 1.CRISIL GR&AChennaiIndia

Personalised recommendations