A Study of High-Dimensional Data Imputation Using Additive LASSO Regression Model

  • K. LavanyaEmail author
  • L. S. S. Reddy
  • B. Eswara Reddy
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 711)


With the rapid growth of computational domains, bioinformatics finance, engineering, biometrics, and neuroimaging emphasize the necessity for analyzing high-dimensional data. Many real-world datasets may contain hundreds or thousands of features. The common problem in most of the knowledge-based classification problems is quality and quantity of data. In general, the common problem with many high-dimensional data samples is that it contains missing or unknown attribute values, incomplete feature vectors, and uncertain or vague data which have to be handled carefully. Due to the presence of a large segment of missing values in the datasets, refined multiple imputation methods are required to estimate the missing values so that a fair and more consistent analysis can be achieved. In this paper, three imputation (MI) methods, mean, imputations predictive mean, and imputations by additive LASSO, are employed in cloud. Results show that imputations by additive LASSO are the preferred multiple imputation (MI) method.


High-dimensional data Multiple imputations Regression Missing data 


  1. 1.
    Fanyu Bu, Zhikui Chen, Qingchen Zhang Laurence T. Yang,” Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud, J Supercomput, (2016) 72:2977–2990.Google Scholar
  2. 2.
    Rubin, D.B.: Multiple imputation for nonresponse in surveys, 1st ed., New York: John Wiley and Sons, Inc., (1987). 258 pages.Google Scholar
  3. 3.
    Schafer, J.L.: Multiple imputation: a primer, Statistical Methods in Medical Research, 8, (1999). 3–15.CrossRefGoogle Scholar
  4. 4.
    Little, R.J.A. and Rubin, D.B.: Statistical analysis with missing data, 2nd ed., New York: John Wiley and Sons, Inc., (2002). 381 pages.Google Scholar
  5. 5.
    Little, R.J.A.: A test of missing completely at random for multivariate data with missing values, Journal of American Statistical Association, 83, (1988). 1198–1202.MathSciNetCrossRefGoogle Scholar
  6. 6.
    Little, R.: Calibrated Bayes, for Statistics in general, and missing data in particular, Statistical Science, 26, (2011). 162–174.MathSciNetCrossRefGoogle Scholar
  7. 7.
    Rubin, D.B. and Schemer, N.: Multiple imputation in health-care databases: An overview and some applications, Statistics in Medicine, 10, (1991). 585–598.CrossRefGoogle Scholar
  8. 8.
    Schafer, J.L. and Olsen, M.K.: Multiple imputation for multivariate missing-data problems: A data analyst’s perspective, Multivariate Behavioral Research, 33, (1998). 545–571.CrossRefGoogle Scholar
  9. 9.
    Schneider, T.: Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values, Journal of Climate, 14, (2001). 853–871.CrossRefGoogle Scholar
  10. 10.
    Jolani, S., Debray, T., Koffijberg, H., van Buuren, S., and Moons, K.: Imputation of systematically missing predictors in an individual participant data meta-analysis: a generalized approach using MICE. Statistics in Medicine, 34(11): (2015). 1841–1863.MathSciNetCrossRefGoogle Scholar
  11. 11.
    Kropko, J., Goodrich, B., Gelman, A., and Hill, J.: Multiple imputation for continuous and categorical data: Comparing joint multivariate normal and conditional approaches. Political Analysis, 22(4): (2014). 497–519.CrossRefGoogle Scholar
  12. 12.
    Langan, D., Higgins, J., and Simmonds, M.: Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Research Synthesis Methods. To appear. (2016).Google Scholar
  13. 13.
    Lassus, J., Gayat, E., Mueller, C., Peacock, W., Spinar, J., Harjola, V., van Kimmenade, R., Pathak, A., Mueller, T., and et al. (2013). Incremental value of biomarkers to clinical variables for mortality prediction in acutely decompensated heart failure: the Multinational Observational Cohort on Acute Heart Failure (MOCA) study. International Journal of Cardiology, 168(3):2186–2194.CrossRefGoogle Scholar
  14. 14.
    Quartagno, M. and Carpenter, J.: Multiple imputation for IPD meta-analysis: allowing for heterogeneity and studies with missing covariates. Statistics in Medicine, 35(17): (2016). 2938–2954.MathSciNetCrossRefGoogle Scholar
  15. 15.
    Yucel, R.: Random-covariances and mixed-effects models for imputing multivariate multilevel continuous data. Statistical modelling, 11(4): (2011). 351–370.MathSciNetCrossRefGoogle Scholar
  16. 16.
    Erler, N., Rizopoulos, D., van Rosmalen, J., Jaddoe, V., Franco, O., and Lesaffre, E.: Dealing with missing covariates in epidemiologic studies: A comparison between multiple imputation and a full Bayesian approach. StatMed. (2016).Google Scholar
  17. 17.
    van Buuren, S.: Flexible Imputation of Missing Data (Chapman & Hall/CRC Interdisciplinary Statistics). Chapman and Hall/CRC. (2016).Google Scholar
  18. 18.
    Vink, G., Lazendic, G., and van Buuren, S.: Partitioned predictive mean matching as a multilevel imputation technique. Psychological Test and Assessment Modeling, 57(4): (2015). 577–594.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringJNTUAAnantapurIndia
  2. 2.Department of Computer Science and EngineeringKLUGunturIndia
  3. 3.Department of Computer ScienceJNTUAAnantapurIndia

Personalised recommendations