Optimal Linear Imputation with a Convergence Guarantee

Resheff, Yehezkel S.; Weinshall, Daphna

doi:10.1007/978-3-319-93647-5_4

Yehezkel S. Resheff^16,17 &
Daphna Weinshall¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10857))

Included in the following conference series:

International Conference on Pattern Recognition Applications and Methods

608 Accesses

Abstract

It is a common occurrence in the field of data science that real-world datasets, especially when they are high dimensional, contain missing entries. Since most machine learning, data analysis, and statistical methods are not able to handle missing values gracefully, these must be filled in prior to the application of these methods. It is no surprise therefore that there has been a long standing interest in methods for imputation of missing values. One recent, popular, and effective approach, the IRMI stepwise regression imputation method, models each feature as a linear combination of all other features. A linear regression model is then computed for each real-valued feature on the basis of all other features in the dataset, and subsequent predictions are used as imputation values. However, the proposed iterative formulation lacks a convergence guarantee. Here we propose a closely related method, stated as a single optimization problem, and a block coordinate-descent solution which is guaranteed to converge to a local minimum. Experiment results on both synthetic and benchmark datasets are comparable to the results of the IRMI method whenever it converges. However, while in the set of experiments described here IRMI often diverges, the performance of our method is shown to be markedly superior in comparison to other methods.

Invited extension of [29] – presented at the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM2017).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this case it would be perhaps more natural to train the model using data pooled over the various copies of the completed data rather than train separate models and average the resulting parameters and structure. This is indeed done artificially in methods such as denoinsing neural nets [37], and has been known to be useful for data imputation [6].
2.
Alternatively, in order to stay close in spirit to the linear IRMI method, we may prefer to use \((X+M)A\) as the imputed data, meaning the imputed values are in fact derived from the all other features using a linear model. Clearly, at the point of convergence of the algorithm the two are identical.
3.
Note that this is not a projection step. Recall that the optimization problem is only over elements \(M_{ij}\) where \(x_{ij}\) is a missing value, encoded by \(m_{ij}=1\). The element-wise multiplication of M by m guarantees that all other elements of M are assigned 0. Effectively, the gradient descent procedure does not treat them as independent variables, as required.

References

Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3) (2011)
Google Scholar
Comon, P., Luciani, X., De Almeida, A.L.: Tensor decompositions, alternating least squares and other tales. J. Chemometr. 23(7–8), 393–405 (2009)
Article Google Scholar
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47(4), 547–553 (2009)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Donders, A.R.T., van der Heijden, G.J., Stijnen, T., Moons, K.G.: Review: a gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)
Article Google Scholar
Duan, Y., Yisheng, L., Kang, W., Zhao, Y.: A deep learning based approach for traffic data imputation. In: 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), pp. 912–917. IEEE (2014)
Google Scholar
Engels, J.M., Diehr, P.: Imputation of missing longitudinal data: a comparison of methods. J. Clin. Epidemiol. 56(10), 968–976 (2003)
Article Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Article Google Scholar
García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010)
Article Google Scholar
Harrison, D., Rubinfeld, D.L.: Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 5(1), 81–102 (1978)
Article Google Scholar
Heitjan, D.F., Basu, S.: Distinguishing missing at random and missing completely at random. Am. Stat. 50(3), 207–213 (1996)
MathSciNet Google Scholar
Hope, T., Shahaf, D.: Ballpark learning: estimating labels from rough group comparisons. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9852, pp. 299–314. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46227-1_19
Chapter Google Scholar
Horton, N.J., Kleinman, K.P.: Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 61(1), 79–90 (2007)
Article MathSciNet Google Scholar
Horton, P., Nakai, K.: A probabilistic classification system for predicting the cellular localization sites of proteins. In: Ismb. vol. 4, pp. 109–115 (1996)
Google Scholar
Jacobusse, G.: WinMICE users manual. TNO quality of life, Leiden (2005). http://www.multiple-imputation.com
Kandaswamy, C., Silva, L.M., Alexandre, L.A., Sousa, R., Santos, J.M., de Sá, J.M.: Improving transfer learning accuracy by reusing stacked denoising autoencoders. In: 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 1380–1387. IEEE (2014)
Google Scholar
Kim, H., Park, H.: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J. Matrix Anal. Appl. 30(2), 713–730 (2008)
Article MathSciNet Google Scholar
Kroonenberg, P.M., De Leeuw, J.: Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45(1), 69–97 (1980)
Article MathSciNet Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Little, R.J.: A test of missing completely at random for multivariate data with missing values. J. Am. Stat. Assoc. 83(404), 1198–1202 (1988)
Article MathSciNet Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)
MATH Google Scholar
Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: Interspeech, pp. 436–440 (2013)
Google Scholar
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_7
Chapter Google Scholar
Pigott, T.D.: A review of methods for missing data. Educ. Res. Eval. 7(4), 353–383 (2001)
Article Google Scholar
Raghunathan, T.E., Lepkowski, J.M., Van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol. 27(1), 85–96 (2001)
Google Scholar
Resheff, Y.S., Rotics, S., Harel, R., Spiegel, O., Nathan, R.: Accelerater: a web application for supervised learning of behavioral modes from acceleration measurements. Mov. Ecol. 2(1), 25 (2014)
Article Google Scholar
Resheff, Y.S., Rotics, S., Nathan, R., Weinshall, D.: Matrix factorization approach to behavioral mode analysis from acceleration data. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 36678 2015, pp. 1–6. IEEE (2015)
Google Scholar
Resheff, Y.S., Rotics, S., Nathan, R., Weinshall, D.: Topic modeling of behavioral modes using sensor data. Int. J. Data Sci. Anal. 1(1), 51–60 (2016)
Article Google Scholar
Resheff, Y.S., Weinshal, D.: Optimized linear imputation. In: Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods, ICPRAM, vol. 1, pp. 17–25 (2017)
Google Scholar
Rotics, S., Kaatz, M., Resheff, Y.S., Turjeman, S.F., Zurell, D., Sapir, N., Eggers, U., Flack, A., Fiedler, W., Jeltsch, F., et al.: The challenges of the first migration: movement and behaviour of juvenile vs. adult white storks with insights regarding juvenile mortality. J. Anim. Ecol. 85(4), 938–947 (2016)
Article Google Scholar
Rubin, D.B.: Multiple imputation after 18+ years. J. Am. Stat. Assoc. 91(434), 473–489 (1996)
Article Google Scholar
Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biom. Biostat. (2015)
Google Scholar
Takane, Y., Young, F.W., De Leeuw, J.: Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika 42(1), 7–67 (1977)
Article Google Scholar
Templ, M., Kowarik, A., Filzmoser, P.: Iterative stepwise regression imputation using standard and robust methods. Comput. Stat. Data Anal. 55(10), 2793–2806 (2011)
Article MathSciNet Google Scholar
Tüfekci, P.: Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. Int. J. Electr. Power Energy Syst. 60, 126–140 (2014)
Article Google Scholar
Van Buuren, S., Oudshoorn, K.: Flexible multivariate imputation by MICE. TNO Prevention Center, Leiden, The Netherlands (1999)
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar
Wagner, A., Zuk, O.: Low-rank matrix recovery from row-and-column affine measurements. arXiv preprint arXiv:1505.06292 (2015)
Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems, pp. 341–349 (2012)
Google Scholar
Zhou, G., Sohn, K., Lee, H.: Online incremental feature learning with denoising autoencoders, Ann Arbor (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
Yehezkel S. Resheff & Daphna Weinshall
Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
Yehezkel S. Resheff

Authors

Yehezkel S. Resheff
View author publications
You can also search for this author in PubMed Google Scholar
Daphna Weinshall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yehezkel S. Resheff .

Editor information

Editors and Affiliations

Sapienza Università di Roma, Rome, Italy
Maria De Marsico
ICAR-CNR, Naples, Napoli, Italy
Gabriella Sanniti di Baja
University of Lisbon, Lisbon, Portugal
Ana Fred

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Resheff, Y.S., Weinshall, D. (2018). Optimal Linear Imputation with a Convergence Guarantee. In: De Marsico, M., di Baja, G., Fred, A. (eds) Pattern Recognition Applications and Methods. ICPRAM 2017. Lecture Notes in Computer Science(), vol 10857. Springer, Cham. https://doi.org/10.1007/978-3-319-93647-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-93647-5_4
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93646-8
Online ISBN: 978-3-319-93647-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics