Abstract
A challenging problem in machine learning is handling missing data, also known as imputation. Simple imputation techniques complete the missing data by the mean or the median values. A more sophisticated approach is to use regression to predict the missing data from the complete input columns. In case the dimension of the input data is high, dimensionality reduction methods may be applied to compactly describe the complete input. Then, a regression from the low-dimensional space to the incomplete data column can be constructed from imputation. In this work, we propose a two-step algorithm for data completion. The first step utilizes a non-linear manifold learning technique, named diffusion maps, for reducing the dimension of the data. This method faithfully embeds complex data while preserving its geometric structure. The second step is the Laplacian pyramids multi-scale method, which is applied for regression. Laplacian pyramids construct kernels of decreasing scales to capture finer modes of the data. Experimental results demonstrate the efficiency of our approach on a publicly available dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asif, M.T., Mitrovic, N., Garg, L., Dauwels, J., Jaillet, P.: Low-dimensional models for missing data imputation in road networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3527–3531 (2013)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)
Belkin, M., Niyogi, P.: Semi-supervised learning on Riemannian manifolds. Mach. Learn. 56, 209–239 (2004)
Chung, F.R.K.: Spectral Graph Theory. AMS Regional Conference Series in Mathematics (1997)
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21, 5–30 (2006)
Dsilva, C.J., Talmon, R., Rabin, N., Coifman, R.R., Kevrekidis, I.G.: Nonlinear intrinsic variables and state reconstruction in multiscale simulations. J. Chem. Phys. 139(18), 184109 (2013)
Fernández, Á., Rabin, N., Fishelov, D., Dorronsoro, J.R.: Auto-adaptative laplacian pyramids for high-dimensional data analysis. arXiv preprint arXiv:1311.6594
Fernández, Á., González, A.M., Díaz, J., Dorronsoro, J.R.: Diffusion maps for dimensionality reduction and visualization of meteorological data. Neurocomputing 163, 25–37 (2015)
Fernández, Á., Rabin, N., Fishelov, D., Dorronsoro, J.R.: Auto-adaptive Laplacian Pyramids. In: 24th European Symposium on Artificial Neural Networks. Computational Intelligence and Machine Learning, ESANN, pp. 59–64, Bruges, Belgium (2016)
Huisman, M.: Missing data in behavioral science research: investigation of a collection of data sets. Kwant. Methoden 57, 69–93 (1998)
Little, J.A.R., Rubin, B.D.: Statistical Analysis with Missing Data, 2nd edn. Wiley, Hoboken (2002)
Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators. In: Neural Information Processing Systems (NIPS), vol. 18 (2005)
Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and reaction coordinate of dynamical systems. Appl. Comput. Harmon. Anal. 21, 113–127 (2006)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. 2(11), 559–572 (1901)
Pierson, E., Yau, C.: ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16, 241 (2015)
UshaRani, Y., Sammulal, P.: An efficient disease prediction and classification using feature reduction based imputation technique. In: International Conference on Engineering & MIS (ICEMIS) (2016)
Rabin, N., Averbuch, A.: Detection of anomaly trends in dynamically evolving systems. In: 2010 AAAI Fall Symposium Series, pp. 44–49 (2010)
Rabin, N., Coifman, R.R.: Heterogeneous datasets representation and learning using diffusion maps and Laplacian pyramids. In: Proceedings of the 2012 SIAM International Conference on Data Mining, pp. 189–199 (2012)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Schclar, A.: A diffusion framework for dimensionality reduction. In: Maimon, O., Rokach, L. (eds.) Soft Computing for Knowledge Discovery and Data Mining, pp. 315–325. Springer, Heidelberg (2008). doi:10.1007/978-0-387-69935-6_13
Zhao, Z., Giannakis, D.: Analog forecasting with dynamics-adapted kernels. Nonlinearity 29, 2888 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Rabin, N., Fishelov, D. (2017). Missing Data Completion Using Diffusion Maps and Laplacian Pyramids. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-62392-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62391-7
Online ISBN: 978-3-319-62392-4
eBook Packages: Computer ScienceComputer Science (R0)