Abstract
Principal component analysis (PCA) and its dual—principal coordinate analysis (PCO)—are widely applied to unsupervised dimensionality reduction. In this paper, we show that PCA and PCO can be carried out under regression frameworks. Thus, it is convenient to incorporate sparse techniques into the regression frameworks. In particular, we propose a sparse PCA model and a sparse PCO model. The former is to find sparse principal components, while the latter directly calculates sparse principal coordinates in a low-dimensional space. Our models can be solved by simple and efficient iterative procedures. Finally, we discuss the relationship of our models with other existing sparse PCA methods and illustrate empirical comparisons for these sparse unsupervised dimensionality reduction methods. The experimental results are encouraging.
Chapter PDF
Similar content being viewed by others
Keywords
- Singular Value Decomposition
- Synthetic Dataset
- Sparse Principal Component Analysis
- Coordinate Matrix
- Procrustes Problem
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Clemmensen, L., Hastie, T., Erbøll, B.: Sparse discriminant analysis. Technical report (June 2008)
d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation for sparse PCA using semidefinite programming. SIAM Review 49(3), 434–448 (2007)
Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218 (1936)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression (with discussions). The Annals of Statistics 32(2), 407–499 (2004)
Elden, L., Park, H.: A procrustes problem on the stiefel manifold. Numerische Mathematik (1999)
Friedman, J.H., Hastie, T., Hoefling, H., Tibshirani, R.: Pathwise coordinate optimization. The Annals of Applied Statistics 2(1), 302–332 (2007)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–536 (1999)
Gower, J.C.: Some distance properties of latent root and vector methods used in multivariate data analysis. Biometrika 53, 315–328 (1966)
Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press, Oxford (2004)
Hastie, T., Tibshirani, R., Buja, A.: Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association 89(428), 1255–1270 (1994)
Jeffers, J.: Two case studies in the application of principal compnent. Appl. Statist. 16, 225–236 (1967)
Jolliffe, I.T.: Principal component analysis, 2nd edn. Springer, New York (2002)
Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the lasso. Journal of Computational and Graphical Statistics 12, 531 (2003)
Magnus, J.R., Neudecker, H.: Matrix Calculus with Applications in Statistics and Econometric. John Wiley & Sons, New York (1999) (revised edn.)
Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, New York (1979)
Shen, H., Huang, J.: Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015–1034 (2008)
Sriperumbudur, B.K., Torres, D., Lanckriet, G.R.G.: Sparse eigen methods by d.c. programming. In: ICML (2007)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288 (1996)
Witten, M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatisitics 10(3), 515–534 (2009)
Zhang, Z., Dai, G.: Optimal scoring for unsupervised learning. In: Advances in Neural Information Processing Systems, vol. 23 (2009)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67, 301–320 (2005)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. Journal of Computational and Graphical Statistics 15, 265–286 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dou, W., Dai, G., Xu, C., Zhang, Z. (2010). Sparse Unsupervised Dimensionality Reduction Algorithms. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15880-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-15880-3_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15879-7
Online ISBN: 978-3-642-15880-3
eBook Packages: Computer ScienceComputer Science (R0)