Abstract
Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original variables. Sparse PCA (SPCA) aims to balance statistical fidelity and interpretability by approximating sparse PCs whose projections capture the maximal variance of original data. In this paper we present an efficient and paralleled method of SPCA using graphics processing units (GPUs), which can process large blocks of data in parallel. Specifically, we construct parallel implementations of the four optimization formulations of the generalized power method of SPCA (GP-SPCA), one of the most efficient and effective SPCA approaches, on a GPU. The parallel GPU implementation of GP-SPCA (using CUBLAS) is up to eleven times faster than the corresponding CPU implementation (using CBLAS), and up to 107 times faster than a MatLab implementation. Extensive comparative experiments in several real-world datasets confirm that SPCA offers a practical advantage.
Similar content being viewed by others
References
d’Aspremont A, El Ghaoui L, Jordan MI, Lanckriet GRG (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev 49:434–448
D’Aspremont A, Bach FR, El Ghaoui L (2008) Optimal solutions for sparse principal component analysis. J Mach Learn Res 9:1269–1294
K. Bache and M. Lichman (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
Cadima J, Jolliffe IT (1995) Loadings and correlations in the interpretation of principal components. J Appl Stat 22:203–214
Cai D, He X, Han J, Huang T (2011) Graph regularized Non-negative matrix factorization for data representation. IEEE Trans PAM 33(8):1548–1560
Cai D, He X, Han J (2011) Speed Up kernel discriminant analysis. VLDB J 20(1):21–33
Cheng-Chieh C, Huei-Fang Y (2013) Quick browsing and retrieval for surveillance videos. Multimedia Tools Appl. doi:10.1007/s11042-013-1750-z
Youtian D, Feng C, Wenli X, Xueming Q (2013) Video content categorization using the double decomposition. Multimedia Tools Appl. doi:10.1007/s11042-012-1213-y
Mark Galassi, Jim Davies, James Theiler, Brian Gough, et al. (2003)GNU Scientific Library
Guan N, Tao D, Luo Z, Yuan B (2012) Online nonnegative matrix factorization with robust stochastic approximation. IEEE Trans Neural Netw Learning Syst 23(7):1087–1099
Guan N, Tao D, Luo Z, Yuan B (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process 60(6):2882–2898
Guan N, Tao D, Luo Z, Yuan B (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process 20(7):2030–2048
Guan N, Tao D, Luo Z, Yuan B (2011) Non-negative patch alignment framework. IEEE Trans Neural Netw 22(8):1218–1230
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Jolliffe IT (1986) Principal component analysis. Springer Verlag, New York
Jolliffe IT (1995) Rotation of principal components: choice of normalization constraints. J Appl Stat 22:29–35
Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12(3):531–547
Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalize power method for sparse principal component analysis. J Mach Learn Res 11:517–553
Li J, Allinson NM, Tao D, Li X (2006) Multitraining support vector machine for image retrieval. IEEE Trans Image Process 15(11):3597–3601
Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEEE Trans Image Process 22:2676–2687
Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118:50–60
Fanty, Mark, and Ronald Cole. (1990) “Spoken Letter Recogniitiion
Moghaddam B, Weiss Y, Avidan S (2006) Spectral bounds for sparse PCA: exact and greedy algorithms. Advances in neural information processing systems, vol 18. MIT Press, Cambridge, pp 915–922
S. A. Nene, S. K. Nayar and H. Murase (1996) Columbia Object Image Library (COIL-20). Technical Report CUCS-005-96
NVIDIA, CUDA C Programming Guide (version 4.0), (2011)
NVIDIA, CUBLAS Library (2011)
J. Sun, D. Tao, C. Faloutsos (2006) Beyond streams and graphs: dynamic tensor analysis. KDD: 374–383
Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715
Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099
Tao D, Tang X, Li X, Rui Y (2006) Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm. IEEE Trans on Multimed 8(4):716–727
Tao D, Li X, Wu X, Maybank SJ (2009) Geometric mean for subspace selection. IEEE Trans Pattern Anal Mach Intell 31(2):260–274
Xu C, Tao D, Xu C (2014) Large-margin multi-view information bottleneck. IEEE Trans Pattern Anal Mach Intell. doi:10.1109/TPAMI.2013.2296528
Zha Z-J, Wang M, Zheng Y-T, Yang Y, Hong R (2012) Tat-seng Chua: interactive video indexing with statistical active learning. IEEE Trans Multimedia 14(1):17–27
Zheng-Jun Zha, Xian-Sheng Hua, Tao Mei, Jingdong Wang, Guo-Jun Qi, Zengfu Wang (2008) Joint multi-label multi-instance learning for image classification. CVPR
Yan-Tao Z, Zheng-Jun Z, Tat-Seng C (2011) Research and applications on georeferenced multimedia: a survey. Multimed Tools Appl 51(1):77–98
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286
Acknowledgments
This work was supported in part by the following projects: the National Natural Science Foundation of China (61271407, 61301242), Shandong Provincial Natural Science Foundation, China (ZR2011FQ016), the Fundamental Research Funds for the Central Universities, China University of Petroleum (East China) (13CX02096A, CX2013057, 27R1105019A).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, W., Zhang, H., Tao, D. et al. Large-scale paralleled sparse principal component analysis. Multimed Tools Appl 75, 1481–1493 (2016). https://doi.org/10.1007/s11042-014-2004-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2004-4