Skip to main content
Log in

Large-scale paralleled sparse principal component analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original variables. Sparse PCA (SPCA) aims to balance statistical fidelity and interpretability by approximating sparse PCs whose projections capture the maximal variance of original data. In this paper we present an efficient and paralleled method of SPCA using graphics processing units (GPUs), which can process large blocks of data in parallel. Specifically, we construct parallel implementations of the four optimization formulations of the generalized power method of SPCA (GP-SPCA), one of the most efficient and effective SPCA approaches, on a GPU. The parallel GPU implementation of GP-SPCA (using CUBLAS) is up to eleven times faster than the corresponding CPU implementation (using CBLAS), and up to 107 times faster than a MatLab implementation. Extensive comparative experiments in several real-world datasets confirm that SPCA offers a practical advantage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. d’Aspremont A, El Ghaoui L, Jordan MI, Lanckriet GRG (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev 49:434–448

    Article  MathSciNet  Google Scholar 

  2. D’Aspremont A, Bach FR, El Ghaoui L (2008) Optimal solutions for sparse principal component analysis. J Mach Learn Res 9:1269–1294

    MathSciNet  MATH  Google Scholar 

  3. K. Bache and M. Lichman (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science

  4. Cadima J, Jolliffe IT (1995) Loadings and correlations in the interpretation of principal components. J Appl Stat 22:203–214

    Article  MathSciNet  Google Scholar 

  5. Cai D, He X, Han J, Huang T (2011) Graph regularized Non-negative matrix factorization for data representation. IEEE Trans PAM 33(8):1548–1560

    Article  Google Scholar 

  6. Cai D, He X, Han J (2011) Speed Up kernel discriminant analysis. VLDB J 20(1):21–33

    Article  Google Scholar 

  7. Cheng-Chieh C, Huei-Fang Y (2013) Quick browsing and retrieval for surveillance videos. Multimedia Tools Appl. doi:10.1007/s11042-013-1750-z

    Article  Google Scholar 

  8. Youtian D, Feng C, Wenli X, Xueming Q (2013) Video content categorization using the double decomposition. Multimedia Tools Appl. doi:10.1007/s11042-012-1213-y

    Article  Google Scholar 

  9. Mark Galassi, Jim Davies, James Theiler, Brian Gough, et al. (2003)GNU Scientific Library

  10. Guan N, Tao D, Luo Z, Yuan B (2012) Online nonnegative matrix factorization with robust stochastic approximation. IEEE Trans Neural Netw Learning Syst 23(7):1087–1099

    Article  Google Scholar 

  11. Guan N, Tao D, Luo Z, Yuan B (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process 60(6):2882–2898

    Article  MathSciNet  Google Scholar 

  12. Guan N, Tao D, Luo Z, Yuan B (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process 20(7):2030–2048

    Article  MathSciNet  Google Scholar 

  13. Guan N, Tao D, Luo Z, Yuan B (2011) Non-negative patch alignment framework. IEEE Trans Neural Netw 22(8):1218–1230

    Article  Google Scholar 

  14. Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554

    Article  Google Scholar 

  15. Jolliffe IT (1986) Principal component analysis. Springer Verlag, New York

    Book  Google Scholar 

  16. Jolliffe IT (1995) Rotation of principal components: choice of normalization constraints. J Appl Stat 22:29–35

    Article  MathSciNet  Google Scholar 

  17. Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12(3):531–547

    Article  MathSciNet  Google Scholar 

  18. Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalize power method for sparse principal component analysis. J Mach Learn Res 11:517–553

    MathSciNet  MATH  Google Scholar 

  19. Li J, Allinson NM, Tao D, Li X (2006) Multitraining support vector machine for image retrieval. IEEE Trans Image Process 15(11):3597–3601

    Article  Google Scholar 

  20. Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEEE Trans Image Process 22:2676–2687

    Article  MathSciNet  Google Scholar 

  21. Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118:50–60

    Article  Google Scholar 

  22. Fanty, Mark, and Ronald Cole. (1990) “Spoken Letter Recogniitiion

  23. Moghaddam B, Weiss Y, Avidan S (2006) Spectral bounds for sparse PCA: exact and greedy algorithms. Advances in neural information processing systems, vol 18. MIT Press, Cambridge, pp 915–922

    Google Scholar 

  24. S. A. Nene, S. K. Nayar and H. Murase (1996) Columbia Object Image Library (COIL-20). Technical Report CUCS-005-96

  25. NVIDIA, CUDA C Programming Guide (version 4.0), (2011)

  26. NVIDIA, CUBLAS Library (2011)

  27. J. Sun, D. Tao, C. Faloutsos (2006) Beyond streams and graphs: dynamic tensor analysis. KDD: 374–383

  28. Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715

    Article  Google Scholar 

  29. Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099

    Article  Google Scholar 

  30. Tao D, Tang X, Li X, Rui Y (2006) Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm. IEEE Trans on Multimed 8(4):716–727

    Article  Google Scholar 

  31. Tao D, Li X, Wu X, Maybank SJ (2009) Geometric mean for subspace selection. IEEE Trans Pattern Anal Mach Intell 31(2):260–274

    Article  Google Scholar 

  32. Xu C, Tao D, Xu C (2014) Large-margin multi-view information bottleneck. IEEE Trans Pattern Anal Mach Intell. doi:10.1109/TPAMI.2013.2296528

    Article  Google Scholar 

  33. Zha Z-J, Wang M, Zheng Y-T, Yang Y, Hong R (2012) Tat-seng Chua: interactive video indexing with statistical active learning. IEEE Trans Multimedia 14(1):17–27

    Article  Google Scholar 

  34. Zheng-Jun Zha, Xian-Sheng Hua, Tao Mei, Jingdong Wang, Guo-Jun Qi, Zengfu Wang (2008) Joint multi-label multi-instance learning for image classification. CVPR

  35. Yan-Tao Z, Zheng-Jun Z, Tat-Seng C (2011) Research and applications on georeferenced multimedia: a survey. Multimed Tools Appl 51(1):77–98

    Article  Google Scholar 

  36. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the following projects: the National Natural Science Foundation of China (61271407, 61301242), Shandong Provincial Natural Science Foundation, China (ZR2011FQ016), the Fundamental Research Funds for the Central Universities, China University of Petroleum (East China) (13CX02096A, CX2013057, 27R1105019A).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Tao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, W., Zhang, H., Tao, D. et al. Large-scale paralleled sparse principal component analysis. Multimed Tools Appl 75, 1481–1493 (2016). https://doi.org/10.1007/s11042-014-2004-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2004-4

Keywords

Navigation