GPU-accelerated Large-Scale Non-negative Matrix Factorization Using Spark

Tang, Bing; Kang, Linyao; Xia, Yanmin; Zhang, Li

doi:10.1007/978-3-030-12981-1_13

Bing Tang¹⁹,
Linyao Kang¹⁹,
Yanmin Xia¹⁹ &
…
Li Zhang¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 268))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

899 Accesses
1 Citations

Abstract

Non-negative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data compress and its ability of extracting highly-interpretable parts from data sets, and it has also been applied to various fields, such as recommendations, image analysis, and text clustering. However, as the size of the matrix increases, the processing speed of non-negative matrix factorization algorithm is very slow. To solve this problem, this paper proposes a parallel algorithm based on GPU for NMF in Spark platform, which makes full use of the advantages of in-memory computation mode and GPU Single-Instruction Multiple-data Streams mode. The new GPU-accelerated NMF on Spark platform is evaluated in a 4-nodes Spark heterogeneous cluster using Google Compute Engine by configuring each node a NVIDIA K80 GPU card, and experimental results indicate that it is competitive in terms of computational time against the existing solutions on a variety of matrix orders. It can achieve a high speed-up, and also can effectively deal with the non-negative decomposition of higher-order matrices, which greatly improves the computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.jcuda.org.

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Kannan, R., Ballard, G., Park, H.: A high-performance parallel algorithm for nonnegative matrix factorization. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, 12–16 March 2016, pp. 9:1–9:11 (2016)
Google Scholar
Kysenko, V., Rupp, K., Marchenko, O., Selberherr, S., Anisimov, A.: GPU-accelerated non-negative matrix factorization for text mining. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 158–163. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31178-9_15
Chapter Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, Papers from Neural Information Processing Systems (NIPS), Denver, CO, USA, vol. 13, pp. 556–562. MIT Press (2000)
Google Scholar
Liao, R., Zhang, Y., Guan, J., Zhou, S.: CloudNMF: a MapReduce implementation of nonnegative matrix factorization for large-scale biological datasets. Genomics Proteomics Bioinf. 12(1), 48–51 (2014)
Article Google Scholar
Liu, C., Yang, H., Fan, J., He, L., Wang, Y.: Distributed nonnegative matrix factorization for web-scale dyadic data analysis on MapReduce. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010, pp. 681–690 (2010)
Google Scholar
Luo, X., Zhou, M., Xia, Y., Zhu, Q.: An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans. Ind. Inf. 10(2), 1273–1284 (2014)
Article Google Scholar
Mejía-Roa, E., Tabas-Madrid, D., Setoain, J., García, C., Tirado, F., Pascual-Montano, A.D.: NMF-mGPU: non-negative matrix factorization on multi-GPU systems. BMC Bioinf. 16, 43:1–43:12 (2015)
Article Google Scholar
Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47(4), 69:1–69:35 (2015)
Article Google Scholar
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, 25–27 April 2012, pp. 15–28 (2012)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Nahum, E.M., Xu, D. (eds.) 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2010, Boston, MA, USA, 22 June 2010. USENIX Association (2010)
Google Scholar
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under grant no. 61602169 and 61702181, and the Natural Science Foundation of Hunan Province under grant no. 2018JJ2135 and 2018JJ3190, as well as the Scientific Research Fund of Hunan Provincial Education Department under grant no.16C0643.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
Bing Tang, Linyao Kang, Yanmin Xia & Li Zhang

Authors

Bing Tang
View author publications
You can also search for this author in PubMed Google Scholar
Linyao Kang
View author publications
You can also search for this author in PubMed Google Scholar
Yanmin Xia
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bing Tang .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
University of West London, London, UK
Xinheng Wang
Hangzhou Dianzi University, Hangzhou Shi, Zhejiang, China
Yuyu Yin
London South Bank University, London, UK
Muddesar Iqbal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, B., Kang, L., Xia, Y., Zhang, L. (2019). GPU-accelerated Large-Scale Non-negative Matrix Factorization Using Spark. In: Gao, H., Wang, X., Yin, Y., Iqbal, M. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 268. Springer, Cham. https://doi.org/10.1007/978-3-030-12981-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-12981-1_13
Published: 07 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12980-4
Online ISBN: 978-3-030-12981-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics