Abstract
Non-negative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data compress and its ability of extracting highly-interpretable parts from data sets, and it has also been applied to various fields, such as recommendations, image analysis, and text clustering. However, as the size of the matrix increases, the processing speed of non-negative matrix factorization algorithm is very slow. To solve this problem, this paper proposes a parallel algorithm based on GPU for NMF in Spark platform, which makes full use of the advantages of in-memory computation mode and GPU Single-Instruction Multiple-data Streams mode. The new GPU-accelerated NMF on Spark platform is evaluated in a 4-nodes Spark heterogeneous cluster using Google Compute Engine by configuring each node a NVIDIA K80 GPU card, and experimental results indicate that it is competitive in terms of computational time against the existing solutions on a variety of matrix orders. It can achieve a high speed-up, and also can effectively deal with the non-negative decomposition of higher-order matrices, which greatly improves the computational efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Kannan, R., Ballard, G., Park, H.: A high-performance parallel algorithm for nonnegative matrix factorization. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, 12–16 March 2016, pp. 9:1–9:11 (2016)
Kysenko, V., Rupp, K., Marchenko, O., Selberherr, S., Anisimov, A.: GPU-accelerated non-negative matrix factorization for text mining. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 158–163. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31178-9_15
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, Papers from Neural Information Processing Systems (NIPS), Denver, CO, USA, vol. 13, pp. 556–562. MIT Press (2000)
Liao, R., Zhang, Y., Guan, J., Zhou, S.: CloudNMF: a MapReduce implementation of nonnegative matrix factorization for large-scale biological datasets. Genomics Proteomics Bioinf. 12(1), 48–51 (2014)
Liu, C., Yang, H., Fan, J., He, L., Wang, Y.: Distributed nonnegative matrix factorization for web-scale dyadic data analysis on MapReduce. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010, pp. 681–690 (2010)
Luo, X., Zhou, M., Xia, Y., Zhu, Q.: An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans. Ind. Inf. 10(2), 1273–1284 (2014)
MejÃa-Roa, E., Tabas-Madrid, D., Setoain, J., GarcÃa, C., Tirado, F., Pascual-Montano, A.D.: NMF-mGPU: non-negative matrix factorization on multi-GPU systems. BMC Bioinf. 16, 43:1–43:12 (2015)
Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47(4), 69:1–69:35 (2015)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, 25–27 April 2012, pp. 15–28 (2012)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Nahum, E.M., Xu, D. (eds.) 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2010, Boston, MA, USA, 22 June 2010. USENIX Association (2010)
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Acknowledgements
This work is supported by the National Natural Science Foundation of China under grant no. 61602169 and 61702181, and the Natural Science Foundation of Hunan Province under grant no. 2018JJ2135 and 2018JJ3190, as well as the Scientific Research Fund of Hunan Provincial Education Department under grant no.16C0643.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Tang, B., Kang, L., Xia, Y., Zhang, L. (2019). GPU-accelerated Large-Scale Non-negative Matrix Factorization Using Spark. In: Gao, H., Wang, X., Yin, Y., Iqbal, M. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 268. Springer, Cham. https://doi.org/10.1007/978-3-030-12981-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-12981-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12980-4
Online ISBN: 978-3-030-12981-1
eBook Packages: Computer ScienceComputer Science (R0)