Abstract
In the big data era, users can get massive information from the Internet, but the value density is very low. In order to help users find the information they need more quickly, this paper presents the mechanism of diverse demands estimation and ranking based on user behaviors. Firstly, a definition of classification system for users query intent is proposed. Secondly, in order to mine the documents on the websites of specific classification, LDA model is used to cluster and annotate the websites. To speed up the inference process of LDA, we take advantage of MPI and OpenMP hybrid parallelism techniques to reduce both internode and intra-node communication cost. Lastly, according to the historical behaviors of users and the search engine return results, we rank the classifications on Map-Reduce platform and present the top-ranking ones to users
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asuncion, A.U., Smyth, P., Welling, M.: Asynchronous distributed learning of topic models. In: Proceedings of International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December, pp. 81–88. DBLP (2008)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chandra, R., Dagum, L., Kohr, D., Maydan, D., Mcdonald, J.: Parallel Programming in OpenMP. Morgan Kaufmann Publishers, Burlington (2001)
Dagum, L., Enon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 46–55 (1998)
Forum, M.P.: MPI: A Message-passing interface standard. University of Tennessee (1994)
Gabriel, E., Fagg, G.E., Bosilca, G., et al.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings of Recent Advances in Parallel Virtual Machine and Message Passing Interface, European Pvm/mpi Users’ Group Meeting, Budapest, Hungary, 19–22 September 2004, Proceedings, pp. 97–104. DBLP (2004)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A 101(Suppl. 1), 5228–5235 (2004)
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge (1994)
Li, S., Zhang, Y., Hoefler, T.: Cache-oblivious MPI all-to-all communications based on Morton order. IEEE Trans. Parallel Distrib. Syst. PP(99), 1 (2018)
Li, S., Zhang, Y., Hoefler, T.: Cache-oblivious MPI all-to-all communications on many-core architectures. ACM SIGPLAN Not. 52(8), 445–446 (2017)
Liu, Z., Zhang, Y., Chang, E.Y.: Sun, M: PLDA+: parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 26 (2011)
Newman, D., Asuncion, A.U., Smyth, P., Welling, M.: Distributed inference for Latent Dirichlet allocation. In: Proceedings of International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, pp. 1–6. DBLP, December 2007
Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Proceedings of Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 427–436. IEEE (2009)
Tora, S., Eguchi, K.: MPI/OpenMP hybrid parallel inference for Latent Dirichlet allocation. In: Proceedings of The Workshop on Large Scale Data Mining: Theory and Applications, p. 5. ACM (2011)
Vrenios, A.: Parallel programming in C with MPI and OpenMP [book review]. IEEE Distrib. Syst. Online 5(1), 7.1–7.3 (2004)
Wang, J., Liu, C., Huang, Y.: Auto tuning for new energy dispatch problem: a case study. Futur. Gener. Comput. Syst. 54(C), 501–506 (2016)
Wang, J., Gao, F., Vazquez-Poletti, J.L., Li, J.: Preface of high performance computing or advanced modeling and simulation of materials. Comput. Phys. Commun. 211, 1 (2017)
Wang, Yi, Bai, Hongjie, Stanton, Matt, Chen, Wen-Yen, Chang, Edward Y.: PLDA: parallel Latent Dirichlet allocation for large-scale applications. In: Goldberg, Andrew V., Zhou, Yunhong (eds.) AAIM 2009. LNCS, vol. 5564, pp. 301–314. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02158-9_26
Wu, B., Li, S., Zhang, Y., et al.: A hybrid optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation. Comput. Phys. Commun. (2016)
Zhang, Y., Li, S., Yan, S., et al.: A cross-platform SpMV framework on many-core architectures. ACM Trans. Arch. Code Optim. 13(4), 33 (2016)
Jiang, C., Han, G., Lin, J., Jia, G., Shi, W., Wan, J.: Characteristics of co-allocated online services and batch jobs in Internet data centers: a case study from Alibaba cloud. IEEE Access 7, 22495–22508. https://doi.org/10.1109/access.2019.2897898
Qiu, Y., Jiang, C., Wang, Y., Ou, D., Li, Y., Wan, J.: Energy aware virtual machine scheduling in data centers. Energies 12, 646 (2019)
Jiang, C., et al.: Interdomain I/O optimization in virtualized sensor networks. Sensors 18, 4395 (2018)
Acknowledgements
This work is jointly supported by Grant 2017YFB0203504 in the National Major Research High Performance Computing Program of China, and the State Key Program of National Natural Science Foundation of China (No. 91530324).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, L. et al. (2019). Diverse Demands Estimation and Ranking Based on User Behaviors. In: Hu, C., Yang, W., Jiang, C., Dai, D. (eds) High-Performance Computing Applications in Numerical Simulation and Edge Computing. HPCMS HiDEC 2018 2018. Communications in Computer and Information Science, vol 913. Springer, Singapore. https://doi.org/10.1007/978-981-32-9987-0_7
Download citation
DOI: https://doi.org/10.1007/978-981-32-9987-0_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9986-3
Online ISBN: 978-981-32-9987-0
eBook Packages: Computer ScienceComputer Science (R0)