Abstract
Acquisition of contents from online big graphs (OBGs) like linked Web pages, social networks and knowledge graphs, is critical as data infrastructure for Web applications and massive data analysis. However, effective data acquisition is challenging due to the massive, heterogeneous, dynamically evolving properties of OBGs with unknown global topological structures. In this paper, we give an adaptive and parallel approach for effective data acquisition from OBGs. We adopt the ideas of Quasi Monte Carlo (QMC) and branch & bound methods to propose an adaptive Web-scale sampling algorithm for parallel data collection implemented upon Spark. Experimental results show the effectiveness and efficiency of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yang, D., Xiao, Y., Tong, H., Zhang, J., Wang, W.: An integrated tag recommendation algorithm towards Weibo user profiling. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 353–373. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18120-2_21
Faure, H., Lemieux, C.: Improved Halton sequences and discrepancy bounds. Monte Carlo Methods Appl. 16(3), 1–18 (2010)
Hammersley, J., Handscomb, D.: Monte Carlo methods. Appl. Stat. 14(2/3), 347–385 (1964)
Sharma, A., Baral, C.: Automatic extraction of events-based conditional commonsense knowledge. In: Proceedings of Workshops at the 30th AAAI Conference on Artificial Intelligence, Phoenix, USA, pp. 527–531. AAAI (2016)
Surendran, S., Prasad, D., Kaimal, M.: A scalable geometric algorithm for community detection from social networks with incremental update. Soc. Netw. Anal. Min. 6(1), 90:1–90:13 (2016)
Xi, S., Sun, F., Wang, J.: A cognitive crawler using structure pattern for incremental crawling and content extraction. In: IEEE International Conference on Cognitive Informatics, Beijing, China, pp. 238–244. IEEE (2010)
Wu, X., Chen, H., Wu, G., Liu, J., et al.: Knowledge engineering with big data. IEEE Intell. Syst. 30(5), 46–55 (2015)
Stivala, A., Koskinen, J., Rolls, D., Wang, P., Robins, G.: Snowball sampling for estimating exponential random graph models for large networks. Soc. Netw. 47, 167–188 (2016)
Urbani, J., Dutta, S., Gurajada, S., Weikum, G.: KOGNAC: efficient encoding of large knowledge graphs. In: International Joint Conference on Artificial Intelligence, New York, USA, pp. 3896–3902 (2016)
Wu, C., Hou, W., Shi, Y., Liu, T.: A Web search contextual crawler using ontology relation mining. In: International Conference on Computational Intelligence and Software Engineering, pp. 1–4. IEEE (2009)
Tsai, C., Lin, W., Ke, S.: Big data mining with parallel computing: a comparison of distributed and MapReduce methodologies. J. Syst. Softw. 122, 83–92 (2016)
Acknowledgment
This paper was supported by the National Natural Science Foundation of China (Nos. 61472345, 61562090), Program for Excellent Young Talents of Yunnan University (No. WX173602), Research Foundation of Yunnan University (No. 2017YDJQ06), and Research Foundation of Educational Department of Yunnan Province (No. 2017ZZX228).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Yin, Z., Yue, K., Wu, H., Su, Y. (2018). Adaptive and Parallel Data Acquisition from Online Big Graphs. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-91452-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91451-0
Online ISBN: 978-3-319-91452-7
eBook Packages: Computer ScienceComputer Science (R0)