Abstract
Many join-based graph mining algorithms such as triangle listing and clique enumeration output a large size of intermediate or final data that sometimes dominates the mining cost. A few researches highlighted on the size of output data. However, those techniques have limitation that they are highly specific to their corresponding graph mining algorithms. In this paper, through the careful observations of the output patterns, we propose a general compression solution that can be applied to any join-based graph algorithm. It first categorizes the overlapping and non-overlapping vertices in a resultant subgraph set of a join-based graph mining algorithm. Then it compresses the output data by removing the redundancy from the overlapping vertices and by encoding the non-overlapping vertices using a non-aligned hybrid bit vector compression technique. Our proposed technique performs the compression on-the-fly and can easily be adopted by the join-based graph mining algorithms. Experiments on the real datasets show that our proposed technique, which is adopted in a triangle listing algorithm, reduces the size of the output data and the running time by three times and more than two times, respectively. The proposed technique also reduces the I/O cost for a maximal clique listing algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
A bit vector represents the neighbors of a vertex in a graph using the bits 0 or 1.
- 2.
A 0-fill or 1-fill or literal word refers to a word that contains all 0-bits or all 1-bits or both 0 and 1-bits, respectively.
References
Boldi, P., Vigna, S.: The webgraph framework i: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web, WWW 2004, pp. 595–602. ACM, New York (2004)
Cheng, J., Ke, Y., Fu, A.W.C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks. ACM Trans. Database Syst. (TODS) 36(4), 21 (2011)
Deliège, F., Pedersen, T.B.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 228–239. ACM (2010)
Hernández, C., Navarro, G.: Compressed representations for web and social graphs. Knowl. Inf. Syst. 40(2), 279–313 (2014)
Lim, Y., Kang, U., Faloutsos, C.: Slashburn: graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26(12), 3077–3089 (2014)
Rasel, M.K., Han, Y., Kim, J., Park, K., Tu, N.A., Lee, Y.K.: iTri: index-based triangle listing in massive graphs. Inf. Sci. 336, 1–20 (2016)
Rasel, M.K., Lee, Y.K.: Exploiting CPU parallelism for triangle listing using hybrid summarized bit batch vector. In: 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 183–190. IEEE (2016)
Wang, J., Cheng, J., Fu, A.W.C.: Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 122–130. ACM (2013)
Wu, K., Otoo, E.J., Shoshani, A.: Compressing bitmap indexes for faster search operations. In: Proceedings of the 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002, pp. 99–108. IEEE Computer Society, Washington, DC (2002)
Xie, Y., Philip, S.Y.: Max-clique: a top-down graph-based approach to frequent pattern mining. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 1139–1144. IEEE (2010)
Acknowledgments
This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST; No. 2015R1A2A2A01008209).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Rasel, M.K., Lee, YK. (2018). An Efficient Subgraph Compression-Based Technique for Reducing the I/O Cost of Join-Based Graph Mining Algorithms. In: Lee, W., Choi, W., Jung, S., Song, M. (eds) Proceedings of the 7th International Conference on Emerging Databases. Lecture Notes in Electrical Engineering, vol 461. Springer, Singapore. https://doi.org/10.1007/978-981-10-6520-0_9
Download citation
DOI: https://doi.org/10.1007/978-981-10-6520-0_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6519-4
Online ISBN: 978-981-10-6520-0
eBook Packages: EngineeringEngineering (R0)