Skip to main content

An Efficient Subgraph Compression-Based Technique for Reducing the I/O Cost of Join-Based Graph Mining Algorithms

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 461))

Abstract

Many join-based graph mining algorithms such as triangle listing and clique enumeration output a large size of intermediate or final data that sometimes dominates the mining cost. A few researches highlighted on the size of output data. However, those techniques have limitation that they are highly specific to their corresponding graph mining algorithms. In this paper, through the careful observations of the output patterns, we propose a general compression solution that can be applied to any join-based graph algorithm. It first categorizes the overlapping and non-overlapping vertices in a resultant subgraph set of a join-based graph mining algorithm. Then it compresses the output data by removing the redundancy from the overlapping vertices and by encoding the non-overlapping vertices using a non-aligned hybrid bit vector compression technique. Our proposed technique performs the compression on-the-fly and can easily be adopted by the join-based graph mining algorithms. Experiments on the real datasets show that our proposed technique, which is adopted in a triangle listing algorithm, reduces the size of the output data and the running time by three times and more than two times, respectively. The proposed technique also reduces the I/O cost for a maximal clique listing algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    A bit vector represents the neighbors of a vertex in a graph using the bits 0 or 1.

  2. 2.

    A 0-fill or 1-fill or literal word refers to a word that contains all 0-bits or all 1-bits or both 0 and 1-bits, respectively.

References

  1. Boldi, P., Vigna, S.: The webgraph framework i: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web, WWW 2004, pp. 595–602. ACM, New York (2004)

    Google Scholar 

  2. Cheng, J., Ke, Y., Fu, A.W.C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks. ACM Trans. Database Syst. (TODS) 36(4), 21 (2011)

    Google Scholar 

  3. Deliège, F., Pedersen, T.B.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 228–239. ACM (2010)

    Google Scholar 

  4. Hernández, C., Navarro, G.: Compressed representations for web and social graphs. Knowl. Inf. Syst. 40(2), 279–313 (2014)

    Article  Google Scholar 

  5. Lim, Y., Kang, U., Faloutsos, C.: Slashburn: graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26(12), 3077–3089 (2014)

    Article  Google Scholar 

  6. Rasel, M.K., Han, Y., Kim, J., Park, K., Tu, N.A., Lee, Y.K.: iTri: index-based triangle listing in massive graphs. Inf. Sci. 336, 1–20 (2016)

    Article  Google Scholar 

  7. Rasel, M.K., Lee, Y.K.: Exploiting CPU parallelism for triangle listing using hybrid summarized bit batch vector. In: 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 183–190. IEEE (2016)

    Google Scholar 

  8. Wang, J., Cheng, J., Fu, A.W.C.: Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 122–130. ACM (2013)

    Google Scholar 

  9. Wu, K., Otoo, E.J., Shoshani, A.: Compressing bitmap indexes for faster search operations. In: Proceedings of the 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002, pp. 99–108. IEEE Computer Society, Washington, DC (2002)

    Google Scholar 

  10. Xie, Y., Philip, S.Y.: Max-clique: a top-down graph-based approach to frequent pattern mining. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 1139–1144. IEEE (2010)

    Google Scholar 

Download references

Acknowledgments

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST; No. 2015R1A2A2A01008209).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Koo Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Rasel, M.K., Lee, YK. (2018). An Efficient Subgraph Compression-Based Technique for Reducing the I/O Cost of Join-Based Graph Mining Algorithms. In: Lee, W., Choi, W., Jung, S., Song, M. (eds) Proceedings of the 7th International Conference on Emerging Databases. Lecture Notes in Electrical Engineering, vol 461. Springer, Singapore. https://doi.org/10.1007/978-981-10-6520-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6520-0_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6519-4

  • Online ISBN: 978-981-10-6520-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics