Skip to main content

A Fast and Compact Web Graph Representation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4726))

Abstract

Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In this paper we show that the same properties can be exploited with a different and elegant technique, built on Re-Pair compression, which achieves about the same space but much faster navigation of the graph. Moreover, the technique has the potential of adapting well to secondary memory. In addition, we introduce an approximate Re-Pair version that works efficiently with limited main memory.

Partially funded by a grant from Yahoo! Research Latin America.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adler, M., Mitzenmacher, M.: Towards compressing Web graphs. In: Proc. IEEE DCC, pp. 203–212. IEEE Computer Society Press, Los Alamitos (2001)

    Google Scholar 

  2. Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proc. ACM STOC, pp. 171–180. ACM Press, New York (2000)

    Google Scholar 

  3. Bharat, K., Broder, A., Henzinger, M., Kumar, P., Venkatasubramanian, S.: The Connectivity Server: Fast access to linkage information on the web. In: Proc. WWW, pp. 469–477 (1998)

    Google Scholar 

  4. Blandford, D.: Compact data structures with fast queries. PhD thesis, School of Computer Science, Carnegie Mellon University, Also as TR CMU-CS-05-196 (2006)

    Google Scholar 

  5. Blandford, D., Blelloch, G., Kash, I.: Compact representations of separable graphs. In: Proc. SODA, pp. 579–588 (2003)

    Google Scholar 

  6. P. Boldi and S. Vigna. The webgraph framework I: compression techniques. In Proc. WWW, pages 595–602, 2004.

    Google Scholar 

  7. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web. J. Computer Networks 33(1–6), 309–320 (2000) Also in Proc. WWW9

    Article  Google Scholar 

  8. Chakrabarti, D., Papadimitriou, S., Modha, D., Faloutsos, C.: Fully automatic cross-associations. In: Proc. ACM SIGKDD, ACM Press, New York (2004)

    Google Scholar 

  9. Chuang, R., Garg, A., He, X., Kao, M.-Y., Lu, H.-I.: Compact encodings of planar graphs with canonical orderings and multiple parentheses. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) ICALP 1998. LNCS, vol. 1443, pp. 118–129. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Deo, N., Litow, B.: A structural approach to graph compression. In: Brim, L., Gruska, J., Zlatuška, J. (eds.) MFCS 1998. LNCS, vol. 1450, pp. 91–101. Springer, Heidelberg (1998)

    Google Scholar 

  11. González, R., Navarro, G.: Compressed text indexes with fast locate. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 216–227. Springer, Heidelberg (2007)

    Google Scholar 

  12. Gulli, A., Signorini, A.: The indexable web is more than 11.5 billion pages. In: Proc. WWW (2005)

    Google Scholar 

  13. He, X., Kao, M.-Y., Lu, H.-I.: Linear-time succinct encodings of planar graphs via canonical orderings. J. Discrete Mathematics 12(3), 317–325 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  14. He, X., Kao, M.-Y., Lu, H.-I.: A fast general methodology for information-theoretically optimal encodings of graphs. SIAM J. Comput. 30, 838–846 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  15. Itai, A., Rodeh, M.: Representation of graphs. Acta Informatica 17, 215–219 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  16. Jacobson, G.: Space-efficient static trees and graphs. In: Proc. FOCS, pp. 549–554 (1989)

    Google Scholar 

  17. Keeler, K., Westbook, J.: Short encodings of planar graphs and maps. Discrete Applied Mathematics 58, 239–252 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  18. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large scale knowledge bases from the Web. In: Proc. VLDB (1999)

    Google Scholar 

  19. Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)

    Article  Google Scholar 

  20. Lu, H.-I.: Linear-time compression of bounded-genus graphs into information-theoretically optimal number of bits. In: Proc. SODA, pp. 223–224 (2002)

    Google Scholar 

  21. Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) Foundations of Software Technology and Theoretical Computer Science. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Google Scholar 

  22. Munro, I., Raman, V.: Succinct representation of balanced parentheses, static trees and planar graphs. In: Proc. FOCS, pp. 118–126 (1997)

    Google Scholar 

  23. Naor, M.: Succinct representation of general unlabeled graphs. Discrete Applied Mathematics 28, 303–307 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  24. Navarro, G.: Compressing web graphs like texts. Technical Report TR/DCC-2007-2, Dept. of Computer Science, University of Chile (2007)

    Google Scholar 

  25. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) article 2 (2007)

    Google Scholar 

  26. Raghavan, S., Garcia-Molina, H.: Representing Web graphs. In: Proc. ICDE (2003)

    Google Scholar 

  27. Randall, K., Stata, R., Wickremesinghe, R., Wiener, J.: The LINK database: Fast access to graphs of the Web. Technical Report 175, Compaq Systems Research Center, Palo Alto, CA (2001)

    Google Scholar 

  28. Rossignac, J.: Edgebreaker: Connectivity compression for triangle meshes. IEEE Transactions on Visualization 5(1), 47–61 (1999)

    Article  Google Scholar 

  29. Suel, T., Yuan, J.: Compressing the graph structure of the Web. In: Proc. IEEE DCC, pp. 213–222. IEEE Computer Society Press, Los Alamitos (2001)

    Google Scholar 

  30. Turán, G.: Succinct representations of graphs. Discrete Applied Mathematics 8, 289–294 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  31. Wan, R.: Browsing and Searching Compressed Documents. PhD thesis, Dept. of Computer Science and Software Engineering, University of Melbourne (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Nivio Ziviani Ricardo Baeza-Yates

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Claude, F., Navarro, G. (2007). A Fast and Compact Web Graph Representation. In: Ziviani, N., Baeza-Yates, R. (eds) String Processing and Information Retrieval. SPIRE 2007. Lecture Notes in Computer Science, vol 4726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75530-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75530-2_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75529-6

  • Online ISBN: 978-3-540-75530-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics