Advertisement

Efficient Compression of Web Graphs

  • Yasuhito Asano
  • Yuya Miyawaki
  • Takao Nishizeki
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5092)

Abstract

Several methods have been proposed for compressing the linkage data of a Web graph. Among them, the method proposed by Boldi and Vigna is known as the most efficient one. In the paper, we propose a new method to compress a Web graph. Our method is more efficient than theirs with respect to the size of the compressed data. For example, our method needs only 1.99 bits per link to compress a Web graph containing 3,216,152 links connecting 325,557 pages, while the method of Boldi and Vigna needs 2.84 bits per link to compress the same Web graph.

Keywords

Adjacency Matrix Local Index Retrieval Time Rectangular Block Adjacency List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Asano, Y., Ito, T., Imai, H., Toyoda, M., Kitsuregawa, M.: Compact Encoding of the Web Graph Exploiting Various Power Laws: Statistical Reason Behind Link Database. In: Dong, G., Tang, C.-j., Wang, W. (eds.) WAIM 2003. LNCS, vol. 2762, pp. 37–46. Springer, Heidelberg (2003)Google Scholar
  2. 2.
    Asano, Y., Nishizeki, T., Toyoda, M., Kitsuregawa, M.: Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework. IEICE Trans. Inf. Syst. E89-D (10), 2606–2615 (2006)CrossRefGoogle Scholar
  3. 3.
    Asano, Y., Tezuka, Y., Nishizeki, T.: Improvements of HITS Algorithms for Spam Links. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 479–490. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Bharat, K., Broder, A., Henzinger, M., Kumar, P., Venkatasubramanian, S.: The Connectivity Server: Fast Access to Linkage Information on the Web. In: Proc. of the 7th WWW, pp. 469–477 (1998)Google Scholar
  5. 5.
    Blandford, D.K., Blelloch, G.E., Kash, I.A.: Compact Representation of Separable Graphs. In: Proc. of the 14th SODA, pp. 679–688 (2003)Google Scholar
  6. 6.
    Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of the 7th WWW, pp. 14–18 (1998)Google Scholar
  7. 7.
    Boldi, P., Vigna, S.: The Web Graph Framework I: Compression Techniques. In: Proc. of the 13th WWW, pp. 595–601 (2004)Google Scholar
  8. 8.
    Boldi, P., Vigna, S.: Codes for the World Wide Web. Internet Mathematics 2(4), 405–427 (2005)MathSciNetGoogle Scholar
  9. 9.
    Claude, F., Navarro, G.: A Fast and Compact Web Graph Representation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 118–129. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Cormen, T.H., Leiserson, C.E., Rivest, R., Stein, C.: Introduction to Algorithms. 2nd edn. MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  11. 11.
    Elias, P.: Universal Codeword Sets and Representaions of the Integers. IEEE Transactions on Information Theory 21, 194–203 (1975)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Flake, G.W., Lawrence, S., Giles, C.L.: Efficient Identification of Web Communities. In: Proc. of the 6th KDD, pp. 150–160 (2000)Google Scholar
  13. 13.
    Guillaume, J.L., Latapy, M., Viennot, L.: Efficient and Simple Encodings for the Web Graph. In: Meng, X., Su, J., Wang, Y. (eds.) WAIM 2002. LNCS, vol. 2419, pp. 328–337. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Kleinberg, J.: Authoritative Sources in a Hyperlinked Environment. In: Proc. of the 9th SODA, pp. 668–677 (1998)Google Scholar
  15. 15.
    Kou, W.: Digital Image Compression: Algorithms and Standards. Springer, Heidelberg (1995)Google Scholar
  16. 16.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for Emerging Cyber-Communities. Computer Networks 31(11-16), 1481–1493 (1999)CrossRefGoogle Scholar
  17. 17.
    Larsson, N.J., Moffat, A.: Off-Line Dictionary-Based Compression. Proc. IEEE 88(11), 1722–1732 (2000)CrossRefGoogle Scholar
  18. 18.
    Levenstein, V.E.: On the Redundancy and Delay of Separable Codes for the Natural numbers. Problems of Cybernetics 20, 173–179 (1968)MathSciNetGoogle Scholar
  19. 19.
    Randall, K., Stata, R., Wickremesinghe, R., Wiener, J.L.: The Link Database: Fast Access to Graphs of the Web. Research Report 175, Compaq Systems Research Center, Palo Alto, CA (2001)Google Scholar
  20. 20.
    Suel, T., Yuan, J.: Compressing the Graph Structure of the Web. In: Proc. of the Data Compression Conference, pp. 213–222 (2001)Google Scholar
  21. 21.
    WebGraph Homepage, http://webgraph.dsi.unimi.it/
  22. 22.
    Wickremesinghe, R., Stata, R., Wiener, J.: Link Compression in the Connectivity Server. Technical Report, Compaq Systems Research Center, Palo Alto, CA (2000)Google Scholar
  23. 23.
    Zhang, Y., Yu, J.X., Hou, J.: Web Communities: Analysis and Construction. Springer, Berlin (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Yasuhito Asano
    • 1
  • Yuya Miyawaki
    • 2
  • Takao Nishizeki
    • 2
  1. 1.Graduate School InformaticsKyoto UniversityYoshidahonmachi,Sakyo-ku, KyotoJapan
  2. 2.Graduate School of Information SciencesTohoku UniversityAramaki, Aoba-kuJapan

Personalised recommendations