Abstract
Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In this paper we show that the same properties can be exploited with a different and elegant technique, built on Re-Pair compression, which achieves about the same space but much faster navigation of the graph. Moreover, the technique has the potential of adapting well to secondary memory. In addition, we introduce an approximate Re-Pair version that works efficiently with limited main memory.
Partially funded by a grant from Yahoo! Research Latin America.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adler, M., Mitzenmacher, M.: Towards compressing Web graphs. In: Proc. IEEE DCC, pp. 203–212. IEEE Computer Society Press, Los Alamitos (2001)
Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proc. ACM STOC, pp. 171–180. ACM Press, New York (2000)
Bharat, K., Broder, A., Henzinger, M., Kumar, P., Venkatasubramanian, S.: The Connectivity Server: Fast access to linkage information on the web. In: Proc. WWW, pp. 469–477 (1998)
Blandford, D.: Compact data structures with fast queries. PhD thesis, School of Computer Science, Carnegie Mellon University, Also as TR CMU-CS-05-196 (2006)
Blandford, D., Blelloch, G., Kash, I.: Compact representations of separable graphs. In: Proc. SODA, pp. 579–588 (2003)
P. Boldi and S. Vigna. The webgraph framework I: compression techniques. In Proc. WWW, pages 595–602, 2004.
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web. J. Computer Networks 33(1–6), 309–320 (2000) Also in Proc. WWW9
Chakrabarti, D., Papadimitriou, S., Modha, D., Faloutsos, C.: Fully automatic cross-associations. In: Proc. ACM SIGKDD, ACM Press, New York (2004)
Chuang, R., Garg, A., He, X., Kao, M.-Y., Lu, H.-I.: Compact encodings of planar graphs with canonical orderings and multiple parentheses. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) ICALP 1998. LNCS, vol. 1443, pp. 118–129. Springer, Heidelberg (1998)
Deo, N., Litow, B.: A structural approach to graph compression. In: Brim, L., Gruska, J., Zlatuška, J. (eds.) MFCS 1998. LNCS, vol. 1450, pp. 91–101. Springer, Heidelberg (1998)
González, R., Navarro, G.: Compressed text indexes with fast locate. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 216–227. Springer, Heidelberg (2007)
Gulli, A., Signorini, A.: The indexable web is more than 11.5 billion pages. In: Proc. WWW (2005)
He, X., Kao, M.-Y., Lu, H.-I.: Linear-time succinct encodings of planar graphs via canonical orderings. J. Discrete Mathematics 12(3), 317–325 (1999)
He, X., Kao, M.-Y., Lu, H.-I.: A fast general methodology for information-theoretically optimal encodings of graphs. SIAM J. Comput. 30, 838–846 (2000)
Itai, A., Rodeh, M.: Representation of graphs. Acta Informatica 17, 215–219 (1982)
Jacobson, G.: Space-efficient static trees and graphs. In: Proc. FOCS, pp. 549–554 (1989)
Keeler, K., Westbook, J.: Short encodings of planar graphs and maps. Discrete Applied Mathematics 58, 239–252 (1995)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large scale knowledge bases from the Web. In: Proc. VLDB (1999)
Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)
Lu, H.-I.: Linear-time compression of bounded-genus graphs into information-theoretically optimal number of bits. In: Proc. SODA, pp. 223–224 (2002)
Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) Foundations of Software Technology and Theoretical Computer Science. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Munro, I., Raman, V.: Succinct representation of balanced parentheses, static trees and planar graphs. In: Proc. FOCS, pp. 118–126 (1997)
Naor, M.: Succinct representation of general unlabeled graphs. Discrete Applied Mathematics 28, 303–307 (1990)
Navarro, G.: Compressing web graphs like texts. Technical Report TR/DCC-2007-2, Dept. of Computer Science, University of Chile (2007)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) article 2 (2007)
Raghavan, S., Garcia-Molina, H.: Representing Web graphs. In: Proc. ICDE (2003)
Randall, K., Stata, R., Wickremesinghe, R., Wiener, J.: The LINK database: Fast access to graphs of the Web. Technical Report 175, Compaq Systems Research Center, Palo Alto, CA (2001)
Rossignac, J.: Edgebreaker: Connectivity compression for triangle meshes. IEEE Transactions on Visualization 5(1), 47–61 (1999)
Suel, T., Yuan, J.: Compressing the graph structure of the Web. In: Proc. IEEE DCC, pp. 213–222. IEEE Computer Society Press, Los Alamitos (2001)
Turán, G.: Succinct representations of graphs. Discrete Applied Mathematics 8, 289–294 (1984)
Wan, R.: Browsing and Searching Compressed Documents. PhD thesis, Dept. of Computer Science and Software Engineering, University of Melbourne (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Claude, F., Navarro, G. (2007). A Fast and Compact Web Graph Representation. In: Ziviani, N., Baeza-Yates, R. (eds) String Processing and Information Retrieval. SPIRE 2007. Lecture Notes in Computer Science, vol 4726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75530-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-75530-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75529-6
Online ISBN: 978-3-540-75530-2
eBook Packages: Computer ScienceComputer Science (R0)