Abstract
New techniques for compressing and storing inverted lists are presented. Differently from previous research, these techniques are especially designed for volatile inverted lists and combine different types of compression (including prefix compression) with block segmentation to allow easy insertion/deletion of pointers and, most importantly, to significantly reduce execution times while keeping storage requirements close to a baseline monolithic inverted list implementation based on Elias’s ( codes. Inverted lists for information retrieval are addressed and experiments are reported. The best method uses an optimized block-oriented evaluation that is able to efficiently skip irrelevant pointers and that has an observed average execution time which is less than 65% of the baseline implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)
Baeza-Yates, R.A.: A Fast Set Intersection Algorithm for Sorted Sequences. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 400–408. Springer, Heidelberg (2004)
Bayer, R., Unterauer, K.: Prefix B-trees. ACM Trans. Database Syst. 2(1), 11–26 (1977)
Comer, D.: The Ubiquitous B-Tree. ACM Comput. Surv. 11(2), 121–137 (1979)
Culpepper, J.S., Moffat, A.: Efficient set intersection for inverted indexing. ACM Trans. Inf. 29(1) (2010)
Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. on Information Theory IT-21(2), 194–203 (1975)
Golomb, S.W.: Run-length encodings. IEEE Trans. Info Theory 12(3), 399–401 (1966)
Navarro, G., de Moura, S.E., Neubert, M., Ziviani, N., Baeza-Yates, R.: Adding Compression to Block Addressing Inverted Indexes. Information Retrieval 3(1), 49–77 (2000)
Sacco, G.M.: Dynamic Taxonomies: A Model for Large Information Bases. IEEE Trans. on Knowl. and Data Eng. 12(3), 468–479 (2000)
Sacco, G.M., Tzitzikas, Y. (eds.): Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience. The Information Retrieval Series, vol. 25. Springer (2009)
Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proc. ACM SIGIR Conf. (SIGIR 2002), pp. 222–229 (2002)
Wagner, R.: Indexing design considerations. IBM Syst. J., 351-367 (1973)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proc. Conf. on World Wide Web (WWW 2009), pp. 401–410 (2009)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comp. Surv. 38(2) (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sacco, G.M. (2012). Fast Block-Compressed Inverted Lists. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32600-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-32600-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32599-1
Online ISBN: 978-3-642-32600-4
eBook Packages: Computer ScienceComputer Science (R0)