Skip to main content

Dual-Sorted Inverted Lists in Practice

  • Conference paper
String Processing and Information Retrieval (SPIRE 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7608))

Included in the following conference series:

Abstract

We implement a recent theoretical proposal to represent inverted lists in memory, in a way that docid-sorted and weight-sorted lists are simultaneously represented in a single wavelet tree data structure. We compare our implementation with classical representations, where the ordering favors either bag-of-word queries or Boolean and weighted conjunctive queries, and demonstrate that the new data structure is faster than the state of the art for conjunctive queries, while it offers an attractive space/time tradeoff when both kinds of queries are of interest.

Partially funded by Fondecyt grant 1-110066 and by the Conicyt PhD Scholarship Program, Chile.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anh, V., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf. Retr. 8(1), 151–166 (2005)

    Article  Google Scholar 

  2. Anh, V., Moffat, A.: Pruned query evaluation using pre-computed impacts. In: Proc. SIGIR, pp. 372–379 (2006)

    Google Scholar 

  3. Arroyuelo, D., González, S., Oyarzún, M.: Compressed Self-indices Supporting Conjunctive Queries on Document Collections. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 43–54. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Metzler, D., Croft, B., Strohman, T.: Search Engines: Information Retrieval in Practice. Pearson Education (2009)

    Google Scholar 

  5. Baeza-Yates, R., Moffat, A., Navarro, G.: Searching Large Text Collections, pp. 195–244. Kluwer Academic Publishers (2002)

    Google Scholar 

  6. Baeza-Yates, R., Ribeiro, B.: Modern Information Retrieval, 2nd edn. Addison-Wesley (2011)

    Google Scholar 

  7. Baeza-Yates, R., Salinger, A.: Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 13–24. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Barbay, J., López-Ortiz, A., Lu, T., Salinger, A.: An experimental investigation of set intersection algorithms for text searching. ACM J. Exp. Alg. 14, art. 7 (2009)

    Google Scholar 

  9. Brisaboa, N., Fariña, A., Ladra, S., Navarro, G.: Reorganizing compressed text. In: Proc. SIGIR, pp. 139–146 (2008)

    Google Scholar 

  10. Broder, A., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proc. CIKM, pp. 426–434 (2003)

    Google Scholar 

  11. Claude, F., Navarro, G.: Practical Rank/Select Queries over Arbitrary Sequences. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 176–187. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Culpepper, J.S., Moffat, A.: Compact Set Representation for Information Retrieval. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 137–148. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Culpepper, J.S., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k Ranked Document Search in General Text Databases. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Demaine, E., Munro, I.: Adaptive set intersections, unions, and differences. In: Proc. SODA, pp. 743–752 (2000)

    Google Scholar 

  15. Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proc. SIGIR, pp. 993–1002 (2011)

    Google Scholar 

  16. Robertson, S., et al.: Okapi at TREC-3. In: Proc. 3rd TREC, pp. 109–126 (1994)

    Google Scholar 

  17. Gagie, T., Puglisi, S.J., Turpin, A.: Range Quantile Queries: Another Virtue of Wavelet Trees. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 1–6. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Proc. Posters WEA, pp. 27–38 (2005)

    Google Scholar 

  19. Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. SODA, pp. 841–850 (2003)

    Google Scholar 

  20. Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  21. Navarro, G., Puglisi, S.J.: Dual-Sorted Inverted Lists. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 309–321. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  22. Persin, M., Zobel, J., Sacks-Davis, R.: Filtered document retrieval with frequency-sorted indexes. J. Amer. Soc. Inf. Sci. 47(10), 749–764 (1996)

    Article  Google Scholar 

  23. Clarke, C., Büttcher, S., Cormack, G.: Information Retrieval: Implementing and Evaluating Search Engines. MIT Press (2010)

    Google Scholar 

  24. Sanders, P., Transier, F.: Intersection in integer inverted indices. In: Proc. ALENEX (2007)

    Google Scholar 

  25. Scholer, F., Williams, H., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proc. SIGIR, pp. 222–229 (2002)

    Google Scholar 

  26. Strohman, T., Croft, B.: Efficient document retrieval in main memory. In: Proc. SIGIR, pp. 175–182 (2007)

    Google Scholar 

  27. Witten, I., Moffat, A., Bell, T.: Managing Gigabytes, 2nd edn. Morgan Kaufmann (1999)

    Google Scholar 

  28. Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proc. WWW, pp. 401–410 (2009)

    Google Scholar 

  29. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comp. Surv. 38(2), art. 6 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Konow, R., Navarro, G. (2012). Dual-Sorted Inverted Lists in Practice. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34109-0_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34108-3

  • Online ISBN: 978-3-642-34109-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics