Abstract
We show how a half-inverted index can be constructed twice as fast as an ordinary inverted index. As shown in a series of recent works, the half-inverted index enables very fast prefix search, which in turn is the basis for very fast processing of many other types of advanced queries. Our construction algorithm is truly single-pass in that every posting (word occurrence) is touched (read and written) only once in the whole construction by avoiding an expensive merge of the index. The algorithm has been carefully engineered, with special attention paid to cache-efficiency and disk cost. We compared our algorithm against the state-of-the-art index construction from Zettair.
This work was partially supported by DFG-SPP 1307, project Efficient Search in Very Large Text Collections, Databases, and Ontologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: Compressing and indexing documents and images (1999)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. (2006)
Holger Bast, I.W.: Type less, find more: fast autocompletion search with a succinct index. In: SIGIR (2006)
Bast, H., Weber, I.: The CompleteSearch engine: Interactive, efficient, and towards IR & DB integration. In: CIDR (2007)
Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. Jour. of the American Society for Information Science and Technology (2003)
Rogers, W., Gerald, C, Harman, D.: Space and time improvements for indexing in information retrieval. In: Proceedings of 4th Annual Symposium on Document Analysis and Information Retrieval (1995)
Moffat, A., Bell, T.A.H. In situ generation of compressed inverted files. Journal of the American Society for Information Science (1995)
Grama, A., Karypis, G., Kumar, V., Gupta, A.: Introduction to Parallel Computing, 2nd edn. Addison-Wesley, Reading (2003)
Buttcher, S., Clarke, C.L.A.: Memory management strategies for single-pass index construction in text retrieval systems. Technical report, School of Computer Science, University of Waterloo, Canada (2005)
Heinz, S., Zobel, J.: Performance of data structure for small sets of strings. In: Proc. of the Australasian conference on Computer Science (2002)
Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. (1996)
Popovici, F.I., Arpaci-dusseau, A.C., Arpaci-dusseau, R.H.: Robust, portable i/o scheduling with the disk mimic. In: USENIX Annual Technical Conference (2003)
Middleton, C., Baeza-Yates, R.: A comparison of open source search engines (2007), http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Celikik, M., Bast, H. (2009). Fast Single-Pass Construction of a Half-Inverted Index. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-03784-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03783-2
Online ISBN: 978-3-642-03784-9
eBook Packages: Computer ScienceComputer Science (R0)