Synonyms
Compressed and searchable data format; Compressed full-text indexing; Compressed suffix array; Compressed suffix tree
Definition
Given a text T[1,n], the Compressed Text Indexing problem requires to building an indexing data structure over T that takes space close to the empirical entropy of the input text and answers queries on the occurrences of an arbitrary pattern P[1, p] in T without any significant slowdown with respect to uncompressed indexes. There are three main queries: count(P), which returns the number of pattern occurrences in T; locate(P), which returns the starting positions of all pattern occurrences in T; and extract(i, j), which retrieves the substring T[i, j].
Historical Background
String processing and searching tasks are at the core of modern web search, information retrieval (IR), data base, and data mining applications. Most of text manipulations required by these applications involve, sooner or later, searchingthose (long) texts for (short) patterns...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Ferragina P. String search in external memory: data structures and algorithms. In: Handbook of computational molecular biology. London: Chapman & Hall; 2005.
Ferragina P, Manzini G. Indexing compressed text. J ACM. 2005;52(4):552–81.
Grossi R, Vitter JS. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J Comput. 2005;35(2):378–407.
Navarro G, Mäkinen V. Compressed full-text indexes. ACM Comput Surv. 2007;39(1).
Ferragina P, Manzini G, Mäkinen V, Navarro G. Compressed representations of sequences and full-text indexes. ACM Trans Algorithm. 2007;3(2).
Grossi R, Gupta A, Vitter JS. High-order entropy-compressed text indexes. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms; 2003. p. 841–50.
Arroyuelo D, Navarro G, Sadakane K. Stronger Lempel-Ziv based compressed text indexing. Algorithmica. 2012;62(1–2):54–101.
Belazzougui D. Linear time construction of compressed text indices in compact space. In: Proceedings of the 46th Annual ACM Symposium on Theory of Computing; 2014. p. 148–93.
Burrows M, Wheeler D. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation; 1994.
Belazzougui D, Navarro G. Alphabet-independent compressed text indexing. ACM Trans Algorithms. 2014;10(4):Article 23.
Sadakane K. New text indexing functionalities of the compressed suffix arrays. J Algoritm. 2007;48(2):294–413.
Sadakane K. Compressed suffix trees with full functionality. Theory Comput Syst. 2007;41(4):589–607.
Ferragina P, Venturini R. Compressed cache-oblivious string B-tree. In: Proceedings of the 21st Annual European Symposium on Algorithms; 2013. p. 469–80.
Ferragina P, Venturini R. The compressed permuterm index. ACM Trans Algorithms. 2010;7(1):10.
Sadakane K. Succinct data structures for flexible text retrieval systems. J Discrete Algorithms. 2007;5(1):12–22.
Ferragina P, Sirén J, Venturini R. Distribution-aware compressed full-text indexes. Algorithmica. 2013;67(4):529–46.
Ferragina P, Grossi R. The string B-tree: a new data structure for string search in external memory and its applications. J ACM. 1999;46(2):236–80.
Ferragina P, González R, Navarro G, Venturini R. Compressed text indexes: from theory to practice. J Exp Algorithmics. 2009;13:1.12–31.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Ferragina, P., Venturini, R. (2018). Indexing Compressed Text. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1144
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1144
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering