Skip to main content

Indexing Compressed Text

  • Reference work entry
  • First Online:
  • 48 Accesses

Synonyms

Compressed and searchable data format; Compressed full-text indexing; Compressed suffix array; Compressed suffix tree

Definition

Given a text T[1,n], the Compressed Text Indexing problem requires to building an indexing data structure over T that takes space close to the empirical entropy of the input text and answers queries on the occurrences of an arbitrary pattern P[1, p] in T without any significant slowdown with respect to uncompressed indexes. There are three main queries: count(P), which returns the number of pattern occurrences in T; locate(P), which returns the starting positions of all pattern occurrences in T; and extract(i, j), which retrieves the substring T[i, j].

Historical Background

String processing and searching tasks are at the core of modern web search, information retrieval (IR), data base, and data mining applications. Most of text manipulations required by these applications involve, sooner or later, searchingthose (long) texts for (short) patterns...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Ferragina P. String search in external memory: data structures and algorithms. In: Handbook of computational molecular biology. London: Chapman & Hall; 2005.

    Google Scholar 

  2. Ferragina P, Manzini G. Indexing compressed text. J ACM. 2005;52(4):552–81.

    Article  MathSciNet  MATH  Google Scholar 

  3. Grossi R, Vitter JS. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J Comput. 2005;35(2):378–407.

    Article  MathSciNet  MATH  Google Scholar 

  4. Navarro G, Mäkinen V. Compressed full-text indexes. ACM Comput Surv. 2007;39(1).

    Article  MATH  Google Scholar 

  5. Ferragina P, Manzini G, Mäkinen V, Navarro G. Compressed representations of sequences and full-text indexes. ACM Trans Algorithm. 2007;3(2).

    Article  MathSciNet  MATH  Google Scholar 

  6. Grossi R, Gupta A, Vitter JS. High-order entropy-compressed text indexes. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms; 2003. p. 841–50.

    Google Scholar 

  7. Arroyuelo D, Navarro G, Sadakane K. Stronger Lempel-Ziv based compressed text indexing. Algorithmica. 2012;62(1–2):54–101.

    Article  MathSciNet  MATH  Google Scholar 

  8. Belazzougui D. Linear time construction of compressed text indices in compact space. In: Proceedings of the 46th Annual ACM Symposium on Theory of Computing; 2014. p. 148–93.

    Google Scholar 

  9. Burrows M, Wheeler D. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation; 1994.

    Google Scholar 

  10. Belazzougui D, Navarro G. Alphabet-independent compressed text indexing. ACM Trans Algorithms. 2014;10(4):Article 23.

    Google Scholar 

  11. Sadakane K. New text indexing functionalities of the compressed suffix arrays. J Algoritm. 2007;48(2):294–413.

    Article  MathSciNet  MATH  Google Scholar 

  12. Sadakane K. Compressed suffix trees with full functionality. Theory Comput Syst. 2007;41(4):589–607.

    Article  MathSciNet  MATH  Google Scholar 

  13. Ferragina P, Venturini R. Compressed cache-oblivious string B-tree. In: Proceedings of the 21st Annual European Symposium on Algorithms; 2013. p. 469–80.

    Google Scholar 

  14. Ferragina P, Venturini R. The compressed permuterm index. ACM Trans Algorithms. 2010;7(1):10.

    Article  MathSciNet  MATH  Google Scholar 

  15. Sadakane K. Succinct data structures for flexible text retrieval systems. J Discrete Algorithms. 2007;5(1):12–22.

    Article  MathSciNet  MATH  Google Scholar 

  16. Ferragina P, Sirén J, Venturini R. Distribution-aware compressed full-text indexes. Algorithmica. 2013;67(4):529–46.

    Article  MathSciNet  MATH  Google Scholar 

  17. Ferragina P, Grossi R. The string B-tree: a new data structure for string search in external memory and its applications. J ACM. 1999;46(2):236–80.

    Article  MathSciNet  MATH  Google Scholar 

  18. Ferragina P, González R, Navarro G, Venturini R. Compressed text indexes: from theory to practice. J Exp Algorithmics. 2009;13:1.12–31.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Ferragina .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Ferragina, P., Venturini, R. (2018). Indexing Compressed Text. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1144

Download citation

Publish with us

Policies and ethics