Skip to main content

Computing Matching Statistics and Maximal Exact Matches on Compressed Full-Text Indexes

  • Conference paper
String Processing and Information Retrieval (SPIRE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6393))

Included in the following conference series:

Abstract

Exact string matching is a problem that computer programmers face on a regular basis, and full-text indexes like the suffix tree or the suffix array provide fast string search over large texts. In the last decade, research on compressed indexes has flourished because the main problem in large-scale applications is the space consumption of the index. Nowadays, the most successful compressed indexes are able to obtain almost optimal space and search time simultaneously. It is known that a myriad of sequence analysis and comparison problems can be solved efficiently with established data structures like the suffix tree or the suffix array, but algorithms on compressed indexes that solve these problem are still lacking at present. Here, we show that matching statistics and maximal exact matches between two strings S 1 and S 2 can be computed efficiently by matching S 2 backwards against a compressed index of S 1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Weiner, P.: Linear pattern matching algorithms. Proc. 14th IEEE Annual Symposium on Switching and Automata Theory. 1–11 (1973)

    Google Scholar 

  2. Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words, pp. 85–96. Springer, Heidelberg (1985)

    Chapter  Google Scholar 

  3. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  4. Manber, U., Myers, E.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  5. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2000)

    Google Scholar 

  6. Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Research Report 124, Digital Systems Research Center (1994)

    Google Scholar 

  7. Chang, W., Lawler, E.: Sublinear approximate string matching and biological applications. Algorithmica 12(4/5), 327–344 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  8. Teo, C., Vishwanathan, S.: Fast and space efficient string kernels using suffix arrays. In: Proc. 23rd Conference on Machine Learning, pp. 929–936. ACM Press, New York (2003)

    Google Scholar 

  9. Rahmann, S.: Fast and sensitive probe selection for DNA chips using jumps in matching statistics. In: Proc. 2nd IEEE Computer Society Bioinformatics Conference, pp. 57–64 (2003)

    Google Scholar 

  10. Kurtz, S., Phillippy, A., Delcher, A., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.: Versatile and open software for comparing large genomes. Genome Biology 5, R12 (2004)

    Google Scholar 

  11. Abouelhoda, M., Kurtz, S., Ohlebusch, E.: CoCoNUT: An efficient system for the comparison and analysis of genomes. BMC Bioinformatics 9, 476 (2008)

    Article  Google Scholar 

  12. Puglisi, S., Smyth, W., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys 39(2), 1–31 (2007)

    Article  Google Scholar 

  13. Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th ACM-SIAM Symposium on Discrete Algorithms, pp. 841–850 (2003)

    Google Scholar 

  14. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), Article 2 (2007)

    Google Scholar 

  15. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  16. Sadakane, K.: Compressed suffix trees with full functionality. Theory of Computing Systems 41, 589–607 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  17. Abouelhoda, M., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2, 53–86 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ohlebusch, E., Gog, S.: A compressed enhanced suffix array supporting fast string matching. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 51–62. Springer, Heidelberg (2009)

    Google Scholar 

  19. Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theoretical Computer Science 410(51), 5354–5364 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  20. Russo, L., Navarro, G., Oliveira, A.: Parallel and distributed compressed indexes. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 348–360. Springer, Heidelberg (2010)

    Google Scholar 

  21. Khan, Z., Bloom, J., Kruglyak, L., Singh, M.: A practical algorithm for finding maximal exact matches in large sequence data sets using sparse suffix arrays. Bioinformatics 25, 1609–1616 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ohlebusch, E., Gog, S., Kügel, A. (2010). Computing Matching Statistics and Maximal Exact Matches on Compressed Full-Text Indexes. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16321-0_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16320-3

  • Online ISBN: 978-3-642-16321-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics