Skip to main content

Biological Sequences and the Exact String Matching Problem

  • Chapter
Introduction to Computational Biology
  • 1918 Accesses

3.15 Summary

In computational biology one often needs to look up the occurrence of some pattern P in a text T. Since the texts of computational biology include genome sequences, which tend to be large, it is important to apply efficient methods of string matching. Traditional string matching methods are guaranteed to take time O(n), where n is the length of the text. By preprocessing a set of patterns into a keyword tree, this time requirement can be extended to set matching. Instead of preprocessing one or more patterns, it is also possible to preprocess the text. A suffix tree is a data structure that can be constructed for a given text in O(n). However, once it is constructed, it can be used to search any P in T in time O(m), where is the length of the pattern. In addition to making string searching extremely efficient, a suffix tree reveals in one fell-swoop the entire repeat structure of T without the need for carrying out any string comparisons. This has important biological applications where unique and repeat sequences play a central role in many fundamental as well as biotechnological problems. Finally, suffix trees can also be used for rapid inexact string matching, where ≤ k mismatches between P and its occurrence in T are allowed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

3.16 Further Reading

  1. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, 1997.

    Google Scholar 

Download references

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Birkhäuser Verlag

About this chapter

Cite this chapter

(2006). Biological Sequences and the Exact String Matching Problem. In: Introduction to Computational Biology. Birkhäuser Basel. https://doi.org/10.1007/3-7643-7387-3_3

Download citation

Publish with us

Policies and ethics