3.15 Summary
In computational biology one often needs to look up the occurrence of some pattern P in a text T. Since the texts of computational biology include genome sequences, which tend to be large, it is important to apply efficient methods of string matching. Traditional string matching methods are guaranteed to take time O(n), where n is the length of the text. By preprocessing a set of patterns into a keyword tree, this time requirement can be extended to set matching. Instead of preprocessing one or more patterns, it is also possible to preprocess the text. A suffix tree is a data structure that can be constructed for a given text in O(n). However, once it is constructed, it can be used to search any P in T in time O(m), where is the length of the pattern. In addition to making string searching extremely efficient, a suffix tree reveals in one fell-swoop the entire repeat structure of T without the need for carrying out any string comparisons. This has important biological applications where unique and repeat sequences play a central role in many fundamental as well as biotechnological problems. Finally, suffix trees can also be used for rapid inexact string matching, where ≤ k mismatches between P and its occurrence in T are allowed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
3.16 Further Reading
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, 1997.
Rights and permissions
Copyright information
© 2006 Birkhäuser Verlag
About this chapter
Cite this chapter
(2006). Biological Sequences and the Exact String Matching Problem. In: Introduction to Computational Biology. Birkhäuser Basel. https://doi.org/10.1007/3-7643-7387-3_3
Download citation
DOI: https://doi.org/10.1007/3-7643-7387-3_3
Publisher Name: Birkhäuser Basel
Print ISBN: 978-3-7643-6700-8
Online ISBN: 978-3-7643-7387-0
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)