The alignment methods introduced in Chapter 3 are good for comparing two sequences accurately. However, they are not adequate for homology search against a large biological database such as GenBank. As of February 2008, there are approximately 85,759,586,764 bases in 82,853,685 sequence records in the traditional GenBank divisions. To search such kind of huge databases, faster methods are required for identifying the homology between the query sequence and the database sequence in a timely manner.
One common feature of homology search programs is the filtration idea, which uses exact matches or approximate matches between the query sequence and the database sequence as a basis to judge if the homology between the two sequences passes the desired threshold.
This chapter is divided into six sections. Section 4.1 describes how to implement the filtration idea for finding exact word matches between two sequences by using efficient data structures such as hash tables, suffix trees, and suffix arrays.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
(2009). Homology Search Tools. In: Sequence Comparison. Computational Biology, vol 7. Springer, London. https://doi.org/10.1007/978-1-84800-320-0_4
Download citation
DOI: https://doi.org/10.1007/978-1-84800-320-0_4
Publisher Name: Springer, London
Print ISBN: 978-1-84800-319-4
Online ISBN: 978-1-84800-320-0
eBook Packages: Computer ScienceComputer Science (R0)