BLAST is the most frequently used method for assessing which DNA or protein sequences in a large database have significant similarity to a given query sequence. Many of the results derived in previous chapters, especially those relating to the renewal theorem, random walks, and sequential analysis, were discussed because they are needed in the statistical theory associated with the BLAST procedure. In this chapter we describe how they are used for this purpose. For concreteness the discussion is in terms of protein (amino acid) sequences; the analysis for DNA sequences is similar to, but simpler than, that for protein sequences.
KeywordsRandom Walk Substitution Matrix Amino Acid Frequency Amino Acid Pair Edge Correction
Unable to display preview. Download preview PDF.