The evolution of methods which capture genetic sequence data has inspired a parallel evolution of computational tools which can be used to analyze and compare the data. Indeed, much of the progress in modern biological research has stemmed from the application of such technology. In this chapter we provide an overview of the main classes of tools currently used for sequence comparison. For each class of tools we provide a basic overview of how they work, their history, and their current state. There have been literally hundreds of different tools produced to align, cluster, filter, or otherwise analyze sequence data and it would be impossible to list all of them in this chapter, so we supply only an overview of the tools that most readers may encounter. We apologize to researchers who feel that their particular piece of software should have been included here. The reader will notice that there is much conceptual and application overlap between tools and in many cases one tool or algorithm is used as one part of another tool’s implementation. Most of the more popular sequence comparison tools are based on ideas and algorithms which can be traced back to the 1960s and 1970s when the cost of computing power first became low enough to enable wide spread development in this area. Where applicable we describe the original algorithms and then list the iterations of the idea (often by different people in different labs) noting the important changes that were included at each stage. Finally we describe the software packages currently used by today’s bioinformaticians. A quick search will allow the reader to find many papers which formally compare different implementations of a particular algorithm, so while we may note that one algorithm is more efficient or accurate than another we stress that we have not performed any formal benchmarking or comparison analysis here.
KeywordsQuery Sequence Alignment Algorithm Pairwise Alignment Progressive Method Progressive Alignment
- Dayhoff Mo, ed., 1978, Atlas of protein Sequence and Structure, Vol 5Google Scholar
- Penzner PA (2001) Fragment assembly with double-barreled data. Bioinformatics 17:S225–S233Google Scholar
- Smit AFA, Hubley R, Green P RepeatMasker Open-3.0. 1996-2004. http://www.repeatmasker.org
- Thompson JD, Higgins DG, Gibson TJ, Clustal W (1994) Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. Nov 11;22(22):4673–4680Google Scholar