Recent accomplishments in computational science have made conceivable not just the acquisition and storage of large amounts of sequence data but also made possible the simultaneous analyses of these sequences. Variations over evolutionary timescale are the reason for divergence among sequences. In light of how variable or conserved a region is, between two or more sequences, much can be said regarding the significance of the region for maintaining functional and structural integrity. Sequence alignment is also the first step in most of the bioinformatics analysis. Domains of high similarity could be a consequence of evolutionary relationships, i.e., shared ancestry, and can be uncovered by using sequence alignment. In this chapter we discuss common terminology related to sequence alignment, how to choose the appropriate alignment strategy for a given problem, different alignment algorithms, and most commonly available tools for pairwise, multiple, and whole genome sequence alignment.
KeywordsSequence identity Sequence homology Sequence similarity Substitution matrices Distance matrices Pairwise sequence alignment Multiple sequence alignment BLAST FASTA Genome alignment
This work is supported by a grant from the NIH Research Project Grant Program (2R01GM079656). The authors are grateful to Dr. David C. Marciano, Dr. Angela Wilkins, and Dr. Rhonald C. Lua for their helpful comments.
- Alberts B, Johnson A, Lewis J et al (2002) Molecular biology of the cell, 4th edn. Garland Science, New YorkGoogle Scholar
- Darwin C (1859.) On the origin of species by means of natural selectionGoogle Scholar
- Dayhoff M, Schwartz R (1978) A model of evolutionary change in proteins. Atlas Pro Seq Struct:345–352 10.1.1.145.4315Google Scholar
- de Vries H (1900–1903) The mutation theoryGoogle Scholar
- Earl D, Nguyen N, Hickey G, Harris R (2014) Alignathon: a competitive assessment of whole genome alignment methods. bioRxiv:1–30. https://doi.org/10.1101/003285
- Gibbs JA, McIntyre AG (1970) The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur J Biochem 16:1–11. https://doi.org/10.1111/J.1432-1033.1970.Tb01046.X CrossRefPubMedGoogle Scholar
- Healy J (2016) FLAK: ultra-fast fuzzy whole genome alignment. Advances in intelligent systems and computing, vol 477. SpringerGoogle Scholar
- Mendel GJ (1865) Experiments on plant hybridization. Read at the meetings of the Brünn Natural History SocietyGoogle Scholar
- Pearson WR (2014) An introduction to sequence similarity (“homology”) searching. Curr Protoc Bioinforma:1–9. https://doi.org/10.1002/0471250953.bi0301s42.An
- Woese CR, Kandler O, Wheelis ML (1990) Towards a natural system of organisms: proposal for the domains archaea, Bacteria, and Eucarya. P Natl Acad Sci USA 87:4576–4579. https://doi.org/10.1073/pnas.87.12.4576 Webpage references: https://omictools.com/whole-genome-alignment-category CrossRefGoogle Scholar
- Zuckerland E, Pauling L (1965) History of evolutionary molecules as documents. J Theor Biol:357–366Google Scholar