Sequence Homology Handling

Saitou, Naruya

doi:10.1007/978-1-4471-5304-7_14

Naruya Saitou⁶

Part of the book series: Computational Biology ((COBO,volume 17))

3302 Accesses

Abstract

How to discover evolutionary homology of nucleotide and amino acid sequences and how to analyze these homologous sequences are discussed, including homology search, pairwise alignment, multiple alignment, and genome-wide sequence viewing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410.
Google Scholar
http://www.ncbi.nlm.nih.gov/books/NBK21097/
Karlin, S., & Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences, 87, 2264–2268.
Article MATH Google Scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25, 3389–3402.
Article Google Scholar
Zhang, Z., Schwartz, S., Wagner, L., & Miller, W. (2000). A greedy algorithm for aligning DNA sequences. Journal of Computational Biology, 7, 203–214.
Article Google Scholar
Kitano, T., & Saitou, N. (2000). Evolutionary history of the Rh blood group-related genes in vertebrates. Immunogenetics, 51, 856–862.
Article Google Scholar
Lipman, D. J., & Pearson, W. R. (1985). Rapid and sensitive protein similarity searches. Science, 227, 1435–1441.
Article Google Scholar
Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States, 85, 2444–2448.
Article Google Scholar
Kent, W. J. (2002). BLAT – the BLAST-like alignment tool. Genome Research, 12, 656–664.
MathSciNet Google Scholar
http://genome.ucsc.edu/FAQ/FAQblat.html
Ma, B., Tromp, J., & Li, M. (2002). PatternHunter: Faster and more sensitive homology search. Bioinformatics, 18, 440–445.
Article Google Scholar
http://www.bioinformaticssolutions.com/all-products/ph
Eddy, S. R. (2009). A new generation of homology search tools based on probabilistic inference. Genome Informatics, 23, 205–211.
Article Google Scholar
Fin, R. D., Clements, J., & Eddy, S. R. (2011). HMMER web server: Interactive sequence similarity searching. Nucleic Acids Research, 39, W29–W37.
Article Google Scholar
Higgs, P. G., & Atwood, T. K. (2005). Bioinformatics and molecular evolution. Malden: Blackwell.
Google Scholar
Chao, K.-M., & Zhang, L. (2008). Sequence comparison: Theory and methods (Computational biology series). London: Springer.
Google Scholar
Saitou, N., & Ueda, S. (1994). Evolutionary rate of insertions and deletions in non-coding nucleotide sequences of primates. Molecular Biology and Evolution, 11, 504–512.
Google Scholar
Needleman, S. B., & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48, 443–453.
Article Google Scholar
Sellers, P. H. (1974). On the theory and computation of evolutionary distances. SIAM Journal on Applied Mathematics, 26, 787–793.
Article MATH MathSciNet Google Scholar
Waterman, M. S., Smith, T. F., & Beyer, W. A. (1976). Some biological sequence metrics. Advances in Mathematics, 20, 367–387.
Article MATH MathSciNet Google Scholar
Gotoh, O. (1982). An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162, 705–708.
Article Google Scholar
Altschul, S. F., & Erickson, B. W. (1986). A nonlinear measure of subalignment similarity and its significance levels. Bulletin of Mathematical Biology, 48, 603–616.
MATH MathSciNet Google Scholar
Fitch, W. (1969). Locating gaps in amino acid sequences to optimize the homology between two proteins. Biochemical Genetics, 3, 99–108.
Article Google Scholar
Schulz, J., Florian Leese, F., & Held, C. (2011). Introduction to dot-plots. Web page available at http://www.code10.info/
Kuroki, Y., Toyoda, A., Noguchi, H., Taylor, T. D., Itoh, T., Kim, D. S., Kim, D. W., Choi, S. H., Kim, I. C., Choi, H. H., Kim, Y. S., Satta, Y., Saitou, N., Yamada, T., Morishita, S., Hattori, M., Sakaki, Y., Park, H. S., & Fujiyama, A. (2006). Comparative analysis of chimpanzee and human Y chromosomes unveils complex evolutionary pathway. Nature Genetics, 38, 158–167.
Article Google Scholar
Murata, M., Richardson, J. S., & Sussman, J. L. (1985). Simultaneous comparison of three protein sequences. Proceedings of National Academy of Sciences, USA, 82, 3073–3077.
Article Google Scholar
Feng, D.-F., & Doolittle, R. F. (1987). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution, 25, 351–360.
Article Google Scholar
Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994). CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22, 4673–4680.
Article Google Scholar
Katoh, K., Misawa, K., Kuma, K., & Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30, 3059–3066.
Article Google Scholar
Notredame, C. (2007). Recent evolutions of multiple sequence alignment algorithms. PLoS Computational Biology, 3, e123.
Article Google Scholar
Morgenstern, B., Dress, A., & Werner, T. (1996). Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proceedings of National Academy of Sciences, USA, 93, 12098–12103.
Article MATH Google Scholar
Brudno, M., Do, C., Cooper, G., Kim, M. F., Davydov, E., Green, E. D., Sidow, A., & Batzoglou, S. (2003). LAGAN and multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Research, 13, 721–731.
Article Google Scholar
Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32, 1792–1797.
Article Google Scholar
Bray, N., & Pachter, L. (2004). MAVID: Constrained ancestral alignment of multiple sequences. Genome Research, 14, 693–699.
Article Google Scholar
Darling, A. C. E., Mau, B., Blatter, F. R., & Perna, N. T. (2004). Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Research, 14, 1394–1403.
Article Google Scholar
Darling, A. C. E., Mau, B., & Perna, N. T. (2010). progressiveMauve: Multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE, 5, e11147.
Article Google Scholar
Kryukov, K., & Saitou, N. (2010). MISHIMA – A new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data. BMC Bioinformatics, 11, 142.
Article Google Scholar
Popendorf, K., Tsuyoshi, H., Osana, Y., & Sakakibara, Y. (2010). Murasaki: A fast, parallelizable algorithm to find anchors from multiple genomes. PLoS ONE, 5, e12651.
Article Google Scholar
Higgins, D. G., & Sharp, P. (1988). CLUSTAL: A package for performing multiple sequence alignment on a microcomputer. Gene, 73, 237–244.
Article Google Scholar
Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4, 406–425.
Google Scholar
Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16, 111–120.
Article Google Scholar
Kimura, M. (1983). The neutral theory of molecular evolution. Cambridge: Cambridge University Press.
Book Google Scholar
Higgins, D. G., Bleasby, A. J., & Fuchs, R. (1992). CLUSTAL V: Improved software for multiple sequence alignment. Computational Applied Biosciences, 8, 189–191.
Google Scholar
Wilbur, W. J., & Lipman, D. (1984). The context dependent comparison of biological sequences. SIAM Journal of Applied Mathematics, 44, 557–567.
Article MATH MathSciNet Google Scholar
Myers, E. W., & Miller, W. (1988). Optimal alignments in linear space. CABIOS, 4, 11–17.
Google Scholar
Larkin, M. A., Blackshields, G., Brown, N. P., et al. (13 co-authors) (2007) Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947–2948.
Google Scholar
Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., Thompson, J. D., & Higgins, D. G. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology, 7, 539.
Article Google Scholar
Felsenstein, J., Sawyer, S., & Kochin, R. (1982). An efficient method for matching nucleotide acid sequences. Nucleic Acids Research, 10, 133–139.
Article Google Scholar
Notredame, C., Higgins, D. G., & Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302, 205–217.
Article Google Scholar
Galtier, N., Gouy, M., & Gautier, C. (1996). SEA VIEW and PHYLO_WIN: Two graphic tools for sequence alignment and molecular phylogeny. Computer Applications in the Biosciences, 12, 543–548.
Google Scholar
Lipman, D. J., Altschul, S. F., & Kececioglu, J. D. (1989). A tool for multiple sequence alignment. Proceedings of the National Academy of Sciences of the United States of America, 86, 4412–4415.
Article Google Scholar
Subramanian, A. R., Kaufmann, M., & Morgenstern, B. (2008). DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology, 3, 6.
Article Google Scholar
Bradley, R. K., Roberts, A., Smoot, M., Juvekar, S., Do, J., Dewey, C., Holmes, I., & Pachter, L. (2009). Fast statistical alignment. PLoS Computational Biology, 5, e1000392.
Article MathSciNet Google Scholar
Bray, N., Dubchak, I., & Pachter, L. (2003). AVID: A global alignment program. Genome Research, 13, 97–102.
Article Google Scholar
Blanchette, M., Kent, W. J., Riemer, C., Elnitski, L., Smit, A. F. A., Roskin, K. M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E. D., Haussler, D., & Miller, W. (2004). Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research, 14, 708–715.
Article Google Scholar
Brudno, M., Chapman, M., Gottgens, B., Batzoglou, S., & Morgenstern, B. (2003). Fast and sensitive multiple alignment of long genomic sequences. BMC Bioinformatics, 4, 66.
Article Google Scholar
Raphael, B., Zhi, D., Tang, H., & Pevzner, P. (2004). A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Research, 14, 2336–2346.
Article Google Scholar
Do, C. B., Mahabhashyam, M. S. P., Brudno, M., & Batzoglou, S. (2005). ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research, 15, 330–340.
Article Google Scholar
Lassmann, T., & Sonnhammer, E. L. L. (2005). Kalign—An accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics, 6, 298.
Article Google Scholar
Lotynoja, A., & Goldman, N. (2005). An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences of the United States of America, 102, 10557–10562.
Article Google Scholar
Sze, S.-H., Lu, Y., & Yang, Q. (2006). A polynomial time solvable formulation of multiple sequence alignment. Journal of Computational Biology, 13, 309–319.
Article MathSciNet Google Scholar
Liu, Y., Schmidt, B., & Maskell, D. L. (2010). MSAProbs: Multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics, 26, 1958–1964.
Article Google Scholar
Shih, A. C.-C., & Li, W.-H. (2003). GS-Aligner: A novel tool for aligning genomic sequences using bit-level operations. Molecular Biology and Evolution, 20, 1299–1309.
Article Google Scholar
Keightley, P. D., & Johnson, T. (2004). MCALIGN: Stochastic alignment of noncoding DNA sequences based on an evolutionary model of sequence evolution. Genome Research, 14, 442–450.
Article Google Scholar
Kurtz, S., Phillippy, A., Delcher, A. L., Smoot, M., Shumway, M., Antonescu, C., & Salzberg, S. L. (2004). Versatile and open software for comparing large genomes. Genome Biology, 5, R12.
Article Google Scholar
Schwartz, S., Zhang, Z., Frazer, K. A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., & Miller, W. (2000). PipMaker–A web server for aligning two genomic DNA sequences. Genome Research, 10, 577–586.
Article Google Scholar
http://genome.lbl.gov/vista/index.shtml
Matsunami, M., Sumiyama, K., & Saitou, N. (2010). Evolution of conserved non-coding sequences within the vertebrate Hox clusters through the two-round whole genome duplications revealed by phylogenetic footprinting analysis. Journal of Molecular Evolution, 71, 427–436.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Division of Population Genetics, National Institute of Genetics (NIG), Mishima, Shizuoka, Japan
Naruya Saitou

Authors

Naruya Saitou
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Saitou, N. (2013). Sequence Homology Handling. In: Introduction to Evolutionary Genomics. Computational Biology, vol 17. Springer, London. https://doi.org/10.1007/978-1-4471-5304-7_14

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5304-7_14
Published: 21 November 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5303-0
Online ISBN: 978-1-4471-5304-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics