Computer Analysis of Amino Acid Sequences

Koonin, Eugene V.; Mushegian, Arcady R.; Dolja, Valerian V.

doi:10.1385/0-89603-385-6:319

Eugene V. Koonin³,
Arcady R. Mushegian³ &
Valerian V. Dolja⁴

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 81))

1535 Accesses
1 Citations

Abstract

Genome sequences are of minimal use without an adequate interpretation of the sequences of putative protein products, which is only possible on the basis of detailed computer analysis. The approaches to amino acid sequence analysis can be roughly divided into those that explore intrinsic properties of proteins, such as hydropathy, secondary structure, distribution of different types of amino acid sequences, and so on, and those that search for sequence similarity. Both approaches include numerous algorithms and computer programs. In this short chapter, we cannot describe all or even the most widely used and valuable of these methods. Instead, we present a minimal set of procedures that, in our experience, is useful in order to extract a substantial amount of information from an amino acid sequence in a relatively short time. For detailed descriptions of various computer methods for sequence analysis, the reader is referred to the recently published reviews and Methods in Enzymology collections (1–3).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
*

References

Doolittle, R. F., ed. (1990) Molecular evolution, Methods in Enzymology, 183.
Google Scholar
Doolittle, R. F., ed. (1996) Molecular evolution, Methods in Enzymology, 226.
Google Scholar
Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nature Genetics 6, 119–129.
Article PubMed CAS Google Scholar
Doolittle, R. F., Feng, D. F., Johnson, M. S., and McClure, M. A. (1989) Origins and evolutionary relationships of retroviruses. Q. Rev. Biol. 64, 1–30.
Article PubMed CAS Google Scholar
Koonin, E. V. (1991) The phylogeny of RNA-dependent RNA polymerases of positivestrand RNA viruses. J. Gen. Virol. 72, 2197–2206.
Article PubMed Google Scholar
Koonin, E. V. and Dolja, V. V. (1993) Evolution and taxonomy of positive-strand RNA viruses: implications of comparative analysis of amino acid sequences. Crit. Rev. Biochem. Mol. Biol. 28, 375–430.
Article PubMed CAS Google Scholar
Karlin, S., Bucher, P., Brendel, V., and Altschul, S. F. (1991) Statistical methods and insights for protein and DNA sequences. Annu. Rev. Biophys. Biophys. Chem. 20, 175–203.
Article PubMed CAS Google Scholar
Karlin, S., Brendel, V., and Bucher, P. (1992) Significant similarity and dissimilarity in homologous proteins. Mol. Biol. Evol. 9, 152–167.
PubMed CAS Google Scholar
Brendel, V., Bucher, P., Nourbakhsh, I. R., Blaisdell, B. E., and Karlin, S. (1992) Methods and algorithms for statistical analysis of protein sequences. Proc. Natl. Acad. Sci. USA 89, 2002–2006.
Article PubMed CAS Google Scholar
Wootton, J. C. (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 18, 269–285.
Article PubMed CAS Google Scholar
Wootton, J. C. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and in sequence databases. Comput. Chem. 17, 149–163.
Article CAS Google Scholar
Wootton, J. C. (1994) Sequences with “unusual” amino acid compositions. Curr. Op. Struct. Biol. 4, 413–421.
Article CAS Google Scholar
Chou, P. Y. and Fasman, G. D. (1978) Empirical predictions of protein conformation. Annu. Rev. Biochem. 47, 251–276.
Article PubMed CAS Google Scholar
Gamier, J., Osguthorpe, D. J., and Robson, B. (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120, 97–120.
Article Google Scholar
Rost, B. and Sander, C. Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599.
Google Scholar
Rost, B. and Sander, C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19, 55–72.
Article PubMed CAS Google Scholar
Rost, B., Sander, C., and Schneider, R. (1994) PHD—an automatic mail server for protein secondary structure prediction. Comput. Appl. Biosci. 10, 53–60.
PubMed CAS Google Scholar
Kyte, J. and Doolittle, R. F. (1982) A simple method for displaying an hydropathic character of a protein. J. Mol. Biol. 157, 105–132.
Article PubMed CAS Google Scholar
Hopp, T. P. and Woods, K. R. (1983) A computer program for predicting protein antigenic determinants. Mol. Immunol. 20, 483–489.
Article PubMed CAS Google Scholar
Benson, D. A., Boguski, M., Lipman, D. J., and Ostell, J. (1994) GenBank. Nucleic Acids Res. 22, 3441–3444.
Article PubMed CAS Google Scholar
Emmert, D. B., Stoehr, P. J., Stoesser, G., and Cameron, G. N. (1994) The European Bioinformatics Institute (EBI) databases. Nucleic Acids Res. 22, 3445–3449.
Article PubMed CAS Google Scholar
Bairoch, A. and Boeckmann, B. (1994) The S,WISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 22, 3578–3580.
Article PubMed CAS Google Scholar
George, D. G., Barker, W. C., Mewes, H. W., Pfeiffer, F., and Tsugita, A. (1994) The PIRInternationaL Protein Sequence Database. Nucleic Acids Res. 22, 3569–3573.
PubMed CAS Google Scholar
Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.
Article PubMed CAS Google Scholar
Lipman, D. J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches. Science 227, 1435–1441.
Article PubMed CAS Google Scholar
Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.
Article PubMed CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
PubMed CAS Google Scholar
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
Article PubMed CAS Google Scholar
Sturrock, S. S. and and Collins, J. F. (1993) MPsrch version 1. 3. Biocomputing Research Unit, University of Edinburgh, UK.
Google Scholar
Waterman, M. S. and Vingron M. (1994) Rapid and accurate estimates of statistical significance for sequence database leaches. Proc. Natl. Acad. Sci. USA 91, 4625–4628.
Article PubMed CAS Google Scholar
Karlin, S. and Altschul, S. F. (1993) Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. USA 90, 5873–5877.
Article PubMed CAS Google Scholar
Karlin, S. and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using genera] scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.
Article PubMed CAS Google Scholar
Pearson, W. R. and Miller, W. (1992) Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 210, 575–601.
Article PubMed CAS Google Scholar
Koonin, E. V. and Tatusov, R. L. (1994) Computer analysis of bacterial dehalogenases defines a large superfamily of hydrolases with diverse specificity. J. Mol. Biol. 245, 125–132.
Article Google Scholar
Tatusov, R. L. and Koonin, E. V. (1994) A simple tool to search for sequence motifs that are conserved in BLAST outputs. Comp. Appl. Biosci. 10, 457–459.
PubMed CAS Google Scholar
Bairoch, A. and Bucher, P. (1994) PROSIT],: recent developments. Nucleic Acids Res. 22, 3583–3589.
Article PubMed CAS Google Scholar
Dolja, V. V., Boyko, V. P., Agranovsky, A. A., and Koonin, E. V. (1991) Phylogeny of capsid proteins of rod-shaped and filamentous plant RNA viruses: two families with distinct patterns of sequence and probably structure conservation. Virology 184, 79–86.
Article PubMed CAS Google Scholar
McClure, M. A., Vasi, T. K., and Fitch, W. M. (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol. Biol. Evol. 11, 571–592.
PubMed CAS Google Scholar
Schuler, G. D., Altschul, S. F., and Lipman, D. J. (1991) A workbench for multiple alignment construction and analysis. Proteins 9, 180–190.
Article PubMed CAS Google Scholar
Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358.
Article PubMed CAS Google Scholar
Gribskov, M., Luthy, R., and Eisenberg, D. (1990) Profile analysis. Methods Enzymol. 183, 146–159.
Article PubMed CAS Google Scholar
Gribskov, M. (1994) Profile analysis. Methods Mol. Biol. 25, 247–266.
PubMed CAS Google Scholar
Tatusov, R. L., Altschul, S. F., and Koonin, E. V. (1994) Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc. Natl. Acad. Sci. USA 91, 12,091–12,095.
Article PubMed CAS Google Scholar
Brown, M., Hughey, R., Krogh, A., Mian, I. S., Sjolander, K., and Haussler, D. (1993) Using Dirichlet mixture priors to derive hidden Markov models for protein families, in Proceedings of First International Conference on Intelligent Systems for Molecular Biology (Hunter, L., Searls, D., Shavlik, J., eds.), AAAI, Menlo Park, A, pp. 47–55.
Google Scholar
Dolja, V. V. and Koonin, E. V. (1991) Phylogeny of capsid proteins of small icosahedral RNA plant viruses. J. Gen. Virol. 72, 1481–1486.
Article PubMed Google Scholar
Rossmann, M. G. and Johnson, J. E. (1989) Icosahedral RNA virus structure. Annu. Rev. Biochem. 58, 533–573.
Article PubMed CAS Google Scholar
Chelvanayagam, G., Heringa, J., and Argos, P. (1992) Anatomy and evolution of proteins displaying the viral capsid jellyroll topology. J. Mol. Biol. 228, 220–242.
Article PubMed CAS Google Scholar
Kikuno, R., Toh, H., Hayashida, H., and Miyata, T. (1984) Sequence similarity between putative gene products of geminiviral DNAs. Nature 308, 562.
Article PubMed CAS Google Scholar
Turpen, T. H., Reinl, S. J., Charoenvit, Y., Hoffmann, S. L., Fallarme, V., and Grill, L. K. (1995) Malarial epitopes expressed on the surface of recombinant tobacco mosaic vrus. Bio/Technology 13, 53–58.
Article PubMed CAS Google Scholar
Porta, C., Spall, V. E., Loveland, J., Johnson, J. E., Barker, P. J., and Lomonossoff, G. P. (1994) Development of cowpea mosaic virus as a high-yielding system for the presentation of foreign peptides. Virology 202, 949–955.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
Eugene V. Koonin & Arcady R. Mushegian
Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR
Valerian V. Dolja

Authors

Eugene V. Koonin
View author publications
You can also search for this author in PubMed Google Scholar
Arcady R. Mushegian
View author publications
You can also search for this author in PubMed Google Scholar
Valerian V. Dolja
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Plant Virology Group School of Biological Sciences, University of Bristol, UK
Gary D. Foster
Department of Botany, University of Leicester, UK
Gary D. Foster & Sally C. Taylor &

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Koonin, E.V., Mushegian, A.R., Dolja, V.V. (1998). Computer Analysis of Amino Acid Sequences. In: Foster, G.D., Taylor, S.C. (eds) Plant Virology Protocols. Methods in Molecular Biology™, vol 81. Humana Press. https://doi.org/10.1385/0-89603-385-6:319

Download citation

DOI: https://doi.org/10.1385/0-89603-385-6:319
Publisher Name: Humana Press
Print ISBN: 978-0-89603-385-6
Online ISBN: 978-1-59259-566-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics