Abstract
Genome sequences are of minimal use without an adequate interpretation of the sequences of putative protein products, which is only possible on the basis of detailed computer analysis. The approaches to amino acid sequence analysis can be roughly divided into those that explore intrinsic properties of proteins, such as hydropathy, secondary structure, distribution of different types of amino acid sequences, and so on, and those that search for sequence similarity. Both approaches include numerous algorithms and computer programs. In this short chapter, we cannot describe all or even the most widely used and valuable of these methods. Instead, we present a minimal set of procedures that, in our experience, is useful in order to extract a substantial amount of information from an amino acid sequence in a relatively short time. For detailed descriptions of various computer methods for sequence analysis, the reader is referred to the recently published reviews and Methods in Enzymology collections (1–3).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
*
References
Doolittle, R. F., ed. (1990) Molecular evolution, Methods in Enzymology, 183.
Doolittle, R. F., ed. (1996) Molecular evolution, Methods in Enzymology, 226.
Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nature Genetics 6, 119–129.
Doolittle, R. F., Feng, D. F., Johnson, M. S., and McClure, M. A. (1989) Origins and evolutionary relationships of retroviruses. Q. Rev. Biol. 64, 1–30.
Koonin, E. V. (1991) The phylogeny of RNA-dependent RNA polymerases of positivestrand RNA viruses. J. Gen. Virol. 72, 2197–2206.
Koonin, E. V. and Dolja, V. V. (1993) Evolution and taxonomy of positive-strand RNA viruses: implications of comparative analysis of amino acid sequences. Crit. Rev. Biochem. Mol. Biol. 28, 375–430.
Karlin, S., Bucher, P., Brendel, V., and Altschul, S. F. (1991) Statistical methods and insights for protein and DNA sequences. Annu. Rev. Biophys. Biophys. Chem. 20, 175–203.
Karlin, S., Brendel, V., and Bucher, P. (1992) Significant similarity and dissimilarity in homologous proteins. Mol. Biol. Evol. 9, 152–167.
Brendel, V., Bucher, P., Nourbakhsh, I. R., Blaisdell, B. E., and Karlin, S. (1992) Methods and algorithms for statistical analysis of protein sequences. Proc. Natl. Acad. Sci. USA 89, 2002–2006.
Wootton, J. C. (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 18, 269–285.
Wootton, J. C. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and in sequence databases. Comput. Chem. 17, 149–163.
Wootton, J. C. (1994) Sequences with “unusual” amino acid compositions. Curr. Op. Struct. Biol. 4, 413–421.
Chou, P. Y. and Fasman, G. D. (1978) Empirical predictions of protein conformation. Annu. Rev. Biochem. 47, 251–276.
Gamier, J., Osguthorpe, D. J., and Robson, B. (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120, 97–120.
Rost, B. and Sander, C. Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599.
Rost, B. and Sander, C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19, 55–72.
Rost, B., Sander, C., and Schneider, R. (1994) PHD—an automatic mail server for protein secondary structure prediction. Comput. Appl. Biosci. 10, 53–60.
Kyte, J. and Doolittle, R. F. (1982) A simple method for displaying an hydropathic character of a protein. J. Mol. Biol. 157, 105–132.
Hopp, T. P. and Woods, K. R. (1983) A computer program for predicting protein antigenic determinants. Mol. Immunol. 20, 483–489.
Benson, D. A., Boguski, M., Lipman, D. J., and Ostell, J. (1994) GenBank. Nucleic Acids Res. 22, 3441–3444.
Emmert, D. B., Stoehr, P. J., Stoesser, G., and Cameron, G. N. (1994) The European Bioinformatics Institute (EBI) databases. Nucleic Acids Res. 22, 3445–3449.
Bairoch, A. and Boeckmann, B. (1994) The S,WISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 22, 3578–3580.
George, D. G., Barker, W. C., Mewes, H. W., Pfeiffer, F., and Tsugita, A. (1994) The PIRInternationaL Protein Sequence Database. Nucleic Acids Res. 22, 3569–3573.
Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.
Lipman, D. J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches. Science 227, 1435–1441.
Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
Sturrock, S. S. and and Collins, J. F. (1993) MPsrch version 1. 3. Biocomputing Research Unit, University of Edinburgh, UK.
Waterman, M. S. and Vingron M. (1994) Rapid and accurate estimates of statistical significance for sequence database leaches. Proc. Natl. Acad. Sci. USA 91, 4625–4628.
Karlin, S. and Altschul, S. F. (1993) Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. USA 90, 5873–5877.
Karlin, S. and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using genera] scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.
Pearson, W. R. and Miller, W. (1992) Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 210, 575–601.
Koonin, E. V. and Tatusov, R. L. (1994) Computer analysis of bacterial dehalogenases defines a large superfamily of hydrolases with diverse specificity. J. Mol. Biol. 245, 125–132.
Tatusov, R. L. and Koonin, E. V. (1994) A simple tool to search for sequence motifs that are conserved in BLAST outputs. Comp. Appl. Biosci. 10, 457–459.
Bairoch, A. and Bucher, P. (1994) PROSIT],: recent developments. Nucleic Acids Res. 22, 3583–3589.
Dolja, V. V., Boyko, V. P., Agranovsky, A. A., and Koonin, E. V. (1991) Phylogeny of capsid proteins of rod-shaped and filamentous plant RNA viruses: two families with distinct patterns of sequence and probably structure conservation. Virology 184, 79–86.
McClure, M. A., Vasi, T. K., and Fitch, W. M. (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol. Biol. Evol. 11, 571–592.
Schuler, G. D., Altschul, S. F., and Lipman, D. J. (1991) A workbench for multiple alignment construction and analysis. Proteins 9, 180–190.
Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358.
Gribskov, M., Luthy, R., and Eisenberg, D. (1990) Profile analysis. Methods Enzymol. 183, 146–159.
Gribskov, M. (1994) Profile analysis. Methods Mol. Biol. 25, 247–266.
Tatusov, R. L., Altschul, S. F., and Koonin, E. V. (1994) Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc. Natl. Acad. Sci. USA 91, 12,091–12,095.
Brown, M., Hughey, R., Krogh, A., Mian, I. S., Sjolander, K., and Haussler, D. (1993) Using Dirichlet mixture priors to derive hidden Markov models for protein families, in Proceedings of First International Conference on Intelligent Systems for Molecular Biology (Hunter, L., Searls, D., Shavlik, J., eds.), AAAI, Menlo Park, A, pp. 47–55.
Dolja, V. V. and Koonin, E. V. (1991) Phylogeny of capsid proteins of small icosahedral RNA plant viruses. J. Gen. Virol. 72, 1481–1486.
Rossmann, M. G. and Johnson, J. E. (1989) Icosahedral RNA virus structure. Annu. Rev. Biochem. 58, 533–573.
Chelvanayagam, G., Heringa, J., and Argos, P. (1992) Anatomy and evolution of proteins displaying the viral capsid jellyroll topology. J. Mol. Biol. 228, 220–242.
Kikuno, R., Toh, H., Hayashida, H., and Miyata, T. (1984) Sequence similarity between putative gene products of geminiviral DNAs. Nature 308, 562.
Turpen, T. H., Reinl, S. J., Charoenvit, Y., Hoffmann, S. L., Fallarme, V., and Grill, L. K. (1995) Malarial epitopes expressed on the surface of recombinant tobacco mosaic vrus. Bio/Technology 13, 53–58.
Porta, C., Spall, V. E., Loveland, J., Johnson, J. E., Barker, P. J., and Lomonossoff, G. P. (1994) Development of cowpea mosaic virus as a high-yielding system for the presentation of foreign peptides. Virology 202, 949–955.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Humana Press Inc.
About this protocol
Cite this protocol
Koonin, E.V., Mushegian, A.R., Dolja, V.V. (1998). Computer Analysis of Amino Acid Sequences. In: Foster, G.D., Taylor, S.C. (eds) Plant Virology Protocols. Methods in Molecular Biology™, vol 81. Humana Press. https://doi.org/10.1385/0-89603-385-6:319
Download citation
DOI: https://doi.org/10.1385/0-89603-385-6:319
Publisher Name: Humana Press
Print ISBN: 978-0-89603-385-6
Online ISBN: 978-1-59259-566-2
eBook Packages: Springer Protocols