Skip to main content

Computer Analysis of Amino Acid Sequences

The Case of Plant Virus Capsid Proteins

  • Protocol
Plant Virology Protocols

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 81))

Abstract

Genome sequences are of minimal use without an adequate interpretation of the sequences of putative protein products, which is only possible on the basis of detailed computer analysis. The approaches to amino acid sequence analysis can be roughly divided into those that explore intrinsic properties of proteins, such as hydropathy, secondary structure, distribution of different types of amino acid sequences, and so on, and those that search for sequence similarity. Both approaches include numerous algorithms and computer programs. In this short chapter, we cannot describe all or even the most widely used and valuable of these methods. Instead, we present a minimal set of procedures that, in our experience, is useful in order to extract a substantial amount of information from an amino acid sequence in a relatively short time. For detailed descriptions of various computer methods for sequence analysis, the reader is referred to the recently published reviews and Methods in Enzymology collections (13).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    *

References

  1. Doolittle, R. F., ed. (1990) Molecular evolution, Methods in Enzymology, 183.

    Google Scholar 

  2. Doolittle, R. F., ed. (1996) Molecular evolution, Methods in Enzymology, 226.

    Google Scholar 

  3. Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nature Genetics 6, 119–129.

    Article  PubMed  CAS  Google Scholar 

  4. Doolittle, R. F., Feng, D. F., Johnson, M. S., and McClure, M. A. (1989) Origins and evolutionary relationships of retroviruses. Q. Rev. Biol. 64, 1–30.

    Article  PubMed  CAS  Google Scholar 

  5. Koonin, E. V. (1991) The phylogeny of RNA-dependent RNA polymerases of positivestrand RNA viruses. J. Gen. Virol. 72, 2197–2206.

    Article  PubMed  Google Scholar 

  6. Koonin, E. V. and Dolja, V. V. (1993) Evolution and taxonomy of positive-strand RNA viruses: implications of comparative analysis of amino acid sequences. Crit. Rev. Biochem. Mol. Biol. 28, 375–430.

    Article  PubMed  CAS  Google Scholar 

  7. Karlin, S., Bucher, P., Brendel, V., and Altschul, S. F. (1991) Statistical methods and insights for protein and DNA sequences. Annu. Rev. Biophys. Biophys. Chem. 20, 175–203.

    Article  PubMed  CAS  Google Scholar 

  8. Karlin, S., Brendel, V., and Bucher, P. (1992) Significant similarity and dissimilarity in homologous proteins. Mol. Biol. Evol. 9, 152–167.

    PubMed  CAS  Google Scholar 

  9. Brendel, V., Bucher, P., Nourbakhsh, I. R., Blaisdell, B. E., and Karlin, S. (1992) Methods and algorithms for statistical analysis of protein sequences. Proc. Natl. Acad. Sci. USA 89, 2002–2006.

    Article  PubMed  CAS  Google Scholar 

  10. Wootton, J. C. (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 18, 269–285.

    Article  PubMed  CAS  Google Scholar 

  11. Wootton, J. C. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and in sequence databases. Comput. Chem. 17, 149–163.

    Article  CAS  Google Scholar 

  12. Wootton, J. C. (1994) Sequences with “unusual” amino acid compositions. Curr. Op. Struct. Biol. 4, 413–421.

    Article  CAS  Google Scholar 

  13. Chou, P. Y. and Fasman, G. D. (1978) Empirical predictions of protein conformation. Annu. Rev. Biochem. 47, 251–276.

    Article  PubMed  CAS  Google Scholar 

  14. Gamier, J., Osguthorpe, D. J., and Robson, B. (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120, 97–120.

    Article  Google Scholar 

  15. Rost, B. and Sander, C. Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599.

    Google Scholar 

  16. Rost, B. and Sander, C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19, 55–72.

    Article  PubMed  CAS  Google Scholar 

  17. Rost, B., Sander, C., and Schneider, R. (1994) PHD—an automatic mail server for protein secondary structure prediction. Comput. Appl. Biosci. 10, 53–60.

    PubMed  CAS  Google Scholar 

  18. Kyte, J. and Doolittle, R. F. (1982) A simple method for displaying an hydropathic character of a protein. J. Mol. Biol. 157, 105–132.

    Article  PubMed  CAS  Google Scholar 

  19. Hopp, T. P. and Woods, K. R. (1983) A computer program for predicting protein antigenic determinants. Mol. Immunol. 20, 483–489.

    Article  PubMed  CAS  Google Scholar 

  20. Benson, D. A., Boguski, M., Lipman, D. J., and Ostell, J. (1994) GenBank. Nucleic Acids Res. 22, 3441–3444.

    Article  PubMed  CAS  Google Scholar 

  21. Emmert, D. B., Stoehr, P. J., Stoesser, G., and Cameron, G. N. (1994) The European Bioinformatics Institute (EBI) databases. Nucleic Acids Res. 22, 3445–3449.

    Article  PubMed  CAS  Google Scholar 

  22. Bairoch, A. and Boeckmann, B. (1994) The S,WISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 22, 3578–3580.

    Article  PubMed  CAS  Google Scholar 

  23. George, D. G., Barker, W. C., Mewes, H. W., Pfeiffer, F., and Tsugita, A. (1994) The PIRInternationaL Protein Sequence Database. Nucleic Acids Res. 22, 3569–3573.

    PubMed  CAS  Google Scholar 

  24. Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.

    Article  PubMed  CAS  Google Scholar 

  25. Lipman, D. J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches. Science 227, 1435–1441.

    Article  PubMed  CAS  Google Scholar 

  26. Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.

    Article  PubMed  CAS  Google Scholar 

  27. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

    PubMed  CAS  Google Scholar 

  28. Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.

    Article  PubMed  CAS  Google Scholar 

  29. Sturrock, S. S. and and Collins, J. F. (1993) MPsrch version 1. 3. Biocomputing Research Unit, University of Edinburgh, UK.

    Google Scholar 

  30. Waterman, M. S. and Vingron M. (1994) Rapid and accurate estimates of statistical significance for sequence database leaches. Proc. Natl. Acad. Sci. USA 91, 4625–4628.

    Article  PubMed  CAS  Google Scholar 

  31. Karlin, S. and Altschul, S. F. (1993) Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. USA 90, 5873–5877.

    Article  PubMed  CAS  Google Scholar 

  32. Karlin, S. and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using genera] scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.

    Article  PubMed  CAS  Google Scholar 

  33. Pearson, W. R. and Miller, W. (1992) Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 210, 575–601.

    Article  PubMed  CAS  Google Scholar 

  34. Koonin, E. V. and Tatusov, R. L. (1994) Computer analysis of bacterial dehalogenases defines a large superfamily of hydrolases with diverse specificity. J. Mol. Biol. 245, 125–132.

    Article  Google Scholar 

  35. Tatusov, R. L. and Koonin, E. V. (1994) A simple tool to search for sequence motifs that are conserved in BLAST outputs. Comp. Appl. Biosci. 10, 457–459.

    PubMed  CAS  Google Scholar 

  36. Bairoch, A. and Bucher, P. (1994) PROSIT],: recent developments. Nucleic Acids Res. 22, 3583–3589.

    Article  PubMed  CAS  Google Scholar 

  37. Dolja, V. V., Boyko, V. P., Agranovsky, A. A., and Koonin, E. V. (1991) Phylogeny of capsid proteins of rod-shaped and filamentous plant RNA viruses: two families with distinct patterns of sequence and probably structure conservation. Virology 184, 79–86.

    Article  PubMed  CAS  Google Scholar 

  38. McClure, M. A., Vasi, T. K., and Fitch, W. M. (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol. Biol. Evol. 11, 571–592.

    PubMed  CAS  Google Scholar 

  39. Schuler, G. D., Altschul, S. F., and Lipman, D. J. (1991) A workbench for multiple alignment construction and analysis. Proteins 9, 180–190.

    Article  PubMed  CAS  Google Scholar 

  40. Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358.

    Article  PubMed  CAS  Google Scholar 

  41. Gribskov, M., Luthy, R., and Eisenberg, D. (1990) Profile analysis. Methods Enzymol. 183, 146–159.

    Article  PubMed  CAS  Google Scholar 

  42. Gribskov, M. (1994) Profile analysis. Methods Mol. Biol. 25, 247–266.

    PubMed  CAS  Google Scholar 

  43. Tatusov, R. L., Altschul, S. F., and Koonin, E. V. (1994) Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc. Natl. Acad. Sci. USA 91, 12,091–12,095.

    Article  PubMed  CAS  Google Scholar 

  44. Brown, M., Hughey, R., Krogh, A., Mian, I. S., Sjolander, K., and Haussler, D. (1993) Using Dirichlet mixture priors to derive hidden Markov models for protein families, in Proceedings of First International Conference on Intelligent Systems for Molecular Biology (Hunter, L., Searls, D., Shavlik, J., eds.), AAAI, Menlo Park, A, pp. 47–55.

    Google Scholar 

  45. Dolja, V. V. and Koonin, E. V. (1991) Phylogeny of capsid proteins of small icosahedral RNA plant viruses. J. Gen. Virol. 72, 1481–1486.

    Article  PubMed  Google Scholar 

  46. Rossmann, M. G. and Johnson, J. E. (1989) Icosahedral RNA virus structure. Annu. Rev. Biochem. 58, 533–573.

    Article  PubMed  CAS  Google Scholar 

  47. Chelvanayagam, G., Heringa, J., and Argos, P. (1992) Anatomy and evolution of proteins displaying the viral capsid jellyroll topology. J. Mol. Biol. 228, 220–242.

    Article  PubMed  CAS  Google Scholar 

  48. Kikuno, R., Toh, H., Hayashida, H., and Miyata, T. (1984) Sequence similarity between putative gene products of geminiviral DNAs. Nature 308, 562.

    Article  PubMed  CAS  Google Scholar 

  49. Turpen, T. H., Reinl, S. J., Charoenvit, Y., Hoffmann, S. L., Fallarme, V., and Grill, L. K. (1995) Malarial epitopes expressed on the surface of recombinant tobacco mosaic vrus. Bio/Technology 13, 53–58.

    Article  PubMed  CAS  Google Scholar 

  50. Porta, C., Spall, V. E., Loveland, J., Johnson, J. E., Barker, P. J., and Lomonossoff, G. P. (1994) Development of cowpea mosaic virus as a high-yielding system for the presentation of foreign peptides. Virology 202, 949–955.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Humana Press Inc.

About this protocol

Cite this protocol

Koonin, E.V., Mushegian, A.R., Dolja, V.V. (1998). Computer Analysis of Amino Acid Sequences. In: Foster, G.D., Taylor, S.C. (eds) Plant Virology Protocols. Methods in Molecular Biology™, vol 81. Humana Press. https://doi.org/10.1385/0-89603-385-6:319

Download citation

  • DOI: https://doi.org/10.1385/0-89603-385-6:319

  • Publisher Name: Humana Press

  • Print ISBN: 978-0-89603-385-6

  • Online ISBN: 978-1-59259-566-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics