Skip to main content

Bioinformatics in protein analysis

  • Chapter
Proteomics in Functional Genomics

Part of the book series: EXS ((EXS,volume 88))

Summary

The chapter gives an overview of bioinformatic techniques of importance in protein analysis. These include database searches, sequence comparisons and structural predictions. Links to useful World Wide Web (WWW) pages are given in relation to each topic.

Databases with biological information are reviewed with emphasis on databases for nucleotide sequences (EMBL, GenBank, DDBJ), genomes, amino acid sequences (Swissprot, PIR, TrEMBL, GenePept), and three-dimensional structures (PDB). Integrated user interfaces for databases (SRS and Entrez) are described. An introduction to databases of sequence patterns and protein families is also given (Prosite, Pfam, Blocks).

Furthermore, the chapter describes the widespread methods for sequence comparisons, FASTA and BLAST, and the corresponding WWW services. The techniques involving multiple sequence alignments are also reviewed: alignment creation with the Clustal programs, phylogenetic tree calculation with the Clustal or Phylip packages and tree display using Drawtree, njplot or phylo_win.

Finally, the chapter also treats the issue of structural prediction. Different methods for secondary structure predictions are described (Chou-Fasman, Garnier-Osguthorpe-Robson, Predator, PHD). Techniques for predicting membrane proteins, antigenic sites and postranslational modifications are also reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Stoesser G, Moseley MA, Sleep J, McGowran M, Garciapastor M, Sterk P (1998) The EMBL Nucleotide Sequence Database. Nucleic Acids Res 26: 8–15

    Article  PubMed  CAS  Google Scholar 

  2. Benson DA, Boguski MS, Lipman DJ, Ostell J, Ouellette BFF (1998) GenBank. Nucleic Acids Res 26: 1–7

    Article  PubMed  CAS  Google Scholar 

  3. Aaronson JS, Eckman B, Blevins RA, Borkowski JA, Myerson J, Imran S, Elliston KO (1996) Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. Genome Res 6: 829–845

    Article  PubMed  CAS  Google Scholar 

  4. Hillier LD, Lennon G, Becker M, Bonaldo MF, Chiapelli B, Chissoe S, Dietrich N, DuBuque T, Favello A, Gish W et al (1996) Generation and analysis of 280,000 human expressed sequence tags. Genome Res 6: 807–828

    Article  PubMed  CAS  Google Scholar 

  5. Hudson TJ, Stein LD, Gerety SS, Ma J, Castle AB, Silva J, Slonim DK, Baptista R, Kruglyak L, Xu SH et al (1995) An STS-based map of the human genome. Science 270: 1945–1954

    Article  PubMed  CAS  Google Scholar 

  6. Fleischmann RD, Adams MD, White 0, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496–512

    Article  PubMed  CAS  Google Scholar 

  7. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M et al (1996) Life with 6000 genes. Science 274: 563–567

    Article  Google Scholar 

  8. Bairoch A, Apweiler R (1998) The SWISS-PROT Protein Sequence Data Bank and Its Supplement TrEMBL in 1998. Nucleic Acids Res 26: 38–42

    Article  PubMed  CAS  Google Scholar 

  9. Hoogland C, Sanchez JC, Tonella L, Bairoch A, Hochstrasser DF, Appel RD (1998) Current Status of the Swiss-2D-PAGE Database. Nucleic Acids Res 26: 332–333

    Article  PubMed  CAS  Google Scholar 

  10. Barker WC, Garavelli JS, Haft DH, Hunt LT, Marzec CR, Orcutt BC, Srinivasarao GY, Yeh LSL, Ledley RS, Mewes HW et al (1998) The PIR-International Protein Sequence Database. Nucleic Acids Res 26: 27–32

    Article  PubMed  CAS  Google Scholar 

  11. Dayhoff MO (eds) (1965) Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Spring, Maryland

    Google Scholar 

  12. a Kallberg Y, Persson B (1999) KIND — a non-redundant protein database. Bioinformatics 15: 260–261

    Article  PubMed  CAS  Google Scholar 

  13. Abola EE, Bernstein FC, Bryant SH, Koetzle TF, Weng J (1987) Crystallographic Databases. Data Commission of the International Union of Crystallography, Bonn/Cambridge/ Chester, 107–132

    Google Scholar 

  14. Etzold T, Argos P (1993) SRS: an indexing and retrieval tool for flat file data libraries. Computer Applications in the Biosciences 9: 49–57

    PubMed  CAS  Google Scholar 

  15. Bairoch A, Bucher P, Hofmann K (1997) The PROSITE database, its status in 1997. Nucleic Acids Res 25: 217–221

    Article  PubMed  CAS  Google Scholar 

  16. Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28: 405–420

    Article  PubMed  CAS  Google Scholar 

  17. Henikoff S, HenikoffJG (1994) Protein family classification based on searching a database of blocks. Genomics 19: 97–107

    Article  PubMed  CAS  Google Scholar 

  18. Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85: 2444–2448

    Article  PubMed  CAS  Google Scholar 

  19. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443–453

    Article  PubMed  CAS  Google Scholar 

  20. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147: 195–197

    Article  PubMed  CAS  Google Scholar 

  21. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402

    Article  PubMed  CAS  Google Scholar 

  22. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4672–4680

    Article  Google Scholar 

  23. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25: 4876–4882

    Article  PubMed  CAS  Google Scholar 

  24. Felsenstein J (1996) Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol 266: 418–427

    Article  PubMed  CAS  Google Scholar 

  25. Perriere G, Gouy M (1996) WWW-query: an on-line retrieval system for biological sequence banks. Biochemie 78: 364–369

    Article  CAS  Google Scholar 

  26. Galtier N, Gouy M, Gautier C (1996) SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. ComputAppl Biosci 12: 543–548

    CAS  Google Scholar 

  27. Chou PY, Fasman GD (1978) Prediction of the secondary structure of proteins from their amino acid sequences. Adv Enzym 47: 45–148

    CAS  Google Scholar 

  28. Gamier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120: 97–120

    Article  Google Scholar 

  29. Persson B, Krook M, Jot-mall H (1991) Characteristics of short-chain alcohol dehydrogenases and related enzymes. Eur J Biochem 200: 537–543

    Article  PubMed  CAS  Google Scholar 

  30. Levin JM, Pascarella S, Argos P, Gamier J (1993) Quantification of secondary structure prediction improvement using multiple alignments. Protein Engineering 6: 849–854

    Article  PubMed  CAS  Google Scholar 

  31. Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27: 329–335

    Article  PubMed  CAS  Google Scholar 

  32. Rost B, Sander C, Schneider R (1994) PHD: an automatic mail server for protein secondary structure prediction. ComputAppl Biosci 10: 53–60

    CAS  Google Scholar 

  33. Rost B, Sander C (1995) Progress of 1D protein structure prediction at last. Proteins 23: 295–300

    Article  PubMed  CAS  Google Scholar 

  34. Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Nail Acad Sci USA 78: 3824–3828

    Article  CAS  Google Scholar 

  35. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157: 105–132

    Article  PubMed  CAS  Google Scholar 

  36. Degli Esposti M, Crimi M, Venturoli G (1990) A critical evaluation of the hydropathy profile of membrane proteins. Eur J Biochem 190: 207–219

    Article  PubMed  CAS  Google Scholar 

  37. von Heijne G (1986) The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans-membrane topology. EMBO J 5: 3021–3027

    PubMed  CAS  Google Scholar 

  38. von Heijne G (1992) Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol 255: 487–494

    Article  Google Scholar 

  39. Persson B, Argos P (1994) Prediction of transmembrane segments in proteins utilising multiple sequence alignments. J Mol Biol 237: 182–192

    Article  PubMed  CAS  Google Scholar 

  40. Persson B, Argos P (1996) Topology prediction of membrane proteins. Protein Sci 5: 363–371

    PubMed  CAS  Google Scholar 

  41. Rost B, Casadio R, Fariselli P, Sander C (1995) Transmembrane helices predicted at 95% accuracy. Protein Sci 4: 521–533

    Article  PubMed  CAS  Google Scholar 

  42. Jameson BA, Wolf H (1988) The antigenic index: a novel algorithm for predicting antigenic determinants. Comput Appl Biosci 4: 181–186

    PubMed  CAS  Google Scholar 

  43. Persson B, Flinta C, von Heijne G, Jornvall H (1985) Structures of N-terminally acetylated proteins. Eur J Biochem 152: 523–527

    Article  PubMed  CAS  Google Scholar 

  44. Eisenhaber F, Persson B, Argos P (1995) Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Critical Reviews in Biochemistry and Molecular Biology 30: 1–94

    Article  CAS  Google Scholar 

  45. Han KK, Martinage A (1992) Possible relationship between coding recognition amino acid sequence motif or residue(s) and post-translational chemical modification of proteins. Int J Biochem 24: 1349–1363

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Basel AG

About this chapter

Cite this chapter

Persson, B. (2000). Bioinformatics in protein analysis. In: Jollès, P., Jörnvall, H. (eds) Proteomics in Functional Genomics. EXS, vol 88. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-8458-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-0348-8458-7_14

  • Publisher Name: Birkhäuser, Basel

  • Print ISBN: 978-3-0348-9576-7

  • Online ISBN: 978-3-0348-8458-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics