Skip to main content

Protein Sequence Analysis and Domain Identification

  • Protocol
The Proteomics Protocols Handbook

Part of the book series: Springer Protocols Handbooks ((SPH))

Abstract

The fundamental unit of protein structure is the domain, defined as a region or regions of a polypeptide that folds independently and possesses a hydrophobic core with a hydrophilic exterior (see Note 1). Domains, particularly those with enzymatic activities, may possess functions independently of whether they are present in isolation or are part of a larger multidomain protein. Other domains confer regulatory and specificity properties to multidomain proteins, usually via the provision of binding sites. Because the majority of eukaryotic proteins, and a large number of eubacterial and archaeal proteins, are multidomain in character, the determination of the structures and functions of these proteins requires detailed consideration of their domain architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Doolittle, R. F. (1995) The multiplicity of domains in proteins. Annu. Rev. Biochem. 64, 287–314.

    Article  PubMed  CAS  Google Scholar 

  2. Ponting, C. P. and Russell, R. B. (2002) The natural history of protein domains. Annu. Rev. Biophys. Biomol. Struct. 31,45–71.

    Article  PubMed  CAS  Google Scholar 

  3. Mathe, C., Sagot, M. F., Schiex, T., and Rouze, P. (2002) Current methods of gene prediction, their strengths and weaknesses. Nucl. Acids Res. 30, 4103–4117.

    Article  PubMed  CAS  Google Scholar 

  4. Bork, P. and Gibson, T. J. (1996) Applying motif and profile searches. Methods Enzymol. 266, 162–184.

    Article  PubMed  CAS  Google Scholar 

  5. Ponting, C. P., Schultz, J., Copley, R. R., Andrade, M. A., and Bork, P. (2000) Evolution of domain families. Adv. Prot. Chem. 54, 185–244.

    Article  CAS  Google Scholar 

  6. Jonassen, I. (2000) Discovering patterns conserved in sets of unaligned protein sequences. Methods Mol. Biol. 143, 33–52.

    PubMed  CAS  Google Scholar 

  7. Karlin, S. and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.

    Article  PubMed  CAS  Google Scholar 

  8. Pearson, W. R. and Miller, W. (1992) Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 210, 575–601.

    Article  PubMed  CAS  Google Scholar 

  9. Lupas, A. (1996) Coiled coils: new structures and new functions. Trends Biochem. Sci. 21, 375–382.

    PubMed  CAS  Google Scholar 

  10. Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nat. Genet. 6, 119–129.

    Article  PubMed  CAS  Google Scholar 

  11. Altschul, S. F., Madden, T. L., Schäffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  12. Pearson, W. R. (1991) Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11, 635–650.

    Article  Google Scholar 

  13. Wootton, J. C. and Federhen, S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266, 554–571.

    Article  PubMed  CAS  Google Scholar 

  14. Schäffer, A. A., Wolf, Y. I., Ponting, C. P., Koonin, E. V., Aravind, L., and Altschul, S. F. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15, 1000–1011.

    Article  PubMed  Google Scholar 

  15. Birney, E., Thompson, J.D., and Gibson, T. J. (1996) PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids Res. 24, 2730–2739.

    Article  PubMed  CAS  Google Scholar 

  16. Birney, E. and Durbin, R. (2000) Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548.

    Article  PubMed  CAS  Google Scholar 

  17. Russell, R. B. (1994) Domain insertion. Protein Eng. 7, 1407–1410.

    Article  PubMed  CAS  Google Scholar 

  18. Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.

    Article  PubMed  CAS  Google Scholar 

  19. Tatusov, R. L., Altschul, S. F., and Koonin, E. V. (1994) Detection of conserved segments in proteins: iterative scanning or sequence databases with alignment blocks. Proc. Natl. Acad. Sci. USA 91, 12,091–12,095.

    Article  PubMed  CAS  Google Scholar 

  20. Dickens, N. J. and Ponting, C. P. (2003) THoR: a tool for domain discovery and curation of multiple alignments. Genome Biol. 4, R52.

    Article  PubMed  Google Scholar 

  21. Ponting, C. P., Bork, P., Schultz, J., and Aravind, L. (1999) No Sec7-homology domain in guanine-nucleotide-exchange factors that act on Ras and Rho. Trends Biochem. Sci. 24, 177–178.

    Article  PubMed  CAS  Google Scholar 

  22. Barnes. M. R., Russell, R. B., Copley, R. R., et al. (1999) A lipid-binding domain in Wnt: a case of mistaken identity? Current Biol. 9, R717–R718.

    Article  CAS  Google Scholar 

  23. Copley, R. R., Ponting, C. P., and Bork, P. (1999) Phospholipases A2 and Wnts are unlikely to share a common ancestor. Current Biol. 9, R718.

    Article  CAS  Google Scholar 

  24. Fitch, W. M. (1970) Distinguishing homologues from analogous proteins. Syst. Zool. 19, 99–113.

    Article  Google Scholar 

  25. Fitch, W. M. (1995) Uses for evolutionary trees. Philos. Trans. R. Soc. Lond. B Biol.Sci. 349, 93–102.

    Article  Google Scholar 

  26. Ponting, C. P. (2001) Issues in predicting protein function from sequence. Brief. Bioinform. 2, 19–29.

    Article  PubMed  CAS  Google Scholar 

  27. Mott, R. (1992) Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol 54, 59–75.

    Google Scholar 

  28. Altschul, S. F. and Gish, W. (1996) Local alignments statistics. Methods Enzymol. 266, 460–480.

    Article  PubMed  CAS  Google Scholar 

  29. Krogh, A., Brown, M., Mian, I. S., Sjolander, K., and Haussler, D. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235, 1501–1531.

    Article  PubMed  CAS  Google Scholar 

  30. Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919.

    Article  PubMed  CAS  Google Scholar 

  31. Benner, S. A., Cohen, M. A., and Gonnet, G. H. (1994) Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng. 7, 1323–1332.

    Article  PubMed  CAS  Google Scholar 

  32. Brenner, S. E., Chothia, C., and Hubbard, T. J. P. (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95, 6073–6078.

    Article  PubMed  CAS  Google Scholar 

  33. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.

    Article  PubMed  CAS  Google Scholar 

  34. Gribskov, M. and Veretnik, S. (1996) Identification of sequence pattern with profile analysis. Methods Enzymol. 266, 198–212.

    Article  PubMed  CAS  Google Scholar 

  35. Karplus, K. (1995) Evaluating regularizers of estimating distributions of amino acids. ISMB 3, 188–196.

    PubMed  CAS  Google Scholar 

  36. Lindqvist, Y. and Schneider, G. (1997) Circular permutations of natural protein sequences: structural evidence. Curr. Opin. Struct. Biol. 7, 422–427.

    Article  PubMed  CAS  Google Scholar 

  37. Uliel, S., Fliess, A., Amir, A., and Unger, R. (1999) A simple algorithm for detecting circular permutations in proteins. Bioinformatics 15, 930–936.

    Article  PubMed  CAS  Google Scholar 

  38. Weimbs, T., Low, S. H., Chapin, S. J., Mostov, K. E., Bucher, P., and Hofmann, K. (1997) A conserved domain is present in different families of vesicular fusion proteins: a new superfamily. Proc. Natl. Acad. Sci. USA 94, 3046–3051.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Humana Press Inc., Totowa, NJ

About this protocol

Cite this protocol

Ponting, C.P., Birney, E. (2005). Protein Sequence Analysis and Domain Identification. In: Walker, J.M. (eds) The Proteomics Protocols Handbook. Springer Protocols Handbooks. Humana Press. https://doi.org/10.1385/1-59259-890-0:527

Download citation

  • DOI: https://doi.org/10.1385/1-59259-890-0:527

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-343-5

  • Online ISBN: 978-1-59259-890-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics