Protein Sequence Analysis and Domain Identification

  • Chris P. Ponting
  • Ewan Birney
Part of the Springer Protocols Handbooks book series (SPH)


The fundamental unit of protein structure is the domain, defined as a region or regions of a polypeptide that folds independently and possesses a hydrophobic core with a hydrophilic exterior (see Note 1). Domains, particularly those with enzymatic activities, may possess functions independently of whether they are present in isolation or are part of a larger multidomain protein. Other domains confer regulatory and specificity properties to multidomain proteins, usually via the provision of binding sites. Because the majority of eukaryotic proteins, and a large number of eubacterial and archaeal proteins, are multidomain in character, the determination of the structures and functions of these proteins requires detailed consideration of their domain architectures.


Multiple Alignment Basic Local Alignment Search Tool Coiled Coil Basic Local Alignment Search Tool Search Extreme Value Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Doolittle, R. F. (1995) The multiplicity of domains in proteins. Annu. Rev. Biochem. 64, 287–314.PubMedCrossRefGoogle Scholar
  2. 2.
    Ponting, C. P. and Russell, R. B. (2002) The natural history of protein domains. Annu. Rev. Biophys. Biomol. Struct. 31,45–71.PubMedCrossRefGoogle Scholar
  3. 3.
    Mathe, C., Sagot, M. F., Schiex, T., and Rouze, P. (2002) Current methods of gene prediction, their strengths and weaknesses. Nucl. Acids Res. 30, 4103–4117.PubMedCrossRefGoogle Scholar
  4. 4.
    Bork, P. and Gibson, T. J. (1996) Applying motif and profile searches. Methods Enzymol. 266, 162–184.PubMedCrossRefGoogle Scholar
  5. 5.
    Ponting, C. P., Schultz, J., Copley, R. R., Andrade, M. A., and Bork, P. (2000) Evolution of domain families. Adv. Prot. Chem. 54, 185–244.CrossRefGoogle Scholar
  6. 6.
    Jonassen, I. (2000) Discovering patterns conserved in sets of unaligned protein sequences. Methods Mol. Biol. 143, 33–52.PubMedGoogle Scholar
  7. 7.
    Karlin, S. and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.PubMedCrossRefGoogle Scholar
  8. 8.
    Pearson, W. R. and Miller, W. (1992) Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 210, 575–601.PubMedCrossRefGoogle Scholar
  9. 9.
    Lupas, A. (1996) Coiled coils: new structures and new functions. Trends Biochem. Sci. 21, 375–382.PubMedGoogle Scholar
  10. 10.
    Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nat. Genet. 6, 119–129.PubMedCrossRefGoogle Scholar
  11. 11.
    Altschul, S. F., Madden, T. L., Schäffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402.PubMedCrossRefGoogle Scholar
  12. 12.
    Pearson, W. R. (1991) Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11, 635–650.CrossRefGoogle Scholar
  13. 13.
    Wootton, J. C. and Federhen, S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266, 554–571.PubMedCrossRefGoogle Scholar
  14. 14.
    Schäffer, A. A., Wolf, Y. I., Ponting, C. P., Koonin, E. V., Aravind, L., and Altschul, S. F. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15, 1000–1011.PubMedCrossRefGoogle Scholar
  15. 15.
    Birney, E., Thompson, J.D., and Gibson, T. J. (1996) PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids Res. 24, 2730–2739.PubMedCrossRefGoogle Scholar
  16. 16.
    Birney, E. and Durbin, R. (2000) Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548.PubMedCrossRefGoogle Scholar
  17. 17.
    Russell, R. B. (1994) Domain insertion. Protein Eng. 7, 1407–1410.PubMedCrossRefGoogle Scholar
  18. 18.
    Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.PubMedCrossRefGoogle Scholar
  19. 19.
    Tatusov, R. L., Altschul, S. F., and Koonin, E. V. (1994) Detection of conserved segments in proteins: iterative scanning or sequence databases with alignment blocks. Proc. Natl. Acad. Sci. USA 91, 12,091–12,095.PubMedCrossRefGoogle Scholar
  20. 20.
    Dickens, N. J. and Ponting, C. P. (2003) THoR: a tool for domain discovery and curation of multiple alignments. Genome Biol. 4, R52.PubMedCrossRefGoogle Scholar
  21. 21.
    Ponting, C. P., Bork, P., Schultz, J., and Aravind, L. (1999) No Sec7-homology domain in guanine-nucleotide-exchange factors that act on Ras and Rho. Trends Biochem. Sci. 24, 177–178.PubMedCrossRefGoogle Scholar
  22. 22.
    Barnes. M. R., Russell, R. B., Copley, R. R., et al. (1999) A lipid-binding domain in Wnt: a case of mistaken identity? Current Biol. 9, R717–R718.CrossRefGoogle Scholar
  23. 23.
    Copley, R. R., Ponting, C. P., and Bork, P. (1999) Phospholipases A2 and Wnts are unlikely to share a common ancestor. Current Biol. 9, R718.CrossRefGoogle Scholar
  24. 24.
    Fitch, W. M. (1970) Distinguishing homologues from analogous proteins. Syst. Zool. 19, 99–113.CrossRefGoogle Scholar
  25. 25.
    Fitch, W. M. (1995) Uses for evolutionary trees. Philos. Trans. R. Soc. Lond. B Biol.Sci. 349, 93–102.CrossRefGoogle Scholar
  26. 26.
    Ponting, C. P. (2001) Issues in predicting protein function from sequence. Brief. Bioinform. 2, 19–29.PubMedCrossRefGoogle Scholar
  27. 27.
    Mott, R. (1992) Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol 54, 59–75.Google Scholar
  28. 28.
    Altschul, S. F. and Gish, W. (1996) Local alignments statistics. Methods Enzymol. 266, 460–480.PubMedCrossRefGoogle Scholar
  29. 29.
    Krogh, A., Brown, M., Mian, I. S., Sjolander, K., and Haussler, D. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235, 1501–1531.PubMedCrossRefGoogle Scholar
  30. 30.
    Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919.PubMedCrossRefGoogle Scholar
  31. 31.
    Benner, S. A., Cohen, M. A., and Gonnet, G. H. (1994) Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng. 7, 1323–1332.PubMedCrossRefGoogle Scholar
  32. 32.
    Brenner, S. E., Chothia, C., and Hubbard, T. J. P. (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95, 6073–6078.PubMedCrossRefGoogle Scholar
  33. 33.
    Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.PubMedCrossRefGoogle Scholar
  34. 34.
    Gribskov, M. and Veretnik, S. (1996) Identification of sequence pattern with profile analysis. Methods Enzymol. 266, 198–212.PubMedCrossRefGoogle Scholar
  35. 35.
    Karplus, K. (1995) Evaluating regularizers of estimating distributions of amino acids. ISMB 3, 188–196.PubMedGoogle Scholar
  36. 36.
    Lindqvist, Y. and Schneider, G. (1997) Circular permutations of natural protein sequences: structural evidence. Curr. Opin. Struct. Biol. 7, 422–427.PubMedCrossRefGoogle Scholar
  37. 37.
    Uliel, S., Fliess, A., Amir, A., and Unger, R. (1999) A simple algorithm for detecting circular permutations in proteins. Bioinformatics 15, 930–936.PubMedCrossRefGoogle Scholar
  38. 38.
    Weimbs, T., Low, S. H., Chapin, S. J., Mostov, K. E., Bucher, P., and Hofmann, K. (1997) A conserved domain is present in different families of vesicular fusion proteins: a new superfamily. Proc. Natl. Acad. Sci. USA 94, 3046–3051.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press Inc., Totowa, NJ 2005

Authors and Affiliations

  • Chris P. Ponting
    • 1
  • Ewan Birney
    • 2
  1. 1.MRC Functional Genetics Unit University of Oxford, Department of Human Anatomy and GeneticsOxfordUK
  2. 2.European Bioinformatics InstituteCambridgeUK

Personalised recommendations