Bioinformatic Tools for Gene and Protein Sequence Analysis

  • Bernd H. A. Rehm
  • Frank Reinecke
Part of the Springer Protocols Handbooks book series (SPH)


The rapid development of efficient, automated DNA-sequencing methods has strongly advanced the genome-sequencing era, culminating in the determination of the entire human genome in 2001 (1,2). An enormous amount of DNA sequence data are available and databases still grow exponentially (see Fig. 1). Analysis of this overwhelming amount of data, including hundreds of genomes from both prokaryotes and eukaryotes, has given rise to the field of bioinformatics. Development of bioinformatic tools has evolved rapidly in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the best studied bacterium Escherichia coli more than 30‰ of the identified open reading frames (ORFs) represent hypothetical genes with no known function. Future challenges of genome-sequence analysis will include the understanding of diseases, gene regulation, and metabolic pathway reconstruction. In addition, a set of methods for protein analysis summarized under the term proteomics holds tremendous potential for biomedicine and biotechnology (141). The large number of bioinformatic tools that have been made available to scientists during the last few years has presented the problem of which to use and how best to obtain scientifically valid answers (3). In this chapter, we will provide a guide for the most efficient way to analyze a given sequence or to collect information regarding a gene, protein, structure, or interaction of interest by applying current publicly available software and databases that mainly use the World Wide Web. All links to services or download sites are given in the text or listed in Table 1; the succession of tools is briefly summarized in Fig. 2.


Alignment Score Unrooted Tree Remote Homolog Content Sensor PROSITE Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Venter, J. C. et al (2001) The sequence of the human genome. Science 291, 1304–1351.PubMedGoogle Scholar
  2. 2.
    Lander, E. S. et al (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.PubMedGoogle Scholar
  3. 3.
    Rehm B.H. (2001) Bioinformatic tools for DNA/protein sequence analysis, functional assignment of genes and protein classification. Appl. Microbiol. Biotechnol. 57, 579–592.PubMedGoogle Scholar
  4. 4.
    Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185.PubMedGoogle Scholar
  5. 5.
    Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194.PubMedGoogle Scholar
  6. 6.
    Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res. 9, 868–877.PubMedGoogle Scholar
  7. 7.
    Gordon, D., Abajian, C., and Green, P. (1998) Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202.PubMedGoogle Scholar
  8. 8.
    Staden, R. (1996) The Staden Sequence Analysis Package. Mol. Biotech. 5, 233–241.Google Scholar
  9. 9.
    Staden, R. (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 12, 505–519.PubMedGoogle Scholar
  10. 10.
    Claverie, J.-M. (1997) Computational methods for the identification of genes in vertebrate genomic sequences. Hum. Mol. Genet. 6, 1735–1744.PubMedGoogle Scholar
  11. 11.
    Guigo, R. (1997) Computational gene identification: an open problem. Comput. Chem. 21, 215–222.PubMedGoogle Scholar
  12. 12.
    Krogh, A. (1998) In Computational Methods in Molecular Biology (Salzberg, S. L., Searls, D., and Kasif, S., eds.), Elsevier, Amsterdam.Google Scholar
  13. 13.
    Krogh, A. (1998) In Guide to Human Genome Computing (Bishop, M. J., ed.), 2nd ed. Academic, New York, pp. 261–274.Google Scholar
  14. 14.
    Delcher, A. L., Harmon, D., Kasif, S., White, O., and Salzberg, S. L. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27, 4636–4641.PubMedGoogle Scholar
  15. 15.
    Guigo, R., Agarwal, P., Abril, J. F., Burset, M., and Fickett, J. W. (2000) An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642.PubMedGoogle Scholar
  16. 16.
    Krogh, A. (2000) Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Res. 10, 523–5PubMedGoogle Scholar
  17. 17.
    Shibuya, T. and Rigoutsos, I. (2002) Dictionary-driven prokaryotic gene finding. Nucleic Acids Res. 30, 2710–2725.PubMedGoogle Scholar
  18. 18.
    Pedersen, J. S. and Hein, J. (2003) Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19, 219–227.PubMedGoogle Scholar
  19. 19.
    Guo, F. B., Ou, H. Y., and Zhang, C. T. (2003) ZCURVE: a new system for recognizing proteincoding genes in bacterial and archaeal genomes. Nucleic Acids Res. 31, 1780–1789.PubMedGoogle Scholar
  20. 20.
    Larsen, T. S., Krogh, A. (2003) EasyGene—a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformat. 4, 21.Google Scholar
  21. 21.
    Gelfand, M. S. (1995) Prediction of function in DNA sequence analysis. J. Comput. Biol. 2, 87–115.PubMedGoogle Scholar
  22. 22.
    Sherriff, A. and Ott, J. (2001) Applications of neural networks for gene finding. Adv. Genet. 42, 287–297.PubMedGoogle Scholar
  23. 23.
    Fickett, J. W. (1996) Finding genes by computer: the state of the art. Trends Genet. 12, 316–320.PubMedGoogle Scholar
  24. 24.
    Zhang, C. T., Wang, J., and Zhang, R. (2002) Using a Euclid distance discriminant method to find protein coding genes in the yeast genome. Comput. Chem. 26, 195–206.PubMedGoogle Scholar
  25. 25.
    Bajic, V. B. and Seah, S. H. (2003) Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res. 13, 1923–1929.PubMedGoogle Scholar
  26. 26.
    Zhang, M. Q. (1998) Statistical features of human exons and their flanking regions. Hum. Mol. Genet. 7, 919–932.PubMedGoogle Scholar
  27. 27.
    Searls, D. B. (1992) The linguistics of DNA. Am. Sci. 80, 579–591.Google Scholar
  28. 28.
    Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic acids. Cambridge University Press, Cambridge.Google Scholar
  29. 29.
    Krogh, A., Mian, I. S., and Haussler, D. (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 22, 4768–4778.PubMedGoogle Scholar
  30. 30.
    Cole, S. T., Brosch, R., Parkhill, J., et al. (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537–544.PubMedGoogle Scholar
  31. 31.
    Thomas, A. and Skolnick, M. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol. 11, 149–160.PubMedGoogle Scholar
  32. 32.
    Henderson, J., Salzberg, S., and Fasman, K. (1997) Finding genes in DNA with a hidden Markov model. J. Comput. Biol. 4, 127–141.PubMedGoogle Scholar
  33. 33.
    Lukashin, A. V. and Borodovsky, M. (1998) GeneMark hmm: new solutions for gene finding. Nucleic Acids Res. 26, 1107–1115.PubMedGoogle Scholar
  34. 34.
    Salzberg, S. L., Pertea, M., Delcher, A. L., Gardner, M. J., and Tettelin, H. (1999) Interpolated Markov models for eukaryotic gene finding. Genomics 59, 24–31.PubMedGoogle Scholar
  35. 35.
    Badger, J. H. and Olsen, G. J. (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol. Biol. Evol. 16, 512–524.PubMedGoogle Scholar
  36. 36.
    Bocs, S., Cruveiller, S., Vallenet, D., Nuel, G., and Medigue, C. (2003) AMIGene: annotation of microbial genes. Nucleic Acids Res. 31, 3723–6.PubMedGoogle Scholar
  37. 37.
    Besemer, J., Lomsadze, A., and Borodovsky, M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29, 2607–2618.PubMedGoogle Scholar
  38. 38.
    Yeramian, E. and Jones, L. (2003) GeneFizz: a web tool to compare genetic (coding/non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives. Nucleic Acids Res. 31, 3843–3849.PubMedGoogle Scholar
  39. 39.
    Kotlar, D. and Lavner, Y. (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res. 13, 1930–1937.PubMedGoogle Scholar
  40. 40.
    Snyder, E. and Stormo, G. (1995) Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1–18.PubMedGoogle Scholar
  41. 41.
    Reese, M. G., Eeckman, F. H., Kulp, D., and Haussler, D. (1997) Improved splice site detection in Genie. J. Comput. Biol. 4, 311–323.PubMedGoogle Scholar
  42. 42.
    Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94.PubMedGoogle Scholar
  43. 43.
    Xu, Y. and Überbacher, E. C. (1997) Automated gene identification in large-scale genomic sequences. J. Comput. Biol. 4, 325–338.PubMedGoogle Scholar
  44. 44.
    Gelfand, M. S., Mironov, A. A., and Pevzner, P. A. (1996) Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93, 9061–9066.PubMedGoogle Scholar
  45. 45.
    Foissac, S., Bardou, P., Moisan, A., Cros, M. J., and Schiex, T. (2003) EUGENE’HOM: a generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res. 31, 3742–3745.PubMedGoogle Scholar
  46. 46.
    Smith, T. E. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.PubMedGoogle Scholar
  47. 47.
    Yada, T., Takagi, T., Totoki, Y., Sakaki, Y., and Takaeda Y. (2003) DIGIT: a novel gene finding program by combining gene-finders. Pac. Symp. Biocomput. 2003, 375–387.Google Scholar
  48. 48.
    Quandt, K., Frech, K., Karas, H., Wingender, E., and Werner, T. (1995) MatInd and MatInspector-new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23, 4878–4884.PubMedGoogle Scholar
  49. 49.
    Prestridge, D. S. (1991) SIGNAL SCAN: a computer program that scans DNA sequences for eukaryotic transcriptional elements. CABIOS 7, 203–206.PubMedGoogle Scholar
  50. 50.
    Wingender, E., Chen, X., Hehl, R., et al. (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319.PubMedGoogle Scholar
  51. 51.
    Prestridge, D. S. (1995) Predicting Pol II Promoter Sequences Using Transcription Factor Binding Sites. J. Mol. Biol. 249, 923–932.PubMedGoogle Scholar
  52. 52.
    Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.PubMedGoogle Scholar
  53. 53.
    Eddy, S. R. (1998) Profile hidden Markov models. Bioinformatics 14, 755–763.PubMedGoogle Scholar
  54. 54.
    Baldi, R. and Brunak, S. (1998) Bioinformatics: The Machine Learning Approach. MIT Press, Boston, MA.Google Scholar
  55. 55.
    Korenberg, M. J., David, R., Hunter, I. W., and Solomon, J. E. (2000) Automatic classification of protein sequences into structure/function groups via parallel cascade identification: a feasibility study. Ann. Biomed. Eng. 28, 803–811.PubMedGoogle Scholar
  56. 56.
    Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.PubMedGoogle Scholar
  57. 57.
    Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, D. G. (1997) The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882.PubMedGoogle Scholar
  58. 58.
    Nicholas, K. B., Nicholas, H. B., Jr., and Deerfield, D. W., II. (1997) GeneDoc: analysis and visualization of genetic variation. EMBNEW.NEWS 4, 14.Google Scholar
  59. 59.
    Lake, J. A. (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc. Natl. Acad. Sci. USA 91, 1451–1459.Google Scholar
  60. 60.
    Lockhart, P. J., Steel, M. A., Hendy, M. D., and Penny, D. (1994) Recovering evolutionary trees under a more realistic model of sequence. Mol. Biol. Evol. 11, 605–612.PubMedGoogle Scholar
  61. 61.
    Brocchieri, L. (2001) Phylogenetic inferences from molecular sequences: review and critique. Theor. Popul. Biol. 59, 27–40.PubMedGoogle Scholar
  62. 62.
    Stewart, C.-B. (1993) The powers and pitfalls of parsimony. Nature 361, 603–607.PubMedGoogle Scholar
  63. 63.
    Attwood, T. K., Beck, M. E., Flower, D. R., Scordis, P., and Selley, J. N. (1998) The PRINTS protein fingerprint database in its fifth year. Nucleic Acids Res. 26, 304–308.PubMedGoogle Scholar
  64. 64.
    Page, R. D. (1996) TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12, 357–358.PubMedGoogle Scholar
  65. 65.
    Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R. D., and Bairoch A. (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31, 3784–3788.PubMedGoogle Scholar
  66. 66.
    Rost, B. (1996) PHD: predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol. 266, 525–539.PubMedGoogle Scholar
  67. 67.
    Eyrich, V. A. and Rost, B. (2003) META-PP: single interface to crucial prediction servers. Nucleic Acids Res. 31, 3308–3310.PubMedGoogle Scholar
  68. 68.
    Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1–6.PubMedGoogle Scholar
  69. 69.
    Hansen, J. E., Lund, O., Tolstrup, N, Gooley, A. A., Williams, K. L., and Brunak, S. (1998) NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconjugate J. 15, 115–130.Google Scholar
  70. 70.
    Hansen, J. E., Lund, O., Rapacki, K., and Brunak, S. (1997) O-glycbase version 2.0-A revised database of O-glycosylated proteins. Nucleic Acids Res. 25, 278–282.PubMedGoogle Scholar
  71. 71.
    Hansen, J. E., Lund, O., Rapacki, K., et al. (1995) Prediction of O-glycosylation of mammalian proteins: specificity patterns of UDP-GalNAc:-polypeptide N-acetylgalactosaminyltransferase. Biochem. J. 308, 801–813.PubMedGoogle Scholar
  72. 72.
    Blom, N., Gammeltoft, S., and Brunak, S. (1999) Sequence-and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294, 1351–1362.PubMedGoogle Scholar
  73. 73.
    Blom, N., Hansen, J., Blaas, D., and Brunak, S. (1996) Cleavage site analysis in picornaviral polyproteins: Discovering cellular targets by neural networks. Protein Sci. 5, 2203–2216.PubMedGoogle Scholar
  74. 74.
    Emanuelsson, O., Nielsen, H., and von Heijne, G. (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 8, 978–984.PubMedGoogle Scholar
  75. 75.
    Cuff, J. A. and Barton, G. J. (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34, 508–519.PubMedGoogle Scholar
  76. 76.
    Sonnhammer, E. L. L. von Heijne, G., and Krogh, A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. in Proceedings of the Sixth Intern Conference on Intelligent Systems for Molecular Biology (ISMB98), pp. 175–182.Google Scholar
  77. 77.
    von Heijne, G. (1992) Membrane protein structure prediction, hydrophobicity analysis and the positive-inside rule. J. Mol. Biol. 225, 487–494.Google Scholar
  78. 78.
    Karplus, K., Barrett, C., and Hughey, R. (1998) Hidden markov models for detecting remote protein homologies. Bioinformatics 14, 846–856.PubMedGoogle Scholar
  79. 79.
    Cserzo, M., Wallin, E., Simon, I., von Heijne, G., and Elofsson, A. (1997) Prediction of transmembrane alpha-helices in procariotic membrane proteins: the dense alignment surface method. Protein Eng. 10, 673–676.PubMedGoogle Scholar
  80. 80.
    Fischer, D. and Eisenberg, D. A. (1996) Fold recognition using sequence-derived properties. Protein Sci. 5, 947–955.PubMedGoogle Scholar
  81. 81.
    Elofsson, A., Fischer, D., Rice, D. W., LeGrand, S., and Eisenberg, D. A. (1996) Study of combined structure-sequence profiles. Folding Design 1, 451–461.PubMedGoogle Scholar
  82. 82.
    Karplus, K., Karchin, R., Draper, J., et al. (2003) Combining local-structure, fold-recognition, and new-fold methods for protein structure prediction. Proteins 53(Suppl 6), 491–496.PubMedGoogle Scholar
  83. 83.
    Peitsch, M. C. (1995) Protein modelling by E-mail. BioTechnology 13, 658–660.Google Scholar
  84. 84.
    Peitsch, M. C. (1996) ProMod and Swiss-Model: internet-based tools for automated comparative protein modelling. Biochem. Soc. Trans. 24, 274–279.PubMedGoogle Scholar
  85. 85.
    Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modelling. Electrophoresis 18, 2714–2723.PubMedGoogle Scholar
  86. 86.
    Lund, O., Frimand, K., Gorodkin, J., et al. (1997) Protein distance constraints predicted by neural networks and probability density functions. Protein Eng. 10, 1241–1248.PubMedGoogle Scholar
  87. 87.
    Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  88. 88.
    Altschul, S. F. (1991) Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219, 555–565.PubMedGoogle Scholar
  89. 89.
    Altschul, S. F. and Gish, W. (1996) Local alignment statistics. Methods Enzymol. 266, 460–480.PubMedGoogle Scholar
  90. 90.
    Rost, B., Schneider, R., and Sander, C. (1997) Protein fold recognition by prediction-based threading. J. Mol. Biol. 270, 471–480.PubMedGoogle Scholar
  91. 91.
    Dayhoff, M. O., Barker, W. C., and Hunt, L. T. (1983) Establishing homologies in protein sequences. Methods Enzymol. 91, 524–545.PubMedGoogle Scholar
  92. 92.
    Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.PubMedGoogle Scholar
  93. 93.
    Pearson, W. R. (1995) Comparison of methods for searching protein sequence databases. Protein Sci. 4, 1145–1160.PubMedGoogle Scholar
  94. 94.
    Karlin, S. and Altschul, S. E. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.PubMedGoogle Scholar
  95. 95.
    Wootton, J. C. (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 18, 269–285.PubMedGoogle Scholar
  96. 96.
    Altschul, S. F., Madden, T. L., Schäffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.PubMedGoogle Scholar
  97. 97.
    Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.PubMedGoogle Scholar
  98. 98.
    Martin, A. C., Orengo, C. A., Hutchinson, E. G., et al. (1998) Protein folds and functions. Structure 6, 875–884.PubMedGoogle Scholar
  99. 99.
    McGuffin, L. J., Bryson, K., and Jones, D. T. (2001) What are the baselines for protein fold recognition? Bioinformatics 17, 63–72.PubMedGoogle Scholar
  100. 100.
    Bairoch, A. (1991) PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 19, 2241–2245.PubMedGoogle Scholar
  101. 101.
    Bairoch, A., Bucher, P., and Hofmann, K. (1997) The PROSITE database, its status in 1997. Nucleic Acids Res. 25, 217–221.PubMedGoogle Scholar
  102. 102.
    Bucher, P., Karplus, K., Moeri, N., and Hofmann, K. (1996) A flexible motif search technique based on generalized profiles. Comput. Chem. 20, 3–23.PubMedGoogle Scholar
  103. 103.
    Sonnhammer, E. L. and Kahn, D. (1994) Modular arrangement of proteins as inferred from analysis of homology. Protein Sci. 3, 482–492.PubMedGoogle Scholar
  104. 104.
    Corpet, F., Gouzy, J., and Kahn, D. (1998) The ProDom database of protein domain families. Nucleic Acids Res. 26, 323–326.PubMedGoogle Scholar
  105. 105.
    Sonnhammer, E. L., Eddy, S. R., and Durbin, R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405–420.PubMedGoogle Scholar
  106. 106.
    Bateman, A., Birney, E., Cerruti, L., et al. (2002) The Pfam protein families database. Nucleic Acids Res. 30, 276–280.PubMedGoogle Scholar
  107. 107.
    Apweiler, R., Attwood, T. K., Bairoch, A., et al. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29, 37–40.PubMedGoogle Scholar
  108. 108.
    Mulder, N. J., Apweiler, R., Attwood, T. K., et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–8.PubMedGoogle Scholar
  109. 109.
    Rawlings, N. D., O’Brien, E., and Barrett, A.J. (2002) MEROPS: the protease database. Nucleic Acids Res. 30, 343–346.PubMedGoogle Scholar
  110. 110.
    Storm, C. E. and Sonnhammer, E. L. (2001) NIFAS: visual analysis of domain evolution in proteins. Bioinformatics 17, 343–348.PubMedGoogle Scholar
  111. 111.
    Schultz, J., Milpetz, F., Bork, P., and Ponting, C. P. (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl. Acad. Sci. USA 95, 5857–5864.PubMedGoogle Scholar
  112. 112.
    Schultz, J., Copley, R. R., Doerks, T., Ponting, C. P., and Bork, P. (2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28, 231–234.PubMedGoogle Scholar
  113. 113.
    Letunic, I., Goodstadt, L., Dickens, N. J., et al. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242–244.PubMedGoogle Scholar
  114. 114.
    Pietrokovski, S., Henikoff, J.G. and Henikoff, S, (1996) The Blocks database-a system for protein classification. Nucleic Acids Res. 24, 197–200.PubMedGoogle Scholar
  115. 115.
    Attwood, T. K., Flower, D. R., Lewis, A. P., et al. (1999) PRINTS prepares for the new millennium. Nucleic Acids Res. 27, 220–225.PubMedGoogle Scholar
  116. 116.
    Silverstein, K. A., Shoop, E., Johnson, J. E., and Retzel, E. F. (2001) MetaFam: a unified classification of protein families. I. Overview and statistics. Bioinformatics 17, 249–261.PubMedGoogle Scholar
  117. 117.
    Yuan, Y. P., Eulenstein, O., Vingron, M., and Bork, P. (1998) Towards detection of orthologues in sequence databases. Bioinformatics 14, 285–289.PubMedGoogle Scholar
  118. 118.
    Bernstein, F. C., Koetzle, T. F., Williams, G. J., et al. (1977) The Protein Data Bank. A computerbased archival file for macromolecular structures. Eur. J. Biochem. 80, 319–324.PubMedGoogle Scholar
  119. 119.
    Berman, H. M., Westbrook, J., Feng, Z., et al. (2000) The Protein Data Bank. Nucleic Acids Res. 28, 235–242.PubMedGoogle Scholar
  120. 120.
    Murzin, A.G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.PubMedGoogle Scholar
  121. 121.
    Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., and Thornton, J. M. (1997) CATH-a Hierarchic classification of protein domain structures. Structure 5, 1093–1108.PubMedGoogle Scholar
  122. 122.
    Pearl, F. M. G, Lee, D., Bray, J. E, Sillitoe, I., Todd, A. E., Harrison, A. P., Thornton, J. M., and Orengo, C.A. (2000) Assigning genomic sequences to CATH. Nucleic Acids Res. 28, 277–282.PubMedGoogle Scholar
  123. 123.
    Peitsch, M. C. and Jongeneel, V. (1993) A 3-dimensional model for the CD40 ligand predicts that it is a compact trimer similar to the tumor necrosis factors. Int. Immunol. 5, 233–238.PubMedGoogle Scholar
  124. 124.
    Schwede, T., Kopp, J., Guex, N., and Peitsch, M. C. (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res. 31, 3381–3385.PubMedGoogle Scholar
  125. 125.
    Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 18, 2714–2723.PubMedGoogle Scholar
  126. 126.
    Combet, C., Jambon, M., Deleage, G., and Geourjon, C. (2002) Geno3D: automatic comparative molecular modelling of protein. Bioinformatics 18, 213–214.PubMedGoogle Scholar
  127. 127.
    Lambert, C., Leonard, N., De Bolle, X., and Depiereux, E. (2002) ESyPred3D: prediction of proteins 3D structures. Bioinformatics 18, 1250–1256.PubMedGoogle Scholar
  128. 128.
    Bader, G. D., Betel, D., and Hogue, C. W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250.PubMedGoogle Scholar
  129. 129.
    Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., and Eisenberg, D. (2000) DIP: The Database of Interacting Proteins. Nucleic Acids Res. 28, 289–291.PubMedGoogle Scholar
  130. 130.
    Levinthal, C., Wodak, S. J., Kahn, P., and Dadivanian, A. K. (1975) Hemoglobin interaction in sickle cell fibers. I. Theoretical approaches to the molecular contacts. Proc. Natl. Acad. Sci. USA 72, 1330–1334.PubMedGoogle Scholar
  131. 131.
    Wodak, S. J. and Janin, J. (1978) Computer analysis of protein-protein interaction. J. Mol. Biol. 124, 323–342.PubMedGoogle Scholar
  132. 132.
    Janin, J., Henrick, K., Moult, J., et al. (2003) CAPRI: a Critical Assessment of PRedicted Interactions. Proteins 52, 2–9.PubMedGoogle Scholar
  133. 133.
    Taylor, R. D., Jewsbury, P. J., and Essex, J. W. (2002) A review of protein-small molecule docking methods. J. Comput. Aided Mol. Des. 16, 151–166.PubMedGoogle Scholar
  134. 134.
    Read, T. D., Peterson, S. N., Tourasse, N., et al. (2003) The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423, 81–86.PubMedGoogle Scholar
  135. 135.
    Ivanova, N., Sorokin, A., Anderson, I., et al. (2003) Genome sequence of Bacillus cereus and comparative analysis with Bacillus anthracis. Nature 423, 87–91.PubMedGoogle Scholar
  136. 136.
    Smith, D. R. (1996) Microbial pathogen genomes-new strategies for identifying therapeutics and vaccine targets. Trends Biotechnol. 14, 290–293.PubMedGoogle Scholar
  137. 137.
    Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–637.PubMedGoogle Scholar
  138. 138.
    Tatusov, R. L., Natale, D. A., Garkavtsev, I. V., et al. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28.PubMedGoogle Scholar
  139. 139.
    Wheeler, D. L., Church, D. M., Federhen, S., et al. (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res. 31, 28–33.PubMedGoogle Scholar
  140. 140.
    Edgar, R., Domrachev, M., and Lash, A.E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210.PubMedGoogle Scholar
  141. 141.
    Rehm, B. H. A. and Reinecke, F. (2004) Evaluation of proteomic techniques: applications and potential. Curr. Proteomics 1, 103–111.Google Scholar

Copyright information

© Humana Press Inc., Totowa, NJ 2005

Authors and Affiliations

  • Bernd H. A. Rehm
    • 1
  • Frank Reinecke
    • 2
  1. 1.Institute of Molecular BioSciencesMassey UniversityPalmerston NorthNew Zealand
  2. 2.Institut für Molekulaire Mikrobiologie und BiotechnologieWestfälische Wilhelms-Universität MünsterMünsterGermany

Personalised recommendations