Skip to main content

In Silico Proteomics

Predicting Interactions from Sequence

  • Chapter
Handbook of Proteomic Methods
  • 244 Accesses

Abstract

This Handbook of Proteomic Methods largely comprises current experimental technologies to identify, quantify, and characterize expressed proteins and their interactions within cells, tissues, and body fluids. These techniques have evolved rapidly with an impetus from the industrial biotechnology sector. Nevertheless, experimental elucidation of all proteomic constituents within an organism and the documentation of their interactions remain formidable tasks. This is further complicated by the broad diversity in protein expression guaranteed by alternative splicing of pre-mRNA or post-translational modifications. In one dramatic example, more than 38,000 different isoforms of Down syndrome cell adhesion molecule (DSCAM) were observed in Drosophila melanogaster (1). Obviously, the combinatorics required for comprehensively explicating all protein—protein interactions, especially for higher eukaryotes, are prohibitive, even with the use of advanced high-throughput approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schmucker, D., Clemens, J. C., Shu, H., et al. (2000) Drosophila DSCAM is an axon guidance receptor exhibiting extraordinary molecular diversity. Cell 101, 671–684.

    CAS  Google Scholar 

  2. Fung, Y. C. (1993) Biomechanics: Mechanical Properties of Living Tissues, 2nd ed. Springer-Verlag, New York.

    Google Scholar 

  3. Spellman, P. T. and Rubin, G. M. (2002) Evidence for large domains of similarly expressed genes in the Drosophila genome. J. Biol. 1, 5.1–5. 8.

    Google Scholar 

  4. Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992) A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory ( Haussler, D., ed.), ACM Press, Pittsburgh, PA, pp. 144–152.

    Chapter  Google Scholar 

  5. Vapnik, V. N. (1995) The Nature of Statistical Learning Theory. Springer-Verlag, Heidelberg, Germany.

    Google Scholar 

  6. Bock, J. R. and Gough, D. A. (2001) Predicting protein-protein interactions from primary structure. Bioinformatics 17, 455–460.

    Article  PubMed  CAS  Google Scholar 

  7. Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., and Eisenberg, D. (2000) DIP: The database of interacting proteins. Nucleic Acids Res. 28, 289–291.

    Article  PubMed  CAS  Google Scholar 

  8. Kandel, D., Mathias, Y., Unger, R., and Winkler, P. (1996) Shuffling biological sequences. Discrete Appl. Math. 71, 171–185.

    Article  Google Scholar 

  9. Eisenberg, D. (1984) Three-dimensional structure of membrane and surface proteins. Ann. Rev. Biochem. 53, 595–623.

    Article  PubMed  CAS  Google Scholar 

  10. Bull, H. B. and Breese, K. (1974) Surface tension of amino acid solutions: a hydrophobicity scale of the amino acid residues. Arch. Biochem. Biophys. 161, 665–670.

    Article  PubMed  CAS  Google Scholar 

  11. Provost, F., Fawcett, T., and Kohavi, R. (1998) The case against accuracy estimation for comparing induction algorithms, in Proceedings of the Fifteenth International Conference on Machine Learning (IMLC-98), Morgan Kaufmann, San Francisco, CA, pp. 445–453.

    Google Scholar 

  12. Weiss, G. M. and Provost, F. (2001) The effect of class distribution on classifier learning: an empirical study. Technical Report ML-TR-44, Department of Computer Science, Rutgers University.

    Google Scholar 

  13. Swingler, K. (1996) Applying Neural Networks: A Practical Guide. Academic, London, UK.

    Google Scholar 

  14. Kwok, J. T. (1999) Moderating the outputs of support vector machine classifiers. IEEE Trans. Neural Net. 10, 1018–1031.

    Article  CAS  Google Scholar 

  15. Platt, J. C. (1999) Fast training of support vector machines using sequential minimal optimization, in Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, pp. 185–208.

    Google Scholar 

  16. Witten, I. H. and Frank, E. (1999) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  17. Elkan, C. (2001) The foundations of cost-sensitive learning, in Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, WA, pp. 973–978.

    Google Scholar 

  18. Bock, J. R. and Gough, D. A. (2003) Machine learning inference of protein-protein binding in Saccharomyces cerevisiae,in review.

    Google Scholar 

  19. Goffeau, A., Barrell, B. G., Bussey, H., et al. (1996) Life with 6000 genes. Science 274, 563–567.

    Article  Google Scholar 

  20. Chervitz, S. A., Aravind, L., Sherlock, G., Ball, C. A., Koonin, E. V., and Dwight, S. S. (1998) Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science 282, 2022–2028.

    Article  PubMed  CAS  Google Scholar 

  21. Mumberg, D., Muller, R., and Funk, M. (1995) Yeast vectors for the controlled expression of heterologous proteins in different genetic backgrounds. Gene 156, 119–122.

    Article  PubMed  CAS  Google Scholar 

  22. Munder, T. and Hinnen, A. (1999) Yeast cells as tools for target-oriented screening. Appl. Microbiol. Biotechnol. 52, 311–320.

    Article  PubMed  CAS  Google Scholar 

  23. Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. (2001) A comprehensive two-hydrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574.

    Article  PubMed  CAS  Google Scholar 

  24. Bartel, P., Chien, C. T., Sternglanz, R., and Fields, S. (1993) Elimination of false positives that arise in using the two-hybrid system. Biotechniques 14, 920–924.

    PubMed  CAS  Google Scholar 

  25. Smith, T. F. and Waterman, W. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.

    Article  PubMed  CAS  Google Scholar 

  26. Altschul, S. F. and Gish, W. (1996) Local alignment statistics. Methods Enzymol. 266, 460–480.

    Article  PubMed  CAS  Google Scholar 

  27. Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915–10, 519.

    Google Scholar 

  28. Kohavi, R. and Provost, F. (1998) Glossary of terms. Machine Learning 30, 271–274.

    Article  Google Scholar 

  29. Peterson, W. W. and Birdsall, T. G. (1953) The theory of signal detectability. Technical Report TR-13, Communications and Signal Processing Laboratory, University of Michigan, Ann Arbor, MI.

    Google Scholar 

  30. Stone, M. (1974) Cross-validatory choices and assessment of statistical predictions. J. Roy. Stat. Soc. 36, 111–147.

    Google Scholar 

  31. Skolnik, M. I. (1980) Introduction to Radar Systems, 2nd ed. McGraw-Hill, New York.

    Google Scholar 

  32. Urick, R. J. (1983) Principles of Underwater Sound, 3rd ed. McGraw-Hill, New York.

    Google Scholar 

  33. Druker, B. J., Talpaz, M. T., Resta, D. J., et al. (2001) Efficacy and safety of a specific inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia and acute lymphoblastic leukemia. N. Engl. J. Med. 344, 1031–1037.

    Article  PubMed  CAS  Google Scholar 

  34. Black, D. L. (2000) Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology. Cell 103, 367–370.

    Article  PubMed  CAS  Google Scholar 

  35. Bock, J. R. and Gough, D. A. (2003) Whole-proteome interaction mining. Bioinformatics 19 125–135.

    Google Scholar 

  36. Bradley, P. S., Fayyad, U. M., and Mangasarian, O. L. (1998) Mathematical programming for data mining: formulations and challenges. Technical Report MSR-98–01, University of Wisconsin Data Mining Institute, Madison, WI.

    Google Scholar 

  37. Rain, J. C., Selig, L., De Reuse, H., et al. (2001) The protein-protein interaction map of Helicobacter pylori. Nature 409, 211–215.

    CAS  Google Scholar 

  38. Burges, C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discovery 2, 121–167.

    Article  Google Scholar 

  39. Sankoff, D. Leduc, G., Paquin, B., Lang, B. F., and Cedergren, R. (1992) Gene order comparisons of phylogenetic inference: evolution of the mitochondrial genome. Proc. Natl. Acad. Sci. USA 89 6575–6579.

    Google Scholar 

  40. Tekaia, F., Lazcano, A., and Dujon, B. (l 999) The genomic tree as revealed from whole proteome comparisons. Genome Res. 9, 550–557.

    Google Scholar 

  41. Brown, J. R. Douady, C. J., Italia, M. J. Marshall, W. E., and Stanhope, M. H. (2001) Universal trees based on large combined protein sequence data sets. Nat. Genet. 28 281–285.

    Google Scholar 

  42. Efron, B. and Gong, G. (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37, 36–48.

    Google Scholar 

  43. Eisen, J. A. (2000) Assessing evolutionary relationships among microbes from wholegenome analysis. Curr. Opin. Microbiol. 3, 475–480.

    Article  PubMed  CAS  Google Scholar 

  44. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. (2002) Network motifs: simple building blocks of complex networks. Science 298, 824–827.

    Google Scholar 

  45. Klumpp, S. and Krieglstein, J. (2002) Phosphorylation and dephosphorylation of histidine residues in proteins. Eur. J. Biochem. 269, 1067–1071.

    Article  PubMed  CAS  Google Scholar 

  46. Alberts, B., Bray, D., Lewis, J. Raff, M., Roberts, K., and Watson, J. D. (1989) Molecular Biology of the Cell,2nd ed. New York.

    Google Scholar 

  47. Bairoch, A., Bucher, P., and Hofmann, K. (1997) The PROSITE database, its status in 1997. Nucleic Acids Res. 25, 217–221.

    Article  PubMed  CAS  Google Scholar 

  48. Matsushita, M. and Janda, K. D. (2002) Histidine kinases as targets for new antimicrobial agents. Bioorg. Med. Chem. 10, 855–867.

    Article  PubMed  CAS  Google Scholar 

  49. Andrews, S. C. (1998) Iron storage in bacteria. Adv. Microb. Physiol. 40, 281–351.

    Article  PubMed  CAS  Google Scholar 

  50. Jeong, H., Mason, S. P., Barabâsi, A.-L., and Oltvai, Z. N. (2001) Lethality and centrality in protein networks. Nature 411, 41–42.

    Google Scholar 

  51. Cunningham, M. J. (2000) Genomics and proteomics: the new millennium of drug discovery and development. J. Pharmacol. Toxicol. Methods 44, 291–300.

    Article  PubMed  CAS  Google Scholar 

  52. Bissantz, C., Folkers, G., and Rognan, D. (2000) Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. J. Med. Chem. 43, 4759–4767.

    Article  PubMed  CAS  Google Scholar 

  53. Waszkowycz, B. (2002) Structure-based approaches to drug design and virtual screening. Curr. Opin. Drug Discovery Dev. 5, 407–413.

    CAS  Google Scholar 

  54. Langer, T. and Hoffmann, R. D. (2001) Virtual screening: an effective tool for lead structure discovery? Curr. Pharma. Design 7, 509–527.

    Article  CAS  Google Scholar 

  55. Gohlke, H. and Klebe, G. (2001) Statistical potentials and scoring functions applied to protein-ligand binding. Curr. Opin. Struct. Biol. 11, 231–235.

    Article  PubMed  CAS  Google Scholar 

  56. Böhm, H. J. (1998) Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs. J. Comput. Aided Mol. Design 12 309–323.

    Google Scholar 

  57. Moret, E. E., van Wijk, M. C., Kostense, A. S., and Gillies, M. B. (1999) Scoring peptide(mimetic)-protein interactions. Med. Chem. Res. 9, 604–620.

    CAS  Google Scholar 

  58. Bock, J. R. and Gough, D. A. (2002) A new method to estimate ligand-receptor energetics. Mol. Cell. Proteomics 1, 904–910.

    Google Scholar 

  59. Smola, A. J. and Schölkopf, B. (1998) A tutorial on support vector regression. Technical Report NC-TR-98–030, Royal Holloway College, University of London, London.

    Google Scholar 

  60. Ortiz, A. R., Pisabarro, M. T., Gago, F., and Wade, R. C. (1995) Prediction of drug binding affinities by comparative binding energy analysis. J. Med. Chem. 38, 2681–2691.

    Google Scholar 

  61. Chen, Y. Z. and Zhi, D. G. (2001) Ligand-protein inverse docking and its potential use in the computer search of protein targets of a small molecule. Proteins 43, 217–226.

    Article  PubMed  CAS  Google Scholar 

  62. Berman, H. M., Westbrook, J., Feng, Z., et al. (2000) The Protein Data Bank. Nucleic Acids Res. 28, 235–242.

    Article  PubMed  CAS  Google Scholar 

  63. Weininger, D. (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inform. Comput. Sci. 28, 31–36.

    Article  CAS  Google Scholar 

  64. Wegner, J. and Zell, A. (2002) JOELib: a Java based computational chemistry package, in 6th Darmstädter Molecular-Modelling Workshop, Technische Universität, Darmstadt, Germany.

    Google Scholar 

  65. Burden, F. R. (1989) Molecular identification number for substructure searches. J. Chem. Inform. Comput. Sci. 29, 225–227.

    Article  CAS  Google Scholar 

  66. Boikess, R. S. and Edelson, E. (1981) Chemical Principles, 2nd ed. Harper & Row, New York.

    Google Scholar 

  67. Golub, G. H. and van Loan, C. F. (1989) Matrix Computations, 2nd ed. Johns Hopkins University Press, Baltimore, MD.

    Google Scholar 

  68. Gershenfeld, N. A. and Weigend, A. S. (1993) The Future of Time Series: Learning and Understanding, vol. XV of Sante Fe Institute Studies in the Sciences of Complexity. Addison- Wesley, Reading, MA, pp. 1–70.

    Google Scholar 

  69. Kendall, M. G. (1938) A new measure of rank correlation. Biometrika 30, 81–93.

    Google Scholar 

  70. Head, R. D., Smythe, M. L., Oprea, T. I., Waller, C. L., Green, S. M., and Marshall, G. R. (1996) VALIDATE: a new method for the receptor-based prediction of binding affinities of novel ligands. J. Amer. Chem. Soc. 118, 3959–3969.

    Article  CAS  Google Scholar 

  71. Wang, R., Liu, L., Lai, L., and Tang, Y. (1998) SCORE: a new empirical method for estimating the binding affinity of a protein-ligand complex. J. Mol. Modeling 4, 379–394.

    Article  CAS  Google Scholar 

  72. Schwikowski, B., Uetz, P., and Fields, S. (2000) A network of protein-protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261.

    Article  PubMed  CAS  Google Scholar 

  73. Wojcik, J. and Schächter, V. (2001) Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics 17 (suppl. 1), S296 - S305.

    Article  PubMed  Google Scholar 

  74. Uetz, P., Goit, L., Cagney, G., et al. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627.

    CAS  Google Scholar 

  75. Tucker, C. L., Gera, J. F., and Uetz, P. (2001) Towards an understanding of complex protein networks. Trends Cell Biol. 11, 102–106.

    Article  PubMed  CAS  Google Scholar 

  76. Walhout, A., Boulton, S., and Vidal, M. (2000) Yeast two-hybrid systems and protein interaction mapping projects for yeast and worm. Yeast 17, 88–94.

    Article  PubMed  CAS  Google Scholar 

  77. Wang, R., Lai, L., and Wang, S. (2002) Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput. Aided Mol. Design 16, 11–26.

    Article  CAS  Google Scholar 

  78. Rarey, M., Kramer, B., Bernd, C., and Lengauer, T. (1996) Time-efficient docking of similar flexible ligands, in Biocomputing: Proceedings of the 1996 Pacific Symposium, Hunter, L. and Klein, T., eds., January 3–6, World Scientific Publishing, Singapore.

    Google Scholar 

  79. Zhang, T. and Koshland, D. E. (1996) Computational method for relative binding energies of enzyme-substrate complexes. Protein Sci. 5, 348–356.

    Article  PubMed  CAS  Google Scholar 

  80. Schapira, M., Totrov, M., and Abagyan, R. (1999) Prediction of the binding energy for small molecules, peptides and proteins. J. Mol. Recog. 12, 177–190.

    Article  CAS  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Bock, J.R., Gough, D.A. (2003). In Silico Proteomics. In: Conn, P.M. (eds) Handbook of Proteomic Methods. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-59259-414-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-59259-414-6_13

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61737-504-0

  • Online ISBN: 978-1-59259-414-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics