Connecting Protein Interaction Data, Mutations, and Disease Using Bioinformatics

  • Jake Y. Chen
  • Eunseog Youn
  • Sean D. Mooney
Part of the Methods in Molecular Biology book series (MIMB, volume 541)


Understanding how mutations lead to changes in protein function and/or protein interaction is critical to understanding the molecular causes of clinical phenotypes. In this method, we present a path toward integration of protein interaction data and mutation data and then demonstrate the identification of a subset of proteins and interactions that are important to a particular disease. We then build a statistical model of disease mutations in this disease-associated subset of proteins, and visualize these results. Using Alzheimer’s disease (AD) as case implementation, we find that we are able to identify a subset of proteins involved in AD and discriminate disease-associated mutations from SNPs in these proteins with 83% accuracy. As the molecular causes of disease become more understood, models such as these will be useful for identifying candidate variants most likely to be causative.

Key words

Protein interaction SNP mutation bioinformatics data integration 



Support for this work was provided by NIH grants K22LM009135 (PI: Mooney) and R01LM009722 (PI: Mooney).


  1. 1.
    Wheeler, D. L., Barrett, T., Benson, D. A., Bryant, S. H., Canese, K., Chetvernin, V., Church, D. M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L. Y., Helmberg, W., Kapustin, Y., Kenton, D. L., Khovayko, O., Lipman, D. J., Madden, T. L., Maglott, D. R., Ostell, J., Pruitt, K. D., Schuler, G. D., Schriml, L. M., Sequeira, E., Sherry, S. T., Sirotkin, K., Souvorov, A., Starchenko, G., Suzek, T. O., Tatusov, R., Tatusova, T. A., Wagner, L., and Yaschenko, E. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2006, 34:D173–80.PubMedCrossRefGoogle Scholar
  2. 2.
    Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K. F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006, 34:D354–7.PubMedCrossRefGoogle Scholar
  3. 3.
    Wu, C. H., Apweiler, R., Bairoch, A., Natale, D. A., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Mazumder, R., O'Donovan, C., Redaschi, N., and Suzek, B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 2006, 34:D187–91.PubMedCrossRefGoogle Scholar
  4. 4.
    Yue, P., Melamud, E., and Moult, J. SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 2006, 7:166.PubMedCrossRefGoogle Scholar
  5. 5.
    Klein, T. E., and Altman, R. B. PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Pharmacogenomics J 2004, 4(1):1.PubMedCrossRefGoogle Scholar
  6. 6.
    Mooney, S. Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief Bioinform 2005, 6:44–56.PubMedCrossRefGoogle Scholar
  7. 7.
    Ye, Y., Li, Z., and Godzik, A. Modeling and analyzing three-dimensional structures of human disease proteins. Pac Symp on Biocomput 2006, 11:439–50.CrossRefGoogle Scholar
  8. 8.
    Brown, K. R., and Jurisica, I. Online predicted human interaction database. Bioinformatics 2005, 21:2076–82.PubMedCrossRefGoogle Scholar
  9. 9.
    Ng, P. C., and Henikoff, S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 2003,31:3812–4.PubMedCrossRefGoogle Scholar
  10. 10.
    Ramensky, V., Bork, P., and Sunyaev, S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res 2002,30:3894–900.PubMedCrossRefGoogle Scholar
  11. 11.
    Saunders, C. T., and Baker, D. Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J Mol Biol 2002,322:891–901.PubMedCrossRefGoogle Scholar
  12. 12.
    Karchin, R., Kelly, L., and Sali, A. Improving functional annotation of non-synonymous SNPs with information theory. Pac Symp Biocomput 2005:397–408.Google Scholar
  13. 13.
    Krishnan, V. G., and Westhead, D. R. A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function. Bioinformatics 2003,19:2199–209.PubMedCrossRefGoogle Scholar
  14. 14.
    Capriotti, E., Calabrese, R., and Casadio, R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 2006, 22(22):2729–34.PubMedCrossRefGoogle Scholar
  15. 15.
    Karchin, R., Diekhans, M., Kelly, L., Thomas, D. J., Pieper, U., Eswar, N., Haussler, D., and Sali, A. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics 2005, 21:2814–20.PubMedCrossRefGoogle Scholar
  16. 16.
    Karchin, R., Monteiro, A. N., Tavtigian, S. V., Carvalho, M. A., and Sali, A. Functional impact of missense variants in BRCA1 predicted by supervised learning. PLoS Comput Biol 2007, 3:e26.PubMedCrossRefGoogle Scholar
  17. 17.
    Henikoff, S., and Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 1992,89:10915–9.PubMedCrossRefGoogle Scholar
  18. 18.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389–402.PubMedCrossRefGoogle Scholar
  19. 19.
    Iakoucheva, L. M., Radivojac, P., Brown, C. J., O'Connor, T. R., Sikes, J. G., Obradovic, Z., and Dunker, A. K. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 2004, 32:1037–49.PubMedCrossRefGoogle Scholar
  20. 20.
    Vapnik, V. N. The Nature of Statistical Learning Theory, 2005, Springer Verlag, New York.Google Scholar
  21. 21.
    Joachims, T. Learning to classify text using support vector machines: methods, theory, and algorithms. 2002, Kluwer Academic Publishers, Dordrecht.Google Scholar
  22. 22.
    Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. Supervised feature selection via dependence estimation. Mach Learn 2002, 46:389–422.CrossRefGoogle Scholar
  23. 23.
    Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13: 2498–504.PubMedCrossRefGoogle Scholar
  24. 24.
    Mishra, G. R., Suresh, M., et al. Human protein reference database – 2006 update. Nucleic Acids Res, 2006, 34(Database issue):D411–4.Google Scholar
  25. 25.
    Chen, J. Y., Mamidipalli, S. R., and Huan, T. HAPPI: an Online Database of Comprehensive Human Annotated and Predicted Protein Interactions, BMC Genomics 2009, (In press).Google Scholar
  26. 26.
    Huan, T., Sivachenko, A. Y., Harrison, S. H., and Chen, J. Y. ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining. BMC bioinformatics, 2008, 9 Suppl: S5.Google Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Jake Y. Chen
    • 1
  • Eunseog Youn
    • 2
  • Sean D. Mooney
    • 3
  1. 1.Informatics and Technology Complex (IT)Indiana University School of Informatics, IUPUI, IndianapolisUSA
  2. 2.Department of Computer ScienceTexas Tech UniversityLubbockUSA
  3. 3.Department of Medical and Molecular Genetics, Center for Computational Biology and BioinformaticsIUPUI, IndianapolisUSA

Personalised recommendations