Integration of Biomolecular Interaction Data in a Genomic and Proteomic Data Warehouse to Support Biomedical Knowledge Discovery

  • Arif Canakoglu
  • Giorgio Ghisalberti
  • Marco Masseroli
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7548)


The growing available genomic and proteomic information gives new opportunities for novel research approaches and biomedical discoveries through effective data management and analysis support. Integration and comprehensive evaluation of available controlled data can highlight information patterns leading to unveil new biomedical knowledge. For this purpose, the University Politecnico di Milano, is developing a software framework to create and maintain a Genomic and Proteomic Data Warehouse (GPDW) that integrates information from many data sources on the basis of a conceptual data model that relates molecular entities and biomedical features.

Here we illustrate and discuss the extension of framework for integrating biomolecular interaction data in the GPDW. The comprehensive and mining of the reliable interaction data together with the other biomolecular information in the GPDW constitutes a powerful computational support for novel biomedical knowledge discoveries.


Proteomic and genomic interaction data Automatic data parsing and integration Data warehousing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ceol, A., Chatr Aryamontri, A., Licata, L., Peluso, D., Briganti, L., Perfetto, L., Castagnoli, L., Cesareni, G.: MINT, the molecular interaction database: 2009 update. Nucleic Acids Res. 38(Database issue), D532–D539 (2009)Google Scholar
  2. 2.
    Aranda, B., Achuthan, P., Alam-Faruque, Y., Armean, I., Bridge, A., Derow, C., Feuermann, M., Ghanbarian, A.T., Kerrien, S., Khadake, J., et al.: The IntAct molecular interaction database in 2010. Nucleic Acids Res. 38, D525–D531 (2010)CrossRefGoogle Scholar
  3. 3.
    Jayapandian, M., Chapman, A., Tarcea, V.G., Yu, C., Elkiss, A., Ianni, A., Liu, B., Nandi, A., Santos, C., Andrews, P., et al.: Michigan Molecular Interactions (MiMI): putting the jigsaw puzzle together. Nucleic Acids Res. 35, 566–571 (2007)CrossRefGoogle Scholar
  4. 4.
    Kerrien, S., Orchard, S., Montecchi-Palazzi, L., Aranda, B., Quinn, A.F., Vinod, N., Bader, G.D., Xenarios, I., Wojcik, J., Sherman, D., et al.: Broadening the horizonlevel 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol. 5, 44 (2007)CrossRefGoogle Scholar
  5. 5.
    Orchard, S., Kerrien, S., Jones, P., Ceol, A., Chatr-Aryamontri, A., Salwinski, L., Nerothin, J., Hermjakob, H.: Submit your interaction data the IMEx way: a step by step guide to trouble-free deposition. Proteomics 7(suppl. 1), 28–34 (2007)CrossRefGoogle Scholar
  6. 6.
    Kulikova, T., Akhtar, R., Aldebert, P., Althorpe, N., Andersson, M., Baldwin, A., Bates, K., Bhattacharyya, S., Bower, L., Browne, P., et al.: EMBL Nucleotide Sequence Database in 2006. Nucleic Acids Res. 35, D16–D20 (2007)CrossRefGoogle Scholar
  7. 7.
    Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: GenBank. Nucleic Acids Res. 36, 25–30 (2008)CrossRefGoogle Scholar
  8. 8.
    Sugawara, H., Ogasawara, O., Okubo, K., Gojobori, T., Tateno, Y.: DDBJ with new system and face. Nucleic Acids Res. 36, D22–D24 (2008)CrossRefGoogle Scholar
  9. 9.
    Kasprzyk, A., Keefe, D., Smedley, D., London, D., Spooner, W., Melsopp, C., et al.: EnsMart: A Generic System for Fast and Flexible Access to Biological Data. Genome Res. 14(1), 160–169 (2004)CrossRefGoogle Scholar
  10. 10.
    Lee, T.J., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D.W., Tenenbaum, J.D., Karp, P.D.: BioWarehouse: A Bioinformatics Database Warehouse Toolkit. BMC Bioinformatics 7(170), 1–14 (2006)Google Scholar
  11. 11.
    Masseroli, M., Martucci, D., Pinciroli, F.: GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining. Nucleic Acids Res. 32(suppl. 2), W293–W300 (2004)CrossRefGoogle Scholar
  12. 12.
    Masseroli, M., Galati, O., Pinciroli, F.: GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists. Nucleic Acids Res. 33(suppl. 2), W717–W723 (2005)CrossRefGoogle Scholar
  13. 13.
    Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer (2006)Google Scholar
  14. 14.
    Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for Data Quality Assessment and Improvement. ACM Comput. Surv. 41(3), 16, 1–52 (2009)Google Scholar
  15. 15.
    Madnick, S.E., Wang, R.Y., Lee, Y.W., Zhu, H.: Overview and Framework for Data and Information Quality Research. ACM J. Data Inform. Quality 1(1), 2, 1–22 (2009)Google Scholar
  16. 16.
    Ghisalberti, G., Masseroli, M., Tettamanti, L.: Quality Controls in Integrative Approaches to Detect Errors and Inconsistencies in Biological Databases. J. Integr. Bioinform. 7(3), 2010–2119 (2010)Google Scholar
  17. 17.
    Hubbard, T.J., Aken, B.L., Ayling, S., Ballester, B., Beal, K., Bragin, E., Brent, S., Chen, Y., Clapham, P., Clarke, L., et al.: Ensembl 2009. Nucleic Acids Res. 37(Database issue), 690–697 (2009)CrossRefGoogle Scholar
  18. 18.
    Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI reference sequences (RefSeq): a Curated Non-Redundant Sequence Database of Genomes, Transcripts and Proteins. Nucleic Acids Res. 35(Database issue), D61–D65 (2007)CrossRefGoogle Scholar
  19. 19.
    UniProt Consortium. The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res. 37(Database issue), D169–D174 (2009)Google Scholar
  20. 20.
    Hermjakob, H., Montecchi-Palazzi, L., Bader, G., Wojcik, J., Salwinski, L., Ceol, A., Moore, S., Orchard, S., Sarkans, U., et al.: The HUPO PSI’s molecular interaction format–a community standard for the representation of protein interaction data. Nature Biotechnology 22(2), 177–183 (2004)CrossRefGoogle Scholar
  21. 21.
    Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R.D., Bairoch, A.: ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 1, 31(13), 3784–3788 (2003)Google Scholar
  22. 22.
    Kanehisa, M., Goto, S.: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28(1), 27–30 (2000)CrossRefGoogle Scholar
  23. 23.
    Amberger, J., Bocchini, C.A., Scott, A.F., Hamosh, A.: McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 37(Database issue), 793–796 (2009)CrossRefGoogle Scholar
  24. 24.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., et al.: Gene Ontology: Tool for the Unification of Biology. Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  25. 25.
    Matthews, L., Gopinath, G., Gillespie, M., Caudy, M., Croft, D., de Bono, B., et al.: Reactome Knowledgebase of Human Biological Pathways and Processes. Nucleic Acids Res. 37(Database issue), D619–D622 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Arif Canakoglu
    • 1
  • Giorgio Ghisalberti
    • 1
  • Marco Masseroli
    • 1
  1. 1.Dipartimento di Elettronica e InformazionePolitecnico di MilanoMilanoItaly

Personalised recommendations