Genomic Annotation Prediction Based on Integrated Information

  • Davide Chicco
  • Marco Tagliasacchi
  • Marco Masseroli
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7548)


In the recent years, an increasingly large amount of biomedical and biomolecular information and data has become available to researchers, allowing the scientific community to infer new knowledge and reach new objectives. As these information increase, so does the difficulty in managing it efficiently. In this paper, we present a short overview of our proposal to solve this problem, a prototypal multi-organism Genomic and Proteomic Data Warehouse called GPDW, based at Politecnico di Milano. We also present the computational methods we implemented to exploit it. Experimental studies on datasets demonstrated the effectiveness of our resource and methods.


Biomolecular databases Bioinformatics data integration Biomolecular annotation prediction Information integration Data warehouse Software infrastructures 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Galperin, M.Y., Cochrane, G.R.: Nucleic Acids Research Annual Database Issue and the NAR Online Molecular Biology Database Collection in 2009. Nucleic Acids Res. 37(Database issue), D1–D4 (2009)CrossRefGoogle Scholar
  2. 2.
    EMBL Nucleotide Sequence Database Statistics,
  3. 3.
    Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics Enrichment Tools: Paths toward the Comprehensive Functional Analysis of Large Gene Lists. Nucleic Acids Res. 37, 1–13 (2009)CrossRefGoogle Scholar
  4. 4.
    Al-Shahrour, F., Minguez, P., Tárraga, J., Medina, I., Alloza, E., Montaner, D., Dopazo, J.: FatiGO+: A Functional Profiling Tool for Genomic Data. Integration of Functional Annotation, Regulatory Motifs and Interaction Data with Microarray Experiments. Nucleic Acids Res. 35(Web Server issue), W91–W96 (2007)CrossRefGoogle Scholar
  5. 5.
    Huang, D.W., Sherman, B.T., Tan, Q., Kir, J., Liu, D., Bryant, D., Guo, Y., et al.: DAVID Bioinformatics Resources: Expanded Annotation Database and Novel Algorithms to Better Extract Biology from Large Gene Lists. Nucleic Acids Res. 35(Web Server issue), W169–W175 (2007)CrossRefGoogle Scholar
  6. 6.
    Masseroli, M., Martucci, D., Pinciroli, F.: GFINDer: Genome Function INtegrated Discoverer through Dynamic Annotation, Statistical Analysis, and Mining. Nucleic Acids Res. 32, W293–W300 (2004)CrossRefGoogle Scholar
  7. 7.
    Masseroli, M.: Management and Analysis of Genomic Functional and Phenotypic Controlled Annotations to Support Biomedical Investigation and Practice. IEEE Trans. Inf. Technol. Biomed. 11, 376–385 (2007)CrossRefGoogle Scholar
  8. 8.
    Sujansky, W.: Heterogeneous Database Integration in Biomedicine. J. Biomed. Inform. 34, 285–298 (2001)CrossRefGoogle Scholar
  9. 9.
    Hernandez, T., Kambhampati, S.: Integration of Biological Sources: Current Systems and Challenges ahead. SIGMOD Record 33, 51–60 (2004)CrossRefGoogle Scholar
  10. 10.
    The Gene Ontology Consortium: Creating the Gene Ontology Resource: Design and Implementation. Genome Res. 11, 1425–1433 (2001)Google Scholar
  11. 11.
    Khatri, P., Done, B., Rao, A., Done, A., Draghici, S.: A Semantic Analysis of the Annotations of the Human Genome. Bioinformatics 21, 3416–3421 (2005)CrossRefGoogle Scholar
  12. 12.
    Davidson, S.B., Overton, C., Tanen, V., Wong, L.: BioKleisli: A Digital Library for Biomedical Researchers. Int. J. Digit. Libr. 1, 36–53 (1997)Google Scholar
  13. 13.
    Davidson, S.B., Crabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, C., Stoeckert, C.: K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM System Journal 40, 512–531 (2001)CrossRefGoogle Scholar
  14. 14.
    Etzold, T., Ulyanov, A., Argos, P.: SRS: Information Retrieval System for Molecular Biology Data Banks. Methods Enzymol. 266, 114–128 (1996)CrossRefGoogle Scholar
  15. 15.
    Tatusova, T.A., Karsch-Mizrachi, I., Ostell, J.A.: Complete Genomes in WWW Entrez: Data Representation and Analysis. Bioinformatics 15, 536–543 (1999)CrossRefGoogle Scholar
  16. 16.
    Safran, M., Solomon, I., Shmueli, O., Lapidot, M., Shen-Orr, S., Adato, A., et al.: GeneCards 2002: Towards a Complete, Object-Oriented, Human Gene Compendium. Bioinformatics 18, 1542–1543 (2002)CrossRefGoogle Scholar
  17. 17.
    Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J.C., Hernandez-Boussard, T., et al.: SOURCE: A Unified Genomic Resource of Functional Annotations, Ontologies, and Gene Expression Data. Nucleic Acids Res. 31, 219–223 (2003)CrossRefGoogle Scholar
  18. 18.
    Freier, A., Hofestädt, R., Lange, M., Scholz, U., Stephanik, A.: BioDataServer: A SQL-Based Service for the Online Integration of Life Science Data. Silico Biol. 2, 37–57 (2002)Google Scholar
  19. 19.
    Haas, L.M., Schwarz, P.M., Kodali, P., Kotlar, E., Rice, J.E., Swops, W.C.: DiscoveryLink: A System for Integrated Access to Life Sciences Data Sources. IBM Systems Journal 40, 489–511 (2001)CrossRefGoogle Scholar
  20. 20.
    Kasprzyk, A., Keefe, D., Smedley, D., London, D., Spooner, W., Melsopp, C., et al.: EnsMart: A Generic System for Fast and Flexible Access to Biological Data. Genome Res. 14, 160–169 (2004)CrossRefGoogle Scholar
  21. 21.
    Lee, T.J., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D.W., Tenenbaum, J.D., Karp, P.D.: BioWarehouse: A Bioinformatics Database Warehouse Toolkit. BMC Bioinformatics 7, 1–14 (2006)CrossRefGoogle Scholar
  22. 22.
    Drineas, P.: Clustering large graphs via the singular value decomposition: Theoretical advances in data clustering. Machine Learning 56, 9–33 (2004)zbMATHCrossRefGoogle Scholar
  23. 23.
    Lin, D.: An Information-Theoretic Definition of Similarity. In: Shavlik, J.W. (ed.) Proceedings of the 15th International Conference on Machine Learning (ICML 1998), pp. 296–304. Morgan Kaufmann Publishers Inc., San Francisco (1998)Google Scholar
  24. 24.
    King, O.D., Foulger, R.E., Dwight, S.S., White, J.V., Roth, F.P.: Predicting Gene Function From Patterns of Annotation. Genome Res. 13, 896–904 (2003)CrossRefGoogle Scholar
  25. 25.
    Tao, Y., Sam, L., Li, J., Friedman, C., Lussier, Y.A.: Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics 23, 529–538 (2007)CrossRefGoogle Scholar
  26. 26.
    AMD Core Math Library (ACML),
  27. 27.
  28. 28.
    Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science & Engineering 5, 46–55 (1998)CrossRefGoogle Scholar
  29. 29.
    Gordon, R.: Essential JNI: Java Native Interface. Prentice-Hall, Inc., NJ (1998)Google Scholar
  30. 30.
    Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software (TOMS) 5 (1979)Google Scholar
  31. 31.
    Berry, M., Do, T., O’Brien, G., Krishna, V., Varadhan, S.: SVDPACKC (Version 1.0) User’s Guide. Citeseer (1993)Google Scholar
  32. 32.
    Angerson, B., Dongarra, G., McKenney, D.C., et al.: LAPACK: A portable linear algebra library for high-performance computers. In: Proceedings of the 1990 ACM/IEEE Conference on Supercomputing, pp. 2–11. IEEE Computer Society Press, Los Alamitos (1990)Google Scholar
  33. 33.
    Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999). ACM, New York (1999)Google Scholar
  34. 34.
    Egan, J.P.: Signal Detection Theory and ROC Analysis. Academic Press, New York (1975)Google Scholar
  35. 35.,
  36. 36.
    Ceri, S., Brambilla, M. (eds.): Search Computing. LNCS, vol. 5950. Springer, Heidelberg (2010)Google Scholar
  37. 37.
    Masseroli, M., Ghisalberti, G.: Bio-SeCo: Integration and Global Ranking of Biomedical Search Results. In: Ceri, S., Brambilla, M. (eds.) Search Computing II. LNCS, vol. 6585, pp. 203–214. Springer, Heidelberg (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Davide Chicco
    • 1
  • Marco Tagliasacchi
    • 1
  • Marco Masseroli
    • 1
  1. 1.Dipartimento di Elettronica e InformazionePolitecnico di MilanoMilanItaly

Personalised recommendations