Skip to main content

Genomic Annotation Prediction Based on Integrated Information

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7548))

Abstract

In the recent years, an increasingly large amount of biomedical and biomolecular information and data has become available to researchers, allowing the scientific community to infer new knowledge and reach new objectives. As these information increase, so does the difficulty in managing it efficiently. In this paper, we present a short overview of our proposal to solve this problem, a prototypal multi-organism Genomic and Proteomic Data Warehouse called GPDW, based at Politecnico di Milano. We also present the computational methods we implemented to exploit it. Experimental studies on datasets demonstrated the effectiveness of our resource and methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Galperin, M.Y., Cochrane, G.R.: Nucleic Acids Research Annual Database Issue and the NAR Online Molecular Biology Database Collection in 2009. Nucleic Acids Res. 37(Database issue), D1–D4 (2009)

    Article  Google Scholar 

  2. EMBL Nucleotide Sequence Database Statistics, http://www3.ebi.ac.uk/Services/DBStats/

  3. Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics Enrichment Tools: Paths toward the Comprehensive Functional Analysis of Large Gene Lists. Nucleic Acids Res. 37, 1–13 (2009)

    Article  Google Scholar 

  4. Al-Shahrour, F., Minguez, P., Tárraga, J., Medina, I., Alloza, E., Montaner, D., Dopazo, J.: FatiGO+: A Functional Profiling Tool for Genomic Data. Integration of Functional Annotation, Regulatory Motifs and Interaction Data with Microarray Experiments. Nucleic Acids Res. 35(Web Server issue), W91–W96 (2007)

    Article  Google Scholar 

  5. Huang, D.W., Sherman, B.T., Tan, Q., Kir, J., Liu, D., Bryant, D., Guo, Y., et al.: DAVID Bioinformatics Resources: Expanded Annotation Database and Novel Algorithms to Better Extract Biology from Large Gene Lists. Nucleic Acids Res. 35(Web Server issue), W169–W175 (2007)

    Article  Google Scholar 

  6. Masseroli, M., Martucci, D., Pinciroli, F.: GFINDer: Genome Function INtegrated Discoverer through Dynamic Annotation, Statistical Analysis, and Mining. Nucleic Acids Res. 32, W293–W300 (2004)

    Article  Google Scholar 

  7. Masseroli, M.: Management and Analysis of Genomic Functional and Phenotypic Controlled Annotations to Support Biomedical Investigation and Practice. IEEE Trans. Inf. Technol. Biomed. 11, 376–385 (2007)

    Article  Google Scholar 

  8. Sujansky, W.: Heterogeneous Database Integration in Biomedicine. J. Biomed. Inform. 34, 285–298 (2001)

    Article  Google Scholar 

  9. Hernandez, T., Kambhampati, S.: Integration of Biological Sources: Current Systems and Challenges ahead. SIGMOD Record 33, 51–60 (2004)

    Article  Google Scholar 

  10. The Gene Ontology Consortium: Creating the Gene Ontology Resource: Design and Implementation. Genome Res. 11, 1425–1433 (2001)

    Google Scholar 

  11. Khatri, P., Done, B., Rao, A., Done, A., Draghici, S.: A Semantic Analysis of the Annotations of the Human Genome. Bioinformatics 21, 3416–3421 (2005)

    Article  Google Scholar 

  12. Davidson, S.B., Overton, C., Tanen, V., Wong, L.: BioKleisli: A Digital Library for Biomedical Researchers. Int. J. Digit. Libr. 1, 36–53 (1997)

    Google Scholar 

  13. Davidson, S.B., Crabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, C., Stoeckert, C.: K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM System Journal 40, 512–531 (2001)

    Article  Google Scholar 

  14. Etzold, T., Ulyanov, A., Argos, P.: SRS: Information Retrieval System for Molecular Biology Data Banks. Methods Enzymol. 266, 114–128 (1996)

    Article  Google Scholar 

  15. Tatusova, T.A., Karsch-Mizrachi, I., Ostell, J.A.: Complete Genomes in WWW Entrez: Data Representation and Analysis. Bioinformatics 15, 536–543 (1999)

    Article  Google Scholar 

  16. Safran, M., Solomon, I., Shmueli, O., Lapidot, M., Shen-Orr, S., Adato, A., et al.: GeneCards 2002: Towards a Complete, Object-Oriented, Human Gene Compendium. Bioinformatics 18, 1542–1543 (2002)

    Article  Google Scholar 

  17. Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J.C., Hernandez-Boussard, T., et al.: SOURCE: A Unified Genomic Resource of Functional Annotations, Ontologies, and Gene Expression Data. Nucleic Acids Res. 31, 219–223 (2003)

    Article  Google Scholar 

  18. Freier, A., Hofestädt, R., Lange, M., Scholz, U., Stephanik, A.: BioDataServer: A SQL-Based Service for the Online Integration of Life Science Data. Silico Biol. 2, 37–57 (2002)

    Google Scholar 

  19. Haas, L.M., Schwarz, P.M., Kodali, P., Kotlar, E., Rice, J.E., Swops, W.C.: DiscoveryLink: A System for Integrated Access to Life Sciences Data Sources. IBM Systems Journal 40, 489–511 (2001)

    Article  Google Scholar 

  20. Kasprzyk, A., Keefe, D., Smedley, D., London, D., Spooner, W., Melsopp, C., et al.: EnsMart: A Generic System for Fast and Flexible Access to Biological Data. Genome Res. 14, 160–169 (2004)

    Article  Google Scholar 

  21. Lee, T.J., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D.W., Tenenbaum, J.D., Karp, P.D.: BioWarehouse: A Bioinformatics Database Warehouse Toolkit. BMC Bioinformatics 7, 1–14 (2006)

    Article  Google Scholar 

  22. Drineas, P.: Clustering large graphs via the singular value decomposition: Theoretical advances in data clustering. Machine Learning 56, 9–33 (2004)

    Article  MATH  Google Scholar 

  23. Lin, D.: An Information-Theoretic Definition of Similarity. In: Shavlik, J.W. (ed.) Proceedings of the 15th International Conference on Machine Learning (ICML 1998), pp. 296–304. Morgan Kaufmann Publishers Inc., San Francisco (1998)

    Google Scholar 

  24. King, O.D., Foulger, R.E., Dwight, S.S., White, J.V., Roth, F.P.: Predicting Gene Function From Patterns of Annotation. Genome Res. 13, 896–904 (2003)

    Article  Google Scholar 

  25. Tao, Y., Sam, L., Li, J., Friedman, C., Lussier, Y.A.: Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics 23, 529–538 (2007)

    Article  Google Scholar 

  26. AMD Core Math Library (ACML), http://developer.amd.com/cpu/libraries/acml/

  27. Rohde, D.: SVDLIBC, http://tedlab.mit.edu/~dr/SVDLIBC

  28. Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science & Engineering 5, 46–55 (1998)

    Article  Google Scholar 

  29. Gordon, R.: Essential JNI: Java Native Interface. Prentice-Hall, Inc., NJ (1998)

    Google Scholar 

  30. Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software (TOMS) 5 (1979)

    Google Scholar 

  31. Berry, M., Do, T., O’Brien, G., Krishna, V., Varadhan, S.: SVDPACKC (Version 1.0) User’s Guide. Citeseer (1993)

    Google Scholar 

  32. Angerson, B., Dongarra, G., McKenney, D.C., et al.: LAPACK: A portable linear algebra library for high-performance computers. In: Proceedings of the 1990 ACM/IEEE Conference on Supercomputing, pp. 2–11. IEEE Computer Society Press, Los Alamitos (1990)

    Google Scholar 

  33. Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999). ACM, New York (1999)

    Google Scholar 

  34. Egan, J.P.: Signal Detection Theory and ROC Analysis. Academic Press, New York (1975)

    Google Scholar 

  35. Search-Computing.org, http://www.search-computing.org

  36. Ceri, S., Brambilla, M. (eds.): Search Computing. LNCS, vol. 5950. Springer, Heidelberg (2010)

    Google Scholar 

  37. Masseroli, M., Ghisalberti, G.: Bio-SeCo: Integration and Global Ranking of Biomedical Search Results. In: Ceri, S., Brambilla, M. (eds.) Search Computing II. LNCS, vol. 6585, pp. 203–214. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chicco, D., Tagliasacchi, M., Masseroli, M. (2012). Genomic Annotation Prediction Based on Integrated Information. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2011. Lecture Notes in Computer Science(), vol 7548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35686-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35686-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35685-8

  • Online ISBN: 978-3-642-35686-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics