Abstract
In the recent years, an increasingly large amount of biomedical and biomolecular information and data has become available to researchers, allowing the scientific community to infer new knowledge and reach new objectives. As these information increase, so does the difficulty in managing it efficiently. In this paper, we present a short overview of our proposal to solve this problem, a prototypal multi-organism Genomic and Proteomic Data Warehouse called GPDW, based at Politecnico di Milano. We also present the computational methods we implemented to exploit it. Experimental studies on datasets demonstrated the effectiveness of our resource and methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Galperin, M.Y., Cochrane, G.R.: Nucleic Acids Research Annual Database Issue and the NAR Online Molecular Biology Database Collection in 2009. Nucleic Acids Res. 37(Database issue), D1–D4 (2009)
EMBL Nucleotide Sequence Database Statistics, http://www3.ebi.ac.uk/Services/DBStats/
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics Enrichment Tools: Paths toward the Comprehensive Functional Analysis of Large Gene Lists. Nucleic Acids Res. 37, 1–13 (2009)
Al-Shahrour, F., Minguez, P., Tárraga, J., Medina, I., Alloza, E., Montaner, D., Dopazo, J.: FatiGO+: A Functional Profiling Tool for Genomic Data. Integration of Functional Annotation, Regulatory Motifs and Interaction Data with Microarray Experiments. Nucleic Acids Res. 35(Web Server issue), W91–W96 (2007)
Huang, D.W., Sherman, B.T., Tan, Q., Kir, J., Liu, D., Bryant, D., Guo, Y., et al.: DAVID Bioinformatics Resources: Expanded Annotation Database and Novel Algorithms to Better Extract Biology from Large Gene Lists. Nucleic Acids Res. 35(Web Server issue), W169–W175 (2007)
Masseroli, M., Martucci, D., Pinciroli, F.: GFINDer: Genome Function INtegrated Discoverer through Dynamic Annotation, Statistical Analysis, and Mining. Nucleic Acids Res. 32, W293–W300 (2004)
Masseroli, M.: Management and Analysis of Genomic Functional and Phenotypic Controlled Annotations to Support Biomedical Investigation and Practice. IEEE Trans. Inf. Technol. Biomed. 11, 376–385 (2007)
Sujansky, W.: Heterogeneous Database Integration in Biomedicine. J. Biomed. Inform. 34, 285–298 (2001)
Hernandez, T., Kambhampati, S.: Integration of Biological Sources: Current Systems and Challenges ahead. SIGMOD Record 33, 51–60 (2004)
The Gene Ontology Consortium: Creating the Gene Ontology Resource: Design and Implementation. Genome Res. 11, 1425–1433 (2001)
Khatri, P., Done, B., Rao, A., Done, A., Draghici, S.: A Semantic Analysis of the Annotations of the Human Genome. Bioinformatics 21, 3416–3421 (2005)
Davidson, S.B., Overton, C., Tanen, V., Wong, L.: BioKleisli: A Digital Library for Biomedical Researchers. Int. J. Digit. Libr. 1, 36–53 (1997)
Davidson, S.B., Crabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, C., Stoeckert, C.: K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM System Journal 40, 512–531 (2001)
Etzold, T., Ulyanov, A., Argos, P.: SRS: Information Retrieval System for Molecular Biology Data Banks. Methods Enzymol. 266, 114–128 (1996)
Tatusova, T.A., Karsch-Mizrachi, I., Ostell, J.A.: Complete Genomes in WWW Entrez: Data Representation and Analysis. Bioinformatics 15, 536–543 (1999)
Safran, M., Solomon, I., Shmueli, O., Lapidot, M., Shen-Orr, S., Adato, A., et al.: GeneCards 2002: Towards a Complete, Object-Oriented, Human Gene Compendium. Bioinformatics 18, 1542–1543 (2002)
Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J.C., Hernandez-Boussard, T., et al.: SOURCE: A Unified Genomic Resource of Functional Annotations, Ontologies, and Gene Expression Data. Nucleic Acids Res. 31, 219–223 (2003)
Freier, A., Hofestädt, R., Lange, M., Scholz, U., Stephanik, A.: BioDataServer: A SQL-Based Service for the Online Integration of Life Science Data. Silico Biol. 2, 37–57 (2002)
Haas, L.M., Schwarz, P.M., Kodali, P., Kotlar, E., Rice, J.E., Swops, W.C.: DiscoveryLink: A System for Integrated Access to Life Sciences Data Sources. IBM Systems Journal 40, 489–511 (2001)
Kasprzyk, A., Keefe, D., Smedley, D., London, D., Spooner, W., Melsopp, C., et al.: EnsMart: A Generic System for Fast and Flexible Access to Biological Data. Genome Res. 14, 160–169 (2004)
Lee, T.J., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D.W., Tenenbaum, J.D., Karp, P.D.: BioWarehouse: A Bioinformatics Database Warehouse Toolkit. BMC Bioinformatics 7, 1–14 (2006)
Drineas, P.: Clustering large graphs via the singular value decomposition: Theoretical advances in data clustering. Machine Learning 56, 9–33 (2004)
Lin, D.: An Information-Theoretic Definition of Similarity. In: Shavlik, J.W. (ed.) Proceedings of the 15th International Conference on Machine Learning (ICML 1998), pp. 296–304. Morgan Kaufmann Publishers Inc., San Francisco (1998)
King, O.D., Foulger, R.E., Dwight, S.S., White, J.V., Roth, F.P.: Predicting Gene Function From Patterns of Annotation. Genome Res. 13, 896–904 (2003)
Tao, Y., Sam, L., Li, J., Friedman, C., Lussier, Y.A.: Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics 23, 529–538 (2007)
AMD Core Math Library (ACML), http://developer.amd.com/cpu/libraries/acml/
Rohde, D.: SVDLIBC, http://tedlab.mit.edu/~dr/SVDLIBC
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science & Engineering 5, 46–55 (1998)
Gordon, R.: Essential JNI: Java Native Interface. Prentice-Hall, Inc., NJ (1998)
Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software (TOMS) 5 (1979)
Berry, M., Do, T., O’Brien, G., Krishna, V., Varadhan, S.: SVDPACKC (Version 1.0) User’s Guide. Citeseer (1993)
Angerson, B., Dongarra, G., McKenney, D.C., et al.: LAPACK: A portable linear algebra library for high-performance computers. In: Proceedings of the 1990 ACM/IEEE Conference on Supercomputing, pp. 2–11. IEEE Computer Society Press, Los Alamitos (1990)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999). ACM, New York (1999)
Egan, J.P.: Signal Detection Theory and ROC Analysis. Academic Press, New York (1975)
Search-Computing.org, http://www.search-computing.org
Ceri, S., Brambilla, M. (eds.): Search Computing. LNCS, vol. 5950. Springer, Heidelberg (2010)
Masseroli, M., Ghisalberti, G.: Bio-SeCo: Integration and Global Ranking of Biomedical Search Results. In: Ceri, S., Brambilla, M. (eds.) Search Computing II. LNCS, vol. 6585, pp. 203–214. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chicco, D., Tagliasacchi, M., Masseroli, M. (2012). Genomic Annotation Prediction Based on Integrated Information. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2011. Lecture Notes in Computer Science(), vol 7548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35686-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-35686-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35685-8
Online ISBN: 978-3-642-35686-5
eBook Packages: Computer ScienceComputer Science (R0)