Abstract
The massive production and spread of biomedical data around the web introduces new challenges related to identify computational approaches for providing quality search and browsing of web resources. This papers presents BioCloud Search EnGene (BSE), a cloud application that facilitates searching and integration of the many layers of biological information offered by public large-scale genomic repositories. Grounding on the concept of dataspace, BSE is built on top of a cloud platform that severely curtails issues associated with scalability and performance. Like popular online gene portals, BSE adopts a gene-centric approach: researchers can find their information of interest by means of a simple “Google-like” query interface that accepts standard gene identification as keywords. We present BSE architecture and functionality and discuss how our strategies contribute to successfully tackle big data problems in querying gene-based web resources. BSE is publically available at: http://biocloud-unica.appspot.com/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ranganathan, S., Schönbach, C., Kelso, J., Rost, B., et al.: Towards big data science in the decade ahead from ten years of InCoB and the 1st ISCB-Asia Joint Conference. BMC Bioinform. 2011 12(suppl 13), S1 (2011)
Tankard, C.: Big data security. Netw. Secur. 2012(7), 5–8 (2012)
Pennisi, E.: Human genome 10th anniversary: will computers crash genomics? Science 331, 666–668 (2011)
Schadt, E.E., Linderman, M.D., Sorenson, J., Lee, L., Nolan, G.P.: Computational solutions to large-scale data management and analysis. Nat. Rev. Genet. 11, 647–657 (2010)
Marshall, E.: Human genome 10th anniversary: waiting for the revolution. Science 331, 526–529 (2011)
Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)
Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. In: Proceedings of PODS’06, pp. 1–9. ACM, New York (2006)
Hogue, C., Ohkawa, H., Bryant, S.: A dynamic look at structures: WWW-entrez and the molecular modeling database. Trends Biochem. Sci. 21, 226–229 (1996)
Ostell, J.: The entrez search and retrieval system. The NCBI Handbook [Internet] (2002), updated 2003. http://www.ncbi.nlm.nih.gov/books/NBK21081/
National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/
Bilofsky, H.S., Burks, C., Fickett, J.W., Goad, W.B., et al.: The GenBank genetic sequence databank. Nucl. Acids Res. 14(1), 1–4 (1986)
Mizrachi, I.: GenBank: the nucleotide sequence database. The NCBI Handbook [Internet] (2002), updated 2007. http://www.ncbi.nlm.nih.gov/books/NBK21105/
Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., et al.: Database resources of the national center for biotechnology information. Nucl. Acids Res. 40(Database issue), D13–D25 (2012)
McEntyre, J., Lipman, D.: PubMed: bridging the information gap. CMAJ 164(9), 1317–1319 (2001)
Canese, K., Jentsch, J., Myers, C.: PubMed: the bibliographic database. The NCBI Handbook [Internet] (2002), updated 2003. http://www.ncbi.nlm.nih.gov/books/NBK21094/
Dong, X., Halevy, A.Y.: Indexing dataspaces. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. SIGMOD’07, pp. 43–54. ACM, New York (2007)
Howe, B., Maier, D., Rayner, N., Rucker, J.: Quarrying dataspaces: schemaless profiling of unfamiliar information sources. In: Proceedings of ICDEW’08, pp. 270–277. IEEE Computer Society (2008)
Atzori, M., Dessì, N.: Dataspaces: where structure and schema meet. Stud. Comput. Intell. 375, 97–119 (2011)
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feed-back for dataspace systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. SIGMOD’08, pp. 847–860. ACM, New York (2008)
Hedeler, C., Belhajjame, K., Paton, N.W., Fernandes, A.A.A., et al.: Pay-as-you-go mapping selection in dataspaces. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. SIGMOD’11, pp. 1279–1282. ACM, New York (2011)
Chen, J., Qian, F., Yan, W., Shen, B.: Translational biomedical informatics in the cloud: present and future. BioMed. Res. Int. 2013, 8 (2013). Article ID 658925
Stonebraker, M.: SQL databases v. NoSQL databases. Commun. ACM 53(4), 10–11 (2010)
Sayers, E.: E-utilities quick start. Entrez Programming Utilities Help [Internet] (2008), updated 2013. http://www.ncbi.nlm.nih.gov/books/NBK25500/
Chambers, J., Davies, M., Gaulton, A., Hersey, A., et al.: UniChem: a unified chemical structure cross-referencing and identifier tracking system. J. Cheminform. 5, 3 (2013)
The UniProt Consortium: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucl. Acids Res. 40, D71–D75 (2012)
Jensen, L.J., Kuhn, M., Stark, M., Chaffron, S., et al.: STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucl. Acids Res. 37(Database issue), D412–D416 (2009)
Kelder, T., Pico, A.R., Hanspers, K., van Iersel, M.P., et al.: Mining biological pathways using WikiPathways web services. PLoS ONE 4(7), e6447 (2009)
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., et al.: KEGG for integration and interpretation of large-scale molecular datasets. Nucl. Acids Res. 40, D109–D114 (2012)
Wu, C., MacLeod, I., Su, A.I.: BioGPS and MyGene.info: organizing online, gene-centric information. Nucl. Acids Res. 41(Database issue), D561–D565 (2013)
Europe PMC. http://europepmc.org/RestfulWebService
NoSQL. www.nosql-database.org
Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucl. Acids Res. 33(Database issue), D54–D58 (2005)
Günther, S., Kuhn, M., Dunkel, M., Campillos, M., et al.: SuperTarget and Matador: resources for exploring drug-target relationships. Nucl. Acids Res. 36(Database issue), D919–D922 (2008)
Croft, D., O’Kelly, G., Wu, G., Haw, R., et al.: Reactome: a database of reactions, pathways and biological processes. Nucl. Acid Res. 39, D691–D697 (2011)
McKusick, V.A.: Mendelian Inheritance in Man: A Catalog of Human Genes and Genetic Disorders. Johns Hopkins University Press, Baltimore (1998)
Pruitt, K.D., Tatusova, T., Brown, G.R., Maglott, D.R.: NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucl. Acid Res. 40(Database issue), D130–D135 (2012)
de Magalhaes, J.P.: The biology of ageing: a primer. In: Stuart-Hamilton, I. (ed.) An Introduction to Gerontology, pp. 21–47. Cambridge University Press, Cambridge (2011)
Yang, W., Soares, J., Greninger, P., Edelman, E.J., et al.: Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucl. Acid Res. 41(Database issue), D955–D961 (2013)
Google App Engine. https://developers.google.com/appengine/
Biopython. www.biopython.org/
Acknowledgments
We thank Francesco Masulli for his appreciations and precious suggestions and the CIBB 2013 general chairs Enrico Formenti, Roberto Tagliaferri and Ernst Wit for hosting the presentation of BSE as special event at CIBB 2013.
This research was supported by RAS, Regione Autonoma della Sardegna (Legge regionale 7 agosto 2007, n. 7), in the project “DENIS: Dataspaces Enhancing the Next Internet in Sardinia”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Dessì, N., Pascariello, E., Milia, G., Pes, B. (2014). BioCloud Search EnGene: Surfing Biological Data on the Cloud. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-09042-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09041-2
Online ISBN: 978-3-319-09042-9
eBook Packages: Computer ScienceComputer Science (R0)