Skip to main content

BioCloud Search EnGene: Surfing Biological Data on the Cloud

  • Conference paper
  • First Online:
Book cover Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2013)

Abstract

The massive production and spread of biomedical data around the web introduces new challenges related to identify computational approaches for providing quality search and browsing of web resources. This papers presents BioCloud Search EnGene (BSE), a cloud application that facilitates searching and integration of the many layers of biological information offered by public large-scale genomic repositories. Grounding on the concept of dataspace, BSE is built on top of a cloud platform that severely curtails issues associated with scalability and performance. Like popular online gene portals, BSE adopts a gene-centric approach: researchers can find their information of interest by means of a simple “Google-like” query interface that accepts standard gene identification as keywords. We present BSE architecture and functionality and discuss how our strategies contribute to successfully tackle big data problems in querying gene-based web resources. BSE is publically available at: http://biocloud-unica.appspot.com/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ranganathan, S., Schönbach, C., Kelso, J., Rost, B., et al.: Towards big data science in the decade ahead from ten years of InCoB and the 1st ISCB-Asia Joint Conference. BMC Bioinform. 2011 12(suppl 13), S1 (2011)

    Article  Google Scholar 

  2. Tankard, C.: Big data security. Netw. Secur. 2012(7), 5–8 (2012)

    Article  Google Scholar 

  3. Pennisi, E.: Human genome 10th anniversary: will computers crash genomics? Science 331, 666–668 (2011)

    Article  Google Scholar 

  4. Schadt, E.E., Linderman, M.D., Sorenson, J., Lee, L., Nolan, G.P.: Computational solutions to large-scale data management and analysis. Nat. Rev. Genet. 11, 647–657 (2010)

    Article  Google Scholar 

  5. Marshall, E.: Human genome 10th anniversary: waiting for the revolution. Science 331, 526–529 (2011)

    Article  Google Scholar 

  6. Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)

    Article  Google Scholar 

  7. Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. In: Proceedings of PODS’06, pp. 1–9. ACM, New York (2006)

    Google Scholar 

  8. Hogue, C., Ohkawa, H., Bryant, S.: A dynamic look at structures: WWW-entrez and the molecular modeling database. Trends Biochem. Sci. 21, 226–229 (1996)

    Article  Google Scholar 

  9. Ostell, J.: The entrez search and retrieval system. The NCBI Handbook [Internet] (2002), updated 2003. http://www.ncbi.nlm.nih.gov/books/NBK21081/

  10. National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/

  11. Bilofsky, H.S., Burks, C., Fickett, J.W., Goad, W.B., et al.: The GenBank genetic sequence databank. Nucl. Acids Res. 14(1), 1–4 (1986)

    Article  Google Scholar 

  12. Mizrachi, I.: GenBank: the nucleotide sequence database. The NCBI Handbook [Internet] (2002), updated 2007. http://www.ncbi.nlm.nih.gov/books/NBK21105/

  13. Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., et al.: Database resources of the national center for biotechnology information. Nucl. Acids Res. 40(Database issue), D13–D25 (2012)

    Article  Google Scholar 

  14. McEntyre, J., Lipman, D.: PubMed: bridging the information gap. CMAJ 164(9), 1317–1319 (2001)

    Google Scholar 

  15. Canese, K., Jentsch, J., Myers, C.: PubMed: the bibliographic database. The NCBI Handbook [Internet] (2002), updated 2003. http://www.ncbi.nlm.nih.gov/books/NBK21094/

  16. Dong, X., Halevy, A.Y.: Indexing dataspaces. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. SIGMOD’07, pp. 43–54. ACM, New York (2007)

    Google Scholar 

  17. Howe, B., Maier, D., Rayner, N., Rucker, J.: Quarrying dataspaces: schemaless profiling of unfamiliar information sources. In: Proceedings of ICDEW’08, pp. 270–277. IEEE Computer Society (2008)

    Google Scholar 

  18. Atzori, M., Dessì, N.: Dataspaces: where structure and schema meet. Stud. Comput. Intell. 375, 97–119 (2011)

    Article  Google Scholar 

  19. Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feed-back for dataspace systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. SIGMOD’08, pp. 847–860. ACM, New York (2008)

    Google Scholar 

  20. Hedeler, C., Belhajjame, K., Paton, N.W., Fernandes, A.A.A., et al.: Pay-as-you-go mapping selection in dataspaces. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. SIGMOD’11, pp. 1279–1282. ACM, New York (2011)

    Google Scholar 

  21. Chen, J., Qian, F., Yan, W., Shen, B.: Translational biomedical informatics in the cloud: present and future. BioMed. Res. Int. 2013, 8 (2013). Article ID 658925

    Google Scholar 

  22. Stonebraker, M.: SQL databases v. NoSQL databases. Commun. ACM 53(4), 10–11 (2010)

    Article  Google Scholar 

  23. Sayers, E.: E-utilities quick start. Entrez Programming Utilities Help [Internet] (2008), updated 2013. http://www.ncbi.nlm.nih.gov/books/NBK25500/

  24. Chambers, J., Davies, M., Gaulton, A., Hersey, A., et al.: UniChem: a unified chemical structure cross-referencing and identifier tracking system. J. Cheminform. 5, 3 (2013)

    Article  Google Scholar 

  25. The UniProt Consortium: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucl. Acids Res. 40, D71–D75 (2012)

    Article  Google Scholar 

  26. Jensen, L.J., Kuhn, M., Stark, M., Chaffron, S., et al.: STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucl. Acids Res. 37(Database issue), D412–D416 (2009)

    Article  Google Scholar 

  27. Kelder, T., Pico, A.R., Hanspers, K., van Iersel, M.P., et al.: Mining biological pathways using WikiPathways web services. PLoS ONE 4(7), e6447 (2009)

    Article  Google Scholar 

  28. Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., et al.: KEGG for integration and interpretation of large-scale molecular datasets. Nucl. Acids Res. 40, D109–D114 (2012)

    Article  Google Scholar 

  29. Wu, C., MacLeod, I., Su, A.I.: BioGPS and MyGene.info: organizing online, gene-centric information. Nucl. Acids Res. 41(Database issue), D561–D565 (2013)

    Article  Google Scholar 

  30. Europe PMC. http://europepmc.org/RestfulWebService

  31. NoSQL. www.nosql-database.org

  32. Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucl. Acids Res. 33(Database issue), D54–D58 (2005)

    Article  Google Scholar 

  33. Günther, S., Kuhn, M., Dunkel, M., Campillos, M., et al.: SuperTarget and Matador: resources for exploring drug-target relationships. Nucl. Acids Res. 36(Database issue), D919–D922 (2008)

    Google Scholar 

  34. Croft, D., O’Kelly, G., Wu, G., Haw, R., et al.: Reactome: a database of reactions, pathways and biological processes. Nucl. Acid Res. 39, D691–D697 (2011)

    Article  Google Scholar 

  35. McKusick, V.A.: Mendelian Inheritance in Man: A Catalog of Human Genes and Genetic Disorders. Johns Hopkins University Press, Baltimore (1998)

    Google Scholar 

  36. Pruitt, K.D., Tatusova, T., Brown, G.R., Maglott, D.R.: NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucl. Acid Res. 40(Database issue), D130–D135 (2012)

    Article  Google Scholar 

  37. de Magalhaes, J.P.: The biology of ageing: a primer. In: Stuart-Hamilton, I. (ed.) An Introduction to Gerontology, pp. 21–47. Cambridge University Press, Cambridge (2011)

    Chapter  Google Scholar 

  38. Yang, W., Soares, J., Greninger, P., Edelman, E.J., et al.: Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucl. Acid Res. 41(Database issue), D955–D961 (2013)

    Article  Google Scholar 

  39. Google App Engine. https://developers.google.com/appengine/

  40. Biopython. www.biopython.org/

Download references

Acknowledgments

We thank Francesco Masulli for his appreciations and precious suggestions and the CIBB 2013 general chairs Enrico Formenti, Roberto Tagliaferri and Ernst Wit for hosting the presentation of BSE as special event at CIBB 2013.

This research was supported by RAS, Regione Autonoma della Sardegna (Legge regionale 7 agosto 2007, n. 7), in the project “DENIS: Dataspaces Enhancing the Next Internet in Sardinia”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicoletta Dessì .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Dessì, N., Pascariello, E., Milia, G., Pes, B. (2014). BioCloud Search EnGene: Surfing Biological Data on the Cloud. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09042-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09041-2

  • Online ISBN: 978-3-319-09042-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics