Galaxy: A Gateway to Tools in e-Science

  • Enis Afgan
  • Jeremy Goecks
  • Dannon Baker
  • Nate Coraor
  • The Galaxy Team
  • Anton Nekrutenko
  • James Taylor
Part of the Computer Communications and Networks book series (CCN)


e-Science focuses on the use of computational tools and resources to analyze large scientific datasets. Performing these analyses often requires running a variety of computational tools specific to a given scientific domain. This places a significant burden on individual researchers for whom simply running these tools may be prohibitively difficult, let alone combining tools into a complete analysis, or acquiring data and appropriate computational resources. This limits the productivity of individual researchers and represents a significant barrier to potential scientific discovery. In order to alleviate researchers from such unnecessary complexities and promote more robust science, we have developed a tool integration framework called Galaxy; Galaxy abstracts individual tools behind a consistent and easy-to-use web interface to enable advanced data analysis that requires no informatics expertise. Furthermore, Galaxy facilitates easy addition of developed tools, thus supporting tool developers, as well as transparent and reproducible communication of computationally intensive analyses. Recently, we have enabled trivial deployment of complete a Galaxy solution on aggregated infrastructures, including cloud computing providers.


Cloud Computing Configuration File Domain Scientist Work Instance Machine Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Galaxy is developed by the Galaxy Team: Enis Afgan, Guruprasad Ananda, Dannon Baker, Dan Blankenberg, Ramkrishna Chakrabarty, Nate Coraor, Jeremy Goecks, Greg Von Kuster, Ross Lazarus, Kanwei Li, Anton Nekrutenko, James Taylor, and Kelly Vincent. We thank our many collaborators who support and maintain data warehouses and browsers accessible through Galaxy. Development of the Galaxy framework is supported by NIH grants HG004909 (A.N. and J.T), HG005133 (J.T. and A.N), and HG005542 (J.T. and A.N.), by NSF grant DBI-0850103 (A.N. and J.T) and by funds from the Huck Institutes for the Life Sciences and the Institute for CyberScience at Penn State. Additional funding is provided, in part, under a grant with the Pennsylvania Department of Health using Tobacco Settlement Funds. The Department specifically disclaims responsibility for any analyses, interpretations, or conclusions.


  1. 1.
    NCBI. (2009, February 3). GenBank Statistics. Available:
  2. 2.
    E. Huedo, R. S. Montero, and I. M. Llorente, “A Framework for Adaptive Execution on Grids,” Journal of Software - Practice and Experience, vol. 34, issue 7, pp. 631–651, June 2004.CrossRefGoogle Scholar
  3. 3.
    E. Afgan and P. Bangalore, “Dynamic BLAST – a Grid Enabled BLAST,” International Journal of Computer Science and Network Security (IJCSNS), vol. 9, issue 4, pp. 149–157, April 2009.Google Scholar
  4. 4.
    D. Blankenberg, J. Taylor, I. Schenck, J. He, Y. Zhang, M. Ghent, N. Veeraraghavan, I. Albert, W. Miller, K. Makova, R. Hardison, and A. Nekrutenko, “A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly,” Genome Research, vol. 17, issue 6, pp. 960–964, Jun 2007.CrossRefGoogle Scholar
  5. 5.
    J. Taylor, I. Schenck, D. Blankenberg, and A. Nekrutenko, “Using Galaxy to perform large-scale interactive data analyses,” Current Protocols in Bioinformatics, vol. 19, pp. 10.5.1–10.5.25, Sep 2007.Google Scholar
  6. 6.
    M. Reich, T. Liefeld, J. Gould, J. Lerner, P. Tamayo, and J. Mesirov, “GenePattern 2.0,” Nature genetics, vol. 38, issue 5, pp. 500–501, 2006.CrossRefGoogle Scholar
  7. 7.
    B. Langmead, C. Trapnell, M. Pop, and S. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome biology, vol. 10, issue 3, p. 25, Mar 4 2009.CrossRefGoogle Scholar
  8. 8.
    P. Kosakovsky, S. Wadhawan, F. Chiaromonte, G. Ananda, W. Chung, J. Taylor, and A. Nekrutenko, “Windshield splatter analysis with the Galaxy metagenomic pipeline,” Genome Research, vol. 19, issue 11, Oct 9 2009.Google Scholar
  9. 9.
    R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility,” Future Generation Computer Systems, vol. 25, issue 6, pp. 599–616, June 2009.CrossRefGoogle Scholar
  10. 10.
    M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above the Clouds: A Berkeley View of Cloud Computing,” University of California at Berkeley UCB/EECS-2009-28, February 10 2009.Google Scholar
  11. 11.
    J. Nielsen, Designing web usability, 1st ed.: Peachpit Press, 1999.Google Scholar
  12. 12.
    S. Peleg, F. Sananbenesi, A. Zovoilis, S. Burkhardt, S. Bahari-Javan, R. Agis-Balboa, P. Cota, J. Wittnam, A. Gogol-Doering, and L. Opitz, “Altered Histone Acetylation Is Associated with Age-Dependent Memory Impairment in Mice,” Science, vol. 328, issue 5979, pp. 753–756, 2010.CrossRefGoogle Scholar
  13. 13.
    S. Kosakovsky Pond, S. Wadhawan, F. Chiaromonte, G. Ananda, W. Chung, J. Taylor, and A. Nekrutenko, “Windshield splatter analysis with the Galaxy metagenomic pipeline,” Genome Research, vol. 19, issue 11, pp. 2144–2153, 2009.CrossRefGoogle Scholar
  14. 14.
    K. Gaulton, T. Nammo, L. Pasquali, J. Simon, P. Giresi, M. Fogarty, T. Panhuis, P. Mieczkowski, A. Secchi, and D. Bosco, “A map of open chromatin in human pancreatic islets,” Nature genetics, vol. 42, issue 3, pp. 255–259, 2010.CrossRefGoogle Scholar
  15. 15.
    R. Kikuchi, S. Yagi, H. Kusuhara, S. Imai, Y. Sugiyama, and K. Shiota, “Genome-wide analysis of epigenetic signatures for kidney-specific transporters,” Kidney International, 2010.Google Scholar
  16. 16.
    J. Parkhill, E. Birney, and P. Kersey, “Genomic information infrastructure after the deluge,” Genome biology, vol. 11, issue 7, p. 402, 2010.CrossRefGoogle Scholar
  17. 17.
    The Grid: Blueprint for a New Computing Infrastructure, 1st ed.: Morgan Kaufmann Publishers, 1998.Google Scholar
  18. 18.
    K. Keahey and T. Freeman, “Contextualization: Providing one-click virtual clusters,” in IEEE International Conference on eScience, Indianapolis, IN, 2008, pp. 301–308.Google Scholar
  19. 19.
    D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov, “The eucalyptus open-source cloud-computing system,” in Cloud Computing and Its Applications, Shanghai, China, 2008, pp. 1–5.Google Scholar
  20. 20.
    I. M. Llorente, R. Moreno-Vozmediano, and R. S. Montero, “Cloud Computing for On-Demand Grid Resource Provisioning,” Advances in Parallel Computing, vol. 18, pp. 177–191, 2009.Google Scholar
  21. 21.
    K. Keahey, I. Foster, T. Freeman, and X. Zhang, “Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid,” Scientific Programming Journal, Special Issue: Dynamic Grids and Worldwide Computing, vol. 13, issue 4, pp. 265–276, 2005.Google Scholar
  22. 22.
    H. Nishimura, N. Maruyama, and S. Matsuoka, “Virtual clusters on the fly-fast, scalable, and flexible installation,” in CCGrid Rio de Janeiro, Brazil, 2007, pp. 549–556.Google Scholar
  23. 23.
    A. W. Group, “AMQP - A General-Purpose Middleware Standard,” ed, p. 291.Google Scholar
  24. 24.
    A. Siepel, A. Farmer, A. Tolopko, M. Zhuang, P. Mendes, W. Beavis, and B. Sobral, “ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources,” Bioinformatics, vol. 17, issue 1, pp. 83–94, Aug 14 2001.CrossRefGoogle Scholar
  25. 25.
    S. Subramaniam, “The Biology Workbench--a seamless database and analysis environment for the biologist,” Proteins, vol. 32, issue 1, pp. 1–2, Jul 1 1998.CrossRefGoogle Scholar
  26. 26.
    K. Choi, Y. Ma, J.-H. Choi, and S. Kim, “PLATCOM: a Platform for Computational Comparative Genomics,” Bioinformatics, vol. 21, issue 10, pp. 2514–2516, Feb 24 2005.CrossRefGoogle Scholar
  27. 27.
    T. Etzold and P. Argos, “SRS--an indexing and retrieval tool for flat file data libraries,” Bioinformatics, vol. 9, issue 1, pp. 49–57, 1993.CrossRefGoogle Scholar
  28. 28.
    E. Kawas, M. Senger, and M. D. Wilkinson, “BioMoby extensions to the Taverna workflow management and enactment software,” BMC Bioinformatics, vol. 7, p. 253, 2006.CrossRefGoogle Scholar
  29. 29.
    D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, and T. Oinn, “Taverna: a tool for building and running workflows of services,” Nucleic Acids Research, vol. 34, issue Web Server issue, pp. W729–32, 2006.CrossRefGoogle Scholar
  30. 30.
    D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, and T. Oinn, “Taverna: a tool for building and running workflows of services,” Nucleic Acids Research, vol. 34, issue Web Server issue, pp. W729–32, 2006.CrossRefGoogle Scholar
  31. 31.
    S. Pepke, B. Wold, and A. Mortazavi, “Computation for ChIP-seq and RNA-seq studies,” Nature methods, vol. 6, pp. S22–S32, 2009.CrossRefGoogle Scholar
  32. 32.
    B. Moore, “Taking the data center: Power and cooling challenge,” Energy User News, vol. 27, issue 9, p. 20, 2002.Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Enis Afgan
    • 1
  • Jeremy Goecks
    • 1
  • Dannon Baker
    • 1
  • Nate Coraor
    • 2
  • The Galaxy Team
    • 3
    • 4
  • Anton Nekrutenko
    • 2
  • James Taylor
    • 1
  1. 1.Department of Biology and Department of Mathematics & Computer ScienceEmory UniversityDruid HillsUSA
  2. 2.Huck Institutes of the Life Sciences and Department of Biochemistry and Molecular BiologyThe Pennsylvania State UniversityUniversity ParkUSA
  3. 3.Pennsylvania State UniversityUniversity ParkUSA
  4. 4.Emory UniversityAtlantaUSA

Personalised recommendations