An Open Framework for Extensible Multi-stage Bioinformatics Software

  • Gabriel Keeble-Gagnère
  • Johan Nyström-Persson
  • Matthew I. Bellgard
  • Kenji Mizuguchi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7632)

Abstract

In research labs, there is often a need to customise software at every step in a given bioinformatics workflow, but traditionally it has been difficult to obtain both a high degree of customisability and good performance. Performance-sensitive tools are often highly monolithic, which can make research difficult. We present a novel set of software development principles and a bioinformatics framework, Friedrich, which is currently in early development. Friedrich applications support both early stage experimentation and late stage batch processing, since they simultaneously allow for good performance and a high degree of flexibility and customisability. These benefits are obtained in large part by basing Friedrich on the multiparadigm programming language Scala. We present a case study in the form of a basic genome assembler and its extension with new functionality. Our architecture has the potential to greatly increase the overall productivity of software developers and researchers in bioinformatics.

Keywords

Open Framework Scala Code Bioinformatics Application Short Read Data Early Stage Experimentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Cock, P.J.A., et al.: Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11), 1422–1423 (2009)CrossRefGoogle Scholar
  2. 2.
    Compeau, P.E.C., et al.: How to apply de Bruijn graphs to genome assembly. Nature Biotechnology 29(11), 987–991 (2011)CrossRefGoogle Scholar
  3. 3.
    Goecks, J., et al.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology 11(8), R86+ (2010)Google Scholar
  4. 4.
    Holland, R.C.G., et al.: BioJava: an Open-Source Framework for Bioinformatics. Bioinformatics 24(18), 2096–2097 (2008)CrossRefGoogle Scholar
  5. 5.
    Hundt, R.: Loop Recognition in C++/Java/Go/Scala. In: Proceedings of Scala Days 2011 (2011)Google Scholar
  6. 6.
    Hunter, A.A., et al.: Yabi: An online research environment for grid, high performance and cloud computing. Source Code for Biology and Medicine 7(1), 1+ (2012)CrossRefGoogle Scholar
  7. 7.
    Li, H., et al.: The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)CrossRefGoogle Scholar
  8. 8.
    MacLean, D., Kamoun, S.: Big data in small places. Nature Biotechnology 30(1), 33–34 (2012)CrossRefGoogle Scholar
  9. 9.
    Mangalam, H.: The Bio* toolkits–a brief overview. Briefings in Bioinformatics 3(3), 296–302 (2002)CrossRefGoogle Scholar
  10. 10.
    McKenna, A., et al.: The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20(9), 1297–1303 (2010)CrossRefGoogle Scholar
  11. 11.
    Mitsuteru, N.G., et al.: BioRuby: open-source bioinformatics library (2003)Google Scholar
  12. 12.
    Odersky, M.: The Scala Language Specification, Version 2.9 (May 2011), http://www.scala-lang.org/docu/files/ScalaReference.pdf
  13. 13.
    Prins, P.: BioScala (March 2011), https://github.com/bioscala/bioscala
  14. 14.
    Rother, K., et al.: A toolbox for developing bioinformatics software. Briefings in Bioinformatics 13(2), 244–257 (2012)CrossRefGoogle Scholar
  15. 15.
    Simpson, J.T., et al.: ABySS: a parallel assembler for short read sequence data. Genome research 19(6), 1117–1123 (2009)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Stajich, J.E., et al.: The Bioperl toolkit: Perl modules for the life sciences. Genome research 12(10), 1611–1618 (2002)CrossRefGoogle Scholar
  17. 17.
    Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5), 821–829 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Gabriel Keeble-Gagnère
    • 1
  • Johan Nyström-Persson
    • 2
  • Matthew I. Bellgard
    • 1
  • Kenji Mizuguchi
    • 2
  1. 1.Centre for Comparative GenomicsMurdoch UniversityAustralia
  2. 2.National Institute of Biomedical InnovationJapan

Personalised recommendations