Abstract
Technological advances in biological and biomedical data acquisition are creating mountains of data. Existing legacy applications are unable to process this data without using new strategies. However, some workloads in bioinformatics are easily parallelized by splitting the data, running legacy applications in parallel and then join the partial results into one final result. In this paper, we present Bio-Cirrus, a software package which facilitates this process. Our software consists of a user-friendly client (jORCA) for accessing Web Services and enacting workflows, and a module (Mr. Cirrus) for processing the data with a map/reduce style approach. Bio-Cirrus binaries and documentation are freely available at http://www.bitlab-es.com/cloud under the Creative Commons Attribution-No Derivative Works 2.5 Spain License and its source code is available under request. (GPL v3 license).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amazon Elastic Map Reduce, http://aws.amazon.com/elasticmapreduce/
Amazon Web Services, http://aws.amazon.com/
IBM SmartCloud, http://www.ibm.com/cloud-computing/us/en/ .
Program parameters for blastall, http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/
The Flipper Web Service Registration Tool, http://chirimoyo.ac.uma.es/flipper/
Windows Azure Storage API, http://msdn.microsoft.com/en-us/library/windowsazure/dd179355.aspx
Borthakur, D.: The Hadoop Distributed File System: Architecture and Design, http://hadoop.apache.org/common/docs/r0.18.0/hdfs_design.pdf
Gibbs, A.J., Mcintyre, G.A.: The diagram, a method for comparing sequences. European Journal of Biochemistry 16(1), 1–11 (1970)
Karlsson, J., Torreño, O., Ramet, D., Klambauer, G., Cano, M., Trelles, O.: Enabling large-scale bioinformatics data analysis with cloud computing. In: 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp. 640–645. IEEE (2012)
Karlsson, J., Trelles, O.: MAPI: a software framework for distributed biomedical applications. Journal of Biomedical Semantics 4(1), 4 (2013)
Parsons, M.: Multiple challenges for multicore processors (2009), http://www.isgtw.org/?pid=1001952
Martin-Requena, V., Ríos, J., García, M., Ramírez, S., Trelles, O.: JORCA: easily integrating bioinformatics Web Services. Bioinformatics 26(4), 553–559 (2010)
Polychronopoulos, C.D., Kuck, D.J.: Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers C-36(12), 1425–1439 (1987)
Ramet, D., Lago, J., Karlsson, J., Falgueras, J., Trelles, O.: Mr-Cirrus: Implementación de Map-Reduce bajo MPI para la ejecución paralela de programas secuenciales. In: Proceedings of XXII Jornadas de Paralelismo, Las Palmas de Gran Canaria, España (2011)
Taylor, R.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics 11(suppl. 12), S1+ (2010)
Trelles, O., Prins, P., Snir, M., Jansen, R.C.: Big data, but are we ready? Nature Reviews Genetics 12(3), 224–224 (2011)
Trelles-Salazar, O., Zapata, E.L., Carazo, J.M.: On an efficient parallelization of exhaustive sequence comparison algorithms on message passing architectures. Computer applications in the biosciences: CABIOS 10(5), 509–511 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karlsson, T.J.M. et al. (2013). Bio-Cirrus: A Framework for Running Legacy Bioinformatics Applications with Cloud Computing Resources. In: Rojas, I., Joya, G., Cabestany, J. (eds) Advances in Computational Intelligence. IWANN 2013. Lecture Notes in Computer Science, vol 7903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38682-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-38682-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38681-7
Online ISBN: 978-3-642-38682-4
eBook Packages: Computer ScienceComputer Science (R0)