Abstract
As a convenient high performance computation system, cloud system is more and more popularly used in the field of bioinformatics. We develop and examine several hybrid computation models for high performance biological sequence alignment on a cloud system. In our practice, Smith-Waterman and CloudBurst alignment algorithms are evaluated for performance with the computation models, which are built from combinations of current technologies including Hadoop, SHadoop and Fair Scheduler, on the Amazon Elastic Compute Cloud (EC2) system. In our experiment with relatively small data sets, the computation model with SHadoop showed the best performance for both Smith-Waterman and CloudBurst algorithms, i.e., speedup of 1.86× and 1.19×, respectively, over the baseline model (with Hadoop). For relatively large data sets, the computation model with SHadoop showed the best performance with Smith-Waterman algorithm (1.03×) and the computation model with SHadoop plus Fair Scheduler showed the best performance with CloudBurst algorithm (1.15×).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Apache Hadoop. http://hadoop.apache.org. Accessed on Mar 2014
Amazon Elastic Compute Cloud. http://www.amazon.com/ec2/. Accessed on Mar 2014
Apache Fair Scheduler. http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. Accessed on Mar 2014
Gu R, Huang Y, Sun Y et al (2014) SHadoop: improving MapReduce performance by optimizing job execution mechanism in hadoop clusters. J Parallel Distrib Comput 74(3):2166–2179
Schatz M (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Job T, Park JH (2014) Exploiting high performance on bioinformatics applications in a cloud system. In: Proceedings of the world congress on engineering and computer science, WCECS 2014. Lecture notes in engineering and computer science, San Francisco, USA, pp 563–566, 22–24 Oct 2014
Quan Z, Wen-rui J, Xu-bin L, Yi J (2012) Hadoop applications in bioinformatics. In: Proceedings of the 7th open cirrus summit, pp 48–52
Cano M, Karlsson J, Klambauer G, et al (2012) Enabling large-scale bioinformatics data analysis with cloud computing. In: Proceedings of the 10th IEEE international symposium on parallel and distributed processing with application, pp 640–645
Widera P, Krasnogor N (2011) Protein models comparator: scalable bioinformatics computing on the Google App Engine platform. Comput Res Repository 1:1–8
Ekanayake J, Gunarathne T, Qiu J (2011) Cloud tecnologies for bioinformatics applications. IEEE Trans Parallel Distrib Syst 22(6):998–1011
Dai L, Gao X, Guo Y, Zhang Z (2012) Bioinformatics clouds for big data manipulation. Biol Direct 7(43):1–7
Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, Nelson K (2012) Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinf 13(42):1–8
Smith AD et al (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinf 9:128
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Job, T., Park, J.H. (2015). Hybrid Computation Models for High Performance Biological Sequence Alignment on a Cloud System. In: Kim, H., Amouzegar, M., Ao, Sl. (eds) Transactions on Engineering Technologies. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7236-5_26
Download citation
DOI: https://doi.org/10.1007/978-94-017-7236-5_26
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-7235-8
Online ISBN: 978-94-017-7236-5
eBook Packages: EngineeringEngineering (R0)