Hybrid Computation Models for High Performance Biological Sequence Alignment on a Cloud System

Job, Taylor; Park, Jin H.

doi:10.1007/978-94-017-7236-5_26

Taylor Job⁴ &
Jin H. Park⁴

653 Accesses

Abstract

As a convenient high performance computation system, cloud system is more and more popularly used in the field of bioinformatics. We develop and examine several hybrid computation models for high performance biological sequence alignment on a cloud system. In our practice, Smith-Waterman and CloudBurst alignment algorithms are evaluated for performance with the computation models, which are built from combinations of current technologies including Hadoop, SHadoop and Fair Scheduler, on the Amazon Elastic Compute Cloud (EC2) system. In our experiment with relatively small data sets, the computation model with SHadoop showed the best performance for both Smith-Waterman and CloudBurst algorithms, i.e., speedup of 1.86× and 1.19×, respectively, over the baseline model (with Hadoop). For relatively large data sets, the computation model with SHadoop showed the best performance with Smith-Waterman algorithm (1.03×) and the computation model with SHadoop plus Fair Scheduler showed the best performance with CloudBurst algorithm (1.15×).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Apache Hadoop. http://hadoop.apache.org. Accessed on Mar 2014
Amazon Elastic Compute Cloud. http://www.amazon.com/ec2/. Accessed on Mar 2014
Apache Fair Scheduler. http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. Accessed on Mar 2014
Gu R, Huang Y, Sun Y et al (2014) SHadoop: improving MapReduce performance by optimizing job execution mechanism in hadoop clusters. J Parallel Distrib Comput 74(3):2166–2179
Article Google Scholar
Schatz M (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369
Article Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Article Google Scholar
Job T, Park JH (2014) Exploiting high performance on bioinformatics applications in a cloud system. In: Proceedings of the world congress on engineering and computer science, WCECS 2014. Lecture notes in engineering and computer science, San Francisco, USA, pp 563–566, 22–24 Oct 2014
Google Scholar
Quan Z, Wen-rui J, Xu-bin L, Yi J (2012) Hadoop applications in bioinformatics. In: Proceedings of the 7th open cirrus summit, pp 48–52
Google Scholar
Cano M, Karlsson J, Klambauer G, et al (2012) Enabling large-scale bioinformatics data analysis with cloud computing. In: Proceedings of the 10th IEEE international symposium on parallel and distributed processing with application, pp 640–645
Google Scholar
Widera P, Krasnogor N (2011) Protein models comparator: scalable bioinformatics computing on the Google App Engine platform. Comput Res Repository 1:1–8
Google Scholar
Ekanayake J, Gunarathne T, Qiu J (2011) Cloud tecnologies for bioinformatics applications. IEEE Trans Parallel Distrib Syst 22(6):998–1011
Article Google Scholar
Dai L, Gao X, Guo Y, Zhang Z (2012) Bioinformatics clouds for big data manipulation. Biol Direct 7(43):1–7
Google Scholar
Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, Nelson K (2012) Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinf 13(42):1–8
Google Scholar
Smith AD et al (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinf 9:128
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, California State University, Fresno, CA, 93740, USA
Taylor Job & Jin H. Park

Authors

Taylor Job
View author publications
You can also search for this author in PubMed Google Scholar
Jin H. Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin H. Park .

Editor information

Editors and Affiliations

Computer & Communication, Catholic University of DaeGu Engineering College, DaeGu, Korea, Republic of (South Korea)
Haeng Kon Kim
College of Engineering, California State Polytechnic University, Pomona, California, USA
Mahyar A. Amouzegar
IAENG Secretariat, International Association of Engineers, Hong Kong, Hong Kong SAR
Sio-long Ao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Job, T., Park, J.H. (2015). Hybrid Computation Models for High Performance Biological Sequence Alignment on a Cloud System. In: Kim, H., Amouzegar, M., Ao, Sl. (eds) Transactions on Engineering Technologies. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7236-5_26

Download citation

DOI: https://doi.org/10.1007/978-94-017-7236-5_26
Published: 08 July 2015
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-7235-8
Online ISBN: 978-94-017-7236-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics