Skip to main content

Hybrid Computation Models for High Performance Biological Sequence Alignment on a Cloud System

  • Conference paper
  • First Online:
Transactions on Engineering Technologies
  • 653 Accesses

Abstract

As a convenient high performance computation system, cloud system is more and more popularly used in the field of bioinformatics. We develop and examine several hybrid computation models for high performance biological sequence alignment on a cloud system. In our practice, Smith-Waterman and CloudBurst alignment algorithms are evaluated for performance with the computation models, which are built from combinations of current technologies including Hadoop, SHadoop and Fair Scheduler, on the Amazon Elastic Compute Cloud (EC2) system. In our experiment with relatively small data sets, the computation model with SHadoop showed the best performance for both Smith-Waterman and CloudBurst algorithms, i.e., speedup of 1.86× and 1.19×, respectively, over the baseline model (with Hadoop). For relatively large data sets, the computation model with SHadoop showed the best performance with Smith-Waterman algorithm (1.03×) and the computation model with SHadoop plus Fair Scheduler showed the best performance with CloudBurst algorithm (1.15×).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  2. Apache Hadoop. http://hadoop.apache.org. Accessed on Mar 2014

  3. Amazon Elastic Compute Cloud. http://www.amazon.com/ec2/. Accessed on Mar 2014

  4. Apache Fair Scheduler. http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. Accessed on Mar 2014

  5. Gu R, Huang Y, Sun Y et al (2014) SHadoop: improving MapReduce performance by optimizing job execution mechanism in hadoop clusters. J Parallel Distrib Comput 74(3):2166–2179

    Article  Google Scholar 

  6. Schatz M (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369

    Article  Google Scholar 

  7. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  Google Scholar 

  8. Job T, Park JH (2014) Exploiting high performance on bioinformatics applications in a cloud system. In: Proceedings of the world congress on engineering and computer science, WCECS 2014. Lecture notes in engineering and computer science, San Francisco, USA, pp 563–566, 22–24 Oct 2014

    Google Scholar 

  9. Quan Z, Wen-rui J, Xu-bin L, Yi J (2012) Hadoop applications in bioinformatics. In: Proceedings of the 7th open cirrus summit, pp 48–52

    Google Scholar 

  10. Cano M, Karlsson J, Klambauer G, et al (2012) Enabling large-scale bioinformatics data analysis with cloud computing. In: Proceedings of the 10th IEEE international symposium on parallel and distributed processing with application, pp 640–645

    Google Scholar 

  11. Widera P, Krasnogor N (2011) Protein models comparator: scalable bioinformatics computing on the Google App Engine platform. Comput Res Repository 1:1–8

    Google Scholar 

  12. Ekanayake J, Gunarathne T, Qiu J (2011) Cloud tecnologies for bioinformatics applications. IEEE Trans Parallel Distrib Syst 22(6):998–1011

    Article  Google Scholar 

  13. Dai L, Gao X, Guo Y, Zhang Z (2012) Bioinformatics clouds for big data manipulation. Biol Direct 7(43):1–7

    Google Scholar 

  14. Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, Nelson K (2012) Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinf 13(42):1–8

    Google Scholar 

  15. Smith AD et al (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinf 9:128

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin H. Park .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Job, T., Park, J.H. (2015). Hybrid Computation Models for High Performance Biological Sequence Alignment on a Cloud System. In: Kim, H., Amouzegar, M., Ao, Sl. (eds) Transactions on Engineering Technologies. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7236-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-7236-5_26

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-017-7235-8

  • Online ISBN: 978-94-017-7236-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics