Skip to main content

Accelerating Pairwise Sequence Alignment Algorithm by MapReduce Technique for Next-Generation Sequencing (NGS) Data Analysis

  • Conference paper
  • First Online:
Emerging Technologies in Data Mining and Information Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 813))

Abstract

Next-generation sequencing (NGS) technologies and different types of sequencing machines are introduced in an enormous volume of omics data. For analysis of NGS data, sequence alignment is always an essential step in finding relationships between sequences. Pairwise sequence alignment is a challenging task for reasonably large input sequences. Smith–Waterman (SW) is a popular centralized algorithm for sequence alignment. However, as data is spreading expeditiously, conventional centralized sequence alignment tools are inefficient in terms of computational time. In this paper, we propose a distributed pairwise sequence alignment technique using MapReduce implemented on Apache Spark framework, called MRaligner. We have compared the result of the proposed MRaligner with Jaligner, an open-source Java implementation of the Smith–Waterman algorithm for biological sequence alignment, and found significant improvement in terms of computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Buermans, H.P.J., Dunnen, J.T.: Next generation sequencing technology: advances and applications. Biochim. Biophys. Acta 1842, 1932–1941 (2014)

    Google Scholar 

  2. Benson, D.A.: GenBank. Nucleic Acids Res. 28, 15–18 (2000)

    Article  Google Scholar 

  3. Ekre, A.R., Mante, R.V.: Genome sequence alignment tools: a review. In: AEEICB16. 978-1-4673-9745-2 IEEE (2016)

    Google Scholar 

  4. Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26(5), 589–595, 2010 (2009)

    Google Scholar 

  5. Liu, Y., Schmidt, B. Long read alignment based on maximal exact match seeds. In: Bioinformatics. ECCB 2012, vol. 28, pp. i318–i324 (2012)

    Google Scholar 

  6. Smith, A.D., Xuan, Z., Zhang, M.Q.: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinform. (2008)

    Google Scholar 

  7. Li, H., et al.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. (2008)

    Google Scholar 

  8. Li, R., et al.: SOAP: short oligonucleotide alignment program. Bioinformatics (2008)

    Google Scholar 

  9. Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009)

    Article  Google Scholar 

  10. Nguyen, T.: CloudAligner: a fast and full-featured MapReduce based tool for sequence mapping. BMC Res. Notes 4, 171 (2011)

    Article  Google Scholar 

  11. Li, W., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11(5), 473–483 (2010)

    Article  Google Scholar 

  12. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  13. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)

    Article  Google Scholar 

  14. Moustafa, A.: JAligner: Open source Java implementation of Smith-Waterman. (2005)

    Google Scholar 

  15. Haque, W.: Pairwise sequence alignment algorithms: a survey. ISTA Information Science, Technology and Applications (2009)

    Google Scholar 

  16. Li, J.: Pairwise sequence alignment for very long sequences on GPUs. IEEE Int. Conf. Comput. Adv. Biol. Med. Sci. PMC (2013)

    Google Scholar 

  17. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters Google, Inc. (2004)

    Google Scholar 

  18. National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudip Mondal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mondal, S., Khatua, S. (2019). Accelerating Pairwise Sequence Alignment Algorithm by MapReduce Technique for Next-Generation Sequencing (NGS) Data Analysis. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_19

Download citation

Publish with us

Policies and ethics