Skip to main content

An Efficient Filtration Method Based on Variable-Length Seeds for Sequence Alignment

  • Conference paper
  • First Online:
Parallel Architecture, Algorithm and Programming (PAAP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

  • 1382 Accesses

Abstract

With the rapid development of next-generation sequencing (NGS) platforms, more than billions of reads are produced quickly. Finding all mapping locations of these reads in the reference genome is not only a bioinformatics issue, but also a large-scale computation issue. Existing all mapping tools are usually divided into the two steps, filtration and verification. Filtration step discards some wrong locations and generates candidates. As for verification step, each candidate is mapped to the reference sequence to determine whether it is a mapping location. Statistics indicated that the verification step is the main part of the whole mapping time. That is to say, less candidates lead to less mapping time. Our strategies improve filtration step to decrease the number of candidates.

We propose a dynamic programming and two heuristic strategies and integrated them into the filtration step. These strategies are applied in the state-of-the-art all-mapper, Bitmapper. Compared with the advanced all-mappers, experiment results show that our method make a significant progress.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinform. 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  2. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultra-fast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)

    Article  Google Scholar 

  3. Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with bowtie 2. Nat. Methods 9(4), 357–359 (2012)

    Article  Google Scholar 

  4. Hach, F., Hormozdiari, F., Alkan, C., et al.: mrsfast: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7(8), 576–577 (2010)

    Article  Google Scholar 

  5. Siragusa, E.: Approximate string matching for high-throughput sequencing. Ph.D. Dissertation, Freie University Berlin (2015)

    Google Scholar 

  6. Cheng, H., Jiang, H., Yang, J., et al.: Bitmapper: an efficient all-mapper based on bit-vector computing. BMC Bioinform. 16(1), 192 (2015)

    Article  Google Scholar 

  7. Xin, H., Lee, D., Hormozdiari, F., et al.: Accelerating read mapping with fasthash. BMC Bioinform. 14(1), S13 (2013)

    Article  Google Scholar 

  8. Kim, J., Li, C., Xie, X.: Improving read mapping using additional prefix grams. BMC Bioinform. 15(1), 42 (2014)

    Article  Google Scholar 

  9. Marco-Sola, S., Sammeth, M., et al.: The gem mapper: fast, accurate and versatile alignment by filtration. Nat. Methods 9(12), 1185–1188 (2012)

    Article  Google Scholar 

  10. Xin, H., Nahar, S., et al.: Optimal seed solver: optimizing seed selection in read mapping. Bioinform. 32(11), 1632–1642 (2016)

    Article  Google Scholar 

  11. Kim, J., Li, C., Xie, X.: Hobbes3: dynamic generation of variable-length signatures for efficient approximate subsequence mappings. In: IEEE 32nd International Conference on Data Engineering (ICDE). IEEE 2016

    Google Scholar 

  12. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  13. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  14. Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM (JACM) 46(3), 395–415 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  15. 1000 Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422): 56–65 (2012)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Nature Science Foundation of China under the grant No. 61672480 and the Program for Excellent Graduate Students in Collaborative Innovation Center of High Performance Computing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd

About this paper

Cite this paper

Guo, R., Cheng, H., Xu, Y. (2017). An Efficient Filtration Method Based on Variable-Length Seeds for Sequence Alignment. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6442-5_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6441-8

  • Online ISBN: 978-981-10-6442-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics