Advertisement

Efficient Approximate Subsequence Matching Using Hybrid Signatures

  • Tao Qiu
  • Xiaochun Yang
  • Bin Wang
  • Yutong Han
  • Siyao Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10827)

Abstract

In this paper, we focus on the problem of approximate subsequence matching, also called the read mapping problem in genomics, which is finding similar subsequences (A subsequence refers to a substring which has consecutive characters) of a query (DNA subsequence) from a reference genome under a user-specified similarity threshold k. Existing methods first extract subsequences from a query to generate signatures, then produce candidate positions using the generated signatures, and finally verify these candidate positions to obtain the true mapping positions. However, there exist two main issues in these works: (1) producing many candidate positions; and (2) generating large numbers of signatures, among which many signatures are redundant. To address the above two issues, we propose a novel filtering technique, called hybrid signatures, which can achieve a better balance between the filtering ability of signatures and the overhead of producing candidate positions. Accordingly, we devise an adaptive algorithm to produce candidate positions using hybrid signatures. Finally, the experimental results on real-world genomic sequences show that our method outperforms state-of-the-art methods in query efficiency.

Keywords

Read mapping Approximate subsequence matching Hybrid signatures 

References

  1. 1.
    Ahmadi, A., Behm, A., Honnalli, N., Li, C., Xie, X.: Hobbes: optimized gram-based methods for efficient read alignment. Nucleic Acids Res. 40, e41 (2012)CrossRefGoogle Scholar
  2. 2.
    Kim, J., Li, C., Xie, X.: Improving read mapping using additional prefix grams. BMC Bioinf. 15(1), 42 (2014)CrossRefGoogle Scholar
  3. 3.
    Kim, J., Li, C., Xie, X.: Hobbes3: dynamic generation of variable-length signatures for efficient approximate subsequence mappings. In: ICDE 2016. IEEE (2016)Google Scholar
  4. 4.
    Yang, X., Wang, B., Li, C., Wang, J., Xie, X.: Efficient direct search on compressed genomic data. In: ICDE 2013, Brisbane, Australia, 8–12 April 2013, pp. 961–972 (2013)Google Scholar
  5. 5.
    Yang, X., Wang, Y., Wang, B., Wang, W.: Local filtering: improving the performance of approximate queries on string collections. In: SIGMOD 2015, pp. 377–392 (2015)Google Scholar
  6. 6.
    Qin, J., Wang, W., Xiao, C., Lu, Y., Lin, X., Wang, H.: Asymmetric signature schemes for efficient exact edit similarity query processing. ACM Trans. Database Syst. 38(3), 16 (2013)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Wang, J., Yang, X., Wang, B., Liu, C.: LS-Join: local similarity join on string collections. IEEE Trans. Knowl. Data Eng. 29(9), 1928–1942 (2017)CrossRefGoogle Scholar
  8. 8.
    Siragusa, E., Weese, D., Reinert, K.: Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Res. 41, e78 (2013)CrossRefGoogle Scholar
  9. 9.
    Cheng, H., Jiang, H., Yang, J., Xu, Y., Shang, Y.: BitMapper: an efficient all-mapper based on bit-vector computing. BMC Bioinf. 16, 192 (2016)CrossRefGoogle Scholar
  10. 10.
    Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol. 10, r25 (2009)CrossRefGoogle Scholar
  11. 11.
    Langmead, B., Salzberg, S.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)CrossRefGoogle Scholar
  12. 12.
    Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)CrossRefGoogle Scholar
  13. 13.
    Newkirk, D., Biesinger, J., Chon, A., Yokomori, K.: AREM: aligning short reads from ChIP-sequencing by expectation maximization. J. Comput. Biol. 18, 1495–1505 (2011)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Roberts, A., Pachter, L.: Streaming fragment assignment for realtime analysis of sequencing experiments. Nat. Methods 10, 71–73 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Tao Qiu
    • 1
  • Xiaochun Yang
    • 1
  • Bin Wang
    • 1
  • Yutong Han
    • 1
  • Siyao Wang
    • 1
  1. 1.School of Computer Science and EngineeringNortheastern UniversityShenyangChina

Personalised recommendations