Skip to main content

Read Mapping Algorithms for Single Molecule Sequencing Data

  • Conference paper
Algorithms in Bioinformatics (WABI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Included in the following conference series:

Abstract

Single Molecule Sequencing technologies such as the Heliscope simplify the preparation of DNA for sequencing, while sampling millions of reads in a day. Simultaneously, the technology suffers from a significantly higher error rate, ameliorated by the ability to sample multiple reads from the same location. In this paper we develop novel rapid alignment algorithms for two-pass Single Molecule Sequencing methods. We combine the Weighted Sequence Graph (WSG) representation of all optimal and near optimal alignments between the two reads sampled from a piece of DNA with k-mer filtering methods and spaced seeds to quickly generate candidate locations for the reads on the reference genome. We also propose a fast implementation of the Smith-Waterman algorithm using vectorized instructions that significantly speeds up the matching process. Our method combines these approaches in order to build an algorithm that is both fast and accurate, since it is able to take complete advantage of both of the reads sampled during two pass sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Califano, A., Rigoutsos, I.: Flash: A fast look-up algorithm for string homology. In: Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology, pp. 56–64. AAAI Press, Menlo Park (1993)

    Google Scholar 

  2. Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., Thompson, J.D.: Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res. 31(13), 3497–3500 (2003)

    Article  Google Scholar 

  3. Jettand, J.H., et al.: High-speed DNA sequencing: an approach based upon fluorescence detection of single molecules. Journal of biomolecular structure and dynamics 7(2), 301–309 (1989)

    Google Scholar 

  4. Harris, T.D., et al.: Single-molecule DNA sequencing of a viral genome. Science 320(5872), 106–109 (2008)

    Article  Google Scholar 

  5. Farrar, M.: Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)

    Article  Google Scholar 

  6. Hein, J.: A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Molecular Biology and Evolution 6(6), 649–668 (1989)

    Google Scholar 

  7. Naor, D., Brutlag, D.L.: On near-optimal alignments of biological sequences. Journal of Computational Biology 1(4), 349–366 (1994)

    Google Scholar 

  8. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequences of two proteins. JMB 48, 443–453 (1970)

    Article  Google Scholar 

  9. Rasmussen, K.R., Stoye, J., Myers, E.W.: Efficient q-gram filters for finding all epsilon-matches over a given length. Journal of Computational Biology 13(2), 296–308 (2006)

    Article  MathSciNet  Google Scholar 

  10. Rognes, T., Seeberg, E.: Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)

    Article  Google Scholar 

  11. Schwikowski, B., Vingron, M.: Weighted sequence graphs: boosting iterated dynamic programming using locally suboptimal solutions. Discrete Appl. Math. 127(1), 95–117 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  12. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  13. Wozniak, A.: Using video-oriented instructions to speed up sequence comparison. Computer Applications in the Biosciences 13(2), 145–150 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yanovsky, V., Rumble, S.M., Brudno, M. (2008). Read Mapping Algorithms for Single Molecule Sequencing Data. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87361-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87360-0

  • Online ISBN: 978-3-540-87361-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics