Advertisement

Fast Mapping and Precise Alignment of AB SOLiD Color Reads to Reference DNA

  • Miklós Csűrös
  • Szilveszter Juhos
  • Attila Bérces
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6293)

Abstract

Applied Biosystems’ SOLiD system offers a low-cost alternative to the traditional Sanger method of DNA sequencing. We introduce two main algorithms of mapping SOLiD’s color reads onto a reference genome. The first method performs mapping by adapting a greedy alignment framework. In such an alignment, reads are mapped to approximate genome positions, allowing for a pre-specified bound on sequence difference that combines nucleotide mismatches, gaps, and sequencing errors. The second method for precise alignment relies on a pair hidden Markov model framework, combining a DNA sequence evolution model and sequencing errors (from read quality files).

Keywords

Sequencing Error Edit Distance Statistical Alignment Color Sequence Precise Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Shendure, J., Li, H.: Next-generation DNA sequencing. Nat. Biotechnol. 26(10), 1135–1145 (2008)CrossRefPubMedGoogle Scholar
  2. 2.
    Shendure, J., Mitra, R.D., Varma, C., Church, G.M.: Advanced sequencing technologies: Methods and goals. Nat. Rev. Genet. 5, 335–344 (2004)CrossRefPubMedGoogle Scholar
  3. 3.
    Wheeler, D.A., et al.: The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)CrossRefPubMedGoogle Scholar
  4. 4.
    Pleasance, E.D., et al.: A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010)CrossRefPubMedGoogle Scholar
  5. 5.
    Venter, J.C., et al.: Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004)CrossRefPubMedGoogle Scholar
  6. 6.
    Flicek, P., Birney, E.: Sense from sequence reads: methods for alignment and assembly. Nat. Methods 6(11s), S6–S12 (2009)Google Scholar
  7. 7.
    Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10 (2009)Google Scholar
  9. 9.
    Brown, D.G., Li, M., Ma, B.: A tutorial of recent developments in the seeding of local alignment. J. Bioinform. Comput. Biol. 2(4), 819–842 (2004)CrossRefPubMedGoogle Scholar
  10. 10.
    Medvedev, P., Stanciu, M., Brudno, M.: Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods 6(11s), S13–S20 (2009)Google Scholar
  11. 11.
    Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: MEGAN analysis of metagenomic data. Genome Res. 17, 377–386 (2007)Google Scholar
  12. 12.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)CrossRefPubMedGoogle Scholar
  13. 13.
    Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)CrossRefPubMedGoogle Scholar
  14. 14.
    Rumble, S.M., Lacroute, P., Dalca, A.V., Fiume, M., Sidow, A., Brudno, M.: SHRiMP: Accurate mapping of short color-space reads. PLoS Comput. Biol. 5(5), e1000386 (2009)Google Scholar
  15. 15.
    Homer, N., Merriman, B., Nelson, S.F.: Local alignment of two-base encoded DNA sequence. BMC Bioinformatics 10, 175 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Wu, S., Manber, U., Myers, G., Miller, W.: An O(NP) sequence comparison algorithm. Inform. Process. Lett. 35(6), 317–323 (1990)CrossRefGoogle Scholar
  17. 17.
    Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy alignment for aligning DNA sequences. J. Comput. Biol. 7(1/2), 203–214 (2000)CrossRefPubMedGoogle Scholar
  18. 18.
    Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, UK (1998)CrossRefGoogle Scholar
  19. 19.
    Lunter, G., Drummond, A.J., Miklós, I., Hein, J.: Statistical alignment: Recent progress, new applications, and challenges. In: Nielsen, R. (ed.) Statistical Methods in Molecular Evolution. Springer, Heidelberg (2005)Google Scholar
  20. 20.
    Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  21. 21.
    Ewing, B., Green, P.: Base-calling of automated sequencer traces using phred: II. error probabilities. Genome Res. 8, 186–194 (1998)CrossRefPubMedGoogle Scholar
  22. 22.
    Liò, P., Goldman, N.: Models of molecular evolution and phylogeny. Genome Res. 8, 1233–1244 (1998)CrossRefPubMedGoogle Scholar
  23. 23.
    Felsenstein, J., Churchill, G.A.: A Hidden Markov Model approach to variation among sites in rate of evolution. Mol. Biol. Evol. 13(1), 93–104 (1996)CrossRefPubMedGoogle Scholar
  24. 24.
    Schwartz, S.A., Pachter, L.: Multiple alignment by sequence annealing. Bioinformatics 23(2), 24–29 (2007)CrossRefGoogle Scholar
  25. 25.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Miklós Csűrös
    • 1
  • Szilveszter Juhos
    • 2
  • Attila Bérces
    • 2
  1. 1.Department of Computer Science and Operations ResearchUniversity of MontréalCanada
  2. 2.Omixon, Chemistry Logic KftBudapestHungary

Personalised recommendations