Skip to main content

Deep Sequencing of a Genetically Heterogeneous Sample: Local Haplotype Reconstruction and Read Error Correction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5541))

Abstract

We present a computational method for analyzing deep sequencing data obtained from a genetically diverse sample. The set of reads obtained from a deep sequencing experiment represents a statistical sample of the underlying population. We develop a generative probabilistic model for assigning observed reads to unobserved haplotypes in the presence of sequencing errors. This clustering problem is solved in a Bayesian fashion using the Dirichlet process mixture to define a prior distribution on the unknown number of haplotypes in the mixture. We devise a Gibbs sampler for sampling from the joint posterior distribution of haplotype sequences, assignment of reads to haplotypes, and error rate of the sequencing process to obtain estimates of the local haplotype structure of the population. The method is evaluated on simulated data and on experimental deep sequencing data obtained from HIV samples.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24(3), 133–141 (2008)

    Article  CAS  PubMed  Google Scholar 

  2. Pop, M., Salzberg, S.L.: Bioinformatics challenges of new sequencing technology. Trends Genet. 24(3), 142–149 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Chi, K.R.: The year of sequencing. Nat. Methods 5(1), 11–14 (2008)

    Article  CAS  PubMed  Google Scholar 

  4. Nowak, M.A., Anderson, R.M., McLean, A.R., Wolfs, T.F., Goudsmit, J., May, R.M.: Antigenic diversity thresholds and the development of AIDS. Science 254(5034), 963–969 (1991)

    Article  CAS  PubMed  Google Scholar 

  5. Walker, B.D., Burton, D.R.: Toward an AIDS vaccine. Science 320(5877), 760–764 (2008)

    Article  CAS  PubMed  Google Scholar 

  6. Perrin, L., Telenti, A.: HIV treatment failure: testing for HIV resistance in clinical practice. Science 280(5371), 1871–1873 (1998)

    Article  CAS  PubMed  Google Scholar 

  7. Hoffmann, C., Minkah, N., Leipzig, J., Wang, G., Arens, M.Q., Tebas, P., Bushman, F.D.: DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutations. Nucleic Acids Res. 35(13), e91 (2007)

    Article  Google Scholar 

  8. Wang, C., Mitsuya, Y., Gharizadeh, B., Ronaghi, M., Shafer, R.W.: Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res. 17(8), 1195–1201 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wildenberg, A., Skiena, S., Sumazin, P.: Deconvolving sequence variation in mixed DNA populations. J. Comput. Biol. 10(3-4), 635–652 (2003)

    Article  CAS  PubMed  Google Scholar 

  10. Eriksson, N., Pachter, L., Mitsuya, Y., Rhee, S.Y., Wang, C., Gharizadeh, B., Ronaghi, M., Shafer, R.W., Beerenwinkel, N.: Viral population estimation using pyrosequencing. PLoS Computational Biology 4(4), e1000074 (2008)

    Article  Google Scholar 

  11. Westbrooks, K., Astrovskaya, I., Campo, D., Khudyakov, Y., Berman, P., Zelikovsky, A.: HCV quasispecies assembly using network flows. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds.) ISBRA 2008. LNCS (LNBI), vol. 4983, pp. 159–170. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Xing, E.P., Jordan, M.I., Sharan, R.: Bayesian haplotype inference via the Dirichlet process. J. Comput. Biol. 14(3), 267–284 (2007)

    Article  CAS  PubMed  Google Scholar 

  13. Saeed, F., Khokhar, A., Zagordi, O., Beerenwinkel, N.: Multiple sequence alignment system for pyrosequencing reads. In: Bioinformatics and Computational Biology (BICoB) conference 2009, LNCS (in press, 2009)

    Google Scholar 

  14. Neal, R.: Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)

    Google Scholar 

  15. Schmid, R., Schuster, S., Steel, M., Huson, D.: Readsim- a simulator for sanger and 454 sequencing (unpublished) (2006)

    Google Scholar 

  16. Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H., Field, D.: Metasim—a sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)

    Article  Google Scholar 

  17. Campbell, P.J., Pleasance, E.D., Stephens, P.J., Dicks, E., Rance, R., Goodhead, I., Follows, G.A., Green, A.R., Futreal, P.A., Stratton, M.R.: Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc. Natl. Acad. Sci. USA 105(35), 13081–13086 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zagordi, O., Geyrhofer, L., Roth, V., Beerenwinkel, N. (2009). Deep Sequencing of a Genetically Heterogeneous Sample: Local Haplotype Reconstruction and Read Error Correction. In: Batzoglou, S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science(), vol 5541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02008-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02008-7_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02007-0

  • Online ISBN: 978-3-642-02008-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics