Probabilistic Inference of Viral Quasispecies Subject to Recombination

  • Osvaldo Zagordi
  • Armin Töpfer
  • Sandhya Prabhakaran
  • Volker Roth
  • Eran Halperin
  • Niko Beerenwinkel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7262)


RNA viruses are present in a single host as a population of different but related strains. This population, shaped by the combination of genetic change and selection, is called quasispecies. Genetic change is due to both point mutations and recombination events. We present a jumping hidden Markov model that describes the generation of the viral quasispecies and a method to infer its parameters by analysing next generation sequencing data. The model introduces position-specific probability tables over the sequence alphabet to explain the diversity that can be found in the population at each site. Recombination events are indicated by a change of state, allowing a single observed read to originate from multiple sequences. We present an implementation of the EM algorithm to find maximum likelihood estimates of the model parameters and a method to estimate the distribution of viral strains in the quasispecies. The model is validated on simulated data, showing the advantage of explicitly taking the recombination process into account, and applied to reads obtained from two experimental HIV samples.


Molecular sequence analysis Sequencing and genotyping technologies Next-generation sequencing Viral quasispecies Hidden Markov model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Archer, J., Rambaut, A., Taillon, B.E., Harrigan, P.R., Lewis, M., Robertson, D.L.: The evolutionary analysis of emerging low frequency HIV-1 CXCR4 using variants through timean ultra-deep approach. PLoS Comput. Biol. 6(12), e1001022 (2010),
  2. 2.
    Astrovskaya, I., Tork, B., Mangul, S., Westbrooks, K., Mandoiu, I., Balfe, P., Zelikovsky, A.: Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 12(suppl. 6) (2011), doi:10.1186/1471-2105-12-S6-S1Google Scholar
  3. 3.
    Beal, M., Ghahramani, Z., Rasmussen, C.: The infinite hidden Markov model. Advances in Neural Information 14, 577–584 (2002), Google Scholar
  4. 4.
    Beerenwinkel, N., Zagordi, O.: Ultra-deep sequencing for the analysis of viral populations. Current Opinion in Virology (January 2011) (in press),
  5. 5.
    Boerlijst, M., Bonhoeffer, S., Nowak, M.: Viral quasi-species and recombination. Proceedings: Biological Sciences 263(1376), 1577–1584 (1996), CrossRefGoogle Scholar
  6. 6.
    Eigen, M.: Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften (January 1971),
  7. 7.
    Eriksson, N., Pachter, L., Mitsuya, Y., Rhee, S.Y., Wang, C., Gharizadeh, B., Ronaghi, M., Shafer, R.W., Beerenwinkel, N.: Viral population estimation using pyrosequencing. PLoS Computational Biology 4(4), e1000074 (2008), MathSciNetCrossRefGoogle Scholar
  8. 8.
    Gilles, A., Meglécz, E., Pech, N., Ferreira, S., Malausa, T., Martin, J.F.: Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics 12, 245 (2011), CrossRefGoogle Scholar
  9. 9.
    Graça, J., Ganchev, K., Taskar, B., Pereira, F.: Posterior vs. parameter sparsity in latent variable models. In: NIPS 2009 (2009)Google Scholar
  10. 10.
    Johnson, J.A., Li, J.F., Wei, X., Lipscomb, J., Irlbeck, D., Craig, C., Smith, A., Bennett, D.E., Monsour, M., Sandstrom, P., Lanier, E.R., Heneine, W.: Minority HIV-1 drug resistance mutations are present in antiretroviral treatment-naïve populations and associate with reduced treatment efficacy. Plos Med. 5(7), 158 (2008)CrossRefGoogle Scholar
  11. 11.
    Metzker, M.L.: Sequencing technologies - the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)CrossRefGoogle Scholar
  12. 12.
    Nowak, M.A., Anderson, R.M., McLean, A.R., Wolfs, T.F., Goudsmit, J., May, R.M.: Antigenic diversity thresholds and the development of AIDS. Science 254(5034), 963–969 (1991), CrossRefGoogle Scholar
  13. 13.
    Prabhakaran, S., Rey, M., Zagordi, O., Beerenwinkel, N., Roth, V.: HIV-haplotype inference using a constraint-based dirichlet process mixture model. In: Machine Learning in Computational Biology (MLCB) NIPS Workshop 2010, pp. 1–4 (October 2010)Google Scholar
  14. 14.
    Prosperi, M.C., Prosperi, L., Bruselles, A., Abbate, I., Rozera, G., Vincenti, D., Solmone, M.C., Capobianchi, M.R., Ulivi, G.: Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing. BMC Bioinformatics 12(1), 5 (2011), CrossRefGoogle Scholar
  15. 15.
    Quince, C., Lanzen, A., Davenport, R.J., Turnbaugh, P.J.: Removing noise from pyrosequenced amplicons. BMC Bioinformatics 12, 38 (2011)CrossRefGoogle Scholar
  16. 16.
    Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition (with erratum). Proceedings of the IEEE 77(2), 257–286 (1989), doi:10.1109/5.18626CrossRefGoogle Scholar
  17. 17.
    Schultz, A.K., Zhang, M., Leitner, T., Kuiken, C., Korber, B., Morgenstern, B., Stanke, M.: A jumping profile hidden Markov model and applications to recombination sites in HIV and HCV genomes. BMC Bioinformatics 7, 265 (2006)CrossRefGoogle Scholar
  18. 18.
    Spang, R., Rehmsmeier, M., Stoye, J.: A novel approach to remote homology detection: jumping alignments. J. Comput. Biol. 9(5), 747–760 (2002)CrossRefGoogle Scholar
  19. 19.
    Vignuzzi, M., Stone, J., Arnold, J., Cameron, C., Andino, R.: Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population. Nature 439(7074), 344–348 (2006), CrossRefGoogle Scholar
  20. 20.
    Zagordi, O., Bhattacharya, A., Eriksson, N., Beerenwinkel, N.: Shorah: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12, 119 (2011), CrossRefGoogle Scholar
  21. 21.
    Zagordi, O., Geyrhofer, L., Roth, V., Beerenwinkel, N.: Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction. J. Comput. Biol. 17(3), 417–428 (2010), MathSciNetCrossRefGoogle Scholar
  22. 22.
    Zagordi, O., Klein, R., Däumer, M., Beerenwinkel, N.: Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Res. 38(21), 7400–7409 (2010), CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Osvaldo Zagordi
    • 1
    • 2
  • Armin Töpfer
    • 1
    • 2
  • Sandhya Prabhakaran
    • 3
  • Volker Roth
    • 3
  • Eran Halperin
    • 4
    • 5
  • Niko Beerenwinkel
    • 1
    • 2
  1. 1.Department of Biosystems Science and EngineeringETH ZurichBaselSwitzerland
  2. 2.SIB Swiss Institute of BioinformaticsSwitzerland
  3. 3.Computer Science DepartmentUniversity of BaselSwitzerland
  4. 4.Department of Molecular Microbiology and BiotechnologyTel-Aviv UniversityIsrael
  5. 5.International Computer Science InstituteBerkeleyUSA

Personalised recommendations