Estimating Viral Haplotypes in a Population Using k-mer Counting
Viral haplotype estimation in a population is an important problem in virology. Viruses undergo a high number of mutations and recombinations during replication for their survival in host cells and exist as a population of closely related genetic variants. Due to this, estimating the number of haplotypes and their relative frequencies in the population becomes a challenging task. The usage of a sequenced reference genome has its limitations due to the high mutational rates in viruses. We propose a method for estimating viral haplotypes based only on the counts of k-mers present in the viral population without using the reference genome. We compute k-mer pairs that are related to each other by one mutation, and compute a minimal set of viral haplotypes that explain the whole population based on these k-mer pairs. We compare our method to the software ShoRAH (which uses a reference genome) on simulated dataset and obtained comparable results, even without using a reference genome.
Keywordsviral haplotype estimation structural variants detection k-mer counting variant detection greedy generating set algorithm
- 1.Astrovskaya, I., Tork, B., Mangul, S., Westbrooks, K., Măndoiu, I., Balfe, P., Zelikovsky, A.: Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 12(6) (2011)Google Scholar
- 2.Beerenwinkel, N., Gunthard, H.F., Roth, V., Metzner, K.J.: Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Frontiers in Microbiology 329(3) (2012)Google Scholar
- 10.Jojic, V., Hertz, T., Jojic, N.: Population sequencing using short reads: HIV as a case study. In: Proc. Pac. Symp. Biocomput., pp. 114–125 (2008)Google Scholar
- 11.Macalalad, A.R., Zody, M.C., Charlebois, P., Lennon, N.J., Newman, R.M., Malboeuf, C.M., Ryan, E.M., Boutwell, C.L., Power, K.A., Brackney, D.E., Pesko, K.N., Levin, J.Z., Ebel, G.D., Allen, T.M., Birren, B.W., Henn, M.R.: Highly sensitive and specific detection of rare variants in mixed viral populations from massively parallel sequence data. PLoS Comput. Biol. 8(3), e1002417 (2012)CrossRefGoogle Scholar
- 13.Prabhakara, S., Malhotra, R., Poss, M., Acharya, R.: Mutant Bin: Unsupervised Haplotype Estimation of Viral Population Diversity Without Reference Genome. Journal of Computational Biology (in press)Google Scholar