HCV Quasispecies Assembly Using Network Flows
Understanding how the genomes of viruses mutate and evolve within infected individuals is critically important in epidemiology. By exploiting knowledge of the forces that guide viral microevolution, researchers can design drugs and treatments that are effective against newly evolved strains. Therefore, it is critical to develop a method for typing the genomes of all of the variants of a virus (quasispecies) inside an infected individual cell.
In this paper, we focus on sequence assembly of Hepatitis C Virus (HCV) based on 454 Lifesciences system that produces around 250K reads each 100-400 base long. We introduce several formulations of the quasispecies assembly problem and a measure of the assembly quality. We also propose a novel scalable assembling method for quasispecies based on a novel network flow formulation. Finally, we report the results of assembling 44 quasispecies from the 1700 bp long E1E2 region of HCV.
KeywordsProblem Instance Directed Acyclic Graph Network Flow Switching Error Consensus Genome
Unable to display preview. Download preview PDF.
- 2.Myers, G.: Building Fragment Assembly String Graphs. In: European Conf. on Computational Biology, pp. 79–85 (2005)Google Scholar
- 4.Alekseyev, M.A., Pevzner, P.A.: Colored de Bruijn graphs and the genome halving problem. IEEE/ACM Trans Comput Biol Bioinform. 4(1), 98–107Google Scholar
- 5.Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome research (to appear, 2007)Google Scholar
- 8.454 Lifescience (2007), http://www.454.com/
- 9.Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376–380 (2005)Google Scholar
- 12.GNU Linear Programming Kit, http://www.gnu.org/software/glpk/
- 13.ILOG CPLEX, http://www.ilog.com/products/cplex/
- 14.IG Systems CS2 Software (2007), http://www.igsystems.com/cs2/