A new recombination lower bound and the minimum perfect phylogenetic forest problem
- 60 Downloads
Understanding recombination is a central problem in population genetics. In this paper, we address an established computational problem in this area: compute lower bounds on the minimum number of historical recombinations for generating a set of sequences (Hudson and Kaplan in Genetics 111, 147–164, 1985; Myers and Griffiths in Genetics 163, 375–394, 2003; Gusfield et al. in Discrete Appl. Math. 155, 806–830, 2007; Bafna and Bansal in IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 78–90, 2004 and in J. Comput. Biol. 13, 501–521, 2006; Song et al. in Bioinformatics 421, i413–i244, 2005). In particular, we propose a new recombination lower bound: the forest bound. We show that the forest bound can be formulated as the minimum perfect phylogenetic forest problem, a natural extension to the classic binary perfect phylogeny problem, which may be of interests on its own. We then show that the forest bound is provably higher than the optimal haplotype bound (Myers and Griffiths in Genetics 163, 375–394, 2003), a very good lower bound in practice (Song et al. in Bioinformatics 421, i413–i422, 2005). We prove that, like several other lower bounds (Bafna and Bansal in J. Comput. Biol. 13, 501–521, 2006), computing the forest bound is NP-hard. Finally, we describe an integer linear programming (ILP) formulation that computes the forest bound precisely for certain range of data. Simulation results show that the forest bound may be useful in computing lower bounds for low quality data.
KeywordsRecombination Lower bound on the minimum number of recombination Ancestral recombination graph Population genetics Computational complexity
Unable to display preview. Download preview PDF.
- Foulds LR, Graham RL (1982) The Steiner tree in phylogeny is NP-complete. Adv Appl Math 3 Google Scholar
- Hudson R, Kaplan N (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147–164 Google Scholar
- Myers S (2003) The detection of recombination events using DNA sequence data. PhD dissertation, Dept of Statistics, University of Oxford, Oxford, England Google Scholar
- Myers SR, Griffiths RC (2003) Bounds on the minimum number of recombination events in a sample history. Genetics 163:375–394 Google Scholar
- Song YS, Ding Z, Gusfield D, Langley C, Wu Y (2006) Algorithms to distinguish the role of gene-conversion from single-crossover recombination in the derivations of SNP sequences in populations. In: Proceedings of RECOMB 2006. LNBI, vol 3909 Google Scholar