Advertisement

Journal of Combinatorial Optimization

, Volume 16, Issue 3, pp 229–247 | Cite as

A new recombination lower bound and the minimum perfect phylogenetic forest problem

  • Yufeng Wu
  • Dan Gusfield
Article

Abstract

Understanding recombination is a central problem in population genetics. In this paper, we address an established computational problem in this area: compute lower bounds on the minimum number of historical recombinations for generating a set of sequences (Hudson and Kaplan in Genetics 111, 147–164, 1985; Myers and Griffiths in Genetics 163, 375–394, 2003; Gusfield et al. in Discrete Appl. Math. 155, 806–830, 2007; Bafna and Bansal in IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 78–90, 2004 and in J. Comput. Biol. 13, 501–521, 2006; Song et al. in Bioinformatics 421, i413–i244, 2005). In particular, we propose a new recombination lower bound: the forest bound. We show that the forest bound can be formulated as the minimum perfect phylogenetic forest problem, a natural extension to the classic binary perfect phylogeny problem, which may be of interests on its own. We then show that the forest bound is provably higher than the optimal haplotype bound (Myers and Griffiths in Genetics 163, 375–394, 2003), a very good lower bound in practice (Song et al. in Bioinformatics 421, i413–i422, 2005). We prove that, like several other lower bounds (Bafna and Bansal in J. Comput. Biol. 13, 501–521, 2006), computing the forest bound is NP-hard. Finally, we describe an integer linear programming (ILP) formulation that computes the forest bound precisely for certain range of data. Simulation results show that the forest bound may be useful in computing lower bounds for low quality data.

Keywords

Recombination Lower bound on the minimum number of recombination Ancestral recombination graph Population genetics Computational complexity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bafna V, Bansal V (2004) The number of recombination events in a sample history: conflict graph and lower bounds. IEEE/ACM Trans Comput Biol Bioinf 1:78–90 CrossRefGoogle Scholar
  2. Bafna V, Bansal V (2006) Inference about recombination from haplotype data: lower bounds and recombination hotspots. J Comput Biol 13:501–521 CrossRefMathSciNetGoogle Scholar
  3. Bordewich M, Semple C (2004) On the computational complexity of the rooted subtree prune and regraft distance. Ann Comb 8:409–423 MATHCrossRefMathSciNetGoogle Scholar
  4. Foulds LR, Graham RL (1982) The Steiner tree in phylogeny is NP-complete. Adv Appl Math 3 Google Scholar
  5. Garey M, Johnson D (1979) Computers and intractability. Freeman, San Francisco MATHGoogle Scholar
  6. Griffiths RC, Marjoram P (1996) Ancestral inference from samples of DNA sequences with recombination. J Comput Biol 3:479–502 CrossRefGoogle Scholar
  7. Gusfield D (1991) Efficient algorithms for inferring evolutionary history. Networks 21:19–28 MATHCrossRefMathSciNetGoogle Scholar
  8. Gusfield D, Eddhu S, Langley C (2004) Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J Bioinf Comput Biol 2:173–213 CrossRefGoogle Scholar
  9. Gusfield D, Hickerson D, Eddhu S (2007) An efficiently-computed lower bound on the number of recombinations in phylogenetic networks: theory and empirical study. Discrete Appl Math 155:806–830 MATHCrossRefMathSciNetGoogle Scholar
  10. Hudson R (2002) Generating samples under the Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2):337–338 CrossRefGoogle Scholar
  11. Hudson R, Kaplan N (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147–164 Google Scholar
  12. Myers S (2003) The detection of recombination events using DNA sequence data. PhD dissertation, Dept of Statistics, University of Oxford, Oxford, England Google Scholar
  13. Myers SR, Griffiths RC (2003) Bounds on the minimum number of recombination events in a sample history. Genetics 163:375–394 Google Scholar
  14. Song YS, Ding Z, Gusfield D, Langley C, Wu Y (2006) Algorithms to distinguish the role of gene-conversion from single-crossover recombination in the derivations of SNP sequences in populations. In: Proceedings of RECOMB 2006. LNBI, vol 3909 Google Scholar
  15. Song YS, Wu Y, Gusfield D (2005) Efficient computation of close lower and upper bounds on the minimum number of needed recombinations in the evolution of biological sequences. Bioinformatics 421:i413–i422. Proceedings of ISMB 2005 CrossRefGoogle Scholar
  16. Wang L, Zhang K, Zhang L (2001) Perfect phylogenetic networks with recombination. J Comput Biol 8:69–78 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Computer Science and Engineering DepartmentUniversity of ConnecticutStorrsUSA
  2. 2.Department of Computer ScienceUniversity of CaliforniaDavisUSA

Personalised recommendations