Skip to main content

Heuristics for the Gene-Duplication Problem: A Θ(n) Speed-Up for the Local Search

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4453))

Abstract

The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene duplications. This problem is NP-hard and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. We show how this local search problem can be solved efficiently by reusing previously computed information. This improves the running time of the current solution by a factor of n, where n is the number of species in the resulting supertree solution, and makes the gene-duplication problem more tractable for large-scale phylogenetic analyses. We verify the exceptional performance of our solution in a comparison study using sets of large randomly generated gene trees. Furthermore, we demonstrate the utility of our solution by incorporating large genomic data sets from GenBank into a supertree analysis of plants.

During this research, O.E. and M.S.B. were supported in part by NSF grant no. 0334832 and J.G.B. by NESCent NSF grant no. EF-0423641.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28, 132–163 (1979)

    Article  Google Scholar 

  2. Ma, B., Li, M., Zhang, L.: On reconcstructing species trees from gene trees in term of duplications and losses. In: RECOMB, pp. 182–191 (1998)

    Google Scholar 

  3. Page, R.D.M.: GeneTree. Bioinformatics 14(9), 819–820 (1998)

    Article  Google Scholar 

  4. Slowinski, J.B., Knight, A., Rooney, A.P.: Inferring species trees from gene trees: A phylogenetic analysis of the elapidae (serpentes) based on the amino acid sequences of venom proteins. Molecular Phylogenetics and Evolution 8(3), 349–362 (1997)

    Article  Google Scholar 

  5. Page, R.D.M.: Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. Mol. Phylogenetics and Evolution 14, 89–106 (2000)

    Article  Google Scholar 

  6. Cotton, J., Page, R.D.M.: Vertebrate phylogenomics: reconciled trees and gene duplications. In: Pacific Symposium on Biocomputing, pp. 536–547 (2002)

    Google Scholar 

  7. Cotton, J., Page, R.: Tangled tales from multiple markers: reconciling conflict between phylogenies to build molecular supertrees. In: Phylogenetic supertrees. Combining information to reveal the Tree of Life. Computational Biology, vol. 4, pp. 107–125. Springer, Heidelberg (2004)

    Google Scholar 

  8. Sanderson, M.J., McMahon, M.M.: Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evolutionary Biology (In press)

    Google Scholar 

  9. Page, R.D.M.: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Systematic Biology 43(1), 58–77 (1994)

    Article  Google Scholar 

  10. Guigó, R., Muchnik, I., Smith, T.F.: Reconstruction of ancient molecular phylogeny. Molecular Phylogenetics and Evolution 6(2), 189–213 (1996)

    Article  Google Scholar 

  11. Mirkin, B., Muchnik, I., Smith, T.F.: A biology consistent model for comparing molecular phylogenies. Journal of Computational Biology 2(4), 493–507 (1995)

    Google Scholar 

  12. Eulenstein, O.: Predictions of gene-duplications and their phylogenetic development. PhD thesis, University of Bonn, Germany. GMD Research Series No. 20 / 1998, ISSN: 1435-2699 (1998)

    Google Scholar 

  13. Zhang, L.: On a Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. Journal of Computational Biology 4(2), 177–187 (1997)

    Article  Google Scholar 

  14. Chen, K., Durand, D., Farach-Colton, M.: Notung: a program for dating gene duplications and optimizing gene family trees. Journal of Computational Biology 7(3/4), 429–447 (2000)

    Article  Google Scholar 

  15. Bonizzoni, P., Vedova, G.D., Dondi, R.: Reconciling gene trees to a species tree. In: Italian Conference on Algorithms and Complexity, Rome, Italy (2003)

    Google Scholar 

  16. Górecki, P., Tiuryn, J.: On the structure of reconciliations. In: Recomb Comparative Genomics Workshop 2004, vol. 3388 (2004)

    Google Scholar 

  17. Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Latin American Theoretical INformatics, pp. 88–94 (2000)

    Google Scholar 

  18. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing 13(2), 338–355 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  19. Fellows, M., Hallett, M., Korostensky, C., Stege, U.: Analogs & duals of the mast problem for sequences & trees. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 103–114. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  20. Stege, U.: Gene trees and species trees: The gene-duplication problem is fixed-parameter tractable. In: Dehne, F., Gupta, A., Sack, J.-R., Tamassia, R. (eds.) WADS 1999. LNCS, vol. 1663, Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  21. Hallett, M.T., Lagergren, J.: New algorithms for the duplication-loss model. In: RECOMB, pp. 138–146 (2000)

    Google Scholar 

  22. Page, R.D.M.: Genetree. http://taxonomy.zoology.gla.ac.uk/rod/genetree/-gene-tree.html

  23. Allen, B.L., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combintorics 5, 1–13 (2001)

    Article  MathSciNet  Google Scholar 

  24. Bordewich, M., Semple, C.: On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combintorics 8, 409–423 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  25. Dondoshansky, I.: Blastclust version 6.1 (2002)

    Google Scholar 

  26. Thompson, J., Higgins, D., Gibson, T.: ClustalW: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)

    Article  Google Scholar 

  27. Saitou, N., Nei, N.: The neighbour-joining method: a new method for reconstructing phylogenetic trees. Journal of Mol. Biology and Evolution 4, 406–425 (1987)

    Google Scholar 

  28. Swofford, D.L.: PAUP*: Phylogenetic analysis using parsimony (*and other methods), version 4.0b10 (2002)

    Google Scholar 

  29. Driskell, A., Ané, C., Burleigh, J., McMahon, M., O’Meara, B., Sanderson, M.: Prospects for building the tree of life from large sequence databases. Science 306, 1172–1174 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Terry Speed Haiyan Huang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Bansal, M.S., Burleigh, J.G., Eulenstein, O., Wehe, A. (2007). Heuristics for the Gene-Duplication Problem: A Θ(n) Speed-Up for the Local Search. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71681-5_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71680-8

  • Online ISBN: 978-3-540-71681-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics