An Exact and Polynomial Distance-Based Algorithm to Reconstruct Single Copy Tandem Duplication Trees

  • Olivier Elemento
  • Olivier Gascuel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2676)


The problem of reconstructing the duplication tree of a set of tandemly repeated sequences which are supposed to have arisen by unequal recombination, was first introduced by Fitch (1977), and has recently received a lot of attention. In this paper, we deal with the restricted problem of reconstructing single copy duplication trees. We describe an exact and polynomial distance based algorithm for solving this problem, the parsimony version of which has previously been shown to be NP-hard (like most evolutionary tree reconstruction problems). This algorithm is based on the minimum evolution principle, and thus involves selecting the shortest tree as being the correct duplication tree. After presenting the underlying mathematical concepts behind the minimum evolution principle, and some of its benefits (such as consistency), we provide a new recurrence equation to estimate the tree length using ordinary least-squares, given a matrix of pairwise distances between the copies. We then show how this equation naturally forms the dynamic programming framework on which our algorithm is based, and provide an implementation in O(n 3) time and O(n 2) space, where n is the number of copies.


Branch Length Recurrence Equation Dynamic Programming Algorithm Adjacent Interval Total Time Complexity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ohno, S.: Evolution by gene duplication. Springer Verlag, New York (1970)Google Scholar
  2. 2.
    Smith, G.: Evolution of repeated dna sequences by unequal crossover. Science 191 (1976) 528–535CrossRefGoogle Scholar
  3. 3.
    Fitch, W.: Phylogenies constrained by cross-over process as illustrated by human hemoglobins in a thirteen-cycle, eleven amino-acid repeat in human apolipoprotein A-I. Genetics 86 (1977) 623–644Google Scholar
  4. 4.
    Jeffreys, A., Harris, S.: Processes of gene duplication. Nature 296 (1981) 9–10CrossRefGoogle Scholar
  5. 5.
    Elemento, O., Gascuel, O., Lefranc, M.P.: Reconstruction de l’histoire de duplication de gènes répétés en tandem. In: Actes des Journées Ouvertes Biologie Informatique Mathématiques. (2001) 9–11Google Scholar
  6. 6.
    Elemento, O., Gascuel, O., Lefranc, M.P.: Reconstructing the duplication history of tandemly repeated genes. Molecular Biological Evolution 19 (2002) 278–288Google Scholar
  7. 7.
    Benson, G., Dong, L.: Reconstructing the duplication history of a tandem repeat. In Lengauer, T., Schneider, R., Bork, P., Brutlag, D., Glasgow, J., Mewes, H.W., Zimmer, R., eds.: Proceedings of Intelligent Systems in Molecular Biology ISMB’99. (1999) 44–53Google Scholar
  8. 8.
    Tang, M., Waterman, M., Yooseph, S.: Zinc finger gene clusters and tandem gene duplication. In El-Mabrouk, N., Lengauer, T., Sankoff, D., eds.: Proceedings of RECOMB 2001. (2001) 297–304Google Scholar
  9. 9.
    Tang, M., Waterman, M., Yooseph, S.: Zinc finger gene clusters and tandem gene duplication. Journal of Computational Biology 9 (2002) 429–446CrossRefGoogle Scholar
  10. 10.
    Jaitly, D., Kearney, P., Lin, G., Ma, B.: Methods for reconstructing the history of tandem repeats and their application to the human genome. Journal of Computer and System Sciences 65 (2002) 494–507.zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Zhang, J., Nei, M.: Evolution of antennapedia-class homeobox genes. Genetics 142 (1996) 295–303Google Scholar
  12. 12.
    Wang, L., Gusfield, D.: Improved approximation algorithms for tree alignment. Journal of Algorithms 25 (1997) 255–273zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Kidd, K., Sgaramella-Zonta, L.: Phylogenetic analysis: concepts and methods. American Journal of Human Genetics 23 (1971) 235–252Google Scholar
  14. 14.
    Rzhetsky, A., Nei, M.: Theoretical foundation of the minimum-evolution method of phylogenetic inference. Molecular Biological Evolution 10 (1993) 173–1095Google Scholar
  15. 15.
    Denis, F., Gascuel, O.: On the consistency of the minimum evolution principle of phylogenetic inference. Computational Molecular Biology Series, Issue IV. Discrete Applied Mathematics 127 (2003) 63–77zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Felsenstein, J.: Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27 (1978) 401–410CrossRefGoogle Scholar
  17. 17.
    Vardi, I.: Computational Recreations in Mathematica. Addison-Wesley (1991)Google Scholar
  18. 18.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4 (1987) 406–425Google Scholar
  19. 19.
    Vach, W.: Least-squares approximation of additive trees. In Opitz, O., ed.: Conceptual and Numerical Analysis of Data, Heidelberg, Springer (1989) 230–238Google Scholar
  20. 20.
    Gascuel, O.: Concerning the NJ algorithm and its unweighted version, UNJ. In Mirkin, B., McMorris, F., Roberts, F., Rzhetsky, A., eds.: Mathematical Hierarchies and Biology. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Amer. Math. Society, Providence (1997) 149–170Google Scholar
  21. 21.
    Barthelemy, J., Guénoche, A.: Trees and proximity representations. Wiley and Sons (1991)Google Scholar
  22. 22.
    Elemento, O., Gascuel, O.: A fast and accurate distance-based algorithm to reconstruct tandem duplicatin trees. Bioinformatics 18 (2002) S92–S99 Proceedings of European Conference on Computational Biology (ECCB2002).CrossRefGoogle Scholar
  23. 23.
    Fitch, W., Margoliash, E.: Construction of phylogenetic trees. Science 155 (1967) 279–284CrossRefGoogle Scholar
  24. 24.
    Felsenstein, J.: An alternating least squares approach to inferring phylogenies from pairwise distances. Systematic Biology 46 (1997) 101–111CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Olivier Elemento
    • 1
    • 2
    • 3
  • Olivier Gascuel
    • 1
  1. 1.Département d’Informatique Fondamentale et ApplicationsLIRMMMontpellierFrance
  2. 2.IMGT, the international ImMunoGeneTics databaseFrance
  3. 3.Laboratoire d’Immunogénétique Moléculaire, LIGMUniversité Montpellier II, UPR CNRS 1142, IGHMontpellier Cedex 5France

Personalised recommendations