Gene Family Evolution—An Algorithmic Framework

  • Nadia El-MabroukEmail author
  • Emmanuel Noutahi
Part of the Computational Biology book series (COBO, volume 29)


Most biological discoveries can only be made in light of evolution. In particular, functional annotation of genes is usually deduced from the orthology, paralogy, or xenology relations between genes, which are inferred from the comparison of a gene tree with a species tree. As sequence-only gene tree reconstruction methods often do not allow to confidently discriminate between trees, recent “integrative methods” include information from the species tree. The idea is to consider, in addition to a value measuring the fitness of a tree to a sequence alignment, a measure reflecting the evolution of a whole gene family through gene gain and loss. One such measure is the “reconciliation” cost, i.e., the cost of a gain and loss scenario explaining the incongruence between the gene and species tree. This chapter begins with a review of deterministic algorithms for computing reconciliation distances under various evolutionary models of gene family evolution. We then review integrative methods for correcting a gene tree, based on various strategies for exploring its neighborhood. The considered algorithms are those based on polytomy resolution, tree amalgamation and supertree reconstruction. The goal is to provide a comprehensive overview of existing methods with algorithms presented in concise form. The reader is referred to original papers for more details and proofs of complexity.


Phylogeny Gene tree Duplication Loss Horizontal gene transfer Incomplete lineage sorting Reconciliation 



The authors acknowledge the support of the Fonds de Recherche du Québec Nature et Technologie (FRQNT) and of the Natural Sciences and Engineering Research Council (NSERC) (Discovery Grant RGPIN-249834).


  1. 1.
    Aho, A., Yehoshua, S., Szymanski, T., Ullman, J.: Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J. Comput. 10(3), 405–421 (1981)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Akerborg, O., Sennblad, B., Arvestad, L., Lagergren, J.: Simultaneous bayesian gene tree reconstruction and reconciliation analysis. Proc. Nal. Acad. Sci. USA 106(14), 5714–5719 (2009)Google Scholar
  3. 3.
    Altenhoff, A.M., Studer, R.A., Robinson-Rechavi, M., Dessimoz, C.: Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput. Biol. 8(5), e1002,514 (2012)Google Scholar
  4. 4.
    Arvestad, L., Berglund, A., Lagergren, J., Sennblad, B.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: RECOMB, pp. 326–335 (2004)Google Scholar
  5. 5.
    Bader, D., Moret, B., Yan, M.: A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. J. Comput. Biol. 8(5), 483–491 (2001)zbMATHGoogle Scholar
  6. 6.
    Bansal, M., Alm, E., Kellis, M.: Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28(12), i283–i291 (2012). Scholar
  7. 7.
    Bansal, M., Burleigh, J., Eulenstein, O., Fernández-Baca, D.: Robinson-foulds supertrees. Alg. Mol. Biol. 5(18) (2010)Google Scholar
  8. 8.
    Bansal, M., Wu, Y., Alm, E., Kellis., M.: Improved gene tree error-correction in the presence of horizontal gene transfer. Bioinformatics 31(8), 1211–1218 (2015). Scholar
  9. 9.
    Bérard, S., Gallien, C., Boussau, B., Szollosi, G., Daubin, V., Tannier, E.: Evolution of gene neighborhoods within reconciled phylogenies. Bioinformatics 28(18), i382–i388 (2012)Google Scholar
  10. 10.
    Berglund, A., Sjolund, E., Ostlund, G., Sonnhammer, E.: InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucl. Acid Res. 36 (2008)Google Scholar
  11. 11.
    Bininda-Emonds, O. (ed.): Phylogenetic Supertrees combining information to reveal The Tree of Life. In: Computational Biology. Kluwer Academic, Dordrecht, The Netherlands (2004)zbMATHGoogle Scholar
  12. 12.
    Boeckmann, B., Robinson-Rechavi, M., Xenarios, I., Dessimoz, C.: Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees. Brief. Bioinform. 12(5), 423–435 (2011)Google Scholar
  13. 13.
    Bork, D., Cheng, R., Wang, J., Sung, J., Libeskind-Hadas, R.: On the computational complexity of the maximum parsimony reconciliation problem in the duplication-loss-coalescence model. Algorithms Mol. Biol. 12(1), 6 (2017)Google Scholar
  14. 14.
    Boussau, B., Szöllősi, G., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23, 323–330 (2013)Google Scholar
  15. 15.
    Chan, Y., Ranwez, V., Scornavacca, C.: Exploring the space of gene/species reconciliations with transfers. J. Math. Biol. 71(5), 1179–1209 (2015)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Chan, Y., Ranwez, V., Scornavacca, C.: Inferring incomplete lineage sorting, duplications, transfers and losses with reconciliations. J. Theoret. Biol. 432, 1–13 (2017)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Chang, W., Eulenstein, O.: Reconciling gene trees with apparent polytomies. In: Chen, D., Lee, D.T. (eds.) Proceedings of the 12th Conference on Computing and Combinatorics (COCOON). Lecture Notes in Computer Science, vol. 4112, pp. 235–244 (2006)Google Scholar
  18. 18.
    Chen, K., Durand, D., Farach-Colton, M.: Notung: dating gene duplications using gene family trees. J. Comput. Biol. 7, 429–447 (2000)Google Scholar
  19. 19.
    Constantinescu, M., Sankoff, D.: An efficient algorithm for supertrees. J. Classif. 12, 101–112 (1995)zbMATHGoogle Scholar
  20. 20.
    Darby, C.A., Stolzer, M., Ropp, P.J., Barker, D., Durand, D.: Xenolog classification. Bioinformatics 33(5), 640–649 (2016)Google Scholar
  21. 21.
    David, L., Alm, E.: Rapid evolutionary innovation during an Archaean genetic expansion. Nature 469 (2011)Google Scholar
  22. 22.
    Doyon, J.P., Chauve, C., Hamel, S.: Space of gene/species trees reconciliations and parsimonious models. J. Comput. Biol 16(10), 1399–1418 (2009)MathSciNetGoogle Scholar
  23. 23.
    Doyon, J., Ranwez, V., Daubin, V., Berry, V.: Models, algorithms and programs for phylogeny reconciliation. Brief. Bioinform. 12(5), 392–400 (2011)Google Scholar
  24. 24.
    Doyon, J.P., Scornavacca, C., Gorbunov, K.Y., Szöllősi, G.J., Ranwez, V., Berry, V.: An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. In: Tannier, E. (ed.) RECOMB International Workshop on Comparative Genomics, RECOMB-CG, pp. 93–108. Springer (2010)Google Scholar
  25. 25.
    Durand, D., Halldórsson, B.V., Vernot, B.: A hybrid micro-macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13(2), 320–335 (2006)MathSciNetzbMATHGoogle Scholar
  26. 26.
    El-Mabrouk, N., Ouangraoua, A.: A general framework for gene tree correction based on duplication-loss reconciliation. In: LIPIcs, Workshop on Algorithms in Bioinformatics (WABI), vol. 88, pp. 8:1–8:14 (2017)Google Scholar
  27. 27.
    Fitch, W.: Homology—a personal view on some of the problems. Trends Genet. 16(5), 227–231 (2000)Google Scholar
  28. 28.
    Flicek, P., et al.: Ensembl 2012. Nucleic Acids Res. 40, D84–D90 (2012)Google Scholar
  29. 29.
    Gabaldón, T., Koonin, E.V.: Functional and evolutionary implications of gene orthology. Nat. Rev. Genet. 14(5), 360 (2013)Google Scholar
  30. 30.
    Goodman, M., Czelusniak, J., Moore, G., Romero-Herrera, A., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. 28, 132–163 (1979)Google Scholar
  31. 31.
    Górecki, P., Eulenstein, O.: Algorithms: simultaneous error-correction and rooting for gene tree reconciliation and the gene duplication problem. BMC Bioinform. 13(Supp 10), S14 (2011)Google Scholar
  32. 32.
    Gorecki, P., Eulenstein, O., Tiuryn, J.: Unrooted tree reconciliation: a unified approach. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(2), 522–536 (2013)Google Scholar
  33. 33.
    Guindon, S., Gascuel, O.: A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003)Google Scholar
  34. 34.
    Hallett, M., Lagergren, J.: Efficient algorithms for lateral gene transfer problems. In: Proceedings of the Fifth Annual International Conference on Computational Biology, RECOMB-CG, pp. 149–156 (2001)Google Scholar
  35. 35.
    Höhna, S., Drummond, A.J.: Guided tree topology proposals for bayesian phylogenetic inference. Syst. Biol. 61(1), 1–11 (2011)Google Scholar
  36. 36.
    Jacox, E., Chauve, C., Szöllősi, G.J., Ponty, Y., Scornavacca, C.: ecceTERA: comprehensive gene tree-species tree reconciliation using parsimony. Bioinformatics 32(13), 2056–2058 (2016). Scholar
  37. 37.
    Jacox, E., Weller, M., Tannier, E., Scornavacca, C.: Resolution and reconciliation of non-binary gene trees with transfers, duplications and losses. Bioinformatics 33(7), 980–987 (2017)Google Scholar
  38. 38.
    Kordi, M., Bansal, M.: On the complexity of duplication-transfer-loss reconciliation with non-binary gene trees. IEEE/ACM Trans. Comput. Biol. Bioinform. (2016)Google Scholar
  39. 39.
    Kordi, M., Bansal, M.: Exact algorithms for duplication-transfer-loss reconciliation with non-binary gene trees. IEEE/ACM Trans. Comput. Biol. Bioinform. (2017)Google Scholar
  40. 40.
    Lafond, M., Chauve, C., Dondi, R., Manuel, El-Mabrouk, N.: Polytomy refinement for the correction of dubious duplications in gene trees. Bioinformatics 30(17), i519–i526 (2014)Google Scholar
  41. 41.
    Lafond, M., Chauve, C., El-Mabrouk, N., Ouangraoua, A.: Gene tree construction and correction using supertree and reconciliation. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) PP(99), 12 pp. (2018)Google Scholar
  42. 42.
    Lafond, M., Noutahi, E., El-Mabrouk, N.: Efficient non-binary gene tree resolution with weighted reconciliation cos. In: 27th Annual Symposium on Combinatorial Pattern Matching (CPM) (2016)Google Scholar
  43. 43.
    Lafond, M., Ouangraoua, A., El-Mabrouk, N.: Reconstructing a supergenetree minimizing reconciliation. BMC Genomics 16, S4 (2015). Special issue of RECOMB-CG 2015Google Scholar
  44. 44.
    Lafond, M., Semeria, M., Swenson, K., Tannier, E., El-Mabrouk, N.: Gene tree correction guided by orthology. BMC Bioinform. 14(supp 15)(S5) (2013)Google Scholar
  45. 45.
    Lafond, M., Swenson, K., El-Mabrouk, N.: An optimal reconciliation algorithm for gene trees with polytomies. In: WABI. LNCS, vol. 7534, pp. 106–122 (2012)Google Scholar
  46. 46.
    Lafond, M., Swenson, K., El-Mabrouk, N.: Error detection and correction of gene trees. In: Models and Algorithms for Genome Evolution. Springer (2013)Google Scholar
  47. 47.
    Lai, H., Stolzer, M., Durand, D.: Fast heuristics for resolving weakly supported branches using duplication, transfers, and losses. In: RECOMB-CG, 22 pp. (2017)Google Scholar
  48. 48.
    Lartillot, N., Philippe, H.: A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21(6), 1095–1109 (2004). Scholar
  49. 49.
    Lechner, M., Findeiß, S., Steiner, L., Manja, M., Stadler, P., Prohaska, S.: Proteinortho: Detection of co-orthologs in large-scale analysis. BMC Bioinform. 12(1), 1 (2011)Google Scholar
  50. 50.
    Li, L., Stoeckert, C.J., Roos, D.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003)Google Scholar
  51. 51.
    Libeskind-Hadas, R., Charleston, M.: On the computational complexity of the reticulate cophylogeny reconstruction problem. J. Comput. Biol. 16 (2009)MathSciNetGoogle Scholar
  52. 52.
    Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)Google Scholar
  53. 53.
    Massey, S., Churbanov, A., Rastogi, S., Liberles, D.: Characterizing positive and negative selection and their phylogenetic effects. Gene 418, 22–26 (2008)Google Scholar
  54. 54.
    Moret, B., Warnow, T.: Molecular evolution: producing the biochemical data. In: Zimmer, E., Roalson, E. (eds.) Methods in Enzymology, Part B, vol. 395, pp. 673–700. Elsevier (2005)Google Scholar
  55. 55.
    Moret, B.M., Bader, D.A., Wyman, S., Warnow, T., Yan, M.: A new implementation and detailed study of breakpoint analysis. In: Biocomputing 2001, pp. 583–594. World Scientific (2000)Google Scholar
  56. 56.
    Ng, M., Wormald, N.: Reconstruction of rooted trees from subtrees. Discrete Appl. Math. 69, 19–31 (1996)MathSciNetzbMATHGoogle Scholar
  57. 57.
    Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol. Biol. 7(3) (2012)Google Scholar
  58. 58.
    Nguyen, T.H., Ranwez, V., Pointet, S., Chifolleau, A.M.A., Doyon, J.P., Berry, V.: Reconciliation and local gene tree rearrangement can be of mutual profit. Algorithms Mol. Biol. 8(1), 12 (2013). Scholar
  59. 59.
    Noutahi, E., El-Mabrouk, N.: GATC: a genetic algorithm for gene tree construction under the duplication-transfer-loss model of evolution. BMC Genomics 19(2), 102 (2018)Google Scholar
  60. 60.
    Noutahi, E., Semeria, M., Lafond, M., Seguin, J., Gueguen, L., El-Mabrouk, N., Tannier, E.: Efficient gene tree correction guided by genome evolution. PLoS One 11(8) (2016)Google Scholar
  61. 61.
    Ovadia, Y., Fielder, D., Conow, C., Libeskind-Hadas, R.: The cophylogeny reconstruction problem is NP-complete. J. Comput. Biol. 18(1), 59–65 (2011). Scholar
  62. 62.
    Page, R.D., Cotton, J.A.: Genetree: a tool for exploring gene family evolution. In: Comparative Genomics, pp. 525–536. Springer (2000)Google Scholar
  63. 63.
    Pattengale, N., Gottlieb, E., Moret, B.: Efficiently computing the Robinson-Foulds metric. J. Comput. Biol. 14(6), 724–735 (2007)MathSciNetGoogle Scholar
  64. 64.
    Ranwez, V., Berry, V., Criscuolo, A., Fabre, P., Guillemot, S., Scornavacca, C., Douzery, E.: PhySIC: a veto supertree method with desirable properties. Syst. Biol. 56(5), 798–817 (2007)Google Scholar
  65. 65.
    Ranwez, V., Criscuolo, A., Douzery, E.: SuperTriplets: a triplet-based supertree approach to phylogenomics. Bioinformatics 26(12), i115–i123 (2010)Google Scholar
  66. 66.
    Rasmussen, M., Kellis, M.: A Bayesian approach for fast and accurate gene tree reconstruction. Mol. Biol. Evol. 28(1), 273–290 (2010)Google Scholar
  67. 67.
    Rasmussen, M.D., Kellis, M.: Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res. 22(4), 755–765 (2012)Google Scholar
  68. 68.
    Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)MathSciNetzbMATHGoogle Scholar
  69. 69.
    Rodrìguez-Ezpeleta, N., Brinkmann, H., Roure, B., Lartillot, N., Lang, B.F., Philippe, H.: Detecting and overcoming systematic errors in genome-scale phylogenies. Syst. Biol. 56(3), 389–399 (2007). Scholar
  70. 70.
    Rogers, J., Fishberg, A., Youngs, N., Wu, Y.C.: Reconciliation feasibility in the presence of gene duplication, loss, and coalescence with multiple individuals per species. BMC Bioinform. 18(1), 292 (2017)Google Scholar
  71. 71.
    Ronquist, F., Huelsenbeck, J.: MrBayes3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003)Google Scholar
  72. 72.
    Roshan, U., Moret, B., Warnow, T., Williams, T.: Performance of supertree methods on various dataset decompositions. In: Bininda-Edmonds, O. (ed.) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp. 301–328. Springer (2004)Google Scholar
  73. 73.
    Scornavacca, C., van Iersel, L., Kelk, S., Bryant, D.: The agreement problem for unrooted phylogenetic trees is FPT. J. Graph Algorithms Appl. 18(3), 385–392 (2014)zbMATHGoogle Scholar
  74. 74.
    Scornavacca, C., Jacox, E., Szollosi, G.: Joint amalgamation of most parsimonious reconciled gene trees. Bioinformatics 31(6), 841–848 (2015)Google Scholar
  75. 75.
    Semple, C.: Reconstructing minimal rooted trees. Discrete Appl. Math. 127(3) (2003)MathSciNetzbMATHGoogle Scholar
  76. 76.
    Sjöstrand, J., Tofigh, A., Daubin, V., Arvestad, L., Sennblad, B., Lagergren, J.: A Bayesian method for analyzing lateral gene transfer. Sys. Biol. 63(3), 409–420 (2014)Google Scholar
  77. 77.
    Skovgaard, M., Kodra, J., Gram, D., Knudsen, S., Madsen, D., Liberles, D.: Using evolutionary information and ancestral sequences to understand the sequence-function relationship in GLP-1 agonists. J. Mol. Biol. 363, 977–988 (2006)Google Scholar
  78. 78.
    Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analysis with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006)Google Scholar
  79. 79.
    Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. J. Classif. 9, 91–116 (1992)MathSciNetzbMATHGoogle Scholar
  80. 80.
    Steel, M., Rodrigo, A.: Maximum likelihood supertrees. Syst. Biol. 57(2), 243–250 (2008)Google Scholar
  81. 81.
    Stolzer, M., Lai, H., Xu, M., Sathaye, D., Vernot, B., Durand, D.: Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics 28(18), i409–i415 (2012)Google Scholar
  82. 82.
    Swenson, K.M., El-Mabrouk, N.: Gene trees and species trees: irreconcilable differences. BMC Bioinform. 13(Suppl 19), S15 (2012)Google Scholar
  83. 83.
    Swenson, M., Suri, R., Linder, C., Warnow, T.: SuperFine: fast and accurate supertree estimation. Sys. Biol. 61(2), 214–227 (2012). Special issue RECOMB-CG 2012Google Scholar
  84. 84.
    Szöllősi, G., Rosikiewicz, W., Boussau, B., Tannier, E., Daubin, V.: Efficient exploration of the space of reconciled gene trees. Syst. Biol. 62(6), 901–912 (2013). Scholar
  85. 85.
    Szöllősi, G., E., Tannier, Daubin, V., Boussau, B.: The inference of gene trees with species trees. Syst. Biol. 64(1), e42–e62 (2014)Google Scholar
  86. 86.
    Szöllősi, G.J., Tannier, E., Lartillot, N., Daubin, V.: Lateral gene transfer from the dead. Syst. Biol. 62(3), 386–397 (2013)Google Scholar
  87. 87.
    Tatusov, R., Galperin, M., Natale, D., Koonin, E.: The COG database: a tool for genome-scale analysis of protein functions. Nucleic Acids Res. 28, 33–36 (2000)Google Scholar
  88. 88.
    Taylor, S., de la Cruz, K., Porter, M., Whiting, M.: Characterization of the long-wavelength opsin from Mecoptera and Siphonaptera: does a flea see? Mol. Biol. Evol. 22, 1165–1174 (2005)Google Scholar
  89. 89.
    Thomas, P.: GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC Bioinform. 11, 312 (2010)Google Scholar
  90. 90.
    Tofigh, A.: Using trees to capture reticulate evolution: lateral gene transfers and cancer progression. Ph.D. thesis, KTH Royal Institute of Technology, Sweden (2009)Google Scholar
  91. 91.
    Tofigh, A., Hallett, M., Lagergren, J.: Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(2), 517–535 (2011). Scholar
  92. 92.
    Vernot, B., Stolzer, M., Goldman, A., Durand, D.: Reconciliation with non-binary species trees. J. Comput. Biol. 15, 981–1006 (2009)MathSciNetGoogle Scholar
  93. 93.
    Wu, T., Zhang, L.: Structural properties of the reconciliation space and their applications in enumerating nearly-optimal reconciliations between a gene tree and a species tree. BMC Bioinform. 12, S7 (2011)Google Scholar
  94. 94.
    Wu, Y., Rasmussen, M., Bansal, M., Kellis, M.: TreeFix: statistically informed gene tree error correction using species trees. Syst. Biol. 62(1), 110–120 (2013)Google Scholar
  95. 95.
    Wu, Y., Rasmussen, M., Bansal, M., Kellis, M.: Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome Res. 24, 475–486 (2014)Google Scholar
  96. 96.
    Zhang, L.: On Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. J. Comput. Biol. 4, 177–188 (1997)Google Scholar
  97. 97.
    Zheng, Y., Wu, T., Zhang, L.: Reconciliation of gene and species trees with polytomies (2012). arXiv:1201.3995
  98. 98.
    Zheng, Y., Zhang, L.: Reconciliation with non-binary gene trees revisited. In: Proceedings of RECOMB. Lecture Notes in Computer Science, vol. 8394, pp. 418–432 (2014)Google Scholar
  99. 99.
    Zmasek, C.M., Eddy, S.R.: A simple algorithm to infer gene duplication and speciiation events on a gene tree. Bioinformatics 17, 821–828 (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Département d’Informatique et de Recherche Opérationnelle (DIRO)Université de MontréalMontrealCanada

Personalised recommendations