Error Detection and Correction of Gene Trees

  • Manuel Lafond
  • Krister M. Swenson
  • Nadia El-Mabrouk
Part of the Computational Biology book series (COBO, volume 19)


Reconstructing the phylogeny of a gene family and reconciling the obtained gene tree with the species tree reveals the history of duplications, losses, and other events that have shaped the gene family, with important implications towards the functional specificity of genes. However, evolutionary histories inferred by reconciliation are strongly dependent upon the accuracy of the trees, and few misplaced leaves will lead to a completely different history. Furthermore, sequence data alone often lack the information to confidently support a gene tree topology. We outline a number of criteria that can be used to detect erroneous gene trees. Analysing Ensembl gene trees of the fish genomes Stickleback, Medaka, Tetraodon, and Zebrafish reveals a significant number of erroneous gene trees. Finally, some potential directions for error correction of gene trees are explored.


Gene Tree Random Tree Lower Common Ancestor Winner Node Ensembl Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Akerborg, O., Sennblad, B., Arvestad, L., Lagergren, J.: Simultaneous Bayesian gene tree recons. and reconciliation analysis. Proc. Natl. Acad. Sci. 106(14), 5714–5719 (2009) CrossRefGoogle Scholar
  2. 2.
    Arvestad, L., Berglund, A.C., Lagergren, J., Sennblad, B.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: RECOMB, pp. 326–335 (2004) CrossRefGoogle Scholar
  3. 3.
    Boussau, B., Szllosi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23, 323–330 (2013) CrossRefGoogle Scholar
  4. 4.
    Beiko, R.G., Hamilton, N.: Phylogenetic identification of lateral genetic transfer events. BMC Evol. Biol. 6(15) (2006) Google Scholar
  5. 5.
    Bergeron, A., Chauve, C., Gingras, Y.: Formal models of gene clusters. In: Mandoiu, I., Zelikovsky, A. (eds.) Bioinformatics Algorithms: Techniques and Applications. Wiley, New York (2008). Chap. 8 Google Scholar
  6. 6.
    Bergeron, A., Corteel, S., Raffinot, M.: The algorithmic of gene teams. In: Algorithms in Bioinformatics, pp. 464–476 (2002) CrossRefGoogle Scholar
  7. 7.
    Bergeron, A., Stoye, J.: On the similarity of sets of permutations and its applications to genome comparison. J. Comput. Biol. 13, 1340–1354 (2003) MathSciNetCrossRefGoogle Scholar
  8. 8.
    Berglund-Sonnhammer, A.C., Steffansson, P., Betts, M.J., Liberles, D.A.: Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J. Mol. Evol. 63, 240–250 (2006) CrossRefGoogle Scholar
  9. 9.
    Chang, W.C., Eulenstein, O.: Reconciling gene trees with apparent polytomies. In: Chen, D.Z., Lee, D.T. (eds.) Proceedings of the 12th Conference on Computing and Combinatorics (COCOON). Lecture Notes in Computer Science, vol. 4112, pp. 235–244 (2006) CrossRefGoogle Scholar
  10. 10.
    Chaudhary, R., Burleigh, J.G., Eulenstein, O.: Efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence. BMC Bioinform. 13(Suppl. 10), S11 (2011) Google Scholar
  11. 11.
    Chaudhary, R., Burleigh, J.G., Fernandez-Baca, D.: Fast local search for unrooted Robinson–Foulds supertrees. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1004–1012 (2012) CrossRefGoogle Scholar
  12. 12.
    Chauve, C., El-Mabrouk, N.: New perspectives on gene family evolution: losses in reconciliation and a link with supertrees. In: RECOMB 2009. LNCS, vol. 5541, pp. 46–58. Springer, Berlin (2009) Google Scholar
  13. 13.
    Chen, K., Durand, D., Farach-Colton, M.: Notung: dating gene duplications using gene family trees. J. Comput. Biol. 7, 429–447 (2000) CrossRefGoogle Scholar
  14. 14.
    Dondi, R., El-Mabrouk, N.: Minimum leaf removal for reconciliation: complexity and algorithms. In: CPM. Lecture Notes in Computer Science, vol. 7354, pp. 399–412. Springer, Berlin (2012) Google Scholar
  15. 15.
    Dondi, R., El-Mabrouk, N., Swenson, K.M.: Gene tree correction for reconciliation and species tree inference: complexity and algorithms. J. Discrete Algorithms (2013). doi: 10.1016/j.jda.2013.06.001 MATHGoogle Scholar
  16. 16.
    Doroftei, A., El-Mabrouk, N.: Removing noise from gene trees. In: WABI. LNBI/LNBI, vol. 6833, pp. 76–91 (2011) Google Scholar
  17. 17.
    Durand, D., Haldórsson, B.V., Vernot, B.: A hybrid micro-macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13, 320–335 (2006) MathSciNetCrossRefGoogle Scholar
  18. 18.
    Durand, D., Sankoff, D.: Tests for gene clustering. J. Comput. Biol. 10(3–4), 453–482 (2003) CrossRefGoogle Scholar
  19. 19.
    Eulenstein, O., Mirkin, B., Vingron, M.: Duplication-based measures of difference between gene and species trees. J. Comput. Biol. 5, 135–148 (1998) CrossRefGoogle Scholar
  20. 20.
    Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981) CrossRefGoogle Scholar
  21. 21.
    Felsenstein, J.: PHYLIP(phylogeny inference package). Version 3.6 distributed by the author, Seattle (WA): Department of Genome Sciences, University of Washington (2005) Google Scholar
  22. 22.
    Fitch, W.M.: Homology: a personal view on some of the problems. Trends Genet. 16(5), 227–231 (2000) CrossRefGoogle Scholar
  23. 23.
    Flicek, P., Amode, M.R., Barrell, D., Beal, K., Brent, S., Carvalho-Silva, D., Clapham, P., Coates, G., Fairley, S., Fitzgerald, S., Gil, L., Gordon, L., Hendrix, M., Hourlier, T., Johnson, N., Khri, A.K., Keefe, D., Keenan, S., Kinsella, R., Komorowska, M., Koscielny, G., Kulesha, E., Larsson, P., Longden, I., McLaren, W., Muffato, M., Overduin, B., Pignatelli, M., Pritchard, B., Riat, H.S., Ritchie, G.R., Ruffier, M., Schuster, M., Sobral, D., Tang, Y.A., Taylor, K., Trevanion, S., Vandrovcova, J., White, S., Wilson, M., Wilder, S.P., Aken, B.L., Birney, E., Cunningham, F., Dunham, I., Durbin, R., Fernndez-Suarez, X.M., Harrow, J., Herrero, J., Hubbard, T.J., Parker, A., Proctor, G., Spudich, G., Vogel, J., Yates, A., Zadissa, A., Searle, S.M.: Ensembl 2012. Nucleic Acids Res. 40(Database Issue), D84–D90 (2012) CrossRefGoogle Scholar
  24. 24.
    Gorecki, P., Eulenstein, O.: Algorithms: simultaneous error-correction and rooting for gene tree reconciliation and the gene duplication problem. BMC Bioinform. 13(Suppl. 10), S14 (2011) Google Scholar
  25. 25.
    Gorecki, P., Eulenstein, O.: A linear-time algorithm for error-corrected reconciliation of unrooted gene trees. In: ISBRA. LNBI, vol. 6674, pp. 148–159. Springer, Berlin (2011) Google Scholar
  26. 26.
    Gorecki, P., Eulenstein, O.: A Robinson–Foulds measure to compare unrooted trees with rooted trees. In: Bleris, L. et al. (eds.) ISBRA. LNBI, vol. 7292, pp. 115–126 (2012) Google Scholar
  27. 27.
    Gorecki, P., Tiuryn, J.: DLS-trees: a model of evolutionary scenarios. Theor. Comput. Sci. 359, 378–399 (2006) MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    Guidon, S., Gascuel, O.: A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003) CrossRefGoogle Scholar
  29. 29.
    Philippe, H., Lopez, P., Brinkmann, H., Budin, K., Germot, A., Laurent, J., Moreira, D., Muller, M., Le Guyader, H.: Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc. R. Soc. Lond. B, Biol. Sci. 267, 1213–1221 (2000) CrossRefGoogle Scholar
  30. 30.
    Hahn, M.W.: Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol. 8(R141) (2007) Google Scholar
  31. 31.
    Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Algorithms in Bioinformatics, pp. 252–263 (2001) CrossRefGoogle Scholar
  32. 32.
    Li, H., Coghlan, A., Ruan, J., Coin, L.J., Hrich, J.K., Osmotherly, L., Li, R., Liu, T., Zhang, Z., Bolund, L., Wong, G.K., Zheng, W., Dehal, P., Wang, J., Durbin, R.: TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34(D572), 580 (2006) Google Scholar
  33. 33.
    Hoberman, R., Durand, D.: The incompatible desiderata of gene cluster properties. In: Comparative Genomics, pp. 73–87 (2005) CrossRefGoogle Scholar
  34. 34.
    Hoberman, R., Sankoff, D., Durand, D.: The statistical analysis of spatially clustered genes under the maximum gap criterion. J. Comput. Biol. 12(8), 1083–1102 (2005) CrossRefGoogle Scholar
  35. 35.
    Koonin, E.V.: Orthologs, paralogs and evolutionary genomics. Annu. Rev. Genet. 39, 309–338 (2005) CrossRefGoogle Scholar
  36. 36.
    Lafond, M., Swenson, K.M., El-Mabrouk, N.: An optimal reconciliation algorithm for gene trees with polytomies. In: WABI. LNCS, vol. 7534, pp. 106–122 (2012) Google Scholar
  37. 37.
    Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: Clustalw and clustalx version 2. Bioinformatics 23, 2947–2948 (2007) CrossRefGoogle Scholar
  38. 38.
    Massey, S.E., Churbanov, A., Rastogi, S., Liberles, D.A.: Characterizing positive and negative selection and their phylogenetic effects. Gene 418, 22–26 (2008) CrossRefGoogle Scholar
  39. 39.
    Miyata, T., Yasunaga, T.: Molecular evolution of MRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J. Mol. Evol. 16(1), 23–36 (1980) CrossRefGoogle Scholar
  40. 40.
    Nei, M., Gojobori, T.: Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426 (1986) Google Scholar
  41. 41.
    Nguyen, T.-H., Ranwez, V., Pointet, S., Chifolleau, A.-M.A., Doyon, J.-P., Berry, V.: Reconciliation and local gene tree rearrangement can be of mutual profit. Algorithms Mol. Biol. 8(12) (2013) Google Scholar
  42. 42.
    Rasmussen, M.D., Kellis, M.: Accurate gene-tree reconstruction by learning gene and species-specific substitution rates across multiple complete geneomes. Genome Res. 17, 1932–1942 (2007) CrossRefGoogle Scholar
  43. 43.
    Rasmussen, M.D., Kellis, M.: A Bayesian approach for fast and accurate gene tree reconstruction. Mol. Biol. Evol. 28(1), 273–290 (2011) CrossRefGoogle Scholar
  44. 44.
    Ronquist, F., Huelsenbeck, J.P.: MrBayes3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003) CrossRefGoogle Scholar
  45. 45.
    Ruan, J., Li, H., Chen, Z., Coghlan, A., Coin, L.J., Guo, Y., Hrich, J.K., Hu, Y., Kristiansen, K., Li, R., Liu, T., Moses, A., Qin, J., Vang, S., Vilella, A.J., Ureta-Vidal, A., Bolund, L., Wang, J., Durbin, R.: TreeFam: 2008 update. Nucleic Acids Res. 36(Suppl. 1), D735–D740 (2008) Google Scholar
  46. 46.
    Ruano-Rubio, V., Fares, V.: Artifactual phylogenies caused by correlated distribution of substitution rates among sites and lineages: the good, the bad and the ugly. Syst. Biol. 56, 68–82 (2007) CrossRefGoogle Scholar
  47. 47.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987) Google Scholar
  48. 48.
    Sankoff, D., Ferretti, V., Nadeau, J.H.: Conserved segment identification. J. Comput. Biol. 4(4), 559–565 (1997) CrossRefGoogle Scholar
  49. 49.
    Skovgaard, M., Kodra, J.T., Gram, D.X., Knudsen, S.M., Madsen, D., Liberles, D.A.: Using evolutionary information and ancestral sequences to understand the sequence-function relationship in GLP-1 agonists. J. Mol. Biol. 363, 977–988 (2006) CrossRefGoogle Scholar
  50. 50.
    Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G., Korf, I., Lapp, H., Lehvslaiho, H., Matsalla, C., Mungall, C.J., Osborne, B.I., Pocock, M.R., Schattner, P., Senger, M., Stein, L.D., Stupka, E., Wilkinson, M.D., Birney, E.: The bioperl toolkit: Perl modules for the life sciences. Genome Res. 12, 1611–1619 (2002) CrossRefGoogle Scholar
  51. 51.
    Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analysis with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006) CrossRefGoogle Scholar
  52. 52.
    Swenson, K.M., Doroftei, A., El-Mabrouk, N.: Gene tree correction for reconciliation and species tree inference. Algorithms Mol. Biol. 7(31) (2012) Google Scholar
  53. 53.
    Swenson, K.M., El-Mabrouk, N.: Gene trees and species trees: irreconcilable differences. BMC Bioinform. 13(Suppl. 19), S15 (2012) Google Scholar
  54. 54.
    Swofford, D.L.: PAUP: Phylogenetic Analysis Using Parsimony, 4th edn. Sinauer Associates, Sunderland (2002) Google Scholar
  55. 55.
    Taylor, S.D., de la Cruz, K.D., Porter, M.L., Whiting, M.F.: Characterization of the long-wavelength opsin from Mecoptera and Siphonaptera: does a flea see? Mol. Biol. Evol. 22, 1165–1174 (2005) CrossRefGoogle Scholar
  56. 56.
    Theobald, D.L.: A formal test of the theory of universal common ancestry. Nature 465(7295), 219–222 (2010) CrossRefGoogle Scholar
  57. 57.
    Thomas, P.D.: GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC Bioinform. 11, 312 (2010) CrossRefGoogle Scholar
  58. 58.
    Townsend, J.P.: Profiling phylogenetic informativeness. Syst. Biol. 56, 222–231 (2007) CrossRefGoogle Scholar
  59. 59.
    Vilella, A.J., Severin, J., Ureta-Vidal, A., Heng, L., Durbin, R., Birney, E.: EnsemblCompara gene trees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009) CrossRefGoogle Scholar
  60. 60.
    Wapinski, I., Pfeffer, A., Friedman, N., Regev, A.: Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics 23(13), i549–i558 (2007) CrossRefGoogle Scholar
  61. 61.
    Wong, K.M., Suchard, M.A., Huelsenbeck, J.P.: Alignment uncertainty and genomic analysis. Science 319, 473–476 (2008) MathSciNetCrossRefMATHGoogle Scholar
  62. 62.
    Wu, Y.C., Rasmussen, M.D., Bansal, M.S., Kellis, M.: TreeFix: statistically informed gene tree error correction using species trees. Syst. Biol. 62(1), 110–120 (2013) CrossRefGoogle Scholar
  63. 63.
    Xu, X., Sankoff, D.: Tests for gene clusters satisfying the generalized adjacency criterion. In: Advances in Bioinformatics and Computational Biology, pp. 152–160 (2008) CrossRefGoogle Scholar
  64. 64.
    Yang, Z.: Paml 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007) CrossRefGoogle Scholar
  65. 65.
    Yang, Z., Sankoff, D.: Natural parameter values for generalized gene adjacency. J. Comput. Biol. 17(9), 1113–1128 (2010) MathSciNetCrossRefGoogle Scholar
  66. 66.
    Zhu, Q., Adam, Z., Choi, V., Sankoff, D.: Generalized gene adjacencies, graph bandwidth, and clusters in yeast evolution. IEEE/ACM Trans. Comput. Biol. Bioinform. 6(2), 213–220 (2009) CrossRefGoogle Scholar
  67. 67.
    Zmasek, C.M., Eddy, S.R.: A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17, 821–828 (2001) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Manuel Lafond
    • 1
  • Krister M. Swenson
    • 1
    • 2
  • Nadia El-Mabrouk
    • 1
  1. 1.Département d’Informatique et de Recherche Opérationnelle (DIRO)Université de MontréalMontréalCanada
  2. 2.McGill UniversityMontrealCanada

Personalised recommendations