Skip to main content

Sources of Error and Incongruence in Phylogenomic Analyses

  • Chapter
  • First Online:
Phylogenomics

Abstract

Phylogenomic analyses can be performed by analysing gene trees separately and using coalescent or supertree analyses to retrieve a tree or using the supermatrix approach. In the latter case, all gene partitions are concatenated into a single dataset before conducting a phylogenetic analysis. Even though massive amounts of data help to reduce sampling error, several sources of errors may bias phylogenomic studies. Especially problematic is systematic error, which is due to the violation of substitution model assumptions, including problems with compositional heterogeneity, among-lineage rate variation and heterotachy. Several methods to detect and deal with these systematic errors have been and are being developed. Furthermore, large-scale phylogenomic studies sometimes exhibit large amounts of missing data, which are generally less problematic as shown in real data and simulation studies. Taxon sampling is another critical issue for phylogenomics, as sparsely sampled analyses might be affected by long-branch attraction artefacts. Data and taxa included should be carefully selected and highly saturated genes should be avoided, as well as phylogenetically unstable (rogue) taxa. Several methods are available to estimate and visualize the information content of genes, as well as the phylogenetic stability of taxa selected for the analysis. Finally, discordance of gene trees and species trees is not rare, and potential causes are incongruent lineage sorting, hybridization or horizontal gene transfer. Coalescent-based methods for species tree inference based on separate or binned gene tree analyses are able to deal with incomplete lineage sorting, whereas network analyses can be used to visualize conflict between gene trees in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Ababneh F, Jermiin LS, Ma C, Robinson J (2006) Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics 22:1225–1231

    Article  CAS  PubMed  Google Scholar 

  • Aberer AJ, Krompass D, Stamatakis A (2013) Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Syst Biol 62:162–166

    Article  PubMed  Google Scholar 

  • Adl SM, Simpson AGB, Farmer MA, Andersen RA, Anderson OR, Barta JR, Bowser SS, Brugerolle GUY, Fensome RA, Fredericq S, James TY, Karpov S, Kugrens P, Krug J, Lane CE, Lewis LA, Lodge J, Lynn DH, Mann DG, McCourt RM, Mendoza L, Moestrup Ø, Mozley-Standridge SE, Nerad TA, Shearer CA, Smirnov AV, Spiegel FW, Taylor MFJR (2005) The new higher level classification of eukaryotes with emphasis on the taxonomy of protists. J Eukaryot Microbiol 52:399–451

    Article  PubMed  Google Scholar 

  • Avise JC, Robinson TJ (2008) Hemiplasy: a new term in the lexicon of phylogenetics. Syst Biol 57:503–507

    Article  PubMed  Google Scholar 

  • Bergsten J (2005) A review of long-branch attraction. Cladistics 21:163–193

    Article  Google Scholar 

  • Bininda-Emonds ORP (2004) The evolution of supertrees. Trends Ecol Evol 19:315–322

    Article  PubMed  Google Scholar 

  • Blanquart S, Lartillot N (2008) A site- and time-heterogeneous model of amino acid replacement. Mol Biol Evol 25:842–858

    Article  CAS  PubMed  Google Scholar 

  • Bouckaert R, Lockhart P (2015) Capturing heterotachy through multi-gamma site models. bioRxiv. doi.org/10.1101/018101

  • Boussau B, Gouy M (2006) Efficient likelihood computations with nonreversible models of evolution. Syst Biol 55:756–768

    Article  PubMed  Google Scholar 

  • Brinkmann H, van der Giezen M, Zhou Y, de Raucourt GP, Philippe H (2005) An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol 54:743–757

    Article  PubMed  Google Scholar 

  • Criscuolo A, Gribaldo S (2010) BMGE (Block mapping and gathering with entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10:210

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Dávalos LM, Perkins SL (2008) Saturation and base composition bias explain phylogenomic conflict in Plasmodium. Genomics 91:433–442

    Article  PubMed  CAS  Google Scholar 

  • Dayhoff M, Schwarz R, Orcutt B (1978) A model of evolutionary change in proteins. In: Dayhoff M (ed) Atlas of protein sequence and structure, vol 5, Suppl. 3. National Biomedical Research Foundation. Washington, DC, pp 345–352

    Google Scholar 

  • de Queiroz A, Gatesy J (2007) The supermatrix approach to systematics. Trends Ecol Evol 22:34–41

    Article  PubMed  Google Scholar 

  • de Vienne DM, Ollier S, Aguileta G (2012) Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol Biol Evol 29:1587–1598

    Article  PubMed  CAS  Google Scholar 

  • Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genet 2:e68

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24:332–340

    Article  PubMed  Google Scholar 

  • Donoghue MJ, Doyle JA (2000) Seed plant phylogeny: demise of the anthophyte hypothesis? Curr Biol 10:R106–R109

    Article  CAS  PubMed  Google Scholar 

  • Dornburg A, Fisk JN, Tamagnan J, Townsend JP (2016) PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R. BMC Evol Biol 16:262

    Article  PubMed  PubMed Central  Google Scholar 

  • Drummond AJ, Suchard MA, Xie D, Rambaut A (2012) Bayesian Phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29:1969–1973

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, Sorensen MV, Haddock SHD, Schmidt-Rhaesa A, Okusu A, Kristensen RM, Wheeler WC, Martindale MQ, Giribet G (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452:745–750

    Article  CAS  PubMed  Google Scholar 

  • Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27:401–410

    Article  Google Scholar 

  • Foster PG, Hickey DA (1999) Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J Mol Evol 48:284–290

    Article  CAS  PubMed  Google Scholar 

  • Galtier N, Gouy M (1998) Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol 15:871–879

    Article  CAS  PubMed  Google Scholar 

  • Gatesy J, Baker RH (2005) Hidden likelihood support in genomic data: can forty-five wrongs make a right? Syst Biol 54:483–492

    Article  PubMed  Google Scholar 

  • Gatesy J, DeSalle R, Wahlberg N (2007) How many genes should a systematist sample? Conflicting insights from a phylogenomic matrix characterized by replicated incongruence. Syst Biol 56:355–363

    Article  CAS  PubMed  Google Scholar 

  • Gatesy J, Springer MS (2014) Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. Mol Phylogenet Evol 80:231–266

    Article  PubMed  Google Scholar 

  • Gee H (2003) Evolution: ending incongruence. Nature 425:782–782

    Article  CAS  PubMed  Google Scholar 

  • Gilbert PS, Chang J, Pan C, Sobel EM, Sinsheimer JS, Faircloth BC, Alfaro ME (2015) Genome-wide ultraconserved elements exhibit higher phylogenetic informativeness than traditional gene markers in percomorph fishes. Mol Phylogenet Evol 92:140–146

    Article  PubMed  PubMed Central  Google Scholar 

  • Giribet G (2016) Genomics and the animal tree of life: conflicts and future prospects. Zool Scr 45:14–21

    Article  Google Scholar 

  • Hahn MW, Nakhleh L (2016) Irrational exuberance for resolved species trees. Evolution 70:7–17

    Article  PubMed  Google Scholar 

  • Halanych KM (2004) The new view of animal phylogeny. Annu Rev Ecol Syst 35:229–256

    Article  Google Scholar 

  • Hasegawa M, Hashimoto T (1993) Ribosomal RNA trees misleading? Nature 361:23–23

    Article  CAS  PubMed  Google Scholar 

  • Heath TA, Hedtke SM, Hillis DM (2008) Taxon sampling and the accuracy of phylogenetic analyses. J Syst Evol 46:239–257

    Google Scholar 

  • Hejnol A, Obst M, Stamatakis A, Ott M, Rouse GW, Edgecombe GD, Martinez P, Baguñà J, Bailly X, Jondelius U, Wiens M, Müller WEG, Seaver E, Wheeler WC, Martindale MQ, Giribet G, Dunn CW (2009) Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc R Soc Lond B Biol Sci 276:4261–4270

    Article  Google Scholar 

  • Hendy MD, Penny D (1989) A framework for the quantitative study of evolutionary trees. Syst Biol 38:297–309

    Google Scholar 

  • Ho JWK, Adams CE, Lew JB, Matthews TJ, Ng CC, Shahabi-Sirjani A, Tan LH, Zhao Y, Easteal S, Wilson SR, Jermiin LS (2006) SeqVis: visualization of compositional heterogeneity in large alignments of nucleotides. Bioinformatics 22:2162–2163

    Article  CAS  PubMed  Google Scholar 

  • Hovmöller R, Lacey Knowles L, Kubatko LS (2013) Effects of missing data on species tree estimation under the coalescent. Mol Phylogenet Evol 69:1057–1062

    Article  PubMed  CAS  Google Scholar 

  • Huelsenbeck JP (1995) Performance of phylogenetic methods in simulation. Syst Biol 44:17–48

    Article  Google Scholar 

  • Hugall AF, Lee MSY (2007) The likelihood node density effect and consequence for evolutionary studies of molecular rates. Evolution 61:2293–2307

    Article  PubMed  Google Scholar 

  • Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SYW, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, da Fonseca RR, Li J, Zhang F, Li H, Zhou L, Narula N, Liu L, Ganapathy G, Boussau B, Bayzid MS, Zavidovych V, Subramanian S, Gabaldón T, Capella-Gutiérrez S, Huerta-Cepas J, Rekepalli B, Munch K, Schierup M, Lindow B, Warren WC, Ray D, Green RE, Bruford MW, Zhan X, Dixon A, Li S, Li N, Huang Y, Derryberry EP, Bertelsen MF, Sheldon FH, Brumfield RT, Mello CV, Lovell PV, Wirthlin M, Schneider MPC, Prosdocimi F, Samaniego JA, Velazquez AMV, Alfaro-Núñez A, Campos PF, Petersen B, Sicheritz-Ponten T, Pas A, Bailey T, Scofield P, Bunce M, Lambert DM, Zhou Q, Perelman P, Driskell AC, Shapiro B, Xiong Z, Zeng Y, Liu S, Li Z, Liu B, Wu K, Xiao J, Yinqi X, Zheng Q, Zhang Y, Yang H, Wang J, Smeds L, Rheindt FE, Braun M, Fjeldsa J, Orlando L, Barker FK, Jønsson KA, Johnson W, Koepfli K-P, O’Brien S, Haussler D, Ryder OA, Rahbek C, Willerslev E, Graves GR, Glenn TC, McCormack J, Burt D, Ellegren H, Alström P, Edwards SV, Stamatakis A, Mindell DP, Cracraft J, Braun EL, Warnow T, Jun W, Gilbert MTP, Zhang G (2014) Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346:1320–1331

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning of incongruence? Trends Genet 22:225–231

    Article  CAS  PubMed  Google Scholar 

  • Jermiin LS, Ho SYW, Ababneh F, Robinson J, Larkum AWD (2004) The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst Biol 53:638–643

    Article  PubMed  Google Scholar 

  • Jiang W, Chen S-Y, Wang H, Li D-Z, Wiens JJ (2014) Should genes with missing data be excluded from phylogenetic analyses? Mol Phylogenet Evol 80:308–318

    Article  PubMed  Google Scholar 

  • Knowles LL, Kubatko LS (2010) Estimating species trees: an introduction to concepts and models. In: Knowles LL, Kubatko LS (eds) Estimating species trees: practical and theoretical aspects. Wiley-Balckwell, Hoboken, pp 1–14

    Google Scholar 

  • Kobert K, Salichos L, Rokas A, Stamatakis A (2016) Computing the internode certainty and related measures from partial gene trees. Mol Biol Evol 33:1606–1617

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kolaczkowski B, Thornton JW (2004) Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:980–984

    Article  CAS  PubMed  Google Scholar 

  • Ku C, Martin WF (2016) A natural barrier to lateral gene transfer from prokaryotes to eukaryotes revealed from genomes: the 70% rule. BMC Biol 14:89

    Article  PubMed  PubMed Central  Google Scholar 

  • Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25:971–973

    Article  CAS  PubMed  Google Scholar 

  • Kück P, Struck TH (2014) BaCoCa—a heuristic software tool for the parallel assessment of sequence biases in hundreds of gene and taxon partitions. Mol Phylogenet Evol 70:94–98

    Article  PubMed  CAS  Google Scholar 

  • Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K (2012) Statistics and Truth in Phylogenomics. Mol Biol Evol 29:457–472

    Article  CAS  PubMed  Google Scholar 

  • Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol 7:S4

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Lartillot N, Philippe H (2008) Improvement of molecular phylogenetic inference and the phylogeny of Bilateria. Philos Trans R Soc Lond Ser B Biol Sci 363:1463–1472

    Article  Google Scholar 

  • Leaché AD, Rannala B (2011) The accuracy of species tree estimation under simulation: a comparison of methods. Syst Biol 60:126–137

    Article  PubMed  Google Scholar 

  • Lemmon AR, Brown JM, Stanger-Hall K, Lemmon EM (2009) The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and bayesian inference. Syst Biol 58:130–145

    Article  CAS  PubMed  Google Scholar 

  • Liu L (2008) BEST: bayesian estimation of species trees under the coalescent model. Bioinformatics 24:2542–2543

    Article  CAS  PubMed  Google Scholar 

  • Liu L, Xi Z, Wu S, Davis CC, Edwards SV (2015) Estimating phylogenetic trees from genome-scale data. Ann N Y Acad Sci 1360:36–53

    Article  PubMed  Google Scholar 

  • Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10:302

    Article  PubMed  PubMed Central  Google Scholar 

  • Lockhart P, Steel M (2005) A tale of two processes. Syst Biol 54:948–951

    Article  PubMed  Google Scholar 

  • López-Giráldez F, Townsend JP (2011) PhyDesign: an online application for profiling phylogenetic informativeness. BMC Evol Biol 11:152

    Article  PubMed  PubMed Central  Google Scholar 

  • Lopez P, Casane D, Philippe H (2002) Heterotachy, an important process of protein evolution. Mol Biol Evol 19:1–7

    Article  CAS  PubMed  Google Scholar 

  • Maddison WP (1997) Gene trees in species trees. Syst Biol 46:523–536

    Article  Google Scholar 

  • Mallet J (2007) Hybrid speciation. Nature 446:279–283

    Article  CAS  PubMed  Google Scholar 

  • Mariadassou M, Bar-Hen A, Kishino H (2012) Taxon influence index: assessing taxon-induced incongruities in phylogenetic inference. Syst Biol 61:337–345

    Article  PubMed  Google Scholar 

  • Mirarab S, Bayzid MS, Boussau B, Warnow T (2014) Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346 1250463.

    Google Scholar 

  • Misof B, Meyer B, von Reumont BM, Kück P, Misof K, Meusemann K (2013) Selecting informative subsets of sparse supermatrices increases the chance to find correct trees. BMC Bioinformatics 14:348

    Article  PubMed  PubMed Central  Google Scholar 

  • Mitchell A, Mitter C, Regier JC (2000) More taxa or more characters revisited: combining data from nuclear protein-encoding genes for phylogenetic analyses of noctuoidea (Insecta: lepidoptera). Syst Biol 49:202–224

    Article  CAS  PubMed  Google Scholar 

  • Miyamoto MM, Fitch WM (1995) Testing the covarion hypothesis of molecular evolution. Mol Biol Evol 12:503–513

    CAS  PubMed  Google Scholar 

  • Moroz LL, Kocot KM, Citarella MR, Dosung S, Norekian TP, Povolotskaya IS, Grigorenko AP, Dailey C, Berezikov E, Buckley KM, Ptitsyn A, Reshetov D, Mukherjee K, Moroz TP, Bobkova Y, Yu F, Kapitonov VV, Jurka J, Bobkov YV, Swore JJ, Girardo DO, Fodor A, Gusev F, Sanford R, Bruders R, Kittler E, Mills CE, Rast JP, Derelle R, Solovyev VV, Kondrashov FA, Swalla BJ, Sweedler JV, Rogaev EI, Halanych KM, Kohn AB (2014) The ctenophore genome and the evolutionary origins of neural systems. Nature 510:109–114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nesnidal MP, Helmkampf M, Bruchhaus I, Hausdorf B (2010) Compositional heterogeneity and phylogenomic inference of metazoan relationships. Mol Biol Evol 27:2095–2104

    Article  CAS  PubMed  Google Scholar 

  • Nosenko T, Schreiber F, Adamska M, Adamski M, Eitel M, Hammel J, Maldonado M, Müller WEG, Nickel M, Schierwater B, Vacelet J, Wiens M, Wörheide G (2013) Deep metazoan phylogeny: when different genes tell different stories. Mol Phylogenet Evol 67:223–233

    Article  PubMed  Google Scholar 

  • Parks SL, Goldman N (2014) Maximum likelihood inference of small trees in the presence of long branches. Syst Biol 63:798–811

    Article  PubMed  Google Scholar 

  • Philip GK, Creevey CJ, McInerney JO (2005) The opisthokonta and the ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the coelomata than ecdysozoa. Mol Biol Evol 22:1175–1184

    Article  CAS  PubMed  Google Scholar 

  • Philippe H, Brinkmann H, Lavrov DV, Littlewood DTJ, Manuel M, Wörheide G, Baurain D (2011) Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol 9:e1000602

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Philippe H, Derelle R, Lopez P, Pick K, Borchiellini C, Boury-Esnault N, Vacelet J, Renard E, Houliston E, Quéinnec E, Da Silva C, Wincker P, Le Guyader H, Leys S, Jackson DJ, Schreiber F, Erpenbeck D, Morgenstern B, Wörheide G, Manuel M (2009) Phylogenomics revives traditional views on deep animal relationships. Curr Biol 19:706–712

    Article  CAS  PubMed  Google Scholar 

  • Philippe H, Lartillot N, Brinkmann H (2005a) Multigene analyses of bilaterian animals corroborate the monophyly of ecdysozoa, lophotrochozoa, and protostomia. Mol Biol Evol 22:1246–1253

    Article  CAS  PubMed  Google Scholar 

  • Philippe H, Snell EA, Bapteste E, Lopez P, Holland PWH, Casane D (2004) Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol Biol Evol 21:1740–1752

    Article  CAS  PubMed  Google Scholar 

  • Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F (2005b) Heterotachy and long-branch attraction in phylogenetics. BMC Evol Biol 5:50

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Phillips MJ, Penny D (2003) The root of the mammalian tree inferred from whole mitochondrial genomes. Mol Phylogenet Evol 28:171–185

    Article  CAS  PubMed  Google Scholar 

  • Pisani D (2004) Identifying and removing fast-evolving sites using compatibility analysis: an example from the arthropoda. Syst Biol 53:978–989

    Article  PubMed  Google Scholar 

  • Pisani D, Pett W, Dohrmann M, Feuda R, Rota-Stabelli O, Philippe H, Lartillot N, Wörheide G (2015) Genomic data do not support comb jellies as the sister group to all other animals. Proc Natl Acad Sci U S A 112:15402–15407

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pol D, Siddall ME (2001) Biases in maximum likelihood and parsimony: a simulation approach to a 10-taxon case. Cladistics 17:266–281

    Article  Google Scholar 

  • Pollock DD, Zwickl DJ, McGuire JA, Hillis DM (2002) Increased taxon sampling is advantageous for phylogenetic inference. Syst Biol 51:664–671

    Article  PubMed  PubMed Central  Google Scholar 

  • Rannala B, Huelsenbeck JP, Yang Z, Nielsen R (1998) Taxon sampling and the accuracy of large phylogenies. Syst Biol 47:702–710

    Article  CAS  PubMed  Google Scholar 

  • Rivera-Rivera CJ, Montoya-Burgos JI (2016) LS3: a method for improving phylogenomic inferences when evolutionary rates are heterogeneous among taxa. Mol Biol Evol 33:1625–1634

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rodríguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF, Philippe H (2007) Detecting and overcoming systematic errors in genome-scale phylogenies. Syst Biol 56:389–399

    Article  PubMed  CAS  Google Scholar 

  • Rokas A, Abbot P (2009) Harnessing genomics for evolutionary insights. Trends Ecol Evol 24:192–200

    Article  PubMed  Google Scholar 

  • Rokas A, Carroll SB (2005) More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol Biol Evol 22:1337–1344

    Article  CAS  PubMed  Google Scholar 

  • Rokas A, Williams B, King N, Caroll S (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798–804

    Article  CAS  PubMed  Google Scholar 

  • Rosenberg MS, Kumar S (2001) Incomplete taxon sampling is not a problem for phylogenetic inference. Proc Natl Acad Sci U S A 98:10751–10756

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Roure B, Baurain D, Philippe H (2013) Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Mol Biol Evol 30:197–214

    Article  CAS  PubMed  Google Scholar 

  • Salichos L, Rokas A (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497:327–331

    Article  CAS  PubMed  Google Scholar 

  • Sanderson MJ, McMahon MM, Steel M (2010) Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol Biol 10:155

    Article  PubMed  PubMed Central  Google Scholar 

  • Sanderson MJ, Shaffer HB (2002) Troubleshooting molecular phylogenetic analyses. Annu Rev Ecol Syst 33:49–72

    Article  Google Scholar 

  • Sanderson MJ, Wojciechowski MF, Hu J-M, Khan TS, Brady SG (2000) Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol Biol Evol 17:782–797

    Article  CAS  PubMed  Google Scholar 

  • Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504

    Article  CAS  PubMed  Google Scholar 

  • Shen X-X, Salichos L, Rokas A (2016) A genome-scale investigation of how sequence, function, and tree-based gene properties influence phylogenetic inference. Genome Biol Evol 8:2565–2580

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Smith SA, Dunn CW (2008) Phyutility: a phyloinformatics tool for trees, alignments and molecular data. Bioinformatics 24:715–716

    Article  CAS  PubMed  Google Scholar 

  • Spencer M, Susko E, Roger AJ (2005) Likelihood, parsimony, and heterogeneous evolution. Mol Biol Evol 22:1161–1164

    Article  CAS  PubMed  Google Scholar 

  • Sperling EA, Pisani D, Peterson KJ (2007) Poriferan paraphyly and its implications for Precambrian palaeobiology. Geol Soc Lond Spec Publ 286:355–368

    Article  Google Scholar 

  • Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Steel MA, Lockhart PJ, Penny D (1993) Confidence in evolutionary trees from biological sequence data. Nature 364:440–442

    Article  CAS  PubMed  Google Scholar 

  • Strimmer K, von Haeseler A (1997) Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci U S A 94:6815–6819

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Struck TH, Nesnidal MP, Purschke G, Halanych KM (2008) Detecting possibly saturated positions in 18S and 28S sequences and their influence on phylogenetic reconstruction of Annelida (Lophotrochozoa). Mol Phylogenet Evol 48:628–645

    Article  CAS  PubMed  Google Scholar 

  • Suh A, Smeds L, Ellegren H (2015) The dynamics of incomplete lineage sorting across the ancient adaptive radiation of neoavian birds. PLoS Biol 13:e1002224

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Sullivan J, Swofford D, Naylor G (1999) The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Mol Biol Evol 16:1347

    Article  CAS  Google Scholar 

  • Susko E, Roger AJ (2007) On reduced amino acid alphabets for phylogenetic inference. Mol Biol Evol 24:2139–2150

    Article  CAS  PubMed  Google Scholar 

  • Tarrío R, Rodríguez-Trelles F, Ayala FJ (2001) Shared nucleotide composition biases among species and their impact on phylogenetic reconstructions of the drosophilidae. Mol Biol Evol 18:1464–1473

    Article  PubMed  Google Scholar 

  • Telford MJ, Moroz LL, Halanych KM (2016) Evolution: a sisterly dispute. Nature 529:286–287

    Article  CAS  PubMed  Google Scholar 

  • Thorley JL, Wilkinson M (1999) Testing the phylogenetic stability of early tetrapods. J Theor Biol 200:343–344

    Article  CAS  PubMed  Google Scholar 

  • Townsend JP (2007) Profiling phylogenetic informativeness. Syst Biol 56:222–231

    Article  CAS  PubMed  Google Scholar 

  • Van de Peer Y, Frickey T, Taylor JS, Meyer A (2002) Dealing with saturation at the amino acid level: a case study based on anciently duplicated zebrafish genes. Gene 295:205–211

    Article  CAS  PubMed  Google Scholar 

  • Wang H-C, Susko E, Roger AJ (2011) Fast statistical tests for detecting heterotachy in protein evolution. Mol Biol Evol 28:2305–2315

    Article  CAS  PubMed  Google Scholar 

  • Weigert A, Helm C, Meyer M, Nickel B, Arendt D, Hausdorf B, Santos SR, Halanych KM, Purschke G, Bleidorn C, Struck TH (2014) Illuminating the base of the annelid tree using transcriptomics. Mol Biol Evol 31:1391–1401

    Article  CAS  PubMed  Google Scholar 

  • Whelan NV, Halanych KM (2016) Who let the CAT out of the bag? Accurately dealing with substitutional heterogeneity in phylogenomic analyses. Syst Biol 52:696–704

    Google Scholar 

  • Whelan NV, Kocot KM, Moroz LL, Halanych KM (2015) Error, signal, and the placement of Ctenophora sister to all other animals. Proc Natl Acad Sci U S A 112:5773–5778

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Whelan S, Blackburne BP, Spencer M (2011) Phylogenetic substitution models for detecting heterotachy during plastid evolution. Mol Biol Evol 28:449–458

    Article  CAS  PubMed  Google Scholar 

  • White W, Hills S, Gaddam R, Holland B, Penny D (2007) Treeness triangles: visualizing the loss of phylogenetic signal. Mol Biol Evol 24:2029–2039

    Article  CAS  PubMed  Google Scholar 

  • Wiens JJ (1998) Does adding characters with missing data increase or decrease phylogenetic accuracy? Syst Biol 47:625–640

    Article  CAS  PubMed  Google Scholar 

  • Wiens JJ (2003) Missing data, incomplete taxa, and phylogenetic accuracy. Syst Biol 52:528–538

    Article  PubMed  Google Scholar 

  • Wiens JJ, Morrill MC (2011) Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. Syst Biol 60:719–731

    Article  PubMed  Google Scholar 

  • Wu J, Susko E (2011) A test for heterotachy using multiple pairs of sequences. Mol Biol Evol 28:1661–1673

    Article  CAS  PubMed  Google Scholar 

  • Xia X (2013) DAMBE5: A comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xia X, Xie Z, Salemi M, Chen L, Wang Y (2003) An index of substitution saturation and its application. Mol Phylogenet Evol 26:1–7

    Article  CAS  PubMed  Google Scholar 

  • Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 11:367–372

    Article  CAS  PubMed  Google Scholar 

  • Zwickl DJ, Hillis DM (2002) Increased taxon sampling greatly reduces phylogenetic error. Syst Biol 51:588–598

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Bleidorn, C. (2017). Sources of Error and Incongruence in Phylogenomic Analyses. In: Phylogenomics. Springer, Cham. https://doi.org/10.1007/978-3-319-54064-1_9

Download citation

Publish with us

Policies and ethics