Biologia Plantarum

, Volume 60, Issue 4, pp 619–627 | Cite as

A plant biologists’ guide to phylogenetic analysis of biological macromolecule sequences



Phylogenetic analysis has become a common step in characterization of gene and protein sequences. However, despite the availability of numerous affordable and more-or-less intuitive software tools, construction of biologically relevant, informative phylogenetic trees remains a process involving several critical steps that are inherently non-algorithmic, i.e., dependent on decisions made by the user. These steps involve, but are not limited to, setting the aims of the phylogenetic study, choosing sequences to be analyzed, and selecting methods employed in sequence alignment construction, as well as algorithms and parameters used to construct the actual phylogenetic tree. This review aims towards providing guidance for these decisions, as well as illustrating common pitfalls and problems occurring during phylogenetic analysis of plant gene sequences.

Additional key words

bioinformatics evolution phylogenetic tree protein domain identification sequence alignment sequence database searching 



basic local alignment search tool


conserved domain search


constraint-based multiple protein alignment tool


DNA data bank of Japan


European nucleotide archive


international nucleotide sequence database collaboration


multiple alignment construction and analysis workbench


multiple alignment using fast Fourier transform


molecular evolutionary genetics analysis


maximum likelihood


multiple sequence comparison by log-expectation


National Centre for Biotechnology Information




phylogenetic analysis using parsimony


phylogeny inference package


simple modular architecture research tool


tree and reticulogram reconstruction


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Al Ait, L., Yamak, Z., Morgenstern, B.: DIALIGN at GOBICS–multiple sequence alignment using various sources of external information. — Nucl. Acids Res. 41: W3–W7, 2013.CrossRefPubMedPubMedCentralGoogle Scholar
  2. Baldauf, S.L.: Phylogeny for the faint of heart: a tutorial. — Trends Genet. 19: 345–351, 2003.CrossRefPubMedGoogle Scholar
  3. Bateman, A., The uniprot consortium: UniProt: a hub for protein information. - Nucl. Acids Res. 43: D204–D212, 2015.CrossRefGoogle Scholar
  4. Baum, D.: Reading a phylogenetic tree: the meaning of monophyletic groups. — Natur. Edu. 1: 190, 2008.Google Scholar
  5. Blouin, C., Perry, S., Lavell, A., Susko, E., Roger, A.J.: Reproducing the manual annotation of multiple sequence alignments using a SVM classifier. — Bioinformatics 25: 3093–3098, 2009.CrossRefPubMedPubMedCentralGoogle Scholar
  6. Boc, A., Diallo, A.B., Makarenkov, V.: T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. — Nucl. Acids Res. 40: W573–W579, 2012.CrossRefPubMedPubMedCentralGoogle Scholar
  7. Capella-Gutierrez, S., Silla-Martinez, J.M., Gabaldon, T.: trimAl: a tool for automated alignment trimming in largescale phylogenetic analyses. - Bioinformatics 25: 1972–1973, 2009.CrossRefPubMedPubMedCentralGoogle Scholar
  8. Chothia, C., Lesk, A.M.: The relation between the divergence of sequence and structure in proteins. — EMBO J. 5: 823–826, 1986.PubMedPubMedCentralGoogle Scholar
  9. Cochrane, G., Karsch-Mizrachi, I., Nakamura, Y.: The international nucleotide sequence database collaboration. — Nucl. Acids Res. 39: D15–D18, 2011.CrossRefPubMedGoogle Scholar
  10. Criscuolo, A., Gribaldo, S.: BMGE (block mapping and gathering with entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. — BMC Evol. Biol. 10: 210, 2010.CrossRefPubMedPubMedCentralGoogle Scholar
  11. Cvrčková, F., Grunt, M., Bezvoda, R., Hála, M., Kulich, I., Rawat, A., Žárský, V.: Evolution of the land plant exocyst complexes. — Front. Plant Sci. 3: 159, 2012.PubMedPubMedCentralGoogle Scholar
  12. Cvrčková, F., Pícková, D., Novotný, M., Žárský, V.: Formin homology 2 domains occur in multiple contexts in angiosperms. — BMC Genomics 5: 44, 2004.CrossRefPubMedPubMedCentralGoogle Scholar
  13. De Castro E., Sigrist, C.J.A., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P.S., Gasteiger, E., Bairoch, A., Hulo, N.: ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. — Nucl. Acids Res. 34: W362–365, 2006.CrossRefPubMedPubMedCentralGoogle Scholar
  14. Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.F., Guindon, S., Lefort, V., Lescot, M., Claverie, J.M., Gascuel, O.: robust phylogenetic analysis for the non-specialist. — Nucl. Acids Res. 36: W465–W469, 2008.CrossRefPubMedPubMedCentralGoogle Scholar
  15. Douady, C.J., Delsuc, F., Boucher, Y., Doolittle, W.F., Douzery, E.J.: Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. — Mol. Biol. Evol. 20: 248–254, 2003.CrossRefPubMedGoogle Scholar
  16. Dvořáková, L., Cvrčková, F., Fischer, L.: Analysis of the hybrid proline-rich protein families from seven plant species suggests rapid diversification of their sequences and expression patterns. — BMC Genomics 8: 412, 2007.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. — Nucl. Acids Res. 32: 1792–1797, 2004.CrossRefPubMedPubMedCentralGoogle Scholar
  18. Egli, B., Kölling, K., Köhler, C., Zeeman, S.C., Streb, S.: Loss of cytosolic phosphoglucomutase compromises gametophyte development in Arabidopsis. — Plant Physiol. 154: 1659–1671, 2010.CrossRefPubMedPubMedCentralGoogle Scholar
  19. Eliáš, M., Potocký, M., Cvrčková, F. Žárský, V.: Molecular diversity of phospholipase D in angiosperms. — BMC Genomics 3: 2, 2002.CrossRefPubMedPubMedCentralGoogle Scholar
  20. Felsenstein, J.: PHYLIP - phylogeny inference package (version 3.2). — Cladistics 5: 164–166, 1989.Google Scholar
  21. Fernandez-Pozo, N., Menda, N., Edwards, J.D., Saha, S., Tecle, I.Y., Strickler, S.R., Bombarely, A., Fisher-York, T., Pujar, A., Foerster, H., Yan, A., Mueller, L.A.: The sol genomics network (SGN)–from genotype to phenotype to breeding. — Nucl. Acids Res. 43: D1036–D1041, 2015.CrossRefPubMedGoogle Scholar
  22. Gish, L.A., Clark. S.E.: The RLK/Pelle family of kinases. — Plant J. 66: 117–127, 2011.CrossRefPubMedPubMedCentralGoogle Scholar
  23. Goldman N.: Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. — System. Biol. 39: 345–361, 1990.Google Scholar
  24. Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., Rokhsar, D.S.: Phytozome: a comparative platform for green plant genomics. — Nucl. Acids Res. 40: D1178–D186, 2012.CrossRefPubMedGoogle Scholar
  25. Grunt, M., Žárský, V., Cvrčková, F.: Roots of angiosperm formins: the evolutionary history of plant FH2 domaincontaining proteins. — BMC Evol. Biol. 8: 115, 2008.CrossRefPubMedPubMedCentralGoogle Scholar
  26. Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. — System. Biol. 59: 307–321, 2010.CrossRefGoogle Scholar
  27. Hall, B.G.: Building phylogenetic trees from molecular data with MEGA. — Mol. Biol. Evol. 30: 1229–1235, 2013.CrossRefPubMedGoogle Scholar
  28. Hall, T.: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. — Nucl. Acids Symp. Ser. 41: 95–98, 1999.Google Scholar
  29. Harrison, C.J., Langdale, J.: A step by step guide to phylogeny reconstruction. — Plant J. 45: 561–572, 2006.CrossRefPubMedGoogle Scholar
  30. Higgins, D.G, Sharp, P.M.: CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. — Gene 73: 237–244, 1988.CrossRefPubMedGoogle Scholar
  31. Holder, M., Lewis, P.O.: Phylogeny estimation: traditional and Bayesian approaches. — Natur. Rev. Genet. 4: 275–284, 2003.CrossRefGoogle Scholar
  32. Howe, C.J., Windram, H.F.: Phylomemetics–evolutionary analysis beyond the gene. — PLoS Biol. 9: e1001069, 2011.CrossRefGoogle Scholar
  33. Huelsenbeck, J.P., Larget, B., Miller, R.E., Ronquist, F.: Potential applications and pitfalls of Bayesian inference of phylogeny. — System. Biol. 51: 673–688, 2002.CrossRefGoogle Scholar
  34. Jiao, Y., Paterson, A.H.: Polyploidy-associated genome modifications during land plant evolution. — Phil. Trans. Roy. Soc. London B Biol. Sci. 369: 20130355, 2014.CrossRefGoogle Scholar
  35. Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., Madden, T.L.: NCBI BLAST: a better web interface. — Nucl. Acids Res. 36: W5–W9, 2008.CrossRefPubMedPubMedCentralGoogle Scholar
  36. Katoh, K., Standley, C.M.: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. — Mol. Biol. Evol. 30: 772–780, 2013.CrossRefPubMedPubMedCentralGoogle Scholar
  37. Kuraku, S., Feiner, N., Keeley, S.D., Hara, Y.: Incorporating tree-thinking and evolutionary time scale into developmental biology. - Dev. Growth Differentiation 58: 131–142, 2016.CrossRefGoogle Scholar
  38. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: Clustal W and Clustal X version 2.0. — Bioinformatics 23: 2947–2948, 2007.CrossRefPubMedGoogle Scholar
  39. Lassmann, T., Frings, O., Sonnhammer, E.L.L.: Kalign2: highperformance multiple alignment of protein and nucleotide sequences allowing external features. — Nucl. Acids Res. 37: 858–865, 2009.CrossRefPubMedGoogle Scholar
  40. Letunic, I., Doerks, T., Bork, P.: SMART: recent updates, new developments and status in 2015. — Nucl. Acids Res. 43: D257–D260, 2015.CrossRefPubMedGoogle Scholar
  41. Marchler-Bauer, A., Bryant, S.H: CD-Search: protein domain annotations on the fly. - Nucl. Acids Res. 32: W327–W331, 2004.CrossRefPubMedPubMedCentralGoogle Scholar
  42. Marchler-Bauer, A., Derbyshire, M.K., Gonzales, N.R., Lu, S., Chitsaz, F., Geer, L.Y., Geer, R.C., He, J., Gwadz, M., Hurwitz, D.I., Lanczycki, C.J., Lu, F., Marchler, G.H., Song, J.S., Thanki, N., Wang, Z., Yamashita, R.A., Zhang, D., Zheng, C., Bryant, S.H.: CDD: NCBI's conserved domain database. - Nucl. Acids Res. 43: D222–D226, 2015.CrossRefPubMedGoogle Scholar
  43. McGinnis, S., Madden, T.L.: BLAST: at the core of a powerful and diverse set of sequence analysis tools. - Nucl. Acids Res. 32: W20–W25, 2004.CrossRefPubMedPubMedCentralGoogle Scholar
  44. Monaco, M.K., Stein, J., Naithani, S., Wei, S., Dharmawardhana, P., Kumari, S., Amarasinghe, V., Youens-Clark, K., Thomason, J., Preece, J., Pasternak, S., Olson, A., Jiao, Y., Lu, Z., Bolser, D., Kerhornou, A., Staines, D., Walts, B., Wu, G., D'Eustachio, P., Haw, R., Croft, D., Kersey, P.J., Stein, L., Jaiswal, P., Ware, D.: Gramene 2013: comparative plant genomics resources. - Nucl. Acids Res. 42: D1193–D1199, 2014.CrossRefPubMedGoogle Scholar
  45. Moretti, S., Armougom, F., Wallace, I.M., Higgins, D.G., Jongeneel, C.V., Notredame, C.: The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. - Nucl. Acids Res. 35: W645–W648, 2007.CrossRefPubMedPubMedCentralGoogle Scholar
  46. Mühlbach H, Schnarrenberger C.: Properties and intracellular distribution of two phosphoglucomutases from spinach leaves. — Planta 141: 65–70, 1978.CrossRefPubMedGoogle Scholar
  47. Notredame. C., Higgins, D.G., Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. - J. mol. Biol. 302: 205–217, 2000.CrossRefPubMedGoogle Scholar
  48. O'Halloran, D.: A practical guide to phylogenetics for nonexperts. — J. visual Exp. 84: e50975, 2014.Google Scholar
  49. Pais, F.S.M., Ruy, P.C., Oliveira, G., Coimbra, R.S.:. Assessing the efficiency of multiple sequence alignment programs. - Algorithms mol. Biol. 9: 4, 2014.CrossRefPubMedPubMedCentralGoogle Scholar
  50. Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. — Bioinformatics 23: 1073–1079, 2007.CrossRefPubMedGoogle Scholar
  51. Pible, O., Armengaud, J.: Improving the quality of genome, protein sequence, and taxonomy databases: a prerequisite for microbiome meta-omics 2.0. — Proteomics 15: 3418–3423, 2015.CrossRefPubMedGoogle Scholar
  52. Rannala, B., Yang, Z.: Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. — J. mol. Evol. 43: 304–311, 1996.CrossRefPubMedGoogle Scholar
  53. Rieppel, O.: The series, the network, and the tree: changing metaphors of order in nature. — Biol. Phil. 25: 475–496, 2010.CrossRefGoogle Scholar
  54. Sánchez, R., Serra, F., Tárraga, J., Medina, I., Carbonell, J., Pulido, L., de María, A., Capella-Gutíerrez, S., Huerta-Cepas, J., Gabaldón, T., Dopazo, J., Dopazo, H.: Phylemon 2.0: a suite of web-tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing. - Nucl. Acids Res. 39: W470–W474. 2011.CrossRefPubMedPubMedCentralGoogle Scholar
  55. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees.–Mol. Biol. Evol. 4: 406–425, 1987.PubMedGoogle Scholar
  56. Schuler, G.D., Altschul, S.F., Lipman, D.J.: A workbench for multiple alignment construction and analysis. — Proteins 9: 180–190, 1991CrossRefPubMedGoogle Scholar
  57. Soltis, D.E., Albert, V.A., Leebens-Mack, J., Bell, C.D., Paterson, A.H., Zheng, C., Sankoff, D., de Pamphilis, C.W., Wall, P.K., Soltis, P.S.: Polyploidy and angiosperm diversification. — Amer. J. Bot. 96: 336–348, 2009.CrossRefGoogle Scholar
  58. Talavera, G., Castresana, J.: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. — System. Biol. 56: 564–577, 2007.CrossRefGoogle Scholar
  59. Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. — Mol. Biol. Evol. 30: 2725–2729, 2013.CrossRefPubMedPubMedCentralGoogle Scholar
  60. Wilgenbusch, J.C., Swofford, D.: Inferring evolutionary trees with PAUP*. - Current Protocols Bioinformatics 6: Unit 6.4, 2003.Google Scholar
  61. Yuksel, B., Memon, A.R.: Comparative phylogenetic analysis of small GTP-binding genes of model legume plants and assessment of their roles in root nodules. — J. exp. Bot. 59: 3831–3844, 2008.CrossRefPubMedPubMedCentralGoogle Scholar
  62. Zhang, X.C., Wang, Z., Zhang, X., Le, M.H., Sun, J., Xu, D., Cheng, J., Stacey, G.: Evolutionary dynamics of protein domain architecture in plants. — BMC Evol. Biol. 12: 6, 2012.CrossRefPubMedPubMedCentralGoogle Scholar
  63. Żmieńko, A., Samelak, A., Kozłowski, P., Figlerowicz, M.: Copy number polymorphism in plant genomes. — Theor. appl. Genet. 127: 1–18, 2014.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.Department of Experimental Plant Biology, Faculty of SciencesCharles UniversityPragueCzech Republic

Personalised recommendations