Skip to main content
Log in

A plant biologists’ guide to phylogenetic analysis of biological macromolecule sequences

  • Review
  • Published:
Biologia Plantarum

Abstract

Phylogenetic analysis has become a common step in characterization of gene and protein sequences. However, despite the availability of numerous affordable and more-or-less intuitive software tools, construction of biologically relevant, informative phylogenetic trees remains a process involving several critical steps that are inherently non-algorithmic, i.e., dependent on decisions made by the user. These steps involve, but are not limited to, setting the aims of the phylogenetic study, choosing sequences to be analyzed, and selecting methods employed in sequence alignment construction, as well as algorithms and parameters used to construct the actual phylogenetic tree. This review aims towards providing guidance for these decisions, as well as illustrating common pitfalls and problems occurring during phylogenetic analysis of plant gene sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Abbreviations

BLAST :

basic local alignment search tool

CD-search:

conserved domain search

COBALT :

constraint-based multiple protein alignment tool

DDBJ :

DNA data bank of Japan

ENA :

European nucleotide archive

INSDC :

international nucleotide sequence database collaboration

MACAW :

multiple alignment construction and analysis workbench

MAFFT :

multiple alignment using fast Fourier transform

MEGA :

molecular evolutionary genetics analysis

ML:

maximum likelihood

MUSCLE :

multiple sequence comparison by log-expectation

NCBI :

National Centre for Biotechnology Information

NJ:

neighbor-joining

PAUP :

phylogenetic analysis using parsimony

PHYLIP :

phylogeny inference package

SMART :

simple modular architecture research tool

T-REX :

tree and reticulogram reconstruction

References

  • Al Ait, L., Yamak, Z., Morgenstern, B.: DIALIGN at GOBICS–multiple sequence alignment using various sources of external information. — Nucl. Acids Res. 41: W3–W7, 2013.

    Article  PubMed  PubMed Central  Google Scholar 

  • Baldauf, S.L.: Phylogeny for the faint of heart: a tutorial. — Trends Genet. 19: 345–351, 2003.

    Article  CAS  PubMed  Google Scholar 

  • Bateman, A., The uniprot consortium: UniProt: a hub for protein information. - Nucl. Acids Res. 43: D204–D212, 2015.

    Article  Google Scholar 

  • Baum, D.: Reading a phylogenetic tree: the meaning of monophyletic groups. — Natur. Edu. 1: 190, 2008.

    Google Scholar 

  • Blouin, C., Perry, S., Lavell, A., Susko, E., Roger, A.J.: Reproducing the manual annotation of multiple sequence alignments using a SVM classifier. — Bioinformatics 25: 3093–3098, 2009.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Boc, A., Diallo, A.B., Makarenkov, V.: T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. — Nucl. Acids Res. 40: W573–W579, 2012.

    Article  PubMed  PubMed Central  Google Scholar 

  • Capella-Gutierrez, S., Silla-Martinez, J.M., Gabaldon, T.: trimAl: a tool for automated alignment trimming in largescale phylogenetic analyses. - Bioinformatics 25: 1972–1973, 2009.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chothia, C., Lesk, A.M.: The relation between the divergence of sequence and structure in proteins. — EMBO J. 5: 823–826, 1986.

    CAS  PubMed  PubMed Central  Google Scholar 

  • Cochrane, G., Karsch-Mizrachi, I., Nakamura, Y.: The international nucleotide sequence database collaboration. — Nucl. Acids Res. 39: D15–D18, 2011.

    Article  CAS  PubMed  Google Scholar 

  • Criscuolo, A., Gribaldo, S.: BMGE (block mapping and gathering with entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. — BMC Evol. Biol. 10: 210, 2010.

    Article  PubMed  PubMed Central  Google Scholar 

  • Cvrčková, F., Grunt, M., Bezvoda, R., Hála, M., Kulich, I., Rawat, A., Žárský, V.: Evolution of the land plant exocyst complexes. — Front. Plant Sci. 3: 159, 2012.

    PubMed  PubMed Central  Google Scholar 

  • Cvrčková, F., Pícková, D., Novotný, M., Žárský, V.: Formin homology 2 domains occur in multiple contexts in angiosperms. — BMC Genomics 5: 44, 2004.

    Article  PubMed  PubMed Central  Google Scholar 

  • De Castro E., Sigrist, C.J.A., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P.S., Gasteiger, E., Bairoch, A., Hulo, N.: ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. — Nucl. Acids Res. 34: W362–365, 2006.

    Article  PubMed  PubMed Central  Google Scholar 

  • Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.F., Guindon, S., Lefort, V., Lescot, M., Claverie, J.M., Gascuel, O.: Phylogeny.fr: robust phylogenetic analysis for the non-specialist. — Nucl. Acids Res. 36: W465–W469, 2008.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Douady, C.J., Delsuc, F., Boucher, Y., Doolittle, W.F., Douzery, E.J.: Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. — Mol. Biol. Evol. 20: 248–254, 2003.

    Article  CAS  PubMed  Google Scholar 

  • Dvořáková, L., Cvrčková, F., Fischer, L.: Analysis of the hybrid proline-rich protein families from seven plant species suggests rapid diversification of their sequences and expression patterns. — BMC Genomics 8: 412, 2007.

    Article  PubMed  PubMed Central  Google Scholar 

  • Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. — Nucl. Acids Res. 32: 1792–1797, 2004.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Egli, B., Kölling, K., Köhler, C., Zeeman, S.C., Streb, S.: Loss of cytosolic phosphoglucomutase compromises gametophyte development in Arabidopsis. — Plant Physiol. 154: 1659–1671, 2010.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Eliáš, M., Potocký, M., Cvrčková, F. Žárský, V.: Molecular diversity of phospholipase D in angiosperms. — BMC Genomics 3: 2, 2002.

    Article  PubMed  PubMed Central  Google Scholar 

  • Felsenstein, J.: PHYLIP - phylogeny inference package (version 3.2). — Cladistics 5: 164–166, 1989.

    Google Scholar 

  • Fernandez-Pozo, N., Menda, N., Edwards, J.D., Saha, S., Tecle, I.Y., Strickler, S.R., Bombarely, A., Fisher-York, T., Pujar, A., Foerster, H., Yan, A., Mueller, L.A.: The sol genomics network (SGN)–from genotype to phenotype to breeding. — Nucl. Acids Res. 43: D1036–D1041, 2015.

    Article  PubMed  Google Scholar 

  • Gish, L.A., Clark. S.E.: The RLK/Pelle family of kinases. — Plant J. 66: 117–127, 2011.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Goldman N.: Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. — System. Biol. 39: 345–361, 1990.

    Google Scholar 

  • Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., Rokhsar, D.S.: Phytozome: a comparative platform for green plant genomics. — Nucl. Acids Res. 40: D1178–D186, 2012.

    Article  CAS  PubMed  Google Scholar 

  • Grunt, M., Žárský, V., Cvrčková, F.: Roots of angiosperm formins: the evolutionary history of plant FH2 domaincontaining proteins. — BMC Evol. Biol. 8: 115, 2008.

    Article  PubMed  PubMed Central  Google Scholar 

  • Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. — System. Biol. 59: 307–321, 2010.

    Article  CAS  Google Scholar 

  • Hall, B.G.: Building phylogenetic trees from molecular data with MEGA. — Mol. Biol. Evol. 30: 1229–1235, 2013.

    Article  CAS  PubMed  Google Scholar 

  • Hall, T.: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. — Nucl. Acids Symp. Ser. 41: 95–98, 1999.

    CAS  Google Scholar 

  • Harrison, C.J., Langdale, J.: A step by step guide to phylogeny reconstruction. — Plant J. 45: 561–572, 2006.

    Article  CAS  PubMed  Google Scholar 

  • Higgins, D.G, Sharp, P.M.: CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. — Gene 73: 237–244, 1988.

    Article  CAS  PubMed  Google Scholar 

  • Holder, M., Lewis, P.O.: Phylogeny estimation: traditional and Bayesian approaches. — Natur. Rev. Genet. 4: 275–284, 2003.

    Article  CAS  Google Scholar 

  • Howe, C.J., Windram, H.F.: Phylomemetics–evolutionary analysis beyond the gene. — PLoS Biol. 9: e1001069, 2011.

    Article  Google Scholar 

  • Huelsenbeck, J.P., Larget, B., Miller, R.E., Ronquist, F.: Potential applications and pitfalls of Bayesian inference of phylogeny. — System. Biol. 51: 673–688, 2002.

    Article  Google Scholar 

  • Jiao, Y., Paterson, A.H.: Polyploidy-associated genome modifications during land plant evolution. — Phil. Trans. Roy. Soc. London B Biol. Sci. 369: 20130355, 2014.

    Article  Google Scholar 

  • Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., Madden, T.L.: NCBI BLAST: a better web interface. — Nucl. Acids Res. 36: W5–W9, 2008.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Katoh, K., Standley, C.M.: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. — Mol. Biol. Evol. 30: 772–780, 2013.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kuraku, S., Feiner, N., Keeley, S.D., Hara, Y.: Incorporating tree-thinking and evolutionary time scale into developmental biology. - Dev. Growth Differentiation 58: 131–142, 2016.

    Article  Google Scholar 

  • Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: Clustal W and Clustal X version 2.0. — Bioinformatics 23: 2947–2948, 2007.

    Article  CAS  PubMed  Google Scholar 

  • Lassmann, T., Frings, O., Sonnhammer, E.L.L.: Kalign2: highperformance multiple alignment of protein and nucleotide sequences allowing external features. — Nucl. Acids Res. 37: 858–865, 2009.

    Article  CAS  PubMed  Google Scholar 

  • Letunic, I., Doerks, T., Bork, P.: SMART: recent updates, new developments and status in 2015. — Nucl. Acids Res. 43: D257–D260, 2015.

    Article  PubMed  Google Scholar 

  • Marchler-Bauer, A., Bryant, S.H: CD-Search: protein domain annotations on the fly. - Nucl. Acids Res. 32: W327–W331, 2004.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Marchler-Bauer, A., Derbyshire, M.K., Gonzales, N.R., Lu, S., Chitsaz, F., Geer, L.Y., Geer, R.C., He, J., Gwadz, M., Hurwitz, D.I., Lanczycki, C.J., Lu, F., Marchler, G.H., Song, J.S., Thanki, N., Wang, Z., Yamashita, R.A., Zhang, D., Zheng, C., Bryant, S.H.: CDD: NCBI's conserved domain database. - Nucl. Acids Res. 43: D222–D226, 2015.

    Article  PubMed  Google Scholar 

  • McGinnis, S., Madden, T.L.: BLAST: at the core of a powerful and diverse set of sequence analysis tools. - Nucl. Acids Res. 32: W20–W25, 2004.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Monaco, M.K., Stein, J., Naithani, S., Wei, S., Dharmawardhana, P., Kumari, S., Amarasinghe, V., Youens-Clark, K., Thomason, J., Preece, J., Pasternak, S., Olson, A., Jiao, Y., Lu, Z., Bolser, D., Kerhornou, A., Staines, D., Walts, B., Wu, G., D'Eustachio, P., Haw, R., Croft, D., Kersey, P.J., Stein, L., Jaiswal, P., Ware, D.: Gramene 2013: comparative plant genomics resources. - Nucl. Acids Res. 42: D1193–D1199, 2014.

    Article  CAS  PubMed  Google Scholar 

  • Moretti, S., Armougom, F., Wallace, I.M., Higgins, D.G., Jongeneel, C.V., Notredame, C.: The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. - Nucl. Acids Res. 35: W645–W648, 2007.

    Article  PubMed  PubMed Central  Google Scholar 

  • Mühlbach H, Schnarrenberger C.: Properties and intracellular distribution of two phosphoglucomutases from spinach leaves. — Planta 141: 65–70, 1978.

    Article  PubMed  Google Scholar 

  • Notredame. C., Higgins, D.G., Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. - J. mol. Biol. 302: 205–217, 2000.

    Article  CAS  PubMed  Google Scholar 

  • O'Halloran, D.: A practical guide to phylogenetics for nonexperts. — J. visual Exp. 84: e50975, 2014.

    Google Scholar 

  • Pais, F.S.M., Ruy, P.C., Oliveira, G., Coimbra, R.S.:. Assessing the efficiency of multiple sequence alignment programs. - Algorithms mol. Biol. 9: 4, 2014.

    Article  PubMed  PubMed Central  Google Scholar 

  • Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. — Bioinformatics 23: 1073–1079, 2007.

    Article  CAS  PubMed  Google Scholar 

  • Pible, O., Armengaud, J.: Improving the quality of genome, protein sequence, and taxonomy databases: a prerequisite for microbiome meta-omics 2.0. — Proteomics 15: 3418–3423, 2015.

    Article  CAS  PubMed  Google Scholar 

  • Rannala, B., Yang, Z.: Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. — J. mol. Evol. 43: 304–311, 1996.

    Article  CAS  PubMed  Google Scholar 

  • Rieppel, O.: The series, the network, and the tree: changing metaphors of order in nature. — Biol. Phil. 25: 475–496, 2010.

    Article  Google Scholar 

  • Sánchez, R., Serra, F., Tárraga, J., Medina, I., Carbonell, J., Pulido, L., de María, A., Capella-Gutíerrez, S., Huerta-Cepas, J., Gabaldón, T., Dopazo, J., Dopazo, H.: Phylemon 2.0: a suite of web-tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing. - Nucl. Acids Res. 39: W470–W474. 2011.

    Article  PubMed  PubMed Central  Google Scholar 

  • Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees.–Mol. Biol. Evol. 4: 406–425, 1987.

    CAS  PubMed  Google Scholar 

  • Schuler, G.D., Altschul, S.F., Lipman, D.J.: A workbench for multiple alignment construction and analysis. — Proteins 9: 180–190, 1991

    Article  CAS  PubMed  Google Scholar 

  • Soltis, D.E., Albert, V.A., Leebens-Mack, J., Bell, C.D., Paterson, A.H., Zheng, C., Sankoff, D., de Pamphilis, C.W., Wall, P.K., Soltis, P.S.: Polyploidy and angiosperm diversification. — Amer. J. Bot. 96: 336–348, 2009.

    Article  Google Scholar 

  • Talavera, G., Castresana, J.: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. — System. Biol. 56: 564–577, 2007.

    Article  CAS  Google Scholar 

  • Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. — Mol. Biol. Evol. 30: 2725–2729, 2013.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wilgenbusch, J.C., Swofford, D.: Inferring evolutionary trees with PAUP*. - Current Protocols Bioinformatics 6: Unit 6.4, 2003.

    Google Scholar 

  • Yuksel, B., Memon, A.R.: Comparative phylogenetic analysis of small GTP-binding genes of model legume plants and assessment of their roles in root nodules. — J. exp. Bot. 59: 3831–3844, 2008.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang, X.C., Wang, Z., Zhang, X., Le, M.H., Sun, J., Xu, D., Cheng, J., Stacey, G.: Evolutionary dynamics of protein domain architecture in plants. — BMC Evol. Biol. 12: 6, 2012.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Żmieńko, A., Samelak, A., Kozłowski, P., Figlerowicz, M.: Copy number polymorphism in plant genomes. — Theor. appl. Genet. 127: 1–18, 2014.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Cvrčková.

Additional information

Acknowledgments: I thank the many generations of students of my Introduction to Bioinformatics undergraduate course for providing continuous feedback that helped to shape the ideas presented here, Anton Markoš, Vojtěch Žárský and Shigehiro Kuraku for critical reading of this manuscript, and the Ministry of Education of the Czech Republic for financial support from the NPUI LO1417 project.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cvrčková, F. A plant biologists’ guide to phylogenetic analysis of biological macromolecule sequences. Biol Plant 60, 619–627 (2016). https://doi.org/10.1007/s10535-016-0649-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10535-016-0649-8

Additional key words

Navigation