A plant biologists’ guide to phylogenetic analysis of biological macromolecule sequences

Cvrčková, F.

doi:10.1007/s10535-016-0649-8

A plant biologists’ guide to phylogenetic analysis of biological macromolecule sequences

Review
Published: 29 September 2016

Volume 60, pages 619–627, (2016)
Cite this article

Biologia Plantarum

F. Cvrčková¹

395 Accesses
1 Citation
Explore all metrics

Abstract

Phylogenetic analysis has become a common step in characterization of gene and protein sequences. However, despite the availability of numerous affordable and more-or-less intuitive software tools, construction of biologically relevant, informative phylogenetic trees remains a process involving several critical steps that are inherently non-algorithmic, i.e., dependent on decisions made by the user. These steps involve, but are not limited to, setting the aims of the phylogenetic study, choosing sequences to be analyzed, and selecting methods employed in sequence alignment construction, as well as algorithms and parameters used to construct the actual phylogenetic tree. This review aims towards providing guidance for these decisions, as well as illustrating common pitfalls and problems occurring during phylogenetic analysis of plant gene sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Guidelines for the Choice of Sequences for Molecular Plant Taxonomy

Molecular Phylogenetics: Concepts for a Newcomer

Abbreviations

BLAST :: basic local alignment search tool
CD-search:: conserved domain search
COBALT :: constraint-based multiple protein alignment tool
DDBJ :: DNA data bank of Japan
ENA :: European nucleotide archive
INSDC :: international nucleotide sequence database collaboration
MACAW :: multiple alignment construction and analysis workbench
MAFFT :: multiple alignment using fast Fourier transform
MEGA :: molecular evolutionary genetics analysis
ML:: maximum likelihood
MUSCLE :: multiple sequence comparison by log-expectation
NCBI :: National Centre for Biotechnology Information
NJ:: neighbor-joining
PAUP :: phylogenetic analysis using parsimony
PHYLIP :: phylogeny inference package
SMART :: simple modular architecture research tool
T-REX :: tree and reticulogram reconstruction

References

Al Ait, L., Yamak, Z., Morgenstern, B.: DIALIGN at GOBICS–multiple sequence alignment using various sources of external information. — Nucl. Acids Res. 41: W3–W7, 2013.
Article PubMed PubMed Central Google Scholar
Baldauf, S.L.: Phylogeny for the faint of heart: a tutorial. — Trends Genet. 19: 345–351, 2003.
Article CAS PubMed Google Scholar
Bateman, A., The uniprot consortium: UniProt: a hub for protein information. - Nucl. Acids Res. 43: D204–D212, 2015.
Article Google Scholar
Baum, D.: Reading a phylogenetic tree: the meaning of monophyletic groups. — Natur. Edu. 1: 190, 2008.
Google Scholar
Blouin, C., Perry, S., Lavell, A., Susko, E., Roger, A.J.: Reproducing the manual annotation of multiple sequence alignments using a SVM classifier. — Bioinformatics 25: 3093–3098, 2009.
Article CAS PubMed PubMed Central Google Scholar
Boc, A., Diallo, A.B., Makarenkov, V.: T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. — Nucl. Acids Res. 40: W573–W579, 2012.
Article PubMed PubMed Central Google Scholar
Capella-Gutierrez, S., Silla-Martinez, J.M., Gabaldon, T.: trimAl: a tool for automated alignment trimming in largescale phylogenetic analyses. - Bioinformatics 25: 1972–1973, 2009.
Article CAS PubMed PubMed Central Google Scholar
Chothia, C., Lesk, A.M.: The relation between the divergence of sequence and structure in proteins. — EMBO J. 5: 823–826, 1986.
CAS PubMed PubMed Central Google Scholar
Cochrane, G., Karsch-Mizrachi, I., Nakamura, Y.: The international nucleotide sequence database collaboration. — Nucl. Acids Res. 39: D15–D18, 2011.
Article CAS PubMed Google Scholar
Criscuolo, A., Gribaldo, S.: BMGE (block mapping and gathering with entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. — BMC Evol. Biol. 10: 210, 2010.
Article PubMed PubMed Central Google Scholar
Cvrčková, F., Grunt, M., Bezvoda, R., Hála, M., Kulich, I., Rawat, A., Žárský, V.: Evolution of the land plant exocyst complexes. — Front. Plant Sci. 3: 159, 2012.
PubMed PubMed Central Google Scholar
Cvrčková, F., Pícková, D., Novotný, M., Žárský, V.: Formin homology 2 domains occur in multiple contexts in angiosperms. — BMC Genomics 5: 44, 2004.
Article PubMed PubMed Central Google Scholar
De Castro E., Sigrist, C.J.A., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P.S., Gasteiger, E., Bairoch, A., Hulo, N.: ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. — Nucl. Acids Res. 34: W362–365, 2006.
Article PubMed PubMed Central Google Scholar
Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.F., Guindon, S., Lefort, V., Lescot, M., Claverie, J.M., Gascuel, O.: Phylogeny.fr: robust phylogenetic analysis for the non-specialist. — Nucl. Acids Res. 36: W465–W469, 2008.
Article CAS PubMed PubMed Central Google Scholar
Douady, C.J., Delsuc, F., Boucher, Y., Doolittle, W.F., Douzery, E.J.: Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. — Mol. Biol. Evol. 20: 248–254, 2003.
Article CAS PubMed Google Scholar
Dvořáková, L., Cvrčková, F., Fischer, L.: Analysis of the hybrid proline-rich protein families from seven plant species suggests rapid diversification of their sequences and expression patterns. — BMC Genomics 8: 412, 2007.
Article PubMed PubMed Central Google Scholar
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. — Nucl. Acids Res. 32: 1792–1797, 2004.
Article CAS PubMed PubMed Central Google Scholar
Egli, B., Kölling, K., Köhler, C., Zeeman, S.C., Streb, S.: Loss of cytosolic phosphoglucomutase compromises gametophyte development in Arabidopsis. — Plant Physiol. 154: 1659–1671, 2010.
Article CAS PubMed PubMed Central Google Scholar
Eliáš, M., Potocký, M., Cvrčková, F. Žárský, V.: Molecular diversity of phospholipase D in angiosperms. — BMC Genomics 3: 2, 2002.
Article PubMed PubMed Central Google Scholar
Felsenstein, J.: PHYLIP - phylogeny inference package (version 3.2). — Cladistics 5: 164–166, 1989.
Google Scholar
Fernandez-Pozo, N., Menda, N., Edwards, J.D., Saha, S., Tecle, I.Y., Strickler, S.R., Bombarely, A., Fisher-York, T., Pujar, A., Foerster, H., Yan, A., Mueller, L.A.: The sol genomics network (SGN)–from genotype to phenotype to breeding. — Nucl. Acids Res. 43: D1036–D1041, 2015.
Article PubMed Google Scholar
Gish, L.A., Clark. S.E.: The RLK/Pelle family of kinases. — Plant J. 66: 117–127, 2011.
Article CAS PubMed PubMed Central Google Scholar
Goldman N.: Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. — System. Biol. 39: 345–361, 1990.
Google Scholar
Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., Rokhsar, D.S.: Phytozome: a comparative platform for green plant genomics. — Nucl. Acids Res. 40: D1178–D186, 2012.
Article CAS PubMed Google Scholar
Grunt, M., Žárský, V., Cvrčková, F.: Roots of angiosperm formins: the evolutionary history of plant FH2 domaincontaining proteins. — BMC Evol. Biol. 8: 115, 2008.
Article PubMed PubMed Central Google Scholar
Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. — System. Biol. 59: 307–321, 2010.
Article CAS Google Scholar
Hall, B.G.: Building phylogenetic trees from molecular data with MEGA. — Mol. Biol. Evol. 30: 1229–1235, 2013.
Article CAS PubMed Google Scholar
Hall, T.: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. — Nucl. Acids Symp. Ser. 41: 95–98, 1999.
CAS Google Scholar
Harrison, C.J., Langdale, J.: A step by step guide to phylogeny reconstruction. — Plant J. 45: 561–572, 2006.
Article CAS PubMed Google Scholar
Higgins, D.G, Sharp, P.M.: CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. — Gene 73: 237–244, 1988.
Article CAS PubMed Google Scholar
Holder, M., Lewis, P.O.: Phylogeny estimation: traditional and Bayesian approaches. — Natur. Rev. Genet. 4: 275–284, 2003.
Article CAS Google Scholar
Howe, C.J., Windram, H.F.: Phylomemetics–evolutionary analysis beyond the gene. — PLoS Biol. 9: e1001069, 2011.
Article Google Scholar
Huelsenbeck, J.P., Larget, B., Miller, R.E., Ronquist, F.: Potential applications and pitfalls of Bayesian inference of phylogeny. — System. Biol. 51: 673–688, 2002.
Article Google Scholar
Jiao, Y., Paterson, A.H.: Polyploidy-associated genome modifications during land plant evolution. — Phil. Trans. Roy. Soc. London B Biol. Sci. 369: 20130355, 2014.
Article Google Scholar
Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., Madden, T.L.: NCBI BLAST: a better web interface. — Nucl. Acids Res. 36: W5–W9, 2008.
Article CAS PubMed PubMed Central Google Scholar
Katoh, K., Standley, C.M.: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. — Mol. Biol. Evol. 30: 772–780, 2013.
Article CAS PubMed PubMed Central Google Scholar
Kuraku, S., Feiner, N., Keeley, S.D., Hara, Y.: Incorporating tree-thinking and evolutionary time scale into developmental biology. - Dev. Growth Differentiation 58: 131–142, 2016.
Article Google Scholar
Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: Clustal W and Clustal X version 2.0. — Bioinformatics 23: 2947–2948, 2007.
Article CAS PubMed Google Scholar
Lassmann, T., Frings, O., Sonnhammer, E.L.L.: Kalign2: highperformance multiple alignment of protein and nucleotide sequences allowing external features. — Nucl. Acids Res. 37: 858–865, 2009.
Article CAS PubMed Google Scholar
Letunic, I., Doerks, T., Bork, P.: SMART: recent updates, new developments and status in 2015. — Nucl. Acids Res. 43: D257–D260, 2015.
Article PubMed Google Scholar
Marchler-Bauer, A., Bryant, S.H: CD-Search: protein domain annotations on the fly. - Nucl. Acids Res. 32: W327–W331, 2004.
Article CAS PubMed PubMed Central Google Scholar
Marchler-Bauer, A., Derbyshire, M.K., Gonzales, N.R., Lu, S., Chitsaz, F., Geer, L.Y., Geer, R.C., He, J., Gwadz, M., Hurwitz, D.I., Lanczycki, C.J., Lu, F., Marchler, G.H., Song, J.S., Thanki, N., Wang, Z., Yamashita, R.A., Zhang, D., Zheng, C., Bryant, S.H.: CDD: NCBI's conserved domain database. - Nucl. Acids Res. 43: D222–D226, 2015.
Article PubMed Google Scholar
McGinnis, S., Madden, T.L.: BLAST: at the core of a powerful and diverse set of sequence analysis tools. - Nucl. Acids Res. 32: W20–W25, 2004.
Article CAS PubMed PubMed Central Google Scholar
Monaco, M.K., Stein, J., Naithani, S., Wei, S., Dharmawardhana, P., Kumari, S., Amarasinghe, V., Youens-Clark, K., Thomason, J., Preece, J., Pasternak, S., Olson, A., Jiao, Y., Lu, Z., Bolser, D., Kerhornou, A., Staines, D., Walts, B., Wu, G., D'Eustachio, P., Haw, R., Croft, D., Kersey, P.J., Stein, L., Jaiswal, P., Ware, D.: Gramene 2013: comparative plant genomics resources. - Nucl. Acids Res. 42: D1193–D1199, 2014.
Article CAS PubMed Google Scholar
Moretti, S., Armougom, F., Wallace, I.M., Higgins, D.G., Jongeneel, C.V., Notredame, C.: The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. - Nucl. Acids Res. 35: W645–W648, 2007.
Article PubMed PubMed Central Google Scholar
Mühlbach H, Schnarrenberger C.: Properties and intracellular distribution of two phosphoglucomutases from spinach leaves. — Planta 141: 65–70, 1978.
Article PubMed Google Scholar
Notredame. C., Higgins, D.G., Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. - J. mol. Biol. 302: 205–217, 2000.
Article CAS PubMed Google Scholar
O'Halloran, D.: A practical guide to phylogenetics for nonexperts. — J. visual Exp. 84: e50975, 2014.
Google Scholar
Pais, F.S.M., Ruy, P.C., Oliveira, G., Coimbra, R.S.:. Assessing the efficiency of multiple sequence alignment programs. - Algorithms mol. Biol. 9: 4, 2014.
Article PubMed PubMed Central Google Scholar
Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. — Bioinformatics 23: 1073–1079, 2007.
Article CAS PubMed Google Scholar
Pible, O., Armengaud, J.: Improving the quality of genome, protein sequence, and taxonomy databases: a prerequisite for microbiome meta-omics 2.0. — Proteomics 15: 3418–3423, 2015.
Article CAS PubMed Google Scholar
Rannala, B., Yang, Z.: Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. — J. mol. Evol. 43: 304–311, 1996.
Article CAS PubMed Google Scholar
Rieppel, O.: The series, the network, and the tree: changing metaphors of order in nature. — Biol. Phil. 25: 475–496, 2010.
Article Google Scholar
Sánchez, R., Serra, F., Tárraga, J., Medina, I., Carbonell, J., Pulido, L., de María, A., Capella-Gutíerrez, S., Huerta-Cepas, J., Gabaldón, T., Dopazo, J., Dopazo, H.: Phylemon 2.0: a suite of web-tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing. - Nucl. Acids Res. 39: W470–W474. 2011.
Article PubMed PubMed Central Google Scholar
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees.–Mol. Biol. Evol. 4: 406–425, 1987.
CAS PubMed Google Scholar
Schuler, G.D., Altschul, S.F., Lipman, D.J.: A workbench for multiple alignment construction and analysis. — Proteins 9: 180–190, 1991
Article CAS PubMed Google Scholar
Soltis, D.E., Albert, V.A., Leebens-Mack, J., Bell, C.D., Paterson, A.H., Zheng, C., Sankoff, D., de Pamphilis, C.W., Wall, P.K., Soltis, P.S.: Polyploidy and angiosperm diversification. — Amer. J. Bot. 96: 336–348, 2009.
Article Google Scholar
Talavera, G., Castresana, J.: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. — System. Biol. 56: 564–577, 2007.
Article CAS Google Scholar
Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. — Mol. Biol. Evol. 30: 2725–2729, 2013.
Article CAS PubMed PubMed Central Google Scholar
Wilgenbusch, J.C., Swofford, D.: Inferring evolutionary trees with PAUP*. - Current Protocols Bioinformatics 6: Unit 6.4, 2003.
Google Scholar
Yuksel, B., Memon, A.R.: Comparative phylogenetic analysis of small GTP-binding genes of model legume plants and assessment of their roles in root nodules. — J. exp. Bot. 59: 3831–3844, 2008.
Article CAS PubMed PubMed Central Google Scholar
Zhang, X.C., Wang, Z., Zhang, X., Le, M.H., Sun, J., Xu, D., Cheng, J., Stacey, G.: Evolutionary dynamics of protein domain architecture in plants. — BMC Evol. Biol. 12: 6, 2012.
Article CAS PubMed PubMed Central Google Scholar
Żmieńko, A., Samelak, A., Kozłowski, P., Figlerowicz, M.: Copy number polymorphism in plant genomes. — Theor. appl. Genet. 127: 1–18, 2014.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Experimental Plant Biology, Faculty of Sciences, Charles University, Viničná 5, CZ 128 43, Prague, Czech Republic
F. Cvrčková

Authors

F. Cvrčková
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to F. Cvrčková.

Additional information

Acknowledgments: I thank the many generations of students of my Introduction to Bioinformatics undergraduate course for providing continuous feedback that helped to shape the ideas presented here, Anton Markoš, Vojtěch Žárský and Shigehiro Kuraku for critical reading of this manuscript, and the Ministry of Education of the Czech Republic for financial support from the NPUI LO1417 project.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cvrčková, F. A plant biologists’ guide to phylogenetic analysis of biological macromolecule sequences. Biol Plant 60, 619–627 (2016). https://doi.org/10.1007/s10535-016-0649-8

Download citation

Received: 09 December 2015
Revised: 10 March 2016
Accepted: 12 April 2016
Published: 29 September 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10535-016-0649-8

Additional key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A plant biologists’ guide to phylogenetic analysis of biological macromolecule sequences

Abstract

Access this article

Similar content being viewed by others

Guidelines for the Choice of Sequences for Molecular Plant Taxonomy

Guidelines for the Choice of Sequences for Molecular Plant Taxonomy

Molecular Phylogenetics: Concepts for a Newcomer

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Additional key words

Navigation

A plant biologists’ guide to phylogenetic analysis of biological macromolecule sequences

Abstract

Access this article

Similar content being viewed by others

Guidelines for the Choice of Sequences for Molecular Plant Taxonomy

Guidelines for the Choice of Sequences for Molecular Plant Taxonomy

Molecular Phylogenetics: Concepts for a Newcomer

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Additional key words

Search

Navigation