Abstract
Maximum likelihood (ML) methods remain the gold standard in molecular phylogenetics. The calculation of likelihood, given a topology and a substitution model, is illustrated with both a brute-force approach and the pruning algorithm which is the most fundamental algorithm in likelihood calculation. The pruning algorithm is also a dynamic programming algorithm. The likelihood calculation is separately presented without and with a molecular clock. While ML is the most robust of all methods in molecular phylogenetics, it may suffer from bias when handling missing data coupled with rate heterogeneity over sites.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Felsenstein J (1973) Maximum-likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland
Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20:406–416
Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179
Sankoff D (1975) Minimal mutation trees of sequences. J SIAM Appl Math 28:35–42
Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16(8):1114–1116
Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728
Xia X (2014) Phylogenetic bias in the likelihood method caused by missing data coupled with among-site rate variation: an analytical approach. In: Basu M, Pan Y, Wang J (eds) Bioinformatics research and applications. Springer, New York, pp 12–23
Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43
Zhu C, Byrd RH, Lu P, Nocedal J (1997) Algorithm 778: L-BFGS-B: fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Softw 23(4):550–560
Author information
Authors and Affiliations
Postscript
Postscript
We have covered the maximum likelihood framework in molecular phylogenetics in depth, but this book does not cover the Bayesian approach which extended the likelihood framework to incorporate prior knowledge. The Bayesian framework can not only help us with molecular phylogenetics but also reduce our tendency to develop prejudice and social bias.
Suppose we live in a multiracial society and need to decide whom our family should interact with. We implicitly would want to estimate the proportion of good people (Pgood) in a race (or an ethnic group), with “good people” defined as those whom we have pleasant experience interacting with. Naturally one wants to interact with people in a race whose Pgood is high and avoid people in a race whose Pgood is low.
Now suppose we have interacted with a small number of people, say three, in one race and our experiences are all bad. A likelihood estimate of Pgood is then 0 because it is based on data only. If we take this estimated Pgood seriously in spite of the small sample size of three, then we become a racist.
With the Bayesian approach, we would first conceive a prior for Pgood before any interaction with people of different races. If we are fair-minded, our prior of Pgood will be the same for all races to start with. If we are unfortunate to have a bad experience with a member of one race, we would reduce Pgood for that race a bit. If our second encounter with people of this race is also bad, then we reduce Pgood still further for that race. Eventually these different Pgood values for different races constitute our private model of racial differences, and the model, correct or wrong, will affect our behavior.
The model of racial differences thus developed in our mind may be quite different from models in other people’s mind, because different people often interact with different samples from different races. Because few of us could claim to have a representative sample of people to interact with, Pgood is almost always biased. However, it may not be as biased as what one gets from a likelihood framework.
In this context of unrepresentative samples from differences, racism, as well as other kinds of prejudices, is almost inevitable. What is important to keep in mind is that much of the differences in Pgood among races or ethnic groups are due to historical differences in racial environment. If a little boy is driven by poverty to steal a loaf of bread for his sick and hungry mother, then it is the ruler of the society, not the boy, who is bad. May the joint effort of mankind lead to a monotonic increase in Pgood in all races.
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media LLC
About this chapter
Cite this chapter
Xia, X. (2018). Maximum Likelihood in Molecular Phylogenetics. In: Bioinformatics and the Cell. Springer, Cham. https://doi.org/10.1007/978-3-319-90684-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-90684-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90682-9
Online ISBN: 978-3-319-90684-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)