Maximum Likelihood in Molecular Phylogenetics

Xia, Xuhua

doi:10.1007/978-3-319-90684-3_16

Xuhua Xia²

2492 Accesses
2 Citations

Abstract

Maximum likelihood (ML) methods remain the gold standard in molecular phylogenetics. The calculation of likelihood, given a topology and a substitution model, is illustrated with both a brute-force approach and the pruning algorithm which is the most fundamental algorithm in likelihood calculation. The pruning algorithm is also a dynamic programming algorithm. The likelihood calculation is separately presented without and with a molecular clock. While ML is the most robust of all methods in molecular phylogenetics, it may suffer from bias when handling missing data coupled with rate heterogeneity over sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Felsenstein J (1973) Maximum-likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249
Article Google Scholar
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Article CAS PubMed Google Scholar
Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland
Google Scholar
Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20:406–416
Article Google Scholar
Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179
Article CAS PubMed Google Scholar
Sankoff D (1975) Minimal mutation trees of sequences. J SIAM Appl Math 28:35–42
Article Google Scholar
Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16(8):1114–1116
Article CAS Google Scholar
Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728
Article PubMed PubMed Central CAS Google Scholar
Xia X (2014) Phylogenetic bias in the likelihood method caused by missing data coupled with among-site rate variation: an analytical approach. In: Basu M, Pan Y, Wang J (eds) Bioinformatics research and applications. Springer, New York, pp 12–23
Google Scholar
Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43
Article Google Scholar
Zhu C, Byrd RH, Lu P, Nocedal J (1997) Algorithm 778: L-BFGS-B: fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Softw 23(4):550–560
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Ottawa CAREG and Biology Department, Ottawa, ON, Canada
Xuhua Xia

Authors

Xuhua Xia
View author publications
You can also search for this author in PubMed Google Scholar

Postscript

We have covered the maximum likelihood framework in molecular phylogenetics in depth, but this book does not cover the Bayesian approach which extended the likelihood framework to incorporate prior knowledge. The Bayesian framework can not only help us with molecular phylogenetics but also reduce our tendency to develop prejudice and social bias.

Suppose we live in a multiracial society and need to decide whom our family should interact with. We implicitly would want to estimate the proportion of good people (P_good) in a race (or an ethnic group), with “good people” defined as those whom we have pleasant experience interacting with. Naturally one wants to interact with people in a race whose P_good is high and avoid people in a race whose P_good is low.

Now suppose we have interacted with a small number of people, say three, in one race and our experiences are all bad. A likelihood estimate of P_good is then 0 because it is based on data only. If we take this estimated P_good seriously in spite of the small sample size of three, then we become a racist.

With the Bayesian approach, we would first conceive a prior for P_good before any interaction with people of different races. If we are fair-minded, our prior of P_good will be the same for all races to start with. If we are unfortunate to have a bad experience with a member of one race, we would reduce P_good for that race a bit. If our second encounter with people of this race is also bad, then we reduce P_good still further for that race. Eventually these different P_good values for different races constitute our private model of racial differences, and the model, correct or wrong, will affect our behavior.

The model of racial differences thus developed in our mind may be quite different from models in other people’s mind, because different people often interact with different samples from different races. Because few of us could claim to have a representative sample of people to interact with, P_good is almost always biased. However, it may not be as biased as what one gets from a likelihood framework.

In this context of unrepresentative samples from differences, racism, as well as other kinds of prejudices, is almost inevitable. What is important to keep in mind is that much of the differences in P_good among races or ethnic groups are due to historical differences in racial environment. If a little boy is driven by poverty to steal a loaf of bread for his sick and hungry mother, then it is the ruler of the society, not the boy, who is bad. May the joint effort of mankind lead to a monotonic increase in P_good in all races.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xia, X. (2018). Maximum Likelihood in Molecular Phylogenetics. In: Bioinformatics and the Cell. Springer, Cham. https://doi.org/10.1007/978-3-319-90684-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-90684-3_16
Published: 06 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90682-9
Online ISBN: 978-3-319-90684-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Maximum Likelihood in Molecular Phylogenetics

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Postscript

Postscript

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation