# Estimating Phylogenies from Molecular Data

- 3 Citations
- 565 Downloads

## Abstract

Phylogenetic estimation from aligned DNA, RNA or amino acid sequences has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Nowadays, its application fields range from medical research to drug discovery, to epidemiology, to systematics and population dynamics. Estimating phylogenies involves solving an optimization problem, called the phylogenetic estimation problem (PEP), whose versions depend on the criterion used to select a phylogeny among plausible alternatives. This chapter offers an overview of PEP and discuss the most important versions that occur in the literature.

## Keywords

Edge Weight Evolutionary Distance Internal Vertex Molecular Sequence Maximal Lyapunov Exponent## 8.1 Introduction

*taxa*, by means of molecular data such as DNA, RNA, amino acid or codon sequences. These relationships are usually described through a weighted tree, called a

*phylogeny*(Fig. 8.1), whose

*leaves*represent the observed taxa,

*internal vertices*represent the intermediate ancestors,

*edges*represent the estimated evolutionary relationships and

*edge weights*represent measures of the similarity between pairs of taxa.

Phylogenies provide a fundamental information in analysis of many fine-scale genetic data; for this reason, the use of molecular phylogenetics has become more and more frequent (and sometimes indispensable) in several research fields such as systematics, medical research, drug discovery, epidemiology and population dynamics [56]. For example, the use of molecular phylogenetics was of considerable assistance to predict the evolution of human influenza A [8], to understand the relationships between the virulence and the genetic evolution of HIV [55, 66], to identify emerging viruses as SARS [51], to recreate and investigate ancestral proteins [17], to design neuropeptides causing smooth muscle contraction [2] and to relate geographic patterns to macroevolutionary processes [36].

Since no one could observe evolution over thousands or millions of years, a part from known phylogenies ([57]), there is no general way to validate empirically a candidate phylogeny for a set of molecular sequences extracted from taxa. For this reason, the literature proposes a number of criteria for selecting one phylogeny from among plausible alternatives. Each criterion adopts its own set of evolutionary hypotheses, whose ability to describe evolution of taxa determines the gap between the *real*and the *true phylogeny*, i.e., the gap between the real evolutionary process of taxa and the phylogeny that one would obtain under the same set of hypotheses if all molecular data from taxa were available [9].

The criteria of phylogenetic estimation can usually be quantified and expressed in terms of objective functions, giving rise to families of optimization problems whose general paradigm can be stated as follows:

### Problem 8.1.

where *Γ*is the set of molecular sequences from *n*taxa, *T*a phylogeny of *Γ*, \(\mathcal{T}\)the set of (2*n*− 5)! ! = 1 ×3 ×5 ×7⋯ ×2*n*− 5 phylogenies of *Γ*, \(f : \mathcal{T} \rightarrow \mathbb{R}\)a function modeling the selected criterion of phylogenetic estimation, and \(g : \Gamma \times \mathcal{T} \rightarrow \mathbb{R}\)a function correlating the set *Γ*to a phylogeny *T*.

A specific optimization problem, or *phylogenetic estimation paradigm*, is completely characterized by defining the functions *f*and *g*. The phylogeny *T* ^{∗}that optimizes *f*and satisfies *g*is referred to as *optimal*, and if *T* ^{∗}approaches the true phylogeny as the amount of molecular data from taxa increases, the corresponding criterion is said to be *statistically consistent*[32]. The statistical consistency is a desirable property in molecular phylogenetics because it measures the ability of a criterion to recover the true (and hopefully the real) phylogeny of the given molecular data. Later in this chapter, we will show that the consistency property changes from criterion to criterion and in some cases may be even absent.

Here, we provide a review of the main estimation criteria that occur in the literature on molecular phylogenetics. Particular emphasis is given to the comparative description of the hypotheses at the core of each criterion and to the optimization aspects related to the phylogenetic estimation paradigms. In Sect. 8.2, we discuss the problem of measuring the similarity among molecular sequences. In Sect. 8.3, we discuss the fundamental least-squares paradigm and formalize the concept of phylogeny. In Sect. 8.4, we present the minimum evolution paradigm by evidencing the recent perspectives and computational advances. Finally, in Sect. 8.5we present the likelihood and the bayesian paradigms by exposing briefly their benefits and drawbacks.

## 8.2 Measuring Molecular Similarity

The degree of similarity between pairwise molecular sequences reflects the amount of mutation events that occurred since they split from their common ancestor. Quantifying such similarity constitutes the first step in the phylogenetic estimation process [11]. The task involves the investigation and the modeling of the *mutation process*over time, i.e., the process by which errors occur in molecular data and are inherited between generations.

Different types of mutation may occur in the genome structure, most of which are point mutations, i.e., changes that involve the replacement, or *substitution*, of one nucleotide for another in the DNA sequence. Point mutations can be classified in two categories: the transitions and the transversions. The transitions occur when a purine nucleotide (adenine or guanine) is substituted for another purine, or when a pyrimidine (cytosine or thymine) is substituted for another pyrimidine. The transversions occur when a pyrimidine is substituted for a purine, or vice versa.

A second class of point mutations are those that lead to *insertions*and *deletions*of nucleotides in the genome. This phenomenon mainly occurs in non-coding regions of DNA, but may interest also coding regions of the genome and be the cause of deleterious effects [57].

Finally, a third class of mutations are those that involve entire chromosome regions of the genome. Specifically, we may have: (1) a *duplication*, when a chromosome region is duplicated; (2) a *translocation*, when a chromosome region is transferred into another chromosome; (3) an *inversion*, when a chromosome region is broken off, turned upside down and reconnected; (4) a *deletion*, when a chromosome region is missing or deleted; (5) and a *loss of heterozygosity*, e.g., when two instances of the same chromosome break and then reconnect but to the different end pieces [57].

Modeling the second and the third classes of mutations is generally non-trivial and requires advanced mathematical background. We refer the interested reader to Felsenstein [29] for an introduction and to Park and Deem [58] for recent advances in the modeling of such classes. Here we shall focus on the first class of mutations and present a fundamental model of molecular evolution which is at the core of the most currently used criteria of phylogenetic estimation. Unless otherwise stated, throughout the chapter we will always assume that the molecular sequences under study have been previously subjected to an *alignment process*, i.e., a process through which the evolutionary relationships between nucleotides of molecular data are evidenced (see [60] for details).

### 8.2.1 The Time Homogeneous Markov Model of Molecular Evolution

*S*be a DNA sequence, i.e., a string of fixed length over an alphabet

*Υ*= {

*A*,

*C*,

*G*,

*T*}, where “A” codes for adenine, “C” for cytosine, “G” for guanine, and “T” for thymine. Let

*r*

_{ ij }≥ 0,

*i*≠

*j*, be the constant rate of substitution from nucleotide

*i*to nucleotide

*j*. Assume that each character (site) of

*S*evolves independently over time and that, instant per instant, the

*Markov conservative hypothesis*[39] holds, i.e.,

*p*

_{ ij }(

*t*) be the probability that nucleotide

*i*undergoes to a substitution to nucleotide

*j*at finite time

*t*. Then, if the superposition principle holds, at

*t*+ d

*t*such probability can be written as:

*p*

_{ ij }(

*t*) in both sides of (8.2) and dividing for d

*t*we obtain:

### 8.2.2 Estimating Evolutionary Distances from Molecular Data

Two molecular sequences *S* _{1}and *S* _{2}, evolving at time *t* _{0}from a common ancestor, could be characterized at time *t*by different amounts of substitution events, some of which not directly observable. Hence, if we would sample the sequences at time *t*and measure their similarity, or *evolutionary distance*, in terms of number of observed differences, we could underestimate the overall substitution events that occurred since *S* _{1}and *S* _{2}split from their common ancestor. A number of authors suggested that the use of the time homogeneous Markov models could overcome the underestimation problem in all those cases in which the hypotheses at the core of the model would properly describe the real evolutionary process of the analyzed sequences [29]. Moreover, in order to compare the evolutionary distances of different pairs of molecular sequences, the authors also proposed to express the evolutionary distances in terms of expected number of substitution events per site rather than the time necessary to transform a sequence into another [29]. In this section, we will present the most general formula currently known in the literature to compute the evolutionary distance from pairwise molecular sequences. To this aim, we shall investigate now the dynamics of the THM model.

*Ω*is the eigenvector matrix of

**R**, and

*Λ*is the diagonal matrix of the eigenvalues of

**R**. This fact suggests that the spectrum of

**P**(

*t*) is the exponential spectrum of

**R**, i.e., the dynamics of

**P**(

*t*) is univocally determined from the knowledge of the spectrum of

**R**[84].

**R**is equal to zero, i.e., at least one of its eigenvalues is identically zero. Moreover, since any

*k*-leading principal sub-matrix of

**R**,

*k*< 4, has negative determinant, for one of the Sylvester corollaries (see [6, p. 409]) all the remaining eigenvalues are negative. Thus, as the spectrum of

**P**(

*t*) is the exponential spectrum of

**R**, matrix

**P**(

*t*) has at least one eigenvalue equal to 1, called

*the maximal Lyapunov exponent*, and three eigenvalues lying in the interval [0, 1]. The maximal Lyapunov exponent prevents the presence of chaotic attractors and guarantees that, as

*t*goes to infinity, the generic entry

*p*

_{ ij }(

*t*) is non-zero and independent on the starting state

*i*∈

*Υ*. In other words, the maximal Lyapunov exponent guarantees the existence of four positive values π

_{ A }, π

_{ C }, π

_{ G }, and π

_{ T }, called

*equilibrium frequencies*, such that

_{ j }constitute a

*stationary distribution*and turn out to be useful to measure the evolutionary distance between

*S*

_{1}and

*S*

_{2}. In fact, denote

**O**(

*t*) as a matrix whose generic entry

*o*

_{ ij }(

*t*),

*i*,

*j*∈

*Υ*, represents the probability that at a given site and time

*t*,

*S*

_{1}is characterized by nucleotide

*i*and

*S*

_{2}by nucleotide

*j*. Assume that

**O**(0) =

*Π*, where

*Π*denotes a diagonal matrix whose

*j*th diagonal entry is π

_{ j }. Then it holds that:

**P**

*′*(

*t*) denotes the transpose of

**P**(

*t*). Premultiplying for

*Π*

^{− 1}both sides of (8.6) we have

*f*(

**ABA**

^{− 1}) =

**A**

*f*(

**B**)

**A**

^{− 1}, we have

*time-reversibility*holds, i.e.:

*Π*

^{− 1}

**R**

*′tΠ*and

**R**

*t*are commutative, and (8.7) becomes:

*Π*, we obtain

*tΠ*

**R**represents the expected number of substitution events per site between

*S*

_{1}and

*S*

_{2}, at time

*t*the evolutionary distance \({d}_{{S}_{1},{S}_{2}}\)between

*S*

_{1}and

*S*

_{2}can be computed as:

*i*undergoes a substitution to nucleotide

*j*would be equal to the probability that a nucleotide

*j*undergoes a substitution to nucleotide

*i*. Thus, given a present-day molecular sequence and its ancestral sequence, it would be impossible to determine which sequence is the present and which is the ancestral one. Hence, the hypothesis of time-reversibility removes the temporality from the evolutionary process. We shall show in the next sections how the paradigms of phylogenetic estimation take advantage of this fact. Below, we provide an example from [79] showing a possible application of (8.9).

#### 8.2.2.1 Estimating Evolutionary Distances from Molecular Data: A Practical Example

**O**(

*t*) and

*Π*are respectively

*Π*

^{− 1}

**O**(

*t*) is:

*Π*

^{− 1}

**O**(

*t*)) is:

*Π*log(

*Π*

^{− 1}

**O**(

*t*)) is:

*d*= −

*tr*[

*Π*log(

*Π*

^{− 1}

**O**(

*t*))] = 0. 09152.

The reader interested in more sophisticated applications of the GTR distance will find useful examples in Lanave et al. [48], Rodriguez et al. [63], and Cantanzaro et al. [10, 11].

## 8.3 The Least-Squares Paradigm of Phylogenetic Estimation

A paradigm of phylogenetic estimation is a quantitative criterion used to discern a phylogeny from among plausible alternatives. One of the earliest paradigms was introduced by Cavalli-Sforza and Edwards [15] and is known as *the additive model*or the *the least-squares model*of phylogenetic estimation [9].

In general, we do not have a sampling of such a process over time but only the knowledge of the living taxa. Hence, in absence of further information, one may be able only to reconstruct the projection of the process onto the “now” plane rather than the process itself. Note that altough the evolutionary process over time is “directed,” its projection is not (Fig. 8.2). Thus, when the projection is considered, the direction of evolution is definitely missed. Nevertheless, the projection of the evolutionary process constitutes still an important piece of information for the analyzed taxa; for this reason, Cavalli-Sforza and Edwards proposed a possible paradigm to recover it.

*n*taxa, the degree constraint implies that the number of edges in a phylogeny

*T*is (2

*n*− 3) and the number of internal vertices is (

*n*− 2). To prove the claim note that as

*T*is a tree, it holds that:

*ℰ*

_{ e }(

*T*) and

*ℰ*

_{ i }(

*T*) are the set of external and internal edges of

*T*, respectively. Moreover, since internal vertices have degree three, the following property holds:

*V*

_{ i }| = (

*n*− 2) and |

*ℰ*

_{ i }| = (

*n*− 3). Thus, a phylogeny \(T \in \mathcal{T}\)can be seen as an unrooted binary tree in which the

*n*taxa are the

*n*leaves of

*T*and the common ancestors are internal vertices of degree three. It is worth noting that dealing with unrooted binary trees does not introduces oversimplifications since it is easy to see that any

*m*-ary tree can be transformed into a phylogeny by adding “dummy” vertices and edges (e.g., see Fig. 8.3).

*Edge–Path incidence matrix of a Tree*(EPT) (see [53, p. 550]) i.e., a network matrix

**X**having a row for each path between two leaves and a column for each edge. The generic entry

*x*

_{ rs,e }of matrix

**X**is equal to 1 if edge

*e*belongs to the path

*p*

_{ rs }from leaf

*r*to leaf

*s*and 0 otherwise. As an example, Fig. 8.4b shows the EPT matrix corresponding to the phylogeny shown in Fig. 8.4a. Hence, the authors proposed a model in which each evolutionary distance

*d*

_{ rs },

*r*,

*s*∈

*Γ*, among pairwise molecular data could be thought of as the resulting sum of mutation events accumulated on edges belonging to the path

*p*

_{ rs }linking taxa

*r*and

*s*on

**X**. In other words, fixed a phylogeny

**X**and defined

*w*

_{ e }as the amount of mutation events on edge

*e*, Cavalli-Sforza and Edwards asserted that:

**w**= {

*w*

_{ e }} is the edge weight vector associated with

**X**, and

**D**

^{△}is a

*n*(

*n*− 1) ∕ 2 vector whose components are obtained by taking row by row the entries of the strictly upper triangular matrix

**D**= {

*d*

_{ rs }}. In general, for a fixed matrix

**X**, (8.12) may not admit solutions; for this reason, the authors proposed the use of the ordinary least-squares (OLS) to find the entries of vector

**w**. Specifically, the authors suggested that the values \({\rho }_{\mathit{rs}} ={ \sum \nolimits }_{e\in {p}_{\mathit{rs}}}{x}_{\mathit{rs,e}}{w}_{e}\)should minimize the function,

**X**

^{†}is the Moore–Penrose pseudo-inverse matrix of

**X**. Thus, Cavalli-Sforza and Edwards’ paradigm of phylogenetic estimation may be stated in terms of the following NP-hard convex optimization problem [22]:

### Problem 8.2.

### 8.3.1 Modified Least-Squares Paradigms of Phylogenetic Estimation

A number of authors proposed some modifications to Cavalli-Sforza and Edwards’ model. Specifically, Fitch and Margoliash [31] observed that OLSP implicitly considers the evolutionary distances *d* _{ rs }among pairwise molecular data as uniformly distributed independent random variables, a hypothesis that cannot be considered generally true due to the common evolutionary history of the analyzed taxa and the presence of sampling errors in molecular data. Hence, Fitch and Margoliash proposed to modify Cavalli-Sforza and Edwards’ paradigm by introducing the quantities ω_{ rs }representing the variances of *d* _{ rs }. They set ω_{ rs }= 1 ∕ *d* _{ rs } ^{2}, *r*, *s*∈ *Γ*, and stated the following paradigm:

### Problem 8.3.

Later, Chakraborty [16] and Hasegawa et al. [38] proposed a very similar paradigm, called the generalized least-squares problem (GLSP), in which the variances ω_{ rs }are replaced by the covariances of *d* _{ rs }. Nowadays, GLSP has fallen into disuse due to its statistical inconsistency problems [9].

### 8.3.2 Drawbacks of the Least-Squares Paradigms of Phylogenetic Estimation

*tree metric*, i.e., a phylogeny whose edge weights are non-negative [73, 80], whenever the distance matrix

**D**satisfies the

*ultrametric property*

*additive property*

**D**is ultrametric or additive, the solution of Problem 8.2is unique and obtainable in polynomial time through the UPGMA greedy algorithm [74] or the sequential algorithm [80], respectively.

Unfortunately, when **D**is generic (e.g., when it is obtained by means of the THM model, see Sect. 8.2), the least-squares paradigm may lead to the occurrence of negative entries in the vector **w**, i.e., to a phylogeny that is not a tree metric [32, 47]. Negative edge weights are infeasible both from a conceptual point of view (a distance, being an expected number of mutation events over time, cannot be negative [45]) and from a biological point of view (evolution cannot proceed backwards [57, 77]). For the latter reason at least, non-tree metric phylogenies are generally not accepted in molecular phylogenetics [35].

In response, some authors investigated the consequences of adding or guaranteeing the positivity constraint of edge weights in the least-squares paradigm.

Gascuel and Levy [33] observed that the presence of the positivity constraint transforms any least-square model into a non-negative linear regression problem which involves projecting the distance matrix **D**onto the positive cone defined by the set of tree metrics (see also [5, p. 187]). Thus, the authors designed an iterative polynomial time algorithm able to generate a sequence of least-squares projections of **D**onto such a set until an additive distance matrix (and the corresponding phylogeny) is obtained.

Farach et al. [26] proposed an alternative approach to impose the positivity constraint. Specifically, the authors proposed to find the minimal perturbation of the distance matrix **D**that guarantees the satisfaction of the additive or the ultrametric property. Farach et al. [26] proposed the *ℒ* _{ ∞ }-norm and *ℒ* _{1}-norm to constraint the entries of **D**to satisfy the additive (ultrametric) property, and proved that such a problem can be solved in polynomial time when **D**is required to be ultrametric under the *ℒ* _{ ∞ }-norm. By contrast, the authors proved that their approaches become hard when an ultrametric or an additive distance matrix is required under the *ℒ* _{1}-norm.

Finally, Barthélemy and Guénoche [3] and Makarenkov and Leclerc [50] proposed a Lagrangian relaxation of the positivity constraint to guarantee metric trees. Both algorithms are iterative and apply to the OLSP and the WLSP, respectively. Specifically, starting from a leaf, the algorithms generate a phylogeny with a growing number of leaves by solving an optimization problem in which the best non-negative edge weights that minimize the OLSP (respectively the WLSP) are found. Both algorithms are polynomial time and characterized by a computational complexity of *O*(*n* ^{4}) and *O*(*n* ^{5}), respectively. FITCH, was also proposed by Felsenstein [27].

A second and possibly more serious drawback of the least-squares is the statistical inconsistency of some paradigms. Specifically, a part from the OLSP which proves to be statistically consistent [23, 68], the only case in which the WLSP is known to be consistent, is when the variances ω_{ rs }are set to the inverse of the product of two strictly positive constants α_{ i }and α_{ j }. By contrast the GLSP is generally inconsistent [35].

## 8.4 The Minimum Evolution Paradigm of Phylogenetic Estimation

Kidd and Sgaramella-Zonta [45] and Beyer et al. [4] independently proposed an alternative paradigm known as *the minimum evolution problem*or *the minimum evolution paradigm*of phylogenetic estimation [9].

The minimum evolution paradigm arises from Cavalli-Sforza and Edwards’ model but mainly differs for the way in which a phylogeny is chosen from among possible alternatives. In fact, the minimum evolution criterion states that if the evolutionary distances *d* _{ rs }were unbiased estimates of the *true evolutionary distances*(i.e., the distances that one would obtain if all the molecular data from the analyzed taxa were available), then the true phylogeny would have an expected length shorter than any other possible phylogeny compatible with **D**. Hence, the minimum evolution paradigm aims at finding the phylogeny whose sum of edge weights, estimated from the corresponding evolutionary distances, is minimum [9].

It is worth noting that the minimum evolution criterion does not asses that molecular evolution follows minimum paths, but states, according to classical evolutionary theory, that a minimum length phylogeny may properly approximate the real phylogeny of well-conserved molecular data, i.e., data whose basic biochemical function has undergone small change throughout the evolution of the observed taxa [4]. That evolution proceeds by small rather than smallest changes is due to the fact that the neighborhood of possible alleles that are selected at each instant of the life of a taxon is finite, and perhaps more important, the selective forces acting on the taxon may not be constant throughout its evolution [4, 80]. Over the long term (periods of environmental change, including the intracellular environment), small changes** will not generally provide the smallest change. Thus, a minimum length phylogeny provides a lower bound on the total number of mutation events that could have occurred along evolution of the observed taxa.

Different versions of the minimum evolution paradigm are discussed in the literature on phylogenetics, and each one is characterized by its own edge weight estimation model [9]. Specifically, we can distinguish between the least-squares edge weight estimation model [24, 68, 69] and the linear programming edge weight estimation model [4, 14, 80]. In the next sections, we shall analyze both families in detail.

### 8.4.1 The Minimum Evolution Paradigm Under the Least-Squares Edge Weight Estimation Model

### Problem 8.4.

**X**, some edges can be weighted more than others. Since there is no biological justification for that, Pauplin proposed a new paradigm in which all edges of a phylogeny were weighted in the same way. The resulting objective function does not depend explicitly on edge weights and can be stated as follows:

_{ rs }is called

*the topological distance*and denotes the number of edges belonging to the path between taxa

*r*and

*s*in a phylogeny

*T*[9]. Hence, BME can be stated in terms of the following optimization problem:

### Problem 8.5.

### 8.4.2 The Minimum Evolution Paradigm Under the Linear Programming Edge Weight Estimation Model

An alternative model to estimate edge weights in the minimum evolution paradigm is provided by linear programming. The model was introduced by Beyer et al. [4] and is based on the following motivation: if the evolutionary distances between pairs of molecular data have to reflect the number of mutation events required to convert one molecular sequence into another over time, then they must satisfy the triangle inequality. Moreover, since any edge weight of a phylogeny is de facto an evolutionary distance, also the entries of vector **w**must satisfy the triangle inequality. This last observation imposes that for each path *p* _{ rs }from taxa *r*and *s*in **X**, the constraint \({\sum \nolimits }_{e\in {p}_{\mathit{rs}}}{w}_{e}{x}_{\mathit{rs,e}} \geq {d}_{\mathit{rs}}\)is satisfied. Hence, Beyer et al. [4] proposed a possible paradigm of phylogenetic estimation consisting of solving the following mixed integer programming model:

### Problem 8.6.

MELP is a well-known APX-hard problem [26] for which the current exact algorithms described in the literature provide solutions to instances containing not more than a dozen taxa [14]. To the best of our knowledge, nothing is known about the statistical consistency of MELP.

### 8.4.3 Drawbacks of the Minimum Evolution Paradigm of Phylogenetic Estimation

There are mainly two drawbacks that affect the minimum evolution paradigm of phylogenetic estimation: the “rigidity” of its criterion and the hardness of its paradigms.

**I**denotes the identity matrix. The use of the integral (8.15) could prove unpractical for an empirical use. However, note that (8.15) can be approximated through the Peano–Baker sequence

**P**(0,

*t*) when

*k*→

*∞*[18]. Hence, under a non-homogeneous Markov model, the substitution probability matrix could be easily computed by means of iterative procedures that appropriately approximate (8.15).

Concerning the second drawback, it is easy to realize that the NP-hardness of the minimum evolution paradigms constitutes a big handicap for the development of exact solution approaches of practical use. Exact approaches are necessary to guarantee the optimality of a given solution and fundamental to investigate whether the hypotheses at the core of a criterion are well suited to describe the evolutionary process of the observed taxa. At present, most molecular datasets involve hundreds of taxa, whereas the current exact solution approaches have difficulty to tackle instances containing more than two dozen taxa (even smaller for the linear programming paradigm). Increasing the size of the datasets analyzable to the optimum is possibly one of the most challenging problems in molecular phylogenetics and warrants for sure further research efforts.

## 8.5 The Likelihood Paradigm of Phylogenetic Estimation

One of the most used criteria of phylogenetic estimation is the *likelihood criterion*. First formalized by Felsenstein [28], the likelihood criterion states that under many plausible explanations of an observed phenomenon, the one having the highest probability of occurring should be preferred to the others. When the likelihood criterion is applied to phylogenetic estimation, a phylogeny is defined to be optimal (or the most likely) if it has the highest probability of explaining the observed taxa. Thus, the likelihood paradigm consists of finding the phylogeny that maximizes a stochastic function, called *the likelihood function*, modeling a set of evolutionary hypotheses of the observed taxa.

The fundamental difference that distinguishes the likelihood paradigm from the least-squares and the minimum evolution paradigms is the nature of the information that it aims at finding. Specifically, if the least-squares and the minimum evolution paradigms aim at finding the best possible approximation of the projection of the evolutionary process of the observed taxa, the likelihood paradigm aims at reconstructing the most likely evolutionary process that originated the observed taxa. Hence, if the phylogeny of the least-squares and the minimum evolution paradigms is an unrooted binary tree, the phylogeny of the likelihood paradigm is a rooted phylogeny, i.e., full binary tree having (2*n*− 1) vertices.

*T*, a model of molecular evolution

*M*and an

*observed data matrix*

**S**= {

*s*

_{ rc }}, i.e., a matrix whose

*r*th row represents the molecular sequence of the

*r*-th taxon. Defined the quantity

*r*of

*T*, each column

*c*of

**S**and each

*i*∈

*Υ*, and the quantity

*v*of

*T*having

*v*

_{1}and

*v*

_{2}as children, the likelihood function

*L*(

*T*,

**S**,

*M*) of

*T*can be defined as

*T*. In the context of the likelihood paradigm, the expected numbers of substitutions per site

*t*

_{ v }

_{ h },

*v*

_{ k }assume the analogous meaning of edge weights in the least-squares and minimum evolution paradigms. Hence, when a given model of molecular evolution is assumed to hold (e.g., the THM model), finding the most likely phylogeny for a set of molecular sequences means maximizing the nonlinear (usually) non-convex stochastic function

*L*(

*T*,

**S**,

*M*) over all the possible rooted phylogenies, and for each rooted phylogeny, over all the possible associated edge weights

*t*

_{ v }

_{ h },

*v*

_{ k }and substitution probabilities

*p*

_{ ij }(

*t*

_{ v }

_{ h },

*v*

_{ k }).

The NP-hardness of the likelihood paradigm [62] justified the development of a number of approximate solution approaches typically based on hill climbing strategies. Specifically, the strategies consist of a first phase in which the structure of a best-so-far phylogeny is modified and a second phase in which the nonlinear optimization of edge weights and the substitution probabilities is performed. The two phases are consecutively iterated until a stopping criterion is satisfied (e.g., the number of iterations performed or the elapsed time) [7, 28, 64]. A systematic review of the hill climbing strategies for the likelihood paradigm is out of the scope of the present chapter and can be found in Bryant et al. [7].

Recent mathematical advances on the likelihood paradigm led to overcome several limitations of the initial Felsenstein’s model, such as the absence of a rate variation among sites [81] and the absence of correlated evolution among sites [61]. Moreover, several progresses have been done concerning the analysis of its statistical consistency and its *idenfiability*, i.e., the study of the conditions under which the likelihood function is at least injective, an aspect markably related to its consistency [7]. The reader may find useful to refer to Gascuel [32] and Gascuel and Steel [34] for an overview of these aspects.

### 8.5.1 The Bayesian Paradigm of Phylogenetic Estimation

Given a dataset of molecular sequences, suppose we have sufficient empirical evidence to assert that the evolution of the observed taxa followed a specific stochastic process. Then, we could try to combine this a priori information with the likelihood function in order to bias the search of the most probable phylogeny through those solutions that fit the known evolutionary process. This idea is at the core of the most recent likelihood-derived paradigm of phylogenetic estimation, called the *bayesian paradigm*, and will be briefly described in this section.

*prior distributions*, which mainly concern three parameters, namely: the

*topology*, i.e., the structure of the phylogeny, edge weights and the substitution probabilities. Defined

*T*), γ(

*t*), and γ(

*R*), to model the a priori information on \(\mathcal{T}\),

*Θ*, and

*ℛ*, respectively. Selected an appropriate model of molecular evolution

*M*, the prior distributions are then combined with the likelihood function to provide a

*posterior density functionB*(

*T*,

**S**,

*M*) that represents the probability distribution of phylogenies conditional on the observed data matrix

**S**, the model

*M*and the priors distributions γ(

*T*), γ(

*t*), and γ(

*R*). Maximizing

*B*(

*T*,

**S**,

*M*) is the goal of the bayesian paradigm.

*T*

_{ i }and denoted

*t*

_{ i }and

*R*

_{ i }the corresponding subspaces of edge weights and substitution probabilities, the mathematical expression of the posterior probability

*B*(

*T*

_{ i },

**S**,

*M*) of

*T*

_{ i }can be written as:

*T*

_{ i }) denotes the prior probability of

*T*

_{ i }, and

*L*

_{∫}(

*T*

_{ i },

**S**,

*M*) denotes the integral of the likelihood function

*L*(

*T*

_{ i },

**S**,

*M*) over all possible edge weights and substitution probabilities [41], i.e.,

*T*

_{ i }, the associated edge weights and the substitution probabilities that globally maximize the posterior probability distribution of phylogenies

*B*(

*T*,

**S**,

*M*). Since finding the maximum a posteriori phylogeny implicitly implies being able to solve the likelihood paradigm, solving the bayesian paradigm is NP-hard [29].

The recursive nature of the likelihood function and the intractability of computing the denominator of Bayes’ theorem prevent an analytical approach to solution of the bayesian paradigm. Hence, the maximum a posteriori phylogeny is usually computed by means of a Markov chain Monte Carlo (MCMC) algorithm [30], i.e., an algorithm that samples *B*(*T*, **S**, *M*) through a stochastic generation of phylogenies in \(\mathcal{T}\)([49, 52, 83]). Sampling *B*(*T*, **S**, *M*) is extremely time consuming; therefore, the bayesian estimations may take even weeks [42]. However, as observed by Yang [82] and Huelsenbeck et al. [41, 43], the sampling process has also the indisputable benefit of providing a measure of the reliability of the best-so-far solution found. In fact, by sampling stochastically around the (best local) maximum a posteriori phylogeny *T* ^{∗}, the bayesian paradigm could determine support values for the subtrees of *T* ^{∗}, i.e., measures of the posterior probability that the subtrees are true.

The bayesian paradigm is possibly the most complex among the phylogenetic estimation paradigms currently available in the literature on molecular phylogenetics. The recent computational advances obtained by Ronquist and Huelsenbeck [65] speeded up the execution of the MCMC algorithm and widened the use of the bayesian paradigm. However, the lack of a systematic investigation of its** statistical consistency and the unclear dependence of the posterior density function on the a priori information [82] possibly make the bayesian paradigm still unripe for phylogenetic estimation [1].

### 8.5.2 Drawbacks of the Likelihood and the Bayesian Paradigms of Phylogenetic Estimation

The higher the complexity of a paradigm, the higher the number of draw-backs that could arise, and the likelihood and the bayesian paradigms do not escape the rule. Specifically, a number of computational and theoretical drawbacks affect the two paradigms. The computational drawbacks mainly involve (i) the optimization aspects of the likelihood function and (ii) the sampling process in the bayesian paradigm. The theoretical drawbacks concern the evolutionary hypotheses at the core of the likelihood and bayesian criteria.

As regards to the computational drawbacks, in Sect. 8.5we have seen that finding the most likely phylogeny for a set of taxa involves maximizing a nonlinear and generally non-convex stochastic function over all the possible phylogenies in \(\mathcal{T}\), and for each phylogeny, over all the possible edge weights and substitution probabilities. Notoriously, this task can be only performed in an approximate way, due to a lack of general mathematical conditions that guarantee the global optimality of a solution in nonlinear non-convex programming [21, 54]. Hence, although it is possible (at least for small datasets) to enumerate all the possible phylogenies in \(\mathcal{T}\), it is not possible to optimize globally edge weights and the substitution probabilities of a fixed phylogeny *T*. This fact may affect negatively the statistical consistency of the likelihood and the bayesian paradigms. In fact, the local optima of the likelihood function grows up exponentially in function of the number of taxa considered [7, 19, 20]. Thus, fixed a phylogeny *T*, the global optimum of the likelihood function is generally approximated by means of hill-climbing techniques that jump from local optimum to another one until a stopping criterion is satisfied (e.g., the number of iterations performed or the elapsed time) [7, 28, 64]. Assume that two phylogenies *T* _{1}and *T* _{2}are given, and let μ_{1}and μ_{2}be two vectors whose entries are edge weights and the substitution probabilities associated to *T* _{1}and *T* _{2}, respectively. Let *z* _{1}and *z* _{2}, the likelihood values of *T* _{1}and *T* _{2}for μ_{1}and μ_{2}, respectively, and assume, without loss of generality, that *z* _{1}> *z* _{2}. Due to the local nature of the optima μ_{1}and μ_{2}, there could exists another local optimum, say \(\hat{{\mu }}_{2}\), such that \(\hat{{z}}_{2} >{z}_{1} >{z}_{2}\). If the hill-climbing algorithm finds \(\hat{{\mu }}_{2}\)before μ_{2}, then we will consider *T* _{2}as a better phylogeny than *T* _{1}, otherwise we will discard *T* _{2}in favor of *T* _{1}. Hence, it is easy to realize that if one of the two phylogenies is the true phylogeny, its acceptance is subordinated to the goodness of the hill-climbing algorithm used to optimize the likelihood function, and as a result the statistical consistency of the likelihood and bayesian paradigms may be seriously compromised.

Some authors argued that multiple local optima should arise infrequently in real datasets [64], but this conjecture was proved false by Bryant et at. [7] and Catanzaro et al. [12]. Specifically, Bryant et al. [7] observed that changing the model of molecular evolution influences the presence of multiple optima in the likelihood function, and Catanzaro et al. [12] showed a number of real datasets affected by strong multimodality of the likelihood function. Despite the importance of the topic, to the best of our knowledge nobody was able to propose a plausible solution to this critical aspect.

A second computational drawback concerns the sampling process of the bayesian paradigm. In fact, as shown in Sect. 8.5.1, the approximation of the posterior density function is generally performed by means of a MCMC algorithm (e.g., the Metropolis or the Gibbs sampling algorithm [30]) that performs random walks in \(\mathcal{T}\). The random walk should be sufficiently diversified to sample potentially the whole \(\mathcal{T}\)and avoid double backs (i.e., to sample phylogenies already visited). Unfortunately, despite the recent computational advances in the bayesian paradigm [65], no technique may guarantee a sufficient diversification of the sampling process. Hence, the convergence to the maximum a posteriori phylogeny in practice becomes the convergence to the best-so-far a posteriori phylogeny that can be arbitrarily distinct from the true phylogeny (see [29, p. 296]).

As regards to the theoretical drawbacks, it is worth noting that the evolutionary hypotheses at the core of the likelihood and bayesian criteria of phylogenetic estimation are at the same time their strength and their weakness. For example, if a proposed model of molecular evolution matches (at least roughly) the real evolutionary process of a set of molecular data, then the likelihood and the bayesian paradigms could succeed in recovering the real phylogeny of the corresponding set of taxa (provided a solution to their computational drawbacks). However, if it is not the case, the paradigms will just provide a (sub)optimal solution for that model that may completely mismatch the real phylogeny. This aspect becomes evident e.g., in Rydin and Källersjö [67]’s article where, for a same dataset, two different Markov model of molecular evolution are used and two different maximum posterior phylogenies are obtained both having the 100% posterior probability of supporting the true phylogeny. concerns in general all the paradigms discussed in this chapter and possibly there is no easy solution for it.

Finally, a second theoretical drawback concerns the prior distributions of the bayesian paradigm. In fact, it is worth noting that if on one hand a strength of the bayesian paradigm is the ability to incorporate the a priori information, on the other hand this information is rarely available, hence in practical applications the prior distributions are generally modeled as uniform distributions, frustrating the potential strengths of the paradigm [1]. Moreover, it is unclear what type of information is well suited for a prior distribution; how possible conflicts among different sets of a priori information can be resolved; and if the inclusion of prior distributions strongly bias the estimation process. Huelsenbeck et al. [43] vaguely claimed “in a typical Bayesian analysis of phylogeny, the results are likely to be rather insensitive to the prior,” but this results was not confirmed by Yang [82] who observed that “[...] the posterior probabilities of trees vary widely over simulated datasets [...] and can be unduly influenced by the prior [...].” Possibly, further research efforts are needed to provide answers to these practical concerns.

## 8.6 Conclusion

The success of a criterion of phylogenetic estimation is undoubtedly influenced by the quality of the evolutionary hypotheses at its core. If the hypotheses match (at least roughly) the real evolutionary process of a set of taxa, then the criterion will hopefully succeed in recovering the real phylogeny. Otherwise, the criterion will miserably fail, by suggesting an optimal phylogeny that mismatch partially or totally the correct result. Since we are far away from a complete understanding of the complex facets of evolution, it is not generally possible to assess the superiority of a criterion over others. Hence, families of estimation criteria cohabit in the literature of molecular phylogenetics, by providing different perspectives about the evolutionary process of the involved taxa.

In this chapter, we have presented a general introduction of the existing literature about molecular phylogenetics. Our purpose has been to introduce a classification scheme in order to provide a general framework for papers appearing in this area. In particular, three main criteria of phylogenetic estimation have been outlined, the first based on the least-squares paradigm, first proposed by Cavalli-Sforza and Edwards [15], the second based on the minimum evolution paradigm, independently proposed by Kidd and Sgaramella-Zonta [45] and Beyer et al. [4], and the third based on the likelihood paradigm, first proposed by Felsenstein [28]. This division has been further disaggregated into different, approximately homogeneous sub-areas, and the basic aspects of each have been pointed out. For each, also, the most relevant issues affecting their use in tackling real-world sized problems have been outlined, as have the most interesting refinements deserving further research effort.

## Notes

### Acknowledgment

Daniele Catanzaro acknowledges support from the Belgian National Fund for Scientific Research (F.N.R.S.) of which he is “Chargé de Recherches.” Raffaele Pesenti and the anonymous reviewers for their valuable comments on previous versions of the manuscript. Finally, thanks to Prof. Mike Steel and Dr. Rosa Maria Lo Presti for helpful and exciting discussions.

## References

- 1.J. K. Archibald, M. E. Mort, and D. J. Crawford. Bayesian inference of phylogeny: A non-technical primer.
*Taxon*, 52:187–191, 2003CrossRefGoogle Scholar - 2.D. A. Bader, B. M. E. Moret, and L. Vawter. Industrial applications of high-performance computing for phylogeny reconstruction. In
*SPIE ITCom: Commercial application for high-performance computing*, pages 159–168. SPIE, WA, 2001Google Scholar - 3.J. P. Barthélemy and A. Guénoche.
*Trees and proximity representations*. Wiley, NY, 1991Google Scholar - 4.W. A. Beyer, M. Stein, T. Smith, and S. Ulam. A molecular sequence metric and evolutionary trees.
*Mathematical Biosciences*, 19:9–25, 1974CrossRefGoogle Scholar - 5.Å. Björck.
*Numerical methods for least-squares problems*. SIAM, PA, 1996CrossRefGoogle Scholar - 6.J. Brinkhuis and V. Tikhomirov.
*Optimization: Insights and applications*. Princeton University Press, NJ, 2005Google Scholar - 7.D. Bryant, N. Galtier, and M. A. Poursat. Likelihood calculation in molecular phylogenetics. In O. Gascuel, editor,
*Mathematics of evolution and phylogeny*. Oxford University Press, NY, 2005Google Scholar - 8.R. M. Bush, C. A. Bender, K. Subbarao, N. J. Cox, and W. M. Fitch. Predicting the evolution of human influenza A.
*Science*, 286(5446):1921–1925, 1999PubMedCrossRefGoogle Scholar - 9.D. Catanzaro. The minimum evolution problem: Overview and classification.
*Networks*, 53(2): 112–125, 2009CrossRefGoogle Scholar - 10.D. Catanzaro, L. Gatto, and M. Milinkovitch. Assessing the applicability of the GTR nucleotide substitution model through simulations.
*Evolutionary Bioinformatics*, 2:145–155, 2006Google Scholar - 11.D. Catanzaro, R. Pesenti, and M. Milinkovitch. A non-linear optimization procedure to estimate distances and instantaneous substitution rate matrices under the GTR model.
*Bioinformatics*, 22(6):708–715, 2006PubMedCrossRefGoogle Scholar - 12.D. Catanzaro, R. Pesenti, and M. C. Milinkovitch. A very large-scale neighborhood search to estimate phylogenies under the maximum likelihood criterion. Technical report, G.O.M. – Computer Science Department – Université Libre de Bruxelles (U.L.B.), 2007Google Scholar
- 13.D. Catanzaro, M. Labbé, R. Pesenti, and J. J. Salazar-Gonzalez. The balanced minimum evolution problem. Technical report, G.O.M. – Computer Science Department – Université Libre de Bruxelles (U.L.B.), 2009Google Scholar
- 14.D. Catanzaro, M. Labbé, R. Pesenti, and J. J. Salazar-Gonzalez. Mathematical models to reconstruct phylogenetic trees under the minimum evolution criterion.
*Networks*, 53(2):126–140, 2009CrossRefGoogle Scholar - 15.L. L. Cavalli-Sforza and A. W. F. Edwards. Phylogenetic analysis: Models and estimation procedures.
*American Journal of Human Genetics*, 19:233–257, 1967PubMedGoogle Scholar - 16.R. Chakraborty. Estimation of time of divergence from phylogenetic studies.
*Canadian Journal of Genetics and Cytology*, 19:217–223, 1977Google Scholar - 17.B. S. W. Chang and M. J. Donoghue. Recreating ancestral proteins.
*Trends in Ecology and Evolution*, 15(3):109–114, 2000PubMedCrossRefGoogle Scholar - 18.L. Chisci.
*Sistemi Dinamici – Parte I*. Pitagora, Italy, 2001Google Scholar - 19.B. Chor, M. D. Hendy, B. R. Holland, and D. Penny. Multiple maxima of likelihood in phylogenetic trees: An analytic approach.
*Molecular Biology and Evolution*, 17(10):1529–1541, 2000PubMedCrossRefGoogle Scholar - 20.B. Chor, M. D. Hendy, and S. Snir. Maximum likelihood jukes-cantor triplets: Analytic solutions.
*Molecular Biology and Evolution*, 23(3):626–632, 2005PubMedCrossRefGoogle Scholar - 21.A. R. Conn, N. I. M. Gould, and P. L. Toint.
*Trust-region methods*. SIAM, PA, 2000CrossRefGoogle Scholar - 22.W. H. E. Day. Computational complexity of inferring phylogenies from dissimilarity matrices.
*Bulletin of Mathematical Biology*, 49:461–467, 1987PubMedGoogle Scholar - 23.F. Denis and O. Gascuel. On the consistency of the minimum evolution principle of phylogenetic inference.
*Discrete Applied Mathematics*, 127:66–77, 2003CrossRefGoogle Scholar - 24.R. Desper and O. Gascuel. Fast and accurate phylogeny reconstruction algorithms based on the minimum evolution principle.
*Journal of Computational Biology*, 9(5):687–705, 2002PubMedCrossRefGoogle Scholar - 25.R. Desper and O. Gascuel. Theoretical foundations of the balanced minimum evolution method of phylogenetic inference and its relationship to the weighted least-squares tree fitting.
*Molecular Biology and Evolution*, 21(3):587–598, 2004PubMedCrossRefGoogle Scholar - 26.M. Farach, S. Kannan, and T. Warnow. A robust model for finding optimal evolutionary trees.
*Algorithmica*, 13:155–179, 1995CrossRefGoogle Scholar - 27.J. Felsenstein. An alternating least-squares approach to inferring phylogenies from pairwise distances.
*Systematic Biology*, 46:101–111, 1997PubMedCrossRefGoogle Scholar - 28.J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach.
*Journal of Molecular Evolution*, 17:368–376, 1981PubMedCrossRefGoogle Scholar - 29.J. Felsenstein.
*Inferring phylogenies*. Sinauer Associates, MA, 2004Google Scholar - 30.G. S. Fishman.
*Monte Carlo: Concepts, algorithms, and applications*. Springer, NY, 1996Google Scholar - 31.W. M. Fitch and E. Margoliash. Construction of phylogenetic trees.
*Science*, 155:279–284, 1967PubMedCrossRefGoogle Scholar - 32.O. Gascuel.
*Mathematics of evolution and phylogeny*. Oxford University Press, NY, 2005Google Scholar - 33.O. Gascuel and D. Levy. A reduction algorithm for approximating a (non-metric) dissimilarity by a tree distance.
*Journal of Classification*, 13:129–155, 1996CrossRefGoogle Scholar - 34.O. Gascuel and M. A. Steel.
*Reconstructing evolution*. Oxford University Press, NY, 2007Google Scholar - 35.O. Gascuel, D. Bryant, and F. Denis. Strengths and limitations of the minimum evolution principle.
*Systematic Biology*, 50:621–627, 2001PubMedCrossRefGoogle Scholar - 36.P. H. Harvey, A. J. L. Brown, J. M. Smith, and S. Nee.
*New uses for new phylogenies*. Oxford University Press, Oxford, 1996Google Scholar - 37.M. Hasegawa, H. Kishino, and T. Yano. Evolutionary trees from DNA sequences: A maximum likelihood approach.
*Journal of Molecular Evolution*, 17:368–376, 1981CrossRefGoogle Scholar - 38.M. Hasegawa, H. Kishino, and T. Yano. Dating the human-ape splitting by a molecular clock of mitochondrial DNA.
*Journal of Molecular Evolution*, 22:160–174, 1985PubMedCrossRefGoogle Scholar - 39.D. P. Heyman and M. J. Sobel, editors. Stochastic models, volume 2 of
*Handbooks in operations research and management science*. North-Holland, Amsterdam, 1990Google Scholar - 40.S. Horai, Y. Sattah, K. Hayasaka, R. Kondo, T. Inoue, T. Ishida, S. Hayashi, and N. Takahata. Man’s place in the hominoidea revealed by mitochondrial DNA genealogy.
*Journal of Molecular Evolution*, 35:32–43, 1992PubMedCrossRefGoogle Scholar - 41.J. P. Huelsenbeck, B. Larget, P. van der Mark, and F. Ronquist. MrBayes: Bayesian inference of phylogenetic trees.
*Bioinformatics*, 17(8):754–755, 2001PubMedCrossRefGoogle Scholar - 42.J. P. Huelsenbeck, F. Ronquist, R. Nielsen, and J. P. Bollback. Bayesian inference of phylogeny and its impact on evolutionary biology.
*Science*, 294:2310–2314, 2001PubMedCrossRefGoogle Scholar - 43.J. P. Huelsenbeck, B. Larget, R. E. Miller, and F. Ronquist. Potential applications and pitfalls of bayesian inference of phylogeny.
*Systematic Biology*, 51:673–688, 2002PubMedCrossRefGoogle Scholar - 44.T. H. Jukes and C.R. Cantor. Evolution of protein molecules. In H. N. Munro, editor,
*Mammalian protein metabolism*, pages 21–123. Academic Press, NY, 1969Google Scholar - 45.K. K. Kidd and L. A. Sgaramella-Zonta. Phylogenetic analysis: Concepts and methods.
*American Journal of Human Genetics*, 23:235–252, 1971PubMedGoogle Scholar - 46.M. Kimura. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.
*Journal of Molecular Evolution*, 16:111–120, 1980PubMedCrossRefGoogle Scholar - 47.M. K. Kuhner and J. Felsenstein. A simulation comparison of phylogeny algorithms under equal and unequal rates.
*Molecular Biology and Evolution*, 11(3):584–593, 1994Google Scholar - 48.C. Lanave, G. Preparata, C. Saccone, and G. Serio. A new method for calculating evolutionary substitution rates.
*Journal of Molecular Evolution*, 20:86–93, 1984PubMedCrossRefGoogle Scholar - 49.S. Li, D. Pearl, and H. Doss. Phylogenetic tree construction using Markov chain Monte Carlo.
*Journal of the American Statistical Association*, 95:493–508, 2000CrossRefGoogle Scholar - 50.V. Makarenkov and B. Leclerc. An algorithm for the fitting of a tree metric according to a weighted least-squares criterion.
*Journal of Classification*, 16:3–26, 1999CrossRefGoogle Scholar - 51.M. A. Marra, S. J. Jones, C. R. Astell, R. A. Holt, A. Brooks-Wilson, Y. S. Butterfield, J. Khattra, J. K. Asano, S. A. Barber, S. Y. Chan, A. Cloutier, S. M. Coughlin, D. Freeman, N. Girn, O. L. Griffith, S. R. Leach, M. Mayo, H. McDonald, S. B. Montgomery, P. K. Pandoh, A. S. Petrescu, A. G. Robertson, J. E. Schein, A. Siddiqui, D. E. Smailus, J. M. Stott, G. S. Yang, F. Plummer, A. Andonov, H. Artsob, N. Bastien, K. Bernard, T. F. Booth, D. Bowness, M. Czub, M. Drebot, L. Fernando, R. Flick, M. Garbutt, M. Gray, A. Grolla, S. Jones, H. Feldmann, A. Meyers, A. Kabani, Y. Li, S. Normand, U. Stroher, G. A. Tipples, S. Tyler, R. Vogrig, D. Ward, B. Watson, R. C. Brunham, M. Krajden, M. Petric, D. M. Skowronski, C. Upton, and R. L. Roper. The genome sequence of the SARS-associated coronavirus.
*Science*, 300(5624):1399–1404, 2003PubMedCrossRefGoogle Scholar - 52.B. Mau and M. A. Newton. Phylogenetic inference for binary data on dendograms using Markov chain Monte Carlo.
*Journal of Computational and Graphical Statistics*, 6:122–131, 1997Google Scholar - 53.G. L. Nemhauser and L. A. Wolsey.
*Integer and combinatorial optimization*. Wiley-Interscience, NY, 1999Google Scholar - 54.G. L. Nemhauser, A. H. G. Rinnooy Kan, and M. J. Tod, editors. Optimization, volume 1 of
*Handbooks in operations research and management science*. North-Holland, Amsterdam, 1989Google Scholar - 55.C. Y. Ou, C. A. Ciesielski, G. Myers, C. I. Bandea, C. C. Luo, B. T. M. Korber, J. I. Mullins, G. Schochetman, R. L. Berkelman, A. N. Economou, J. J. Witte, L. J. Furman, G. A. Satten, K. A. Maclnnes, J. W. Curran, and H. W. Jaffe. Molecular epidemiology of HIV transmission in a dental practice.
*Science*, 256(5060):1165–1171, 1992PubMedCrossRefGoogle Scholar - 56.L. Pachter and B. Sturmfels. The mathematics of phylogenomics.
*SIAM Review*, 49(1):3–31, 2007CrossRefGoogle Scholar - 57.R. D. M. Page and E. C. Holmes.
*Molecular evolution: A phylogenetic approach*. Blackwell Science, Oxford, 1998Google Scholar - 58.J. M. Park and M. W. Deem. Phase diagrams of quasispecies theory with recombination and horizontal gene transfer.
*Physical Review Letters*, 98:058101–058104, 2007PubMedCrossRefGoogle Scholar - 59.Y. Pauplin. Direct calculation of a tree length using a distance matrix.
*Journal of Molecular Evolution*, 51:41–47, 2000PubMedGoogle Scholar - 60.P. A. Pevzner.
*Computational molecular biology*. MIT, MA, 2000Google Scholar - 61.D. D. Pollock, W. R. Taylor, and N. Goldman. Coevolving protein residues: Maximum likelihood identification and relationship to structure.
*Journal of Molecular Biology*, 287(1): 187–198, 1999PubMedCrossRefGoogle Scholar - 62.S. Roch. A short proof that phylogenetic tree reconstruction by maximum likelihood is hard.
*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, 3(1):92–94, 2006PubMedCrossRefGoogle Scholar - 63.F. Rodriguez, J. L. Oliver, A. Marin, and J. R. Medina. The general stochastic model of nucleotide substitution.
*Journal of Theoretical Biology*, 142:485–501, 1990PubMedCrossRefGoogle Scholar - 64.J. S. Rogers and D. Swofford. Multiple local maxima for likelihoods of phylogenetic trees from nucleotide sequences.
*Molecular Biology and Evolution*, 16:1079–1085, 1999PubMedCrossRefGoogle Scholar - 65.F. Ronquist and J. P. Huelsenbeck. MrBayes 3: Bayesian phylogenetic inference under mixed models.
*Bioinformatics*, 19(12):1572–1574, 2003PubMedCrossRefGoogle Scholar - 66.H. A. Ross and A. G. Rodrigo. Immune-mediated positive selection drives human immunodeficency virus type 1 molecular variation and predicts disease duration.
*Journal of Virology*, 76(22):11715–11720, 2002PubMedCrossRefGoogle Scholar - 67.C. Rydin and M. Källersjö. Taxon sampling and seed plant phylogeny.
*Cladistics*, 18:485–513, 2002Google Scholar - 68.A. Rzhetsky and M. Nei. Theoretical foundations of the minimum evolution method of phylogenetic inference.
*Molecular Biology and Evolution*, 10:1073–1095, 1993PubMedGoogle Scholar - 69.A. Rzhetsky and M. Nei. Statistical properties of the ordinary least-squares generalized least-squares and minimum evolution methods of phylogenetic inference.
*Journal of Molecular Evolution*, 35:367–375, 1992PubMedCrossRefGoogle Scholar - 70.N. Saitou and M. Nei. The neighbor-joining method: A new method for reconstructing phylogenetic trees.
*Molecular Biology and Evolution*, 4:406–425, 1987PubMedGoogle Scholar - 71.E. Schadt and K. Lange. Codon and rate variation models in molecular phylogeny.
*Molecular Biology and Evolution*, 19(9):1534–1549, 2002PubMedCrossRefGoogle Scholar - 72.E. Schadt and K. Lange. Applications of codon and rate variation models in molecular phylogeny.
*Molecular Biology and Evolution*, 19(9):1550–1562, 2002PubMedCrossRefGoogle Scholar - 73.C. Semple and M. A. Steel.
*Phylogenetics*. Oxford University Press, NY, 2003Google Scholar - 74.P. H. A. Sneath and R. R. Sokal.
*Numerical taxonomy*. W. K. Freeman and Company, CA, 1963Google Scholar - 75.J. A. Studier and K. J. Keppler. A note on the neighbor-joining algorithm of Saitou and Nei.
*Molecular Biology and Evolution*, 5:729–731, 1988PubMedGoogle Scholar - 76.D. L. Swofford.
*PAUP* version 4.0*. Sinauer Associates, MA, 1997Google Scholar - 77.D. L. Swofford, G. J. Olsen, P. J. Waddell, and D. M. Hillis. Phylogenetic inference. In D. M. Hillis, C. Moritz, and B. K. Mable, editors,
*Molecular systematics*, pages 407–514. Sinauer Associates, MA, 1996Google Scholar - 78.K. Tamura and M. Nei. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.
*Molecular Biology and Evolution*, 10(3):512–526, 1993PubMedGoogle Scholar - 79.P. J. Waddell and M. A. Steel. General time-reversible distances with unequal rates across sites: Mixing gamma and inverse gaussian distributions with invariant sites.
*Molecular Phylogenetics and Evolution*, 8:398–414, 1997PubMedCrossRefGoogle Scholar - 80.M. S. Waterman, T. F. Smith, M. Singh, and W. A. Beyer. Additive evolutionary trees.
*Journal of Theoretical Biology*, 64:199–213, 1977PubMedCrossRefGoogle Scholar - 81.Z. Yang. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods.
*Journal of Molecular Evolution*, 39:306–314, 1994PubMedCrossRefGoogle Scholar - 82.Z. Yang. Bayesian inference in molecular phylogenetics. In O. Gascuel, editor,
*Mathematics of evolution and phylogeny*. Oxford University Press, NY, 2005Google Scholar - 83.Z. Yang and B. Rannala. Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method.
*Molecular Biology and Evolution*, 14:717–724, 1997PubMedCrossRefGoogle Scholar - 84.L. A. Zadeh and C. A. Desoer.
*Linear system theory*. McGraw-Hill, NY, 1963Google Scholar