Bayesian Phylogenetic Inference under a Statistical Insertion-Deletion Model

  • Gerton Lunter
  • István Miklós
  • Alexei Drummond
  • Jens Ledet Jensen
  • Jotun Hein
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2812)


A central problem in computational biology is the inference of phylogeny given a set of DNA or protein sequences. Currently, this problem is tackled stepwise, with phylogenetic reconstruction dependent on an initial multiple sequence alignment step. However these two steps are fundamentally interdependent. Whether the main interest is in sequence alignment or phylogeny, a major goal of computational biology is the co-estimation of both. Here we present a first step towards this goal by developing an extension of the Felsenstein peeling algorithm. Given an alignment, our extension analytically integrates out both substitution and insertion–deletion events within a proper statistical model. This new algorithm provides a solution to two important problems in computational biology. Firstly, indel events become informative for phylogenetic reconstruction, and secondly phylogenetic uncertainty can be included in the estimation of insertion-deletion parameters. We illustrate the practicality of this algorithm within a Bayesian Markov chain Monte Carlo framework by demonstrating it on a non-trivial analysis of a multiple alignment of ten globin protein sequences.


Homology Structure Homology Class Data Augmentation Statistical Alignment Indel Event 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Drummond, A.J., Nicholls, G.K., Rodrigo, A.G., Solomon, W.: Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161(3), 1307–1320 (2002)Google Scholar
  2. 2.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)zbMATHCrossRefGoogle Scholar
  3. 3.
    Eddy, S.: HMMER: Profile hidden Markov models for biological sequence analysis (2001),
  4. 4.
    Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981)CrossRefGoogle Scholar
  5. 5.
    Felsenstein, J.: Estimating effective population size from samples of sequences: Inefficiency of pairwise and segregating sites as compared to phylogenetic estimates. Genetical Research Cambridge 59, 139–147 (1992)CrossRefGoogle Scholar
  6. 6.
    Felsenstein, J.: PHYLIP version 3.5c. Dept. of Genetics, Univ. of Washington, Seattle (1993)Google Scholar
  7. 7.
    Griffiths, R.C., Tavare, S.: Ancestral inference in population genetics. Statistical Science 9, 307–319 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Hedges, S.B., Poling, L.L.: A molecular phylogeny of reptiles. Science 283(5404), 945–946 (1999)CrossRefGoogle Scholar
  9. 9.
    Hein, J.: An algorithm for statistical alignment of sequences related by a binary tree. In: Pac. Symp. Biocomp., pp. 179–190. World Scientific, Singapore (2001)Google Scholar
  10. 10.
    Hein, J., Jensen, J.L., Pedersen, C.N.S.: Recursions for statistical multiple alignment. Technical Report 425, Dept. of Theor. Stat., Univ. of Aarhus (January 2002)Google Scholar
  11. 11.
    Hein, J., Wiuf, C., Knudsen, B., Møller, M.B., Wibling, G.: Statistical alignment: Computational properties, homology testing and goodness-of-fit. J. Mol. Biol. 302, 265–279 (2000)CrossRefGoogle Scholar
  12. 12.
    Holmes, I., Bruno, W.J.: Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17(9), 803–820 (2001)CrossRefGoogle Scholar
  13. 13.
    Huelsenbeck, J.P., Ronquist, F.: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics (2001)Google Scholar
  14. 14.
    Jensen, J.L., Hein, J.: Gibbs sampler for statistical multiple alignment. Technical Report 429, Dept. of Theor. Stat., U. Aarhus (September 2002)Google Scholar
  15. 15.
    Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro (ed.) Mammalian Protein Metabolism, pp. 21–132. Acad. Press, New York (1969)Google Scholar
  16. 16.
    Kuhner, M.K., Yamato, J., Felsenstein, J.: Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140(4), 1421–1430 (1995)Google Scholar
  17. 17.
    Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  18. 18.
    Lunter, G.A., Miklós, I., Song, Y.S., Hein, J.: An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J. Comp. Biol. (2003) (in press)Google Scholar
  19. 19.
    Miklós, I.: An improved algorithm for statistical alignment of sequences related by a star tree. Bul. Math. Biol. 64, 771–779 (2002)CrossRefGoogle Scholar
  20. 20.
    Miklós, I., Lunter, G.A., Holmes, I.: A ”long indel” model for evolutionary sequence alignment (in preparation)Google Scholar
  21. 21.
    Pybus, O.G., Drummond, A.J., Nakano, T., Robertson, B.H., Rambaut, A.: The epidemiology and iatrogenic transmission of hepatitis c virus in Egypt: a Bayesian coalescent approach. Mol Biol Evol 20(3), 381–387 (2003)CrossRefGoogle Scholar
  22. 22.
    Pybus, O.G., Rambaut, A., Harvey, P.H.: An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155(3), 1429–1437 (2000)Google Scholar
  23. 23.
    Steel, M., Hein, J.: Applying the Thorne-Kishino-Felsenstein model to sequence evolution on a star-shaped tree. Appl. Math. Let. 14, 679–684 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Stephens, M., Donnelly, P.: Inference in molecular population genetics. J. of the Royal Stat. Soc. B 62, 605–655 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Swofford, D.: Paup* 4.0. Sinauer Associates (2001)Google Scholar
  26. 26.
    Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114–124 (1991)CrossRefGoogle Scholar
  27. 27.
    Whelan, S., Lió, P., Goldman, N.: Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends in Gen. 17, 262–272 (2001)CrossRefGoogle Scholar
  28. 28.
    Wilson, J., Balding, D.J.: Genealogical inference from microsatellite data. Genetics (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Gerton Lunter
    • 1
  • István Miklós
    • 1
  • Alexei Drummond
    • 1
  • Jens Ledet Jensen
    • 2
  • Jotun Hein
    • 1
  1. 1.Department of StatisticsUniversity of OxfordOxfordUnited Kingdom
  2. 2.Department of Mathematical SciencesUniversity of AarhusAarhus CDenmark

Personalised recommendations