Distinguishing Regional from Within-Codon Rate Heterogeneity in DNA Sequence Alignments

  • Alexander V. Mantzaris
  • Dirk Husmeier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5780)


We present an improved phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to (1) recombination and (2) rate heterogeneity. The focus of the present work is on improving the modelling of the latter aspect. Earlier papers have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. This approach fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. We propose an improved model that explicitly distinguishes between these two effects, and we assess its performance on a set of simulated DNA sequence alignments.


Posterior Distribution Markov Chain Monte Carlo Branch Length Codon Position Hide State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Boys, R.J., Henderson, D.A., Wilkinson, D.J.: Detecting homogeneous segments in DNA sequences by using hidden Markov models. Applied Statistics 49, 269–285 (2000)Google Scholar
  2. 2.
    Casella, G., George, E.I.: Explaining the Gibbs sampler. The American Statistician 46(3), 167–174 (1992)Google Scholar
  3. 3.
    Felsenstein, J.: Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27, 401–440 (1978)CrossRefGoogle Scholar
  4. 4.
    Felsenstein, J.: Evolution trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution 17, 368–376 (1981)CrossRefPubMedGoogle Scholar
  5. 5.
    Felsenstein, J., Churchill, G.A.: A hidden Markov model approach to variation among sites in rate of evolution. Molecular Biology and Evolution 13(1), 93–104 (1996)CrossRefPubMedGoogle Scholar
  6. 6.
    Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–472 (1992)CrossRefGoogle Scholar
  7. 7.
    Hasegawa, M., Kishino, H., Yano, T.: Dating the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22, 160–174 (1985)CrossRefPubMedGoogle Scholar
  8. 8.
    Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)CrossRefGoogle Scholar
  9. 9.
    Husmeier, D.: Discriminating between rate heterogeneity and interspecific recombination in dna sequence alignments with phylogenetic factorial hidden Markov models. Bioinformatics 172, ii166–ii172 (2005)Google Scholar
  10. 10.
    Husmeier, D., Mantzaris, A.V.: Addressing the shortcomings of three recent Bayesian methods for detecting interspecific recombination in DNA sequence alignments. Statistical Applications in Genetics and Molecular Biology 7(1), Article 34 (2008)Google Scholar
  11. 11.
    Husmeier, D., McGuire, G.: Detecting recombination in 4-taxa DNA sequence alignments with Bayesian hidden Markov models and Markov chain Monte Carlo. Molecular Biology and Evolution 20(3), 315–337 (2003)CrossRefPubMedGoogle Scholar
  12. 12.
    Lehrach, W.P., Husmeier, D.: Segmenting bacterial and viral DNA sequence alignments with a trans-dimensional phylogenetic factorial hidden Markov model. Applied Statistics 58(3), 307–327 (2009)Google Scholar
  13. 13.
    Minin, V.N., Dorman, K.S., Fang, F., Suchard, M.A.: Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics 21(13), 3034–3042 (2005)CrossRefPubMedGoogle Scholar
  14. 14.
    Suchard, M.A., Weiss, R.E., Dorman, K.S., Sinsheimer, J.S.: Inferring spatial phylogenetic variation along nucleotide sequences: A multiple changepoint model. Journal of the American Statistical Association 98(462), 427–437 (2003)CrossRefGoogle Scholar
  15. 15.
    Tuffley, C., Steel, M.: Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bulletin of Mathematical Biology 59, 581–607 (1997)CrossRefPubMedGoogle Scholar
  16. 16.
    Webb, A., Hancock, J., Holmes, C.: Phylogenetic inference under recombination using Bayesian stochastic topology selection. Bioinformatics 25(2), 197–203 (2009)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Alexander V. Mantzaris
    • 1
  • Dirk Husmeier
    • 1
  1. 1.Biomathematics and Statistics Scotland, JCMBEdinburghUK

Personalised recommendations