A Maximum Likelihood Method for Reconstruction of the Evolution of Eukaryotic Gene Structure

  • Liran Carmel
  • Igor B. Rogozin
  • Yuri I. Wolf
  • Eugene V. Koonin
Part of the Methods in Molecular Biology book series (MIMB, volume 541)


Spliceosomal introns are one of the principal distinctive features of eukaryotes. Nevertheless, different large-scale studies disagree about even the most basic features of their evolution. In order to come up with a more reliable reconstruction of intron evolution, we developed a model that is far more comprehensive than previous ones. This model is rich in parameters, and estimating them accurately is infeasible by straightforward likelihood maximization. Thus, we have developed an expectation-maximization algorithm that allows for efficient maximization. Here, we outline the model and describe the expectation-maximization algorithm in detail. Since the method works with intron presence–absence maps, it is expected to be instrumental for the analysis of the evolution of other binary characters as well.

Key words

Maximum likelihood expectation-maximization intron evolution ancestral reconstruction eukaryotic gene structure 


  1. 1.
    Nixon JE, Wang A, Morrison HG, McArthur AG, Sogin ML, Loftus BJ, Samuelson J. A spliceosomal intron in Giardia lamblia. Proc Natl Acad Sci U S A 2002;99:3359–3361.CrossRefGoogle Scholar
  2. 2.
    Vanacova S, Yan W, Carlton JM, Johnson PJ. Spliceosomal introns in the deep-branching eukaryote Trichomonas vaginalis. Proc Natl Acad Sci U S A 2005;102:4430–4435.PubMedCrossRefGoogle Scholar
  3. 3.
    Simpson AG, MacQuarrie EK, Roger AJ. Early origin of canonical introns. Nature 2002;419:270.PubMedCrossRefGoogle Scholar
  4. 4.
    Collins L, Penny D. Complex spliceosomal organization ancestral to extant eukaryotes. Mol Biol Evol 2005;22:1053–1066.PubMedCrossRefGoogle Scholar
  5. 5.
    Lynch M., Richardson AO. The evolution of spliceosomal introns. Curr Opin Genet Dev 2002;12:701–710.PubMedCrossRefGoogle Scholar
  6. 6.
    Roy SW, Gilbert W. The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet 2006;7:211–221.PubMedGoogle Scholar
  7. 7.
    Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 2003;13:1512–1517.PubMedCrossRefGoogle Scholar
  8. 8.
    Roy SW, Gilbert W. Complex early genes. Proc Natl Acad Sci U S A 2005;102:1986–1991.PubMedCrossRefGoogle Scholar
  9. 9.
    Roy SW, Gilbert W. Rates of intron loss and gain: implications for early eukaryotic evolution. Proc Natl Acad Sci U S A 2005;102:5773–5778.PubMedCrossRefGoogle Scholar
  10. 10.
    Csuros M. Likely scenarios of intron evolution, Lecture Notes in Bioinformatics (McLysaght, A. and Huson, D., editors): Proc. RECOMB 2005 Comparative Genomics International Workshop (RCG 2005) 2005;3678:47–60.Google Scholar
  11. 11.
    Qiu WG, Schisler N, Stoltzfus A. The evolutionary gain of spliceosomal introns: sequence and phase preferences. Mol Biol Evol 2004;21:1252–1263.PubMedCrossRefGoogle Scholar
  12. 12.
    Fedorov A, Roy SW, Fedorova L, Gilbert W. Mystery of intron gain. Genome Res 2003;13:2236–2241.PubMedCrossRefGoogle Scholar
  13. 13.
    Cho S, Jin SW, Cohen A, Ellis RE. A phylogeny of caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res 2004;14:1207–1220.PubMedCrossRefGoogle Scholar
  14. 14.
    Roy SW, Hartl DL. Very little intron loss/gain in Plasmodium: intron loss/gain mutation rates and intron number. Genome Res 2006;16:750–756.PubMedCrossRefGoogle Scholar
  15. 15.
    Jeffares DC, Mourier T, Penny D. The biology of intron gain and loss. Trends Genet 2006;22:16–22.PubMedCrossRefGoogle Scholar
  16. 16.
    Nguyen HD, Yoshihama M, Kenmochi N. New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol 2005;1:e79.PubMedCrossRefGoogle Scholar
  17. 17.
    Nei M, Chakraborty R, Fuerst PA. Infinite allele model with varying mutation rate. Proc Natl Acad Sci U S A 1976;73:4164–4168.PubMedCrossRefGoogle Scholar
  18. 18.
    Uzzell T, Corbin KW. Fitting discrete probability distributions to evolutionary events. Science 1971;172:1089–1096.PubMedCrossRefGoogle Scholar
  19. 19.
    Dibb NJ. Proto-splice site model of intron origin. J Theor Biol 1991;151:405–416.PubMedCrossRefGoogle Scholar
  20. 20.
    Dibb NJ, Newman AJ. Evidence that introns arose at proto-splice sites. Embo J 1989;8:2015–2021.PubMedGoogle Scholar
  21. 21.
    Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV. Reconstruction of ancestral protosplice sites. Curr Biol 2004;14:1505–1508.PubMedCrossRefGoogle Scholar
  22. 22.
    Jordan IM (ed.). Learning in Graphical Models. Kluwer Academic Publishers, Boston, MA, 1998.Google Scholar
  23. 23.
    Jin L, Nei M. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol 1990;7:82–102.PubMedGoogle Scholar
  24. 24.
    Yang Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 1994;39:306–314.PubMedCrossRefGoogle Scholar
  25. 25.
    Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc B 1977;39:1–38.Google Scholar
  26. 26.
    Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 1981;17:368–376.PubMedCrossRefGoogle Scholar
  27. 27.
    Friedman N, Ninio M, Pe’er I, Pupko T. A structural EM algorithm for phylogenetic inference. J Comput Biol 2002;9: 331–353.PubMedCrossRefGoogle Scholar
  28. 28.
    Siepel A, Haussler D. Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol 2004;21:468–488.PubMedCrossRefGoogle Scholar
  29. 29.
    Castillo E, Gutierrez JM, Hadi AS. Expert systems and probabilistic network models (Monographs in Computer Science). Springer, New York, 1996.Google Scholar
  30. 30.
    Press WH, Flannery BP, Teukolsky SA, Vetterling WT. Numerical recipes in C: The art of scientific computing. Cambridge University Press, New York, 2nd ed., 1992.Google Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Liran Carmel
    • 1
  • Igor B. Rogozin
    • 1
  • Yuri I. Wolf
    • 1
  • Eugene V. Koonin
    • 1
  1. 1.National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaUSA

Personalised recommendations