Co-evolutionary Models for Reconstructing Ancestral Genomic Sequences: Computational Issues and Biological Examples

  • Tamir Tuller
  • Hadas Birin
  • Martin Kupiec
  • Eytan Ruppin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5817)


The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content.

In this work we formally define a computational problem for the inference of ancestral genome content by co-evolution. We show that this problem is NP-hard and present both a Fixed Parameter Tractable (FPT) algorithm, and heuristic approximation algorithms for solving it. The running time of these algorithms on simulated inputs with hundreds of protein families and hundreds of co-evolutionary relations was fast (up to four minutes) and it achieved an approximation ratio < 1.3.

We use our approach to study the ancestral genome content of the Fungi. To this end, we implement our approach on a dataset of 33,931 protein families and 20,317 co-evolutionary relations. Our algorithm added and removed hundreds of proteins from the ancestral genomes inferred by maximum likelihood (ML) or maximum parsimony (MP) while slightly affecting the likelihood/parsimony score of the results. A biological analysis revealed various pieces of evidence that support the biological plausibility of the new solutions.


Co-evolution reconstruction of ancestral genomes maximum parsimony maximum likelihood 


  1. 1.
    Barry, D., Hartigan, J.: Statistical analysis of humanoid molecular evolution. Stat. Sci. 2, 191–210 (1987)CrossRefGoogle Scholar
  2. 2.
    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statistical Society 57(1), 289–300 (1995)Google Scholar
  3. 3.
    Blanchette, M., et al.: Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res. 14, 2412–2423 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Cai, W., et al.: Reconstruction of ancestral protein sequences and its applications. BMC Evolutionary Biology 4, e33 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Chor, B., et al.: Multiple maxima of likelihood in phylogenetic trees: An analytic approach. Mol. Biol. Evol. 17(10), 1529–1541 (2000)CrossRefPubMedGoogle Scholar
  6. 6.
    Cohen, O., et al.: A likelihood framework to analyse phyletic patterns. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 363(1512), 3903–3911 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Csurös, M., Miklós, I.: Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model. Mol. Biol. Evol. (2009)Google Scholar
  8. 8.
    Elias, I., Tuller, T.: Reconstruction of ancestral genomic sequences using likelihood. J. Comput. Biol. 14(2), 216–237 (2007)CrossRefPubMedGoogle Scholar
  9. 9.
    Barker, D., et al.: Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes. Bioinformatics 23(1), 14–20 (2007)CrossRefPubMedGoogle Scholar
  10. 10.
    Pazos, F., et al.: Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 271, 511–523 (1997)CrossRefPubMedGoogle Scholar
  11. 11.
    Wapinski, I., et al.: Natural history and evolutionary principles of gene duplication in fungi. Nature 449, 54–65 (2007)CrossRefPubMedGoogle Scholar
  12. 12.
    Wu, J., et al.: Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19, 1524–1530 (2003)CrossRefPubMedGoogle Scholar
  13. 13.
    Marino-Ramirez, L., et al.: Co-evolutionary rates of functionally related yeast genes. Evol. Bioinformatics, 2295–2300 (2006)Google Scholar
  14. 14.
    Jensen, L.J., et al.: String 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412–D416 (2009)CrossRefGoogle Scholar
  15. 15.
    Chena, Y., et al.: The coordinated evolution of yeast proteins is constrained by functional modularity. Trends Genet. 22(8), 416–419 (2006)CrossRefGoogle Scholar
  16. 16.
    Felder, Y., Tuller, T.: Discovering local patterns of co-evolution. In: Nelson, C.E., Vialette, S. (eds.) RECOMB-CG 2008. LNCS (LNBI), vol. 5267, pp. 55–71. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Felsenstein, J.: Phylip (phylogeny inference package) version 3.5c. Technical report, Department of Genetics, University of Washington, Seattle (1993)Google Scholar
  18. 18.
    Fitch, W.: Toward defining the course of evolution: minimum change for a specified tree topology. Syst. Z. 20, 406–416 (1971)CrossRefGoogle Scholar
  19. 19.
    Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155, 279–284 (1967)CrossRefPubMedGoogle Scholar
  20. 20.
    Garey, M.R., Johnson, D.S.: Computer and Intractability. Bell Telephone Laboratories, incorporated (1979)Google Scholar
  21. 21.
    Gaucher, E.A., et al.: Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins. Nature 425, 285–288 (2003)CrossRefPubMedGoogle Scholar
  22. 22.
    Hillis, D.M., et al.: Application and accuracy of molecular phylogenies. Science 264, 671–677 (1994)CrossRefPubMedGoogle Scholar
  23. 23.
    Hudek, A.K., Brown, D.G.: Ancestral sequence alignment under optimal conditions. BMC Bioinformatics (2005)Google Scholar
  24. 24.
    Jermann, T.M., et al.: Reconstructing the evolutionary history of the artiodactyl ribonuclease superfamily. Nature 374, 57–59 (1995)CrossRefPubMedGoogle Scholar
  25. 25.
    Jin, G., et al.: Maximum likelihood of phylogenetic networks. Bioinformatics 22(21), 2604–2611 (2006)CrossRefPubMedGoogle Scholar
  26. 26.
    Juan, D., et al.: High-confidence prediction of global interactomes based on genome-wide coevolutionary networks. Proc. Natl. Acad. Sci. USA 105(3), 934–939 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H.N. (ed.) Mammalian protein metabolism, pp. 21–123. Academic Press, New York (1969)CrossRefGoogle Scholar
  28. 28.
    Knudsen, B., Hein, J.: Rna secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15, 446–454 (1999)CrossRefPubMedGoogle Scholar
  29. 29.
    Koshi, M., Goldstein, R.: Probabilistic reconstruction of ancestral protein seuences. JME 42, 313–320 (1996)CrossRefGoogle Scholar
  30. 30.
    Krishnan, N.M., et al.: Ancestral sequence reconstruction in primate mitochondrial dna: Compositional bias and effect on functional inference. MBE 21(10), 1871–1883 (2004)CrossRefGoogle Scholar
  31. 31.
    Kschischang, F.R., et al.: Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47(2), 498–519 (2001)CrossRefGoogle Scholar
  32. 32.
    Li, G., et al.: More taxa are not necessarily better for the reconstruction of ancestral character states. Systematic Biology 57(4), 647–653 (2008)CrossRefPubMedGoogle Scholar
  33. 33.
    Lockless, S.W., Ranganathan, R.: Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286(5438), 295–299 (1999)CrossRefPubMedGoogle Scholar
  34. 34.
    Ma, J., et al.: Reconstructing contiguous regions of an ancestral genome. Genome Res. 16(12), 1557–1565 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)Google Scholar
  36. 36.
    Neyman, J.: Molecular studies of evolution: A source of novel statistical problems. In: Gupta, S., Jackel, Y. (eds.) Statistical Decision Theory and Related Topics, p. 127. Academic Press, New York (1971)Google Scholar
  37. 37.
    Ouzounis, C.A., et al.: A minimal estimate for the gene content of the last universal common ancestor–exobiology from a terrestrial perspective. Res. Microbiol. 157(1), 57–68 (2006)CrossRefPubMedGoogle Scholar
  38. 38.
    Pagel, M.: The maximum likelihood approach to reconstructing ancestral character states of discerete characters on phylogenies. Systematic Biology 48(3), 612–622 (1999)CrossRefGoogle Scholar
  39. 39.
    Pedersen, J.S., et al.: Identification and classification of conserved rna secondary structures in the human genome. PLoS. Comp. Bio. 2, e33 (2006)CrossRefGoogle Scholar
  40. 40.
    Pupko, T., et al.: A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol. Biol. Evol. 17(6), 890–896 (2000)CrossRefPubMedGoogle Scholar
  41. 41.
    Rascola, V.L., et al.: Ancestral animal genomes reconstruction. Current Opinion in Immunology 19(5), 542–546 (2007)CrossRefGoogle Scholar
  42. 42.
    Rzhetsky, A.: Estimating substitution rates in ribosomal rna genes. Genetics 141, 771–783 (1995)PubMedPubMedCentralGoogle Scholar
  43. 43.
    Sankoff, D.: Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics 28, 35–42 (1975)CrossRefGoogle Scholar
  44. 44.
    Sato, T., et al.: The inference of proteinprotein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21(17), 3482–3489 (2005)CrossRefPubMedGoogle Scholar
  45. 45.
    Tauberberger, J.K., et al.: Characterization of the 1918 influenza virus polymerase genes. Nature 437, 889–893 (2005)CrossRefGoogle Scholar
  46. 46.
    Thornton, J.W., et al.: Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling. Science 301, 1714–1717 (2003)CrossRefPubMedGoogle Scholar
  47. 47.
    Toda, T., et al.: Three different genes in s. cerevisiae encode the catalytic subunits of the camp-dependent protein kinase. Cell 50(2), 277–287 (1987)CrossRefPubMedGoogle Scholar
  48. 48.
    Tuller, T., et al.: Reconstructing ancestral gene content by co-evolution (submitted 2009)Google Scholar
  49. 49.
    Tuller, T., et al.: Co-evolutionary networks of genes and cellular processes across fungal species. Genome Biol. 10 (2009)Google Scholar
  50. 50.
    Yang, Z.: Paml: a program package for phylogenetic analysis by maximum likelihood. Computer Applications in BioSciences 13, 555–556 (1997)Google Scholar
  51. 51.
    Yang, Z., et al.: A new method of inference of ancestral nucleotide - and amino acid sequences. Genetics 141, 1641–1650 (1995)PubMedPubMedCentralGoogle Scholar
  52. 52.
    Yeang, C.H., et al.: Detecting the coevolution of biosequences–an example of rna interaction prediction. Mol. Biol. Evol. 24(9), 2119–2131 (2007)CrossRefPubMedGoogle Scholar
  53. 53.
    Yeang, C.H., Haussler, D.: Detecting coevolution in and among protein domains. PLoS Comput. Biol. 3(11), e211 (2007)CrossRefGoogle Scholar
  54. 54.
    Zhang, J., Rosenberg, H.F.: Complementary advantageous substitutions in the evolution of an antiviral rnase of higher primates. Proc. Natl. Acad. Sci. USA 99, 5486–5491 (2002)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Tamir Tuller
    • 1
    • 2
    • 3
    • 4
  • Hadas Birin
    • 1
  • Martin Kupiec
    • 2
  • Eytan Ruppin
    • 1
    • 3
  1. 1.School of Computer ScienceIsrael
  2. 2.Department of Molecular Microbiology and BiotechnologyIsrael
  3. 3.School of MedicineTel Aviv UniversityIsrael
  4. 4.Faculty of Mathematics and Computer ScienceWeizmann Institute of ScienceIsrael

Personalised recommendations