Computational Optimization and Applications

, Volume 38, Issue 2, pp 281–298 | Cite as

Modeling sequence evolution with kernel methods

  • Margherita Bresco
  • Marco Turchi
  • Tijl De Bie
  • Nello Cristianini


We model the evolution of biological and linguistic sequences by comparing their statistical properties. This comparison is performed by means of efficiently computable kernel functions, that take two sequences as an input and return a measure of statistical similarity between them. We show how the use of such kernels allows to reconstruct the phylogenetic trees of primates based on the mitochondrial DNA (mtDNA) of existing animals, and the phylogenetic tree of Indo-European and other languages based on sample documents from existing languages.

Kernel methods provide a convenient framework for many pattern analysis tasks, and recent advances have been focused on efficient methods for sequence comparison and analysis. While a large toolbox of algorithms has been developed to analyze data by using kernels, in this paper we demonstrate their use in combination with standard phylogenetic reconstruction algorithms and visualization methods.


Feature Space Leaf Node Kernel Method Kernel Matrix Gorilla Gorilla 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allman, E.S., Rhodes, J.A.: Mathematical Models in Biology: An Introduction. Cambridge University Press, Cambridge (2004) MATHGoogle Scholar
  2. 2.
    Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. (2002) Google Scholar
  3. 3.
    De Bie, T., Cristianini, N.: Kernel methods for exploratory data analysis: a demonstration on text data. In: Proceedings of the joint IAPR international workshops on Syntactical and Structural Pattern Recognition, SSPR 2004 and Statistical Pattern Recognition, SPR 2004, Lisbon, August 2004 Google Scholar
  4. 4.
    Felsenstein, J.: Inferring Phylogenies. Sinauer, Sunderland (2004) Google Scholar
  5. 5.
    Ingman, M.: mtDB—Human Mitochondrial Genome Database,
  6. 6.
    Ingman, M., Kaessmann, H., Pbo, S., Gyllensten, U.: Mitochondrial genome variation and the origin of modern humans. Nature (2000) Google Scholar
  7. 7.
    Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage, Beverly Hills (1978) Google Scholar
  8. 8.
    Leslie, C., Kuang, R.: Fast kernels for inexact string matching. In: Conference on Learning Theory, Columbia University, New York, NY, 2003 Google Scholar
  9. 9.
    Li, M., Li, X., Ma, B., Vitanyi, P.: Similarity distance and phylogeny. IEEE Trans. Inform. Theory (2004) Google Scholar
  10. 10.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. (2002) Google Scholar
  11. 11.
    Nowak, M.A., Krakauer, D.C.: The evolution of language. Proc. Natl. Acad. Sci. USA (1999) Google Scholar
  12. 12.
    Perrière, G., Gouy, M.: WWW-Query: An on-line retrieval system for biological sequence banks. Biochimie (1996),, pp. 364–369
  13. 13.
    United Nations General Assembly resolution 217 A (III), Universal Declaration of Human Rights, 1948 Google Scholar
  14. 14.
    Ringe, D.A., Taylor, A., Warnow, T.: Determining the Evolutionary History of Languages. University of Pennsylvania, Philadelphia (1955) Google Scholar
  15. 15.
    Ruhlen, M.: The Origin of Language: Tracing the Evolution of the Mother Tongue. Wiley, New York (1994) Google Scholar
  16. 16.
    Saitou, N., Nei, M.: The neighbor joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. (1987) Google Scholar
  17. 17.
    Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. (1948) Google Scholar
  18. 18.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004), Google Scholar
  19. 19.
    Studier, A.J., Keppler, K.J.: A note on the neighbor joining algorithm of Saitou and Nei. Mol. Biol. Evol. (1988) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Margherita Bresco
    • 1
  • Marco Turchi
    • 2
  • Tijl De Bie
    • 3
  • Nello Cristianini
    • 4
  1. 1.Department of Mathematics and InformaticsUniversity of SalernoSalernoItaly
  2. 2.Department of Information EngineeringUniversity of SienaSienaItaly
  3. 3.ECS, ISIS Research GroupUniversity of SouthamptonSouthamptonUK
  4. 4.Department of StatisticsUniversity of CaliforniaDavisUSA

Personalised recommendations