Viral Gene Compression: Complexity and Verification

  • Mark Daley
  • Ian McQuillan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3317)


The smallest known biological organisms are, by far, the viruses. One of the unique adaptations that many viruses have aquired is the compression of the genes in their genomes. In this paper we study a formalized model of gene compression in viruses. Specifically, we define a set of constraints that describe viral gene compression strategies and investigate the properties of these constraints from the point of view of genomes as languages. We pay special attention to the finite case (representing real viral genomes) and describe a metric for measuring the level of compression in a real viral genome. An efficient algorithm for establishing this metric is given along with applications to real genomes including automated classification of viruses and prediction of horizontal gene transfer between host and virus.


Viral Genome Horizontal Gene Transfer Formal Language Large Integer Regular Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Berstel, J.: Transductions and Context-Free Languages. B.B. Teubner, Stuttgart (1979)Google Scholar
  2. 2.
    Blumer, A., Blumer, J., Chen, M.T., Ehrenfeucht, A., Seiferas, J.: The smallest automaton recognizing the subwords of a text. Theoretical Computer Science 40(1), 31–55 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Cann, A.J.: Principles of Molecular Virology, 3rd edn. Academic Press, San Diego (2001)Google Scholar
  4. 4.
    Ginsburg, S.: Algebraic and Automata-Theoretic Properties of Formal Languages. North-Holland Publishing Company, Amsterdam (1975)zbMATHGoogle Scholar
  5. 5.
    Ginsburg, S., Spanier, E.H.: Bounded algol-like languages. Transactions of the American Mathematical Society 113(2), 333–368 (1964)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Holub, J., Melichar, B.: Implementation of nondeterministic finite automata for approximate pattern matching. In: Champarnaud, J.-M., Maurel, D., Ziadi, D. (eds.) WIA 1998. LNCS, vol. 1660, pp. 92–99. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  7. 7.
    Ibarra, O.: Reversal-bounded multicounter machines and their decision problems. Journal of the ACM 25(1), 116–133 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Krakauer, D.C.: Evolutionary principles of genome compression. Comments on Theoretical Biology 7(4), 215–236 (2002)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Salomaa, A.: Formal Languages. Academic Press, New York (1973)zbMATHGoogle Scholar
  10. 10.
    Wagner, E.K., Hewlett, M.J.: Basic Virology. Blackwell Science, Malden (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Mark Daley
    • 1
    • 2
  • Ian McQuillan
    • 2
  1. 1.University of SaskatchewanSaskatoonCanada
  2. 2.University of Western OntarioLondonCanada

Personalised recommendations