Skip to main content

Two Proteins for the Price of One: The Design of Maximally Compressed Coding Sequences

  • Conference paper
  • 771 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3892))

Abstract

The emerging field of synthetic biology moves beyond conventional genetic manipulation to construct novel life forms which do not originate in nature. We explore the problem of designing the provably shortest genomic sequence to encode a given set of genes by exploiting alternate reading frames. We present an algorithm for designing the shortest DNA sequence simultaneously encoding two given amino acid sequences. We show that the coding sequence of naturally occurring pairs of overlapping genes approach maximum compression. We also investigate the impact of alternate coding matrices on overlapping sequence design. Finally, we discuss an interesting application for overlapping gene design, namely the interleaving of an antibiotic resistance gene into a target gene inserted into a virus or plasmid for amplification.

This research was partially supported by NSF grants EIA-0325123 and DBI-0444815.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cello, J., Paul, A., Wimmer, E.: Chemical synthesis of poliovirus cDNA: Generation of infectious virus in the absence of natural template. Science 297, 1016–1018 (2002)

    Article  Google Scholar 

  2. Smith, H., Hutchison, C., Pfannkoch, C., Venter, J.C.: Generating a synthetic genome by whole genome assembly: Phix174 bacteriophage from synthetic oligonucleotides. Proc. Nat. Acad. Sci. 100, 15440–15445 (2003)

    Article  Google Scholar 

  3. Kodumal, S., Pael, K., Reid, R., Menzella, H., Welch, M., Santi, D.: Total synthesis of long DNA sequences: Synthesis of a contiguous 32-kb polyketide synthase gene cluster. Proc. Nat. Acad. Sci. 44, 15573–15578 (2004)

    Article  Google Scholar 

  4. Ball, P.: Starting from scratch. Nature 431, 624–626 (2004)

    Article  Google Scholar 

  5. Tian, J., Gong, H., Sheng, N., Zhou, Z., Gulari, E., Gao, X., Church, G.: Accurate multiplex gene synthesis from programmable DNA microchips. Nature 432, 1050–1054 (2004)

    Article  Google Scholar 

  6. Skiena, S., Wimmer, E.: Gene design for vaccines and theraputic phages. NSF ITR Award 0325123 (2003)

    Google Scholar 

  7. Cohen, B., Skiena, S.: Natural selection and algorithmic design of mrna. J. Computational Biology 10, 419–432 (2003)

    Article  Google Scholar 

  8. Skiena, S.: Designing better phages. Bioinformatics 17, 253–261 (2001)

    Article  Google Scholar 

  9. Fukuda, Y., Washio, T., Tomita, M.: Evolution of overlapping genes: Comparative genomics of mycoplasma genitalium and mycoplasma pneumoniae. In: The Ninth Workshop on Genome Informatics (1998)

    Google Scholar 

  10. Cann, A.J.: Principles of Molecular Virology. Academic Press, London (1993)

    Google Scholar 

  11. Keese, P., Gibbs, A.: Origins of genes: “big bang” or continuous creation? Proc. Natl. Acad. Sci. 89, 9489–9493 (1992)

    Article  Google Scholar 

  12. Krakauer, D.C.: Evolutionary principles of genomic compression. Comments on Theor. Biol. (2002)

    Google Scholar 

  13. Oppenheim, D., Yahofsky, C.: Translational coupling during expression of the tryptophan operon of e. coli. Genetics 95, 785–795 (1980)

    Google Scholar 

  14. Miyata, T., Yasunaga, T.: Evolution of overlapping genes. Nature 272, 532–535 (1978)

    Article  Google Scholar 

  15. Krakauer, D.C.: Stability and evolution of overlapping genes. Evolution 54(3), 731–739 (2000)

    Article  Google Scholar 

  16. Veeramachaneni, V., Makalowski, W., Galdzicki, M., Sood, R., Makalowska, I.: Mammalian overlapping genes: The comparative method. Genome Research 14, 280–286 (2004)

    Article  Google Scholar 

  17. Fukuda, Y., Nakayama, Y., Tomita, M.: On dynamics of overlapping genes in bacterial genomes. Gene. 323, 181–187 (2003)

    Article  Google Scholar 

  18. Rogozin, I., Spiridonov, A., Sorokin, A., Wolf, Y., King, J., Tatusov, R., Koonin, E.: Purifying and directional selection in overlapping prokaryotic genes. Trends Genet. 18(5), 228–232 (2002)

    Article  Google Scholar 

  19. Karlin, S., Chen, C., Gentles, A., Cleary, M.: Associations between human disease genes and overlapping gene groups and multiple amino acid runs. Proc. Natl. Acad. Sci. 99(26), 17008–17013 (2002)

    Article  Google Scholar 

  20. Freeland, S., Hurst, L.: Evolution encoded. Sci. Am. 290(4), 84–91 (2004)

    Article  Google Scholar 

  21. Gilis, D., Massar, S., Cerf, N.J., Rooman, M.: Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol. 2(11) (2001)

    Google Scholar 

  22. Marti-Renom, M.A., Stuart, A.C., Fiser, A., Sanchez, R., Melo, F., Sali, A.: Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325 (2000)

    Article  Google Scholar 

  23. Levitt, M.: A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 104, 59–107 (1976)

    Article  Google Scholar 

  24. Elber, R., Karplus, M.: Enhanced sampling in molecular dynamics: Use of the time-dependent hartree approximation for a simulation of carbon monoxide diffusion through myoglobin. J. Am. Chem. Soc. 112, 9161–9175 (1990)

    Article  Google Scholar 

  25. Hornak, V., Simmerling, C.: Generation of accurate protein loop conformations through low-barrier molecular dynamics. Proteins 51, 577–590 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, B., Papamichail, D., Mueller, S., Skiena, S. (2006). Two Proteins for the Price of One: The Design of Maximally Compressed Coding Sequences. In: Carbone, A., Pierce, N.A. (eds) DNA Computing. DNA 2005. Lecture Notes in Computer Science, vol 3892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11753681_31

Download citation

  • DOI: https://doi.org/10.1007/11753681_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34161-1

  • Online ISBN: 978-3-540-34165-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics