Skip to main content

No Molecule Is an Island: Molecular Evolution and the Study of Sequence Space

  • Chapter
  • First Online:
Book cover Algorithmic Bioprocesses

Part of the book series: Natural Computing Series ((NCS))

Abstract

Our knowledge of nucleic acid and protein structure comes almost exclusively from biological sequences isolated from nature. The ability to synthesize arbitrary sequences of DNA, RNA, and protein in vitro gives us experimental access to the much larger space of sequence possibilities that have not been instantiated in the course of evolution. In principle, this technology promises to both broaden and deepen our understanding of macromolecules, their evolution, and our ability to engineer new and complex functionality. Yet, it has long been assumed that the large number of sequence possibilities and the complexity of the sequence-to-structure relationship preempts any systematic analysis of sequence space. Here, we review recent efforts demonstrating that, with judicious employment of both formal and empirical constraints, it is possible to exploit intrinsic symmetries and correlations in sequence space, enabling coordination, projection, and navigation of the sea of sequence possibilities. These constraints not only make it possible to map the distributions of evolved sequences in the context of sequence space, but they also permit properties intrinsic to sequence space to be mapped by sampling tractable numbers of randomly generated sequences. Such maps suggest entirely new ways of looking at evolution, define new classes of experiments using randomly generated sequences and hold deep implications for the origin and evolution of macromolecular systems. We call this promising new direction sequenomics—the systematic study of sequence space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181:223–230

    Article  Google Scholar 

  2. Armstrong KA, Tidor B (2008) Computationally mapping sequence space to understand evolutionary protein engineering. Biotechnol Prog 24:62–73

    Article  Google Scholar 

  3. Beadle GW, Tatum EL (1941) Genetic control of biochemical reactions in neurospora. Proc Natl Acad Sci 27:499–506

    Article  Google Scholar 

  4. Bloomfield VA, Crothers DM, Tinoco I (2000) Nucleic acids: structures, properties, and functions. University Science Books, Sausalito

    Google Scholar 

  5. Breaker RR (2004) Natural and engineered nucleic acids as tool to explore biology. Nature 432:838–844

    Article  Google Scholar 

  6. Carothers JM, Oestreich SC, Davis JH, Szostak JW (2004) Information complexity and functional activity of RNA structure. J Am Chem Soc 126:5130–5137

    Article  Google Scholar 

  7. Chargaff E (1950) Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia 6:201–209

    Article  Google Scholar 

  8. Chiarabellia C et al. (2001) Investigation of de novo totally random biosequences, part II: on the folding frequency in a totally random library of de novo proteins obtained by phage display. Chem Biodivers 3:840–859

    Article  Google Scholar 

  9. Creighton TE (1993) Proteins: structures and molecular properties, 2nd edn. Freeman, New York, pp 172–173

    Google Scholar 

  10. Curtis EA, Bartel DP (2005) New catalytic structures from an existing ribozyme. Nat Struct Mol Biol 12:994–1000

    Google Scholar 

  11. Davidson AR, Sauer RT (1994) Folded proteins occur frequently in libraries of random amino acid sequences. Proc Natl Acad Sci 91:2146–2150

    Article  Google Scholar 

  12. Davidson AR, Lumb KJ, Sauer RT (1995) Cooperatively folded proteins in random sequence libraries. Nat Struct Biol 2:856–864

    Article  Google Scholar 

  13. Doherty EA et al. (2001) A universal mode of helix packing in RNA. Nat Struct Biol 8:339–343

    Article  Google Scholar 

  14. Doi N, Kakukawa K, Oishi Y, Yanagawa H (2004) High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Prot Eng Des Sel 18:279–284

    Article  Google Scholar 

  15. Draper DE (1992) The RNA-folding problem. Acc Chem Res 25:201–207

    Article  Google Scholar 

  16. Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Biol 6:197–207

    Article  Google Scholar 

  17. Fontana W, Schuster P (1998) Continuity in evolution: on the nature of transitions. Science 280:1451–1455

    Article  Google Scholar 

  18. Frauenfelder H, Wolynes PG (1994) Biomolecules: where the physics of complexity and simplicity meet. Phys Today 47:58–64

    Article  Google Scholar 

  19. Gan HH, Pasquali S, Schlick T (2003) Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic Acids Res 31:2926–2943

    Article  Google Scholar 

  20. Green DW, Ingram VM, Perutz MF (1953) The structure of hemoglobin, IV: sign determination by isomorphus replacement method. Proc R Soc Lond A 255:287–307

    Google Scholar 

  21. Gould SJ, Lewontin RC (1979) The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationalist programme. Proc R Soc Lond B 205:581–598

    Article  Google Scholar 

  22. Grüner W et al. (1996a) Analysis of RNA sequence structure maps by exhaustive enumeration, I: Neutral networks. Mon Chem 127:355–374

    Article  Google Scholar 

  23. Grüner W et al. (1996b) Analysis of RNA sequence structure maps by exhaustive enumeration, II: Structures of neutral networks and shape space covering. Mon Chem 127:375–389

    Article  Google Scholar 

  24. Guo F, Cech TR (2002) Evolution of tetrahymena ribozyme mutants with increased structural stability. Nat Struct Biol 9:855–861

    Google Scholar 

  25. Hecker R et al. (1988) Analysis of RNA structure by temperature-gradient gel electrophoresis:viroid replication and processing. Gene 72:59–74

    Article  Google Scholar 

  26. Held DM et al. (2003) Evolutionary landscapes for the acquisition of new ligand recognition by RNA aptamers. J Mol Evol 57:299–308

    Article  Google Scholar 

  27. Huang Z, Szostak JW (2003) Evolution of aptamers with a new specificity and new secondary structure from ATP aptamers. RNA 9:1456–1463

    Article  Google Scholar 

  28. Kendrew JC, Bode G, Dintzis HM, Parrish RC, Wykoff H (1958) A three-dimensional model of the myoglobin molecule obtained by X-ray analysis. Nature 181:660–662

    Article  Google Scholar 

  29. Kimura M (1968) Evolutionary rate at the molecular level. Nature 217:624–626

    Article  Google Scholar 

  30. King JL, Jukes TH (1969) Non-Darwinian evolution. Science 164:788–798

    Article  Google Scholar 

  31. Kauffman SA (1993) The origins of order: self-organization and selection in evolution. Oxford University Press, New York

    Google Scholar 

  32. Kim N, Shin JS, Elmetwaly S, Gan HH, Schlick T (2007) RAGPOOLS: RNA-as-graph-pools a web server for assisting the design of structured RNA pools for in vitro selection. Bioinformatics. doi:10.1093/bioinformatics/btm439

    Google Scholar 

  33. Knight R et al. (2005) Abundance of correctly folded RNA motifs in sequence space, calculated on computational grids. Nucleic Acids Res 33:6671–6671

    Article  Google Scholar 

  34. Lambros RJ, Mortimer JR, Forsdyke DR (2003) Optimum growth temperature and the base composition of open reading frames in prokaryotes. Extremophiles 7:443–450

    Article  Google Scholar 

  35. LaBean TH, Kayffman SA (1993) Design of synthetic gene libraries encoding random sequence proteins with desired ensemble characteristics. Protein Sci 2:1249–1254

    Article  Google Scholar 

  36. LaBean TH, Kauffman SA, Butt TR (1995) Libraries of random-sequence polypeptides produced with high yield as carboxy-terminal fusions with ubiquitin. Mol Divers 1:29–38

    Article  Google Scholar 

  37. LaBean TH, Schultes EA, Butt TR, Kauffman SA (2009) Protein folding absent selection (submitted)

    Google Scholar 

  38. Leontis N et al. (2006) The RNA ontology consortium: an open invitation to the RNA community. RNA 12:533–541

    Article  Google Scholar 

  39. Levinthal C (1968) Are there pathways for protein folding? Extrait J Chim Phys 65:44–45

    Google Scholar 

  40. Levinthal C (1969) How to fold graciously. In: DeBrunner JTP, Munck E (eds) Mossbauer spectroscopy in biological systems: proceedings of a meeting held at Allerton House, Monticello, IL. University of Illinois Press, Champaign, pp 22–24

    Google Scholar 

  41. Liu X, Fan K, Wang W (2004) The number of protein folds and their distribution over families in nature. Proteins 54:491–499

    Article  Google Scholar 

  42. Lisacek F, Diaz Y, Michel F (1994) Automatic identification of group I introns cores in genomic DNA sequences. J Mol Biol 235:1206–1217

    Article  Google Scholar 

  43. Mathews DH, Sabina J, Zuker M, Turner DH (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288:911–940

    Article  Google Scholar 

  44. Meier S, Özbek S (2007) A biological cosmos of parallel universes: does protein structural plasticity facilitate evolution? BioEssays 29:1095–1104

    Article  Google Scholar 

  45. Mirsky AE, Pauling L (1936) On the structure of native, denatured, and coagulated proteins. Proc Natl Acad Sci 22:439–447

    Article  Google Scholar 

  46. Nissen P et al. (2001) RNA tertiary interactions in the large ribosomal subunit: the A-minor motif. Proc Natl Acad Sci 98:4899–4903

    Article  Google Scholar 

  47. Pinker RJ, Lin L, Rose GD, Kallenbach NR (1993) Effects of alanine substitutions in alpha-helices of sperm whale myoglobin on protein stability. Protein Sci 2:1099–1105

    Article  Google Scholar 

  48. Prijambada ID et al. (1996) Solubility of artificial proteins with random sequences. FEBS Lett 382:21–25

    Article  Google Scholar 

  49. Ptitsyn OB (1995) Molten globule and protein folding. Adv Protein Chem 47:83–229

    Article  Google Scholar 

  50. Quastler H (1964) The emergence of biological organization. Yale University Press, New Haven

    Google Scholar 

  51. RajBhandary UL, Kohrer C (2006) Early days of tRNA research: discovery, function, purification and sequence analysis. J Biosci 31:439–451

    Article  Google Scholar 

  52. Reidys CM, Stadler PF, Schuster P (1997) Generic properties of combinatory maps: neural networks of RNA secondary structures. Bull Math Biol 59:339–397

    Article  MATH  Google Scholar 

  53. Rucker AL, Creamer TP (2002) Polyproline II helical structure in protein unfolded states: lysine peptides revisited. Protein Sci 11:980–985

    Google Scholar 

  54. Salisbury FB (1969) Natural selection and the complexity of the gene. Nature 224:342–343

    Article  Google Scholar 

  55. Sanger F (1952) The arrangement of amino acids in proteins. Adv Protein Chem 7:1–69

    Article  Google Scholar 

  56. Schultes EA, Spasic A, Mohanty U, Bartel DP (2005) Compact and ordered collapse in randomly generated RNA sequences. Nat Struct Mol Biol 12:1130–1136

    Article  Google Scholar 

  57. Schultes EA, Bartel DP (2000) One sequence, two ribozymes: implications for the emergence of new ribozyme folds. Science 289:448–452

    Article  Google Scholar 

  58. Schultes E, Hraber PT, LaBean TH (1999a) A parameterization of RNA sequence space. Complexity 4:61–71

    Article  MathSciNet  Google Scholar 

  59. Schultes EA, Hraber PT, LaBean TH (1999b) Estimating the contributions of selection and self-organization in RNA secondary structures. J Mol Evol 49:76–83

    Article  Google Scholar 

  60. Schultes E, Hraber PT, LaBean TH (1997) Global similarities in nucleotide base composition among disparate functional classes of single-stranded RNA imply adaptive evolutionary convergence. RNA 3:792–806

    Google Scholar 

  61. Smit S, Yarus MY, Knight R (2006) Natural selection is not required to explain universal compositional patterns in rRNA secondary structure categories. RNA-A Publ RNA Soc 12:1–14

    Google Scholar 

  62. Smith JM (1970) Natural selection and the concept of protein space. Nature 225:563–564

    Article  Google Scholar 

  63. Sondek J, Shortle D (1990) Accommodation of single amino acid insertions by the native state of staphylococcal nuclease. Proteins 7:299–305

    Article  Google Scholar 

  64. Svedberg T, Fahraeus R (1926) A new method for the determination of the molecular weights of proteins. J Am Chem Soc 48:430–438

    Article  Google Scholar 

  65. Tompa P (2002) Instrinsically unstructured proteins. Trends Biochem Sci 27:527–533

    Article  Google Scholar 

  66. Urfer R, Kirschner K (1992) The importance of surface loops for stabilizing an eightfold beta alpha barrel protein. Protein Sci 1:31–45

    Article  Google Scholar 

  67. Uhlenbeck OC (1995) Keeping RNA happy. RNA 1:4–6

    MathSciNet  Google Scholar 

  68. van Holde KE (2003) Reflections on a century of protein chemistry. Biophys Chem 100:71–79

    Article  Google Scholar 

  69. Weissmann C (2004) The state of proin. Nat Rev Microbiol 2:861–871

    Article  Google Scholar 

  70. Wilson DS, Szostak JW (1999) In vitro selection of functional nucleic acids. Annu Rev Biochem 68:611–647

    Article  Google Scholar 

  71. Woese CR (2000) Interpreting the universal phylogenetic tree. Proc Natl Acad Sci 97:8392–8396

    Article  Google Scholar 

  72. Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–149

    Article  Google Scholar 

  73. Zuker M (2003) mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415

    Article  Google Scholar 

  74. Zukerkandl E, Pauling L (1965) Evolutionary divergence and convergence in proteins. In: Bryson V, Vogel H (eds) Evolving genes are proteins. Academic Press, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik A. Schultes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schultes, E.A., Hraber, P.T., LaBean, T.H. (2009). No Molecule Is an Island: Molecular Evolution and the Study of Sequence Space. In: Condon, A., Harel, D., Kok, J., Salomaa, A., Winfree, E. (eds) Algorithmic Bioprocesses. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88869-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88869-7_34

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88868-0

  • Online ISBN: 978-3-540-88869-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics