No Molecule Is an Island: Molecular Evolution and the Study of Sequence Space

  • Erik A. SchultesEmail author
  • Peter T. Hraber
  • Thomas H. LaBean
Part of the Natural Computing Series book series (NCS)


Our knowledge of nucleic acid and protein structure comes almost exclusively from biological sequences isolated from nature. The ability to synthesize arbitrary sequences of DNA, RNA, and protein in vitro gives us experimental access to the much larger space of sequence possibilities that have not been instantiated in the course of evolution. In principle, this technology promises to both broaden and deepen our understanding of macromolecules, their evolution, and our ability to engineer new and complex functionality. Yet, it has long been assumed that the large number of sequence possibilities and the complexity of the sequence-to-structure relationship preempts any systematic analysis of sequence space. Here, we review recent efforts demonstrating that, with judicious employment of both formal and empirical constraints, it is possible to exploit intrinsic symmetries and correlations in sequence space, enabling coordination, projection, and navigation of the sea of sequence possibilities. These constraints not only make it possible to map the distributions of evolved sequences in the context of sequence space, but they also permit properties intrinsic to sequence space to be mapped by sampling tractable numbers of randomly generated sequences. Such maps suggest entirely new ways of looking at evolution, define new classes of experiments using randomly generated sequences and hold deep implications for the origin and evolution of macromolecular systems. We call this promising new direction sequenomics—the systematic study of sequence space.


Molecular Evolution Sequence Space Regular Graph Hepatitis Delta Virus Neutral Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181:223–230 CrossRefGoogle Scholar
  2. 2.
    Armstrong KA, Tidor B (2008) Computationally mapping sequence space to understand evolutionary protein engineering. Biotechnol Prog 24:62–73 CrossRefGoogle Scholar
  3. 3.
    Beadle GW, Tatum EL (1941) Genetic control of biochemical reactions in neurospora. Proc Natl Acad Sci 27:499–506 CrossRefGoogle Scholar
  4. 4.
    Bloomfield VA, Crothers DM, Tinoco I (2000) Nucleic acids: structures, properties, and functions. University Science Books, Sausalito Google Scholar
  5. 5.
    Breaker RR (2004) Natural and engineered nucleic acids as tool to explore biology. Nature 432:838–844 CrossRefGoogle Scholar
  6. 6.
    Carothers JM, Oestreich SC, Davis JH, Szostak JW (2004) Information complexity and functional activity of RNA structure. J Am Chem Soc 126:5130–5137 CrossRefGoogle Scholar
  7. 7.
    Chargaff E (1950) Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia 6:201–209 CrossRefGoogle Scholar
  8. 8.
    Chiarabellia C et al. (2001) Investigation of de novo totally random biosequences, part II: on the folding frequency in a totally random library of de novo proteins obtained by phage display. Chem Biodivers 3:840–859 CrossRefGoogle Scholar
  9. 9.
    Creighton TE (1993) Proteins: structures and molecular properties, 2nd edn. Freeman, New York, pp 172–173 Google Scholar
  10. 10.
    Curtis EA, Bartel DP (2005) New catalytic structures from an existing ribozyme. Nat Struct Mol Biol 12:994–1000 Google Scholar
  11. 11.
    Davidson AR, Sauer RT (1994) Folded proteins occur frequently in libraries of random amino acid sequences. Proc Natl Acad Sci 91:2146–2150 CrossRefGoogle Scholar
  12. 12.
    Davidson AR, Lumb KJ, Sauer RT (1995) Cooperatively folded proteins in random sequence libraries. Nat Struct Biol 2:856–864 CrossRefGoogle Scholar
  13. 13.
    Doherty EA et al. (2001) A universal mode of helix packing in RNA. Nat Struct Biol 8:339–343 CrossRefGoogle Scholar
  14. 14.
    Doi N, Kakukawa K, Oishi Y, Yanagawa H (2004) High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Prot Eng Des Sel 18:279–284 CrossRefGoogle Scholar
  15. 15.
    Draper DE (1992) The RNA-folding problem. Acc Chem Res 25:201–207 CrossRefGoogle Scholar
  16. 16.
    Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Biol 6:197–207 CrossRefGoogle Scholar
  17. 17.
    Fontana W, Schuster P (1998) Continuity in evolution: on the nature of transitions. Science 280:1451–1455 CrossRefGoogle Scholar
  18. 18.
    Frauenfelder H, Wolynes PG (1994) Biomolecules: where the physics of complexity and simplicity meet. Phys Today 47:58–64 CrossRefGoogle Scholar
  19. 19.
    Gan HH, Pasquali S, Schlick T (2003) Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic Acids Res 31:2926–2943 CrossRefGoogle Scholar
  20. 20.
    Green DW, Ingram VM, Perutz MF (1953) The structure of hemoglobin, IV: sign determination by isomorphus replacement method. Proc R Soc Lond A 255:287–307 Google Scholar
  21. 21.
    Gould SJ, Lewontin RC (1979) The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationalist programme. Proc R Soc Lond B 205:581–598 CrossRefGoogle Scholar
  22. 22.
    Grüner W et al. (1996a) Analysis of RNA sequence structure maps by exhaustive enumeration, I: Neutral networks. Mon Chem 127:355–374 CrossRefGoogle Scholar
  23. 23.
    Grüner W et al. (1996b) Analysis of RNA sequence structure maps by exhaustive enumeration, II: Structures of neutral networks and shape space covering. Mon Chem 127:375–389 CrossRefGoogle Scholar
  24. 24.
    Guo F, Cech TR (2002) Evolution of tetrahymena ribozyme mutants with increased structural stability. Nat Struct Biol 9:855–861 Google Scholar
  25. 25.
    Hecker R et al. (1988) Analysis of RNA structure by temperature-gradient gel electrophoresis:viroid replication and processing. Gene 72:59–74 CrossRefGoogle Scholar
  26. 26.
    Held DM et al. (2003) Evolutionary landscapes for the acquisition of new ligand recognition by RNA aptamers. J Mol Evol 57:299–308 CrossRefGoogle Scholar
  27. 27.
    Huang Z, Szostak JW (2003) Evolution of aptamers with a new specificity and new secondary structure from ATP aptamers. RNA 9:1456–1463 CrossRefGoogle Scholar
  28. 28.
    Kendrew JC, Bode G, Dintzis HM, Parrish RC, Wykoff H (1958) A three-dimensional model of the myoglobin molecule obtained by X-ray analysis. Nature 181:660–662 CrossRefGoogle Scholar
  29. 29.
    Kimura M (1968) Evolutionary rate at the molecular level. Nature 217:624–626 CrossRefGoogle Scholar
  30. 30.
    King JL, Jukes TH (1969) Non-Darwinian evolution. Science 164:788–798 CrossRefGoogle Scholar
  31. 31.
    Kauffman SA (1993) The origins of order: self-organization and selection in evolution. Oxford University Press, New York Google Scholar
  32. 32.
    Kim N, Shin JS, Elmetwaly S, Gan HH, Schlick T (2007) RAGPOOLS: RNA-as-graph-pools a web server for assisting the design of structured RNA pools for in vitro selection. Bioinformatics. doi: 10.1093/bioinformatics/btm439 Google Scholar
  33. 33.
    Knight R et al. (2005) Abundance of correctly folded RNA motifs in sequence space, calculated on computational grids. Nucleic Acids Res 33:6671–6671 CrossRefGoogle Scholar
  34. 34.
    Lambros RJ, Mortimer JR, Forsdyke DR (2003) Optimum growth temperature and the base composition of open reading frames in prokaryotes. Extremophiles 7:443–450 CrossRefGoogle Scholar
  35. 35.
    LaBean TH, Kayffman SA (1993) Design of synthetic gene libraries encoding random sequence proteins with desired ensemble characteristics. Protein Sci 2:1249–1254 CrossRefGoogle Scholar
  36. 36.
    LaBean TH, Kauffman SA, Butt TR (1995) Libraries of random-sequence polypeptides produced with high yield as carboxy-terminal fusions with ubiquitin. Mol Divers 1:29–38 CrossRefGoogle Scholar
  37. 37.
    LaBean TH, Schultes EA, Butt TR, Kauffman SA (2009) Protein folding absent selection (submitted) Google Scholar
  38. 38.
    Leontis N et al. (2006) The RNA ontology consortium: an open invitation to the RNA community. RNA 12:533–541 CrossRefGoogle Scholar
  39. 39.
    Levinthal C (1968) Are there pathways for protein folding? Extrait J Chim Phys 65:44–45 Google Scholar
  40. 40.
    Levinthal C (1969) How to fold graciously. In: DeBrunner JTP, Munck E (eds) Mossbauer spectroscopy in biological systems: proceedings of a meeting held at Allerton House, Monticello, IL. University of Illinois Press, Champaign, pp 22–24 Google Scholar
  41. 41.
    Liu X, Fan K, Wang W (2004) The number of protein folds and their distribution over families in nature. Proteins 54:491–499 CrossRefGoogle Scholar
  42. 42.
    Lisacek F, Diaz Y, Michel F (1994) Automatic identification of group I introns cores in genomic DNA sequences. J Mol Biol 235:1206–1217 CrossRefGoogle Scholar
  43. 43.
    Mathews DH, Sabina J, Zuker M, Turner DH (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288:911–940 CrossRefGoogle Scholar
  44. 44.
    Meier S, Özbek S (2007) A biological cosmos of parallel universes: does protein structural plasticity facilitate evolution? BioEssays 29:1095–1104 CrossRefGoogle Scholar
  45. 45.
    Mirsky AE, Pauling L (1936) On the structure of native, denatured, and coagulated proteins. Proc Natl Acad Sci 22:439–447 CrossRefGoogle Scholar
  46. 46.
    Nissen P et al. (2001) RNA tertiary interactions in the large ribosomal subunit: the A-minor motif. Proc Natl Acad Sci 98:4899–4903 CrossRefGoogle Scholar
  47. 47.
    Pinker RJ, Lin L, Rose GD, Kallenbach NR (1993) Effects of alanine substitutions in alpha-helices of sperm whale myoglobin on protein stability. Protein Sci 2:1099–1105 CrossRefGoogle Scholar
  48. 48.
    Prijambada ID et al. (1996) Solubility of artificial proteins with random sequences. FEBS Lett 382:21–25 CrossRefGoogle Scholar
  49. 49.
    Ptitsyn OB (1995) Molten globule and protein folding. Adv Protein Chem 47:83–229 CrossRefGoogle Scholar
  50. 50.
    Quastler H (1964) The emergence of biological organization. Yale University Press, New Haven Google Scholar
  51. 51.
    RajBhandary UL, Kohrer C (2006) Early days of tRNA research: discovery, function, purification and sequence analysis. J Biosci 31:439–451 CrossRefGoogle Scholar
  52. 52.
    Reidys CM, Stadler PF, Schuster P (1997) Generic properties of combinatory maps: neural networks of RNA secondary structures. Bull Math Biol 59:339–397 zbMATHCrossRefGoogle Scholar
  53. 53.
    Rucker AL, Creamer TP (2002) Polyproline II helical structure in protein unfolded states: lysine peptides revisited. Protein Sci 11:980–985 Google Scholar
  54. 54.
    Salisbury FB (1969) Natural selection and the complexity of the gene. Nature 224:342–343 CrossRefGoogle Scholar
  55. 55.
    Sanger F (1952) The arrangement of amino acids in proteins. Adv Protein Chem 7:1–69 CrossRefGoogle Scholar
  56. 56.
    Schultes EA, Spasic A, Mohanty U, Bartel DP (2005) Compact and ordered collapse in randomly generated RNA sequences. Nat Struct Mol Biol 12:1130–1136 CrossRefGoogle Scholar
  57. 57.
    Schultes EA, Bartel DP (2000) One sequence, two ribozymes: implications for the emergence of new ribozyme folds. Science 289:448–452 CrossRefGoogle Scholar
  58. 58.
    Schultes E, Hraber PT, LaBean TH (1999a) A parameterization of RNA sequence space. Complexity 4:61–71 CrossRefMathSciNetGoogle Scholar
  59. 59.
    Schultes EA, Hraber PT, LaBean TH (1999b) Estimating the contributions of selection and self-organization in RNA secondary structures. J Mol Evol 49:76–83 CrossRefGoogle Scholar
  60. 60.
    Schultes E, Hraber PT, LaBean TH (1997) Global similarities in nucleotide base composition among disparate functional classes of single-stranded RNA imply adaptive evolutionary convergence. RNA 3:792–806 Google Scholar
  61. 61.
    Smit S, Yarus MY, Knight R (2006) Natural selection is not required to explain universal compositional patterns in rRNA secondary structure categories. RNA-A Publ RNA Soc 12:1–14 Google Scholar
  62. 62.
    Smith JM (1970) Natural selection and the concept of protein space. Nature 225:563–564 CrossRefGoogle Scholar
  63. 63.
    Sondek J, Shortle D (1990) Accommodation of single amino acid insertions by the native state of staphylococcal nuclease. Proteins 7:299–305 CrossRefGoogle Scholar
  64. 64.
    Svedberg T, Fahraeus R (1926) A new method for the determination of the molecular weights of proteins. J Am Chem Soc 48:430–438 CrossRefGoogle Scholar
  65. 65.
    Tompa P (2002) Instrinsically unstructured proteins. Trends Biochem Sci 27:527–533 CrossRefGoogle Scholar
  66. 66.
    Urfer R, Kirschner K (1992) The importance of surface loops for stabilizing an eightfold beta alpha barrel protein. Protein Sci 1:31–45 CrossRefGoogle Scholar
  67. 67.
    Uhlenbeck OC (1995) Keeping RNA happy. RNA 1:4–6 MathSciNetGoogle Scholar
  68. 68.
    van Holde KE (2003) Reflections on a century of protein chemistry. Biophys Chem 100:71–79 CrossRefGoogle Scholar
  69. 69.
    Weissmann C (2004) The state of proin. Nat Rev Microbiol 2:861–871 CrossRefGoogle Scholar
  70. 70.
    Wilson DS, Szostak JW (1999) In vitro selection of functional nucleic acids. Annu Rev Biochem 68:611–647 CrossRefGoogle Scholar
  71. 71.
    Woese CR (2000) Interpreting the universal phylogenetic tree. Proc Natl Acad Sci 97:8392–8396 CrossRefGoogle Scholar
  72. 72.
    Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–149 CrossRefGoogle Scholar
  73. 73.
    Zuker M (2003) mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415 CrossRefGoogle Scholar
  74. 74.
    Zukerkandl E, Pauling L (1965) Evolutionary divergence and convergence in proteins. In: Bryson V, Vogel H (eds) Evolving genes are proteins. Academic Press, New York Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Erik A. Schultes
    • 1
    Email author
  • Peter T. Hraber
    • 2
  • Thomas H. LaBean
    • 1
  1. 1.Department of Computer ScienceDuke UniversityDurhamUSA
  2. 2.Theoretical Biology & Biophysics GroupLos Alamos National LaboratoryLos AlamosUSA

Personalised recommendations