No Molecule Is an Island: Molecular Evolution and the Study of Sequence Space

Schultes, Erik A.; Hraber, Peter T.; LaBean, Thomas H.

doi:10.1007/978-3-540-88869-7_34

Erik A. Schultes⁶,
Peter T. Hraber⁷ &
Thomas H. LaBean⁶

Part of the book series: Natural Computing Series ((NCS))

1254 Accesses
1 Citations

Abstract

Our knowledge of nucleic acid and protein structure comes almost exclusively from biological sequences isolated from nature. The ability to synthesize arbitrary sequences of DNA, RNA, and protein in vitro gives us experimental access to the much larger space of sequence possibilities that have not been instantiated in the course of evolution. In principle, this technology promises to both broaden and deepen our understanding of macromolecules, their evolution, and our ability to engineer new and complex functionality. Yet, it has long been assumed that the large number of sequence possibilities and the complexity of the sequence-to-structure relationship preempts any systematic analysis of sequence space. Here, we review recent efforts demonstrating that, with judicious employment of both formal and empirical constraints, it is possible to exploit intrinsic symmetries and correlations in sequence space, enabling coordination, projection, and navigation of the sea of sequence possibilities. These constraints not only make it possible to map the distributions of evolved sequences in the context of sequence space, but they also permit properties intrinsic to sequence space to be mapped by sampling tractable numbers of randomly generated sequences. Such maps suggest entirely new ways of looking at evolution, define new classes of experiments using randomly generated sequences and hold deep implications for the origin and evolution of macromolecular systems. We call this promising new direction sequenomics—the systematic study of sequence space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181:223–230
Article Google Scholar
Armstrong KA, Tidor B (2008) Computationally mapping sequence space to understand evolutionary protein engineering. Biotechnol Prog 24:62–73
Article Google Scholar
Beadle GW, Tatum EL (1941) Genetic control of biochemical reactions in neurospora. Proc Natl Acad Sci 27:499–506
Article Google Scholar
Bloomfield VA, Crothers DM, Tinoco I (2000) Nucleic acids: structures, properties, and functions. University Science Books, Sausalito
Google Scholar
Breaker RR (2004) Natural and engineered nucleic acids as tool to explore biology. Nature 432:838–844
Article Google Scholar
Carothers JM, Oestreich SC, Davis JH, Szostak JW (2004) Information complexity and functional activity of RNA structure. J Am Chem Soc 126:5130–5137
Article Google Scholar
Chargaff E (1950) Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia 6:201–209
Article Google Scholar
Chiarabellia C et al. (2001) Investigation of de novo totally random biosequences, part II: on the folding frequency in a totally random library of de novo proteins obtained by phage display. Chem Biodivers 3:840–859
Article Google Scholar
Creighton TE (1993) Proteins: structures and molecular properties, 2nd edn. Freeman, New York, pp 172–173
Google Scholar
Curtis EA, Bartel DP (2005) New catalytic structures from an existing ribozyme. Nat Struct Mol Biol 12:994–1000
Google Scholar
Davidson AR, Sauer RT (1994) Folded proteins occur frequently in libraries of random amino acid sequences. Proc Natl Acad Sci 91:2146–2150
Article Google Scholar
Davidson AR, Lumb KJ, Sauer RT (1995) Cooperatively folded proteins in random sequence libraries. Nat Struct Biol 2:856–864
Article Google Scholar
Doherty EA et al. (2001) A universal mode of helix packing in RNA. Nat Struct Biol 8:339–343
Article Google Scholar
Doi N, Kakukawa K, Oishi Y, Yanagawa H (2004) High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Prot Eng Des Sel 18:279–284
Article Google Scholar
Draper DE (1992) The RNA-folding problem. Acc Chem Res 25:201–207
Article Google Scholar
Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Biol 6:197–207
Article Google Scholar
Fontana W, Schuster P (1998) Continuity in evolution: on the nature of transitions. Science 280:1451–1455
Article Google Scholar
Frauenfelder H, Wolynes PG (1994) Biomolecules: where the physics of complexity and simplicity meet. Phys Today 47:58–64
Article Google Scholar
Gan HH, Pasquali S, Schlick T (2003) Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic Acids Res 31:2926–2943
Article Google Scholar
Green DW, Ingram VM, Perutz MF (1953) The structure of hemoglobin, IV: sign determination by isomorphus replacement method. Proc R Soc Lond A 255:287–307
Google Scholar
Gould SJ, Lewontin RC (1979) The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationalist programme. Proc R Soc Lond B 205:581–598
Article Google Scholar
Grüner W et al. (1996a) Analysis of RNA sequence structure maps by exhaustive enumeration, I: Neutral networks. Mon Chem 127:355–374
Article Google Scholar
Grüner W et al. (1996b) Analysis of RNA sequence structure maps by exhaustive enumeration, II: Structures of neutral networks and shape space covering. Mon Chem 127:375–389
Article Google Scholar
Guo F, Cech TR (2002) Evolution of tetrahymena ribozyme mutants with increased structural stability. Nat Struct Biol 9:855–861
Google Scholar
Hecker R et al. (1988) Analysis of RNA structure by temperature-gradient gel electrophoresis:viroid replication and processing. Gene 72:59–74
Article Google Scholar
Held DM et al. (2003) Evolutionary landscapes for the acquisition of new ligand recognition by RNA aptamers. J Mol Evol 57:299–308
Article Google Scholar
Huang Z, Szostak JW (2003) Evolution of aptamers with a new specificity and new secondary structure from ATP aptamers. RNA 9:1456–1463
Article Google Scholar
Kendrew JC, Bode G, Dintzis HM, Parrish RC, Wykoff H (1958) A three-dimensional model of the myoglobin molecule obtained by X-ray analysis. Nature 181:660–662
Article Google Scholar
Kimura M (1968) Evolutionary rate at the molecular level. Nature 217:624–626
Article Google Scholar
King JL, Jukes TH (1969) Non-Darwinian evolution. Science 164:788–798
Article Google Scholar
Kauffman SA (1993) The origins of order: self-organization and selection in evolution. Oxford University Press, New York
Google Scholar
Kim N, Shin JS, Elmetwaly S, Gan HH, Schlick T (2007) RAGPOOLS: RNA-as-graph-pools a web server for assisting the design of structured RNA pools for in vitro selection. Bioinformatics. doi:10.1093/bioinformatics/btm439
Google Scholar
Knight R et al. (2005) Abundance of correctly folded RNA motifs in sequence space, calculated on computational grids. Nucleic Acids Res 33:6671–6671
Article Google Scholar
Lambros RJ, Mortimer JR, Forsdyke DR (2003) Optimum growth temperature and the base composition of open reading frames in prokaryotes. Extremophiles 7:443–450
Article Google Scholar
LaBean TH, Kayffman SA (1993) Design of synthetic gene libraries encoding random sequence proteins with desired ensemble characteristics. Protein Sci 2:1249–1254
Article Google Scholar
LaBean TH, Kauffman SA, Butt TR (1995) Libraries of random-sequence polypeptides produced with high yield as carboxy-terminal fusions with ubiquitin. Mol Divers 1:29–38
Article Google Scholar
LaBean TH, Schultes EA, Butt TR, Kauffman SA (2009) Protein folding absent selection (submitted)
Google Scholar
Leontis N et al. (2006) The RNA ontology consortium: an open invitation to the RNA community. RNA 12:533–541
Article Google Scholar
Levinthal C (1968) Are there pathways for protein folding? Extrait J Chim Phys 65:44–45
Google Scholar
Levinthal C (1969) How to fold graciously. In: DeBrunner JTP, Munck E (eds) Mossbauer spectroscopy in biological systems: proceedings of a meeting held at Allerton House, Monticello, IL. University of Illinois Press, Champaign, pp 22–24
Google Scholar
Liu X, Fan K, Wang W (2004) The number of protein folds and their distribution over families in nature. Proteins 54:491–499
Article Google Scholar
Lisacek F, Diaz Y, Michel F (1994) Automatic identification of group I introns cores in genomic DNA sequences. J Mol Biol 235:1206–1217
Article Google Scholar
Mathews DH, Sabina J, Zuker M, Turner DH (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288:911–940
Article Google Scholar
Meier S, Özbek S (2007) A biological cosmos of parallel universes: does protein structural plasticity facilitate evolution? BioEssays 29:1095–1104
Article Google Scholar
Mirsky AE, Pauling L (1936) On the structure of native, denatured, and coagulated proteins. Proc Natl Acad Sci 22:439–447
Article Google Scholar
Nissen P et al. (2001) RNA tertiary interactions in the large ribosomal subunit: the A-minor motif. Proc Natl Acad Sci 98:4899–4903
Article Google Scholar
Pinker RJ, Lin L, Rose GD, Kallenbach NR (1993) Effects of alanine substitutions in alpha-helices of sperm whale myoglobin on protein stability. Protein Sci 2:1099–1105
Article Google Scholar
Prijambada ID et al. (1996) Solubility of artificial proteins with random sequences. FEBS Lett 382:21–25
Article Google Scholar
Ptitsyn OB (1995) Molten globule and protein folding. Adv Protein Chem 47:83–229
Article Google Scholar
Quastler H (1964) The emergence of biological organization. Yale University Press, New Haven
Google Scholar
RajBhandary UL, Kohrer C (2006) Early days of tRNA research: discovery, function, purification and sequence analysis. J Biosci 31:439–451
Article Google Scholar
Reidys CM, Stadler PF, Schuster P (1997) Generic properties of combinatory maps: neural networks of RNA secondary structures. Bull Math Biol 59:339–397
Article MATH Google Scholar
Rucker AL, Creamer TP (2002) Polyproline II helical structure in protein unfolded states: lysine peptides revisited. Protein Sci 11:980–985
Google Scholar
Salisbury FB (1969) Natural selection and the complexity of the gene. Nature 224:342–343
Article Google Scholar
Sanger F (1952) The arrangement of amino acids in proteins. Adv Protein Chem 7:1–69
Article Google Scholar
Schultes EA, Spasic A, Mohanty U, Bartel DP (2005) Compact and ordered collapse in randomly generated RNA sequences. Nat Struct Mol Biol 12:1130–1136
Article Google Scholar
Schultes EA, Bartel DP (2000) One sequence, two ribozymes: implications for the emergence of new ribozyme folds. Science 289:448–452
Article Google Scholar
Schultes E, Hraber PT, LaBean TH (1999a) A parameterization of RNA sequence space. Complexity 4:61–71
Article MathSciNet Google Scholar
Schultes EA, Hraber PT, LaBean TH (1999b) Estimating the contributions of selection and self-organization in RNA secondary structures. J Mol Evol 49:76–83
Article Google Scholar
Schultes E, Hraber PT, LaBean TH (1997) Global similarities in nucleotide base composition among disparate functional classes of single-stranded RNA imply adaptive evolutionary convergence. RNA 3:792–806
Google Scholar
Smit S, Yarus MY, Knight R (2006) Natural selection is not required to explain universal compositional patterns in rRNA secondary structure categories. RNA-A Publ RNA Soc 12:1–14
Google Scholar
Smith JM (1970) Natural selection and the concept of protein space. Nature 225:563–564
Article Google Scholar
Sondek J, Shortle D (1990) Accommodation of single amino acid insertions by the native state of staphylococcal nuclease. Proteins 7:299–305
Article Google Scholar
Svedberg T, Fahraeus R (1926) A new method for the determination of the molecular weights of proteins. J Am Chem Soc 48:430–438
Article Google Scholar
Tompa P (2002) Instrinsically unstructured proteins. Trends Biochem Sci 27:527–533
Article Google Scholar
Urfer R, Kirschner K (1992) The importance of surface loops for stabilizing an eightfold beta alpha barrel protein. Protein Sci 1:31–45
Article Google Scholar
Uhlenbeck OC (1995) Keeping RNA happy. RNA 1:4–6
MathSciNet Google Scholar
van Holde KE (2003) Reflections on a century of protein chemistry. Biophys Chem 100:71–79
Article Google Scholar
Weissmann C (2004) The state of proin. Nat Rev Microbiol 2:861–871
Article Google Scholar
Wilson DS, Szostak JW (1999) In vitro selection of functional nucleic acids. Annu Rev Biochem 68:611–647
Article Google Scholar
Woese CR (2000) Interpreting the universal phylogenetic tree. Proc Natl Acad Sci 97:8392–8396
Article Google Scholar
Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–149
Article Google Scholar
Zuker M (2003) mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415
Article Google Scholar
Zukerkandl E, Pauling L (1965) Evolutionary divergence and convergence in proteins. In: Bryson V, Vogel H (eds) Evolving genes are proteins. Academic Press, New York
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Duke University, Durham, NC, 27708, USA
Erik A. Schultes & Thomas H. LaBean
Theoretical Biology & Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
Peter T. Hraber

Authors

Erik A. Schultes
View author publications
You can also search for this author in PubMed Google Scholar
Peter T. Hraber
View author publications
You can also search for this author in PubMed Google Scholar
Thomas H. LaBean
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik A. Schultes .

Editor information

Editors and Affiliations

Dept. Computer Science, University of British Columbia, Main Mall 201-2366, Vancouver, V6T 1Z4, Canada
Anne Condon
Dept. Applied Mathematics, Weizmann Institute of Science, Rehovot, 76100, Israel
David Harel
Leiden Inst. Advanced Computer Science, Leiden University, Niels Bohrweg 1, Leiden, 2333 CA, Netherlands
Joost N. Kok
Turku Centre for Computer Science, Lemminkaisenkatu 14 A, Turku, 20520, Finland
Arto Salomaa
Computer Science, Computation,, California Inst. of Technology, Pasadena, 91125, U.S.A.
Erik Winfree

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schultes, E.A., Hraber, P.T., LaBean, T.H. (2009). No Molecule Is an Island: Molecular Evolution and the Study of Sequence Space. In: Condon, A., Harel, D., Kok, J., Salomaa, A., Winfree, E. (eds) Algorithmic Bioprocesses. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88869-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-540-88869-7_34
Published: 13 August 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88868-0
Online ISBN: 978-3-540-88869-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics