The Protein Universes

Rackovsky, S.

doi:10.1007/0-387-33916-7_11

S. Rackovsky²

Part of the book series: Molecular Biology Intelligence Unit ((MBIU))

948 Accesses

Abstract

We discuss some informatic problems in protein classification. We first address a neglected problem in sequence classification-information loss resulting from alphabet contraction. Since the use of reduced alphabets is a standard bioinformatic tool, this is a significant issue. We review recent work in which it was shown that information theoretic methods can be used to quantitate the amount of structural information carried by a specified sequence representation. These tools are then used to construct reduced alphabets of specified size which retain the maximum possible amount of structural information. We then turn to structure classification. After briefly reviewing previous work in this field, we discuss the fact that sequence and structure classification give different pictures of the protein space. We outline ongoing research in which new parameters are sought which explicitly encode architecture choice by protein sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rackovsky S. “Hidden” Sequence periodicities and protein Architecture. Proc Nat Acad Sci USA 1998;95:8580–8584.
Article ADS Google Scholar
M Gerstein. A structural census of genomes: Comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J Mol Biol 1997;274:562–576.
Article Google Scholar
Rackovsky S. Quantitative organization of the known protein X-ray structures. I. Methods and short length-scale results. Proteins: Structure, Function and Genetics 1990;7:378–402.
Article Google Scholar
Yee DP, Dill KA. Families and the structural relatedness among globular proteins. Prot Sci 1993;2:884–899.
Article Google Scholar
HOU J, Sims GE, Shang C et al. A global representation of the protein fold space. Proc Nat Acad Sci USA 2003;100:2386–2390.
Article ADS Google Scholar
Holm L, Sander C. Dali/FSSP classification of three-dimensional protein folds. Nucleic Acid Res 1997;25:231–234.
Article Google Scholar
Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence database. Science 1992;256:1443–1445.
Article ADS Google Scholar
Linial M, Linial N, Tishby N et al. Global self-organization of all known protein sequences reveals inherent biological signatures. J Mol Biol 1997;268:539–556.
Article Google Scholar
Gracy J, Argos P. Automated protein sequence database classification. I. Integration of compositional similarity search, Local similarity search, and multiple sequence alignment. Bioinformatics 1998;14:164–173.
Article Google Scholar
Wang H-C, Dopazo J, De La Fraga LG et al. Self-organizing tree-growing network for the classification of protein sequences. Prot Sci 1998;7:2613–2622.
Article Google Scholar
Yona G, Linial N, Linial M. Proto Map: Automatic classification of protein sequences, A heirarchy of protein families, and local maps of the protein space. Proteins: Structure Function and Genetics 1999;37:360–378.
Article Google Scholar
Dokholyan NV, Shakhnovich B, Shakhnovich EI. Expanding protein universe and its origin from the biological big bang. Proc Nat Acad Sci USA 2002;99:14132–14136.
Article ADS Google Scholar
Albert R, Barabási A-L. Statistical mechanics of complex networks. Rev Mod Phys 2002;74:47–97.
Article ADS Google Scholar
Myers EW. Seeing conserved signals: Using algorithms to detect similarities between biosequences. In: Lander ES, Waterman MS, eds. Calculating the Secrets of Life. Washington, DC: National Academy Press, 1995.
Google Scholar
Barton GJ. Protein sequence alignment techniques. Acta Cryst 1998;D54:1139–1146.
Google Scholar
Smith TF. The art of matchmaking: Sequence alignment methods and their structural implications. Structure 1999;7:R7–R12.
Article Google Scholar
Dayhoff MO, Eck RV. Atlas of Protein Sequence and Structure. Silver Spring, MD: NBRF Press, 1996:2.
Google Scholar
Henikoff S, Henikoff J. Amino acid substitution matrices from protein blocks. Proc Nat Acad Sci USA 1992;89:10915–10919.
Article ADS Google Scholar
Altschul SF. A protein alignment scoring system sensitive at all evolutionary distances. J Mol Evol 1993;36:290–300.
Article Google Scholar
Naor D, Fischer D, Jernigan RL et al. Amino acid pair interchanges at spatially conserved locations. J Mol Biol 1996;256:924–938.
Article Google Scholar
Russell RB, Saqi MAS, Sayle RA et al. Recognition of analogous and homologous protein folds: Analysis of sequence and structure conservation. J Mol Biol 1997;269:423–439.
Article Google Scholar
Johnson MS, Overington JP. A structural basis for sequence comparison: An evaluation of scoring methodologies. J Mol Biol 1993;233:716–738.
Article Google Scholar
Prlic A, Domingues FS, Sippl MJ. Structurederived substitution matrices for alignment of distantly related sequences. Protein Engineering 2000;13:545–550.
Article Google Scholar
Blake JD, Cohen FE. Pairwise sequence alignment below the twilight zone. J Mol Biol 2001;307:721–735.
Article Google Scholar
Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Evol 1981;147:195–197.
Google Scholar
Altschul SF. Generalized affine gap costs for protein sequence alignment. Proteins: Structure Function and Genetics 1998;32:88–96.
Article Google Scholar
Argos P, Vingron M, Vogt G. Protein sequence comparisons: Methods and significance. Protein Eng 1991;4:375–383.
Article Google Scholar
Saqi M, Sternberg M. A simple method to generate Nontrivial alternate alignments of protein sequences. J Mol Biol 1991;219:727–732.
Article Google Scholar
Zuker M. Suboptimal sequence alignment in molecular biology: Alignment with error analysis. J Mol Biol 1991;221:403–420.
Article Google Scholar
Agarwal P, States D. A bayesian evolutionary distance for parametrically aligned sequences. J Comput Biol 1996;3:1–17.
Article Google Scholar
Vingron M. Near-optimal sequence alignment. Curr Opin in Struct Biol 1996;6:346–352.
Article Google Scholar
Horowitz E, Sahni S. Fundamentals of Computer Algorithms. New York, NY: Computer Science Press, 1978:198–247.
MATH Google Scholar
Pearson W, Lipman D. Improved tools for biological sequence comparison. Proc Nat Acad Sci USA 1988;85:2444–2448.
Article ADS Google Scholar
Altschul S, Gish W, Miller W et al. Basic local alignment search tool. J Mol Biol 1990;215:403–410.
Google Scholar
Krogh A, Brown M, Mian J et al. Hidden markov models in computational biology: Applications to protein modeling. J Mol Biol 1994;235:1501–1531.
Article Google Scholar
Eddy S. Hidden markov models. Curr Opin Struct Biol 1996;6:361–365.
Article MathSciNet Google Scholar
Bucher P, Hoffman K. A sequence similarity algorithm based on a probabilistic interpretation of an alignment scoring system. In: States D, Gaasterland T, Hunter L, Smith R, eds. ISMB-4. Menlo Park: AAAI Press, 1996.
Google Scholar
Lipman DJ, Altschul SF, Kececioglu J. A tool for multiple sequence alignment. Proc Nat Acad Sci USA 1989;86:4412–4415.
Article ADS Google Scholar
Notredame C, Higgins DG. SAGA: Sequence Alignment by Genetic Algorithm. Nucl Acids Res 1996;24:1515–1524.
Article Google Scholar
Brenner SE, Chothia C, Hubbard TJP. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Nat Acad Sci USA 1998;95:6073–6078.
Article ADS Google Scholar
Sauder JM, Arthur JW, Dunbrack Jr RL. Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins: Structure, Function and Genetics 2000;40:6–22.
Article Google Scholar
Panchenko AR, Bryant SH. A comparison of position-specific score matrices based on sequence and structure alignments. Prot Sci 2002;11:361–370.
Article Google Scholar
Blaisdell BE. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Nat Acad Sci USA 1986;83:5155–5159.
Article MATH ADS Google Scholar
Blaisdell BE. Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a variety of computer-generated model systems. J Mol Evol 1991;32:521–528.
Article Google Scholar
Yona G, Levitt M. A unified sequence-structure classification of protein sequences: Combining sequence and structure in a map of the protein space. Tokyo: Proceedings of the Fourth Annual Conference on Computational Molecular Biology, 2000:308–317.
Google Scholar
Solis AD, Rackovsky S. Optimized representations and maximal information in proteins. Proteins: Structure Function and Genetics 2000;38:149–164.
Article Google Scholar
Solis AD, Rackovsky S. Optimally informative backbone structural propensities in proteins. Proteins: Structure Function and Genetics 2002;48:463–486.
Article Google Scholar
Solis AD. Structural information from local sequence of proteins and DNA. Thesis, Mt. Sinai School of Medicine of New York University 2002;148–191.
Google Scholar
Kuznetsov IB, Solis AD, Rackovsky S. (work in progress).
Google Scholar
Brown NP, Orengo CP, Taylor WR. A protein structure comparison methodology. Computers Chem 1996;20:359–380.
Article Google Scholar
Wallin S, Farwer J, Bastolla U. Testing similarity measures with continuous and discrete protein models. Proteins: Structure Function and Genetics 2003;50:144–157.
Article Google Scholar
Godzik A. The structural alignment between two proteins: Is there a unique answer? Prot Sci 1996;5:1325–1338.
Article Google Scholar
Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. I. On the comparison of polymer conformations. Macromolecules 1978;11:1168–1174.
Article ADS Google Scholar
Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. II. Mathematical considerations and a conformational distance function. Macromolecules 1980;13:1440–1453.
Article ADS Google Scholar
Rackovsky S, Scheraga HA. Intermolecular anti-parallel beta sheet: Comparison of predicted and observed conformations of gramicidin S. Proc Nat Acad Science USA 1980;77:6965–6967.
Article ADS Google Scholar
Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. III. Nearest-neighbor correlations and medium-range structure. Macromolecules 1981;14:1259–1269.
Article ADS Google Scholar
Rackovsky S, Scheraga HA. Differential geometry and polymer conformations. IV. Conformational and nucleation properties of individual amino acids. Macromolecules 1982;15:1340–1346.
Article ADS Google Scholar
Rackovsky S, Scheraga HA. Differential geometry and protein folding. Accounts of Chemical Research 1984;17:209–214.
Article Google Scholar
Rackovsky S, Goldstein DA. Differential geometry and protein conformation. V. Medium-range conformational influence of the individual amino acids. Biopolymers 1987;26:1163–1187.
Article Google Scholar
Rackovsky S, Goldstein DA. Protein comparison and classification: A differential geometric approach. Proc Natl Acad Sci USA 1988;85:777–781.
Article ADS Google Scholar
Pevzner P. Personal communication.
Google Scholar
Rackovsky S. Quantitative classification of the known protein X-ray structures. Polymer Preprints 1990;31:205.
Google Scholar
Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol 1993;223:123–138.
Article Google Scholar
Holm L, Sander C. Mapping the protein universe. Science 1996;273:595–602.
Article ADS Google Scholar
Brenner SE, Chothia C, Hubbard TJP. Population statistics of protein structures: Lessons from structural classifications. Curr Opin Struct Biol 1997; 7:369–376.
Article Google Scholar
Fischer D, Tsai C-J, Nussinov R et al. A 3D sequence-independent representation of the protein data bank. Protein Engineering 1995; 8:981–997.
Article Google Scholar
Leibowitz N, Fligelman Z, Nussinov R et al. Automated multiple structure alignment and detection of a common motif. Proteins: Structure Function and Genetics 2001;43:235–245.
Article Google Scholar
Dror O, Benyamini H, Nussinov R et al. MASS: Multiple structure alignment by secondary structures. Bioinformatics 2003; 19(Suppl.1):i95–i104.
Article Google Scholar
Levitt M, Gerstein M. A unified statistical framework for sequence comparison and structure com-parison. Proc Nat Acad Sci USA 1998;95:5913–5920.
Article ADS Google Scholar
Qian J, Luscombe NM, Gerstein M. Protein family and fold occurrence in genomes: Power-law behaviour and evolutionary model. J Mol Biol 2001;313:673–681.
Article Google Scholar
Kuznetsov VA. In: Zhang W, Shmulevich I, eds. Computational and Statistical Approaches to Genomics. Boston: Kluwer, 2002:125–171.
Google Scholar
Karev GP, Wolf YI, Rzhetsky AY et al. In: Galperin MY, Koonin EV, eds. Amsterdam, Horizon: Computational Genomics From Sequence to Function 2003:261–314.
Google Scholar
Yanai I, Camacho C, DeLisi C. Predictions of gene family distributions in microbial genomes: Evolution by gene duplication and modification. Phys Rev Lett 2000;85:2641–2644.
Article ADS Google Scholar
Kidera A, Konishi Y, Oka M et al. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Prot Chem 1985;4:23–55.
Article Google Scholar
Kidera A, Konishi Y, Ooi T et al. Relation between sequence similarity and structural similarity in proteins. Role of important properties of amino acids. J Prot Chem 1985; 4:265–297.
Article Google Scholar
Rackovsky S. work in progress.
Google Scholar
Yang A-S, Honig B. An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence. J Mol Biol 2000;301:679–689.
Article Google Scholar
Alm E, Baker D. Matching theory and experiment in protein folding. Curr Opin Struct Biol 1999;9:189–196.
Article Google Scholar
Shea JE, Onuchic JN, Brooks IIIrd CL. Exploring the origins of topological frustration: Design of a minimally frustrated model of fragment B of protein A. Proc Nat Acad Sci USA 1999;96:12512–12517.
Article ADS Google Scholar
Onuchic JN, Nymeyer H, Garcia AE et al. The energy landscape theory of protein folding: In-sights in folding mechanism and scenarios. Adv Prot Chem 2000;53:87–152.
Article Google Scholar
Micheletti C, Banavar JR, Maritan A et al. Protein structures and optimal folding from a geometrical variational principle. Phys Rev Lett 1999;82:3372–3375.
Article ADS Google Scholar
Abkevich V, Gutin A, Shakhnovich E. Specific nucleus as the transition state for protein folding: Evidence from the lattice model. Biochemistry 1994;33:10026–10036.
Article Google Scholar
Baldwin RL. Folding concensus? Nature Struct Biol 2001;8:92–94.
Article Google Scholar
Fersht AR. Transition-state structure as a unifying basis in protein-folding mechanisms: Contact order, chain topology, stability, and the extended nucleus mechanism. Proc Natl Acad Sci USA 2000;97:1525–1529.
Article ADS Google Scholar
Burns LL, Dalessio PIM, Ropson IJ. Folding Mechanism of three structurally similar β-Sheet Proteins. PROTEINS: Structure, Function and Genetics 1998;33:107–188.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Pharmacology and Biological Chemistry, and Center for Biomathematics, Mount Sinai School of Medicine of New York University, One Gustave L. Levy Place, New York, New York, 10029, USA
S. Rackovsky

Authors

S. Rackovsky
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rackovsky, S. (2006). The Protein Universes. In: Power Laws, Scale-Free Networks and Genome Biology. Molecular Biology Intelligence Unit. Springer, Boston, MA. https://doi.org/10.1007/0-387-33916-7_11

Download citation

DOI: https://doi.org/10.1007/0-387-33916-7_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-25883-6
Online ISBN: 978-0-387-33916-0
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics