JEvTrace: refinement and variations of the evolutionary trace in JAVA

Joachimiak, Marcin P; Cohen, Fred E

doi:10.1186/gb-2002-3-12-research0077

JEvTrace: refinement and variations of the evolutionary trace in JAVA

Research
Published: 26 November 2002

Volume 3, article number research0077.1, (2002)
Cite this article

Genome Biology Aims and scope Submit manuscript

Marcin P Joachimiak^1,2 &
Fred E Cohen^1,2

13k Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

Background

Details of functional speciation within gene families can be difficult to identify using standard multiple sequence alignment (MSA) methods. The evolutionary trace (ET) was developed as a visualization tool to combine MSA, phylogenetic and structural data for identification of functional sites in proteins. The method has been successful in extracting evolutionary details of functional surfaces in a number of biological systems and modifications of the method are useful in creating hypotheses about the function of previously unannotated genes. We wish to facilitate the graphical interpretation of disparate data types through the creation of flexible software implementations.

Results

We have implemented the ET method in a JAVA graphical interface, JEvTrace. Users can analyze and visualize ET input and output with respect to protein phylogeny, sequence and structure. Function discovery with JEvTrace is demonstrated on two proteins with recently determined crystal structures: YlxR from Streptococcus pneumoniae with a predicted RNA-binding function, and a Haemophilus influenzae protein of unknown function, YbaK. To facilitate analysis and storage of results we propose a MSA coloring data structure. The sequence coloring format readily captures evolutionary, biological, functional and structural features of MSAs.

Conclusions

Protein families and phylogeny represent complex data with statistical outliers and special cases. The JEvTrace implementation of the ET method allows detailed mining and graphical visualization of evolutionary sequence relationships.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ChromaClade: combined visualisation of phylogenetic and sequence data

Article Open access 15 October 2019

Christopher Monit, Richard A. Goldstein & Greg J. Towers

JPhyloIO: a Java library for event-based reading and writing of different phylogenetic file formats through a common interface

Article Open access 22 July 2019

Ben C. Stöver, Sarah Wiechers & Kai F. Müller

Evolutionary Sequence Analysis and Visualization with Wasabi

References

Mewes HW, Albermann K, Heumann K, Liebl S, Pfeiffer F: MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res. 1997, 25: 28-30. 10.1093/nar/25.1.28.
Article PubMed CAS PubMed Central Google Scholar
Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002, 30: 31-34. 10.1093/nar/30.1.31.
Article PubMed CAS PubMed Central Google Scholar
Gerlt JA, Babbitt PC: Can sequence determine function?. Genome Biol. 2000, 1: reviews0005.1-0005.10. 10.1186/gb-2000-1-5-reviews0005.
Article Google Scholar
Brenner SE: Errors in genome annotation. Trends Genet. 1999, 15: 132-133. 10.1016/S0168-9525(99)01706-0.
Article PubMed CAS Google Scholar
Pearl F, Todd AE, Bray JE, Martin AC, Salamov AA, Suwa M, Swindells MB, Thornton JM, Orengo CA: Using the CATH domain database to assign structures and functions to the genome sequences. Biochem Soc Trans. 2000, 28: 269-275.
Article PubMed CAS Google Scholar
Jones DT, Tress M, Bryson K, Hadley C: Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins. 1999, Suppl 3: 104-111. 10.1002/(SICI)1097-0134(1999)37:3+<104::AID-PROT14>3.3.CO;2-G.
Article PubMed CAS Google Scholar
Panchenko A, Marchler-Bauer A, Bryant SH: Threading with explicit models for evolutionary conservation of structure and sequence. Proteins. 1999, Suppl13: 133-140. 10.1002/(SICI)1097-0134(1999)37:3+<133::AID-PROT18>3.3.CO;2-4.
Article Google Scholar
Russell RB, Sasieni PD, Sternberg MJ: Supersites within super-folds. Binding site similarity in the absence of homology. J Mol Biol. 1998, 282: 903-918. 10.1006/jmbi.1998.2043.
Article PubMed CAS Google Scholar
Lichtarge O, Bourne HR, Cohen FE: Evolutionarily conserved Gαβγ binding surfaces support a model of the G protein-receptor complex. Proc Natl Acad Sci USA. 1996, 93: 7507-7511. 10.1073/pnas.93.15.7507.
Article PubMed CAS PubMed Central Google Scholar
Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D: A combined algorithm for genome-wide prediction of protein function. Nature. 1999, 402: 83-86. 10.1038/47048.
Article PubMed CAS Google Scholar
Kolesov G, Mewes HW, Frishman D: SNAPping up functionally related genes based on context information: a colinearity-free approach. J Mol Biol. 2001, 311: 639-656. 10.1006/jmbi.2001.4701.
Article PubMed CAS Google Scholar
Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast. 2001, 18: 523-531. 10.1002/yea.706.abs.
Article PubMed CAS Google Scholar
Di Gennaro JA, Siew N, Hoffman BT, Zhang L, Skolnick J, Neilson LI, Fetrow JS: Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol. 2001, 134: 232-245. 10.1006/jsbi.2001.4391.
Article PubMed CAS Google Scholar
Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996, 257: 342-358. 10.1006/jmbi.1996.0167.
Article PubMed CAS Google Scholar
Du P, Alkorta I: Sequence divergence analysis for the prediction of seven-helix membrane protein structures: I. Comparison with bacteriorhodopsin. Protein Eng. 1994, 7: 1221-1229.
Article PubMed CAS Google Scholar
Landgraf R, Fischer D, Eisenberg D: Analysis of heregulin symmetry by weighted evolutionary tracing. Protein Eng. 1999, 12: 943-951. 10.1093/protein/12.11.943.
Article PubMed CAS Google Scholar
Innis CA, Shi J, Blundell TL: Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. Protein Eng. 2000, 13: 839-847. 10.1093/protein/13.12.839.
Article PubMed CAS Google Scholar
Aloy P, Querol E, Aviles FX, Sternberg MJ: Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J Mol Biol. 2001, 311: 395-408. 10.1006/jmbi.2001.4870.
Article PubMed CAS Google Scholar
Wells JA: Systematic mutational analyses of protein-protein interfaces. Methods Enzymol. 1991, 202: 390-411.
Article PubMed CAS Google Scholar
Taylor WR: Residual colours: a proposal for aminochromography. Protein Eng. 1997, 10: 743-746. 10.1093/protein/10.7.743.
Article PubMed CAS Google Scholar
Osipiuk J, Gornicki P, Maj L, Dementieva I, Laskowski R, Joachimiak A: Streptococcus pneumoniae YlxR at 1.35 Å shows a putative new fold. Acta Crystallogr D Biol Crystallogr. 2001, 57: 1747-1751. 10.1107/S0907444901014019.
Article PubMed CAS PubMed Central Google Scholar
Grill S, Moll I, Hasenohrl D, Gualerzi CO, Blasi U: Modulation of ribosomal recruitment to 5'-terminal start codons by translation initiation factors IF2 and IF3. FEBS Lett. 2001, 495: 167-171. 10.1016/S0014-5793(01)02378-X.
Article PubMed CAS Google Scholar
Bae W, Xia B, Inouye M, Severinov K: Escherichia coli CspA-family RNA chaperones are transcription antiterminators. Proc Natl Acad Sci USA. 2000, 97: 7784-7789. 10.1073/pnas.97.14.7784.
Article PubMed CAS PubMed Central Google Scholar
Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001, 29: 22-28. 10.1093/nar/29.1.22.
Article PubMed CAS PubMed Central Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Article PubMed CAS PubMed Central Google Scholar
Zhang H, Huang K, Li Z, Banerjei L, Fisher KE, Grishin NV, Eisenstein E, Herzberg O: Crystal structure of YbaK protein from Haemophilus influenzae (HI1434) at 1.8 Å resolution: functional implications. Proteins. 2000, 40: 86-97. 10.1002/(SICI)1097-0134(20000701)40:1<86::AID-PROT100>3.0.CO;2-Y.
Article PubMed CAS Google Scholar
Burns DM, Beacham IR: Identification and sequence analysis of a silent gene (ushA0) in Salmonella typhimurium. J Mol Biol. 1986, 192: 163-175.
Article PubMed CAS Google Scholar
Bensing BA, Dunny GM: Cloning and molecular analysis of genes affecting expression of binding substance, the recipient-encoded receptor(s) mediating mating aggregate formation in Enterococcus faecalis. J Bacteriol. 1993, 175: 7421-7429.
PubMed CAS PubMed Central Google Scholar
Varani L, Gunderson SI, Mattaj IW, Kay LE, Neuhaus D, Varani G: The NMR structure of the 38 kDa U1A protein - PIE RNA complex reveals the basis of cooperativity in regulation of polyadenylation by human U1A protein. Nat Struct Biol. 2000, 7: 329-335. 10.1038/74101.
Article PubMed CAS Google Scholar
Feng W, Tejero R, Zimmerman DE, Inouye M, Montelione GT: Solution NMR structure and backbone dynamics of the major cold-shock protein (CspA) from Escherichia coli: evidence for conformational dynamics in the single-stranded RNA-binding site. Biochemistry. 1998, 37: 10881-10896. 10.1021/bi980269j.
Article PubMed CAS Google Scholar
Markus MA, Hinck AP, Huang S, Draper DE, Torchia DA: High resolution solution structure of ribosomal protein L11-C76, a helical protein with a flexible loop that becomes structured upon binding to RNA. Nat Struct Biol. 1997, 4: 70-77.
Article PubMed CAS Google Scholar
GRASP: Graphical Representation and Analysis of Structural Properties. [http://btcpxx.che.uni-bayreuth.de/COMPUTER/Software/GRASP/]
Bogan AA, Thorn KS: Anatomy of hot spots in protein interfaces. J Mol Biol. 1998, 280: 1-9. 10.1006/jmbi.1998.1843.
Article PubMed CAS Google Scholar
Thorn KS, Bogan AA: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics. 2001, 17: 284-285. 10.1093/bioinformatics/17.3.284.
Article PubMed CAS Google Scholar
Walther D: WebMol-a Java-based PDB viewer. Trends Biochem Sci. 1997, 22: 274-275. 10.1016/S0968-0004(97)89047-0.
Article PubMed CAS Google Scholar
Joachimiak MP, Chang C, Rosenthal PJ, Cohen FE: The impact of whole genome sequence data on drug discovery - a malaria case study. Mol Med. 2001, 7: 698-710.
PubMed CAS PubMed Central Google Scholar
Wilson CA, Kreychman J, Gerstein M: Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol. 2000, 297: 233-249. 10.1006/jmbi.2000.3550.
Article PubMed CAS Google Scholar
Devos D, Valencia A: Practical limits of function prediction. Proteins. 2000, 41: 98-107. 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.3.CO;2-J.
Article PubMed CAS Google Scholar
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.
Article PubMed CAS PubMed Central Google Scholar
Devereux J, Haeberli P, Smithies O: A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984, 12: 387-395.
Article PubMed CAS PubMed Central Google Scholar
Feng DF, Doolittle RF: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987, 25: 351-360.
Article PubMed CAS Google Scholar
Feng DF, Doolittle RF: Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol. 1996, 266: 368-382.
Article PubMed CAS Google Scholar
Higgins DG, Sharp PM: Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci. 1989, 5: 151-153.
PubMed CAS Google Scholar
Rogers JS, Swofford DL: Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Mol Biol Evol. 1999, 16: 1079-1085.
Article PubMed CAS Google Scholar
Protein sequence and structure utilities - ACCESS. [http://www.cmpharm.ucsf.edu/~srp/utils.html]
Lee B, Richards FM: The interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971, 55: 379-400.
Article PubMed CAS Google Scholar
Defay TR, Cohen FE: Multiple sequence information for threading algorithms. J Mol Biol. 1996, 262: 314-323. 10.1006/jmbi.1996.0515.
Article PubMed CAS Google Scholar
Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.
Google Scholar
Huang CC, Couch GS, Pettersen EF, Ferrin TE: Chimera: an extensible molecular modeling application constructed using standard components. Pac Symp Biocomput. 1996, 1: 724-
Google Scholar
Creighton TE: Proteins: Structures and Molecular Properties. 1992, New York: WH Freeman
Google Scholar
Karplus PA: Hydrophobicity regained. Protein Sci. 1997, 6: 1302-1307.
Article PubMed CAS PubMed Central Google Scholar
SCF sequence coloring format description and source code download. [http://www.cmpharm.ucsf.edu/~marcinj/SCF/]
Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M: The protein data bank: a computer-based archival file for macromolecular structures. Arch Biochem Biophys. 1978, 185: 584-591.
Article PubMed CAS Google Scholar
JEvTrace manual and executable JAVA package download. [http://www.cmpharm.ucsf.edu/~marcinj/JEvTrace/]
Cho SJ, Lee MG, Yang JK, Lee JY, Song HK, Suh SW: Crystal structure of Escherichia coli CyaY protein reveals a previously unidentified fold for the evolutionarily conserved frataxin family. Proc Natl Acad Sci USA. 2000, 97: 8932-8937. 10.1073/pnas.160270897.
Article PubMed CAS PubMed Central Google Scholar
Sanner MF, Olson AJ, Spehner JC: Reduced surface: an efficient way to compute molecular surfaces. Biopolymers. 1996, 38: 305-320. 10.1002/(SICI)1097-0282(199603)38:3<305::AID-BIP4>3.3.CO;2-8.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

We are deeply grateful for the help of Dietlind Gerloff, Dirk Walther, Jonathan Blake, John-Marc Chandonia, Wally Novak, Anthony Lau and Chern-Sing Goh during the development of the application. Anthony Lau and Elaine Meng provided invaluable comments on the manuscript.

Author information

Authors and Affiliations

Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA, 94143-0450, USA
Marcin P Joachimiak & Fred E Cohen
Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, 94143-0450, USA
Marcin P Joachimiak & Fred E Cohen

Authors

Marcin P Joachimiak
View author publications
You can also search for this author in PubMed Google Scholar
Fred E Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fred E Cohen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Joachimiak, M.P., Cohen, F.E. JEvTrace: refinement and variations of the evolutionary trace in JAVA. Genome Biol 3, research0077.1 (2002). https://doi.org/10.1186/gb-2002-3-12-research0077

Download citation

Received: 24 April 2002
Revised: 11 July 2002
Accepted: 21 October 2002
Published: 26 November 2002
DOI: https://doi.org/10.1186/gb-2002-3-12-research0077

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

JEvTrace: refinement and variations of the evolutionary trace in JAVA