Skip to main content
Log in

JEvTrace: refinement and variations of the evolutionary trace in JAVA

  • Research
  • Published:
Genome Biology Aims and scope Submit manuscript

Abstract

Background

Details of functional speciation within gene families can be difficult to identify using standard multiple sequence alignment (MSA) methods. The evolutionary trace (ET) was developed as a visualization tool to combine MSA, phylogenetic and structural data for identification of functional sites in proteins. The method has been successful in extracting evolutionary details of functional surfaces in a number of biological systems and modifications of the method are useful in creating hypotheses about the function of previously unannotated genes. We wish to facilitate the graphical interpretation of disparate data types through the creation of flexible software implementations.

Results

We have implemented the ET method in a JAVA graphical interface, JEvTrace. Users can analyze and visualize ET input and output with respect to protein phylogeny, sequence and structure. Function discovery with JEvTrace is demonstrated on two proteins with recently determined crystal structures: YlxR from Streptococcus pneumoniae with a predicted RNA-binding function, and a Haemophilus influenzae protein of unknown function, YbaK. To facilitate analysis and storage of results we propose a MSA coloring data structure. The sequence coloring format readily captures evolutionary, biological, functional and structural features of MSAs.

Conclusions

Protein families and phylogeny represent complex data with statistical outliers and special cases. The JEvTrace implementation of the ET method allows detailed mining and graphical visualization of evolutionary sequence relationships.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  1. Mewes HW, Albermann K, Heumann K, Liebl S, Pfeiffer F: MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res. 1997, 25: 28-30. 10.1093/nar/25.1.28.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  2. Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002, 30: 31-34. 10.1093/nar/30.1.31.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  3. Gerlt JA, Babbitt PC: Can sequence determine function?. Genome Biol. 2000, 1: reviews0005.1-0005.10. 10.1186/gb-2000-1-5-reviews0005.

    Article  Google Scholar 

  4. Brenner SE: Errors in genome annotation. Trends Genet. 1999, 15: 132-133. 10.1016/S0168-9525(99)01706-0.

    Article  PubMed  CAS  Google Scholar 

  5. Pearl F, Todd AE, Bray JE, Martin AC, Salamov AA, Suwa M, Swindells MB, Thornton JM, Orengo CA: Using the CATH domain database to assign structures and functions to the genome sequences. Biochem Soc Trans. 2000, 28: 269-275.

    Article  PubMed  CAS  Google Scholar 

  6. Jones DT, Tress M, Bryson K, Hadley C: Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins. 1999, Suppl 3: 104-111. 10.1002/(SICI)1097-0134(1999)37:3+<104::AID-PROT14>3.3.CO;2-G.

    Article  PubMed  CAS  Google Scholar 

  7. Panchenko A, Marchler-Bauer A, Bryant SH: Threading with explicit models for evolutionary conservation of structure and sequence. Proteins. 1999, Suppl13: 133-140. 10.1002/(SICI)1097-0134(1999)37:3+<133::AID-PROT18>3.3.CO;2-4.

    Article  Google Scholar 

  8. Russell RB, Sasieni PD, Sternberg MJ: Supersites within super-folds. Binding site similarity in the absence of homology. J Mol Biol. 1998, 282: 903-918. 10.1006/jmbi.1998.2043.

    Article  PubMed  CAS  Google Scholar 

  9. Lichtarge O, Bourne HR, Cohen FE: Evolutionarily conserved Gαβγ binding surfaces support a model of the G protein-receptor complex. Proc Natl Acad Sci USA. 1996, 93: 7507-7511. 10.1073/pnas.93.15.7507.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  10. Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D: A combined algorithm for genome-wide prediction of protein function. Nature. 1999, 402: 83-86. 10.1038/47048.

    Article  PubMed  CAS  Google Scholar 

  11. Kolesov G, Mewes HW, Frishman D: SNAPping up functionally related genes based on context information: a colinearity-free approach. J Mol Biol. 2001, 311: 639-656. 10.1006/jmbi.2001.4701.

    Article  PubMed  CAS  Google Scholar 

  12. Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast. 2001, 18: 523-531. 10.1002/yea.706.abs.

    Article  PubMed  CAS  Google Scholar 

  13. Di Gennaro JA, Siew N, Hoffman BT, Zhang L, Skolnick J, Neilson LI, Fetrow JS: Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol. 2001, 134: 232-245. 10.1006/jsbi.2001.4391.

    Article  PubMed  CAS  Google Scholar 

  14. Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996, 257: 342-358. 10.1006/jmbi.1996.0167.

    Article  PubMed  CAS  Google Scholar 

  15. Du P, Alkorta I: Sequence divergence analysis for the prediction of seven-helix membrane protein structures: I. Comparison with bacteriorhodopsin. Protein Eng. 1994, 7: 1221-1229.

    Article  PubMed  CAS  Google Scholar 

  16. Landgraf R, Fischer D, Eisenberg D: Analysis of heregulin symmetry by weighted evolutionary tracing. Protein Eng. 1999, 12: 943-951. 10.1093/protein/12.11.943.

    Article  PubMed  CAS  Google Scholar 

  17. Innis CA, Shi J, Blundell TL: Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. Protein Eng. 2000, 13: 839-847. 10.1093/protein/13.12.839.

    Article  PubMed  CAS  Google Scholar 

  18. Aloy P, Querol E, Aviles FX, Sternberg MJ: Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J Mol Biol. 2001, 311: 395-408. 10.1006/jmbi.2001.4870.

    Article  PubMed  CAS  Google Scholar 

  19. Wells JA: Systematic mutational analyses of protein-protein interfaces. Methods Enzymol. 1991, 202: 390-411.

    Article  PubMed  CAS  Google Scholar 

  20. Taylor WR: Residual colours: a proposal for aminochromography. Protein Eng. 1997, 10: 743-746. 10.1093/protein/10.7.743.

    Article  PubMed  CAS  Google Scholar 

  21. Osipiuk J, Gornicki P, Maj L, Dementieva I, Laskowski R, Joachimiak A: Streptococcus pneumoniae YlxR at 1.35 Å shows a putative new fold. Acta Crystallogr D Biol Crystallogr. 2001, 57: 1747-1751. 10.1107/S0907444901014019.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  22. Grill S, Moll I, Hasenohrl D, Gualerzi CO, Blasi U: Modulation of ribosomal recruitment to 5'-terminal start codons by translation initiation factors IF2 and IF3. FEBS Lett. 2001, 495: 167-171. 10.1016/S0014-5793(01)02378-X.

    Article  PubMed  CAS  Google Scholar 

  23. Bae W, Xia B, Inouye M, Severinov K: Escherichia coli CspA-family RNA chaperones are transcription antiterminators. Proc Natl Acad Sci USA. 2000, 97: 7784-7789. 10.1073/pnas.97.14.7784.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  24. Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001, 29: 22-28. 10.1093/nar/29.1.22.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  25. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  26. Zhang H, Huang K, Li Z, Banerjei L, Fisher KE, Grishin NV, Eisenstein E, Herzberg O: Crystal structure of YbaK protein from Haemophilus influenzae (HI1434) at 1.8 Å resolution: functional implications. Proteins. 2000, 40: 86-97. 10.1002/(SICI)1097-0134(20000701)40:1<86::AID-PROT100>3.0.CO;2-Y.

    Article  PubMed  CAS  Google Scholar 

  27. Burns DM, Beacham IR: Identification and sequence analysis of a silent gene (ushA0) in Salmonella typhimurium. J Mol Biol. 1986, 192: 163-175.

    Article  PubMed  CAS  Google Scholar 

  28. Bensing BA, Dunny GM: Cloning and molecular analysis of genes affecting expression of binding substance, the recipient-encoded receptor(s) mediating mating aggregate formation in Enterococcus faecalis. J Bacteriol. 1993, 175: 7421-7429.

    PubMed  CAS  PubMed Central  Google Scholar 

  29. Varani L, Gunderson SI, Mattaj IW, Kay LE, Neuhaus D, Varani G: The NMR structure of the 38 kDa U1A protein - PIE RNA complex reveals the basis of cooperativity in regulation of polyadenylation by human U1A protein. Nat Struct Biol. 2000, 7: 329-335. 10.1038/74101.

    Article  PubMed  CAS  Google Scholar 

  30. Feng W, Tejero R, Zimmerman DE, Inouye M, Montelione GT: Solution NMR structure and backbone dynamics of the major cold-shock protein (CspA) from Escherichia coli: evidence for conformational dynamics in the single-stranded RNA-binding site. Biochemistry. 1998, 37: 10881-10896. 10.1021/bi980269j.

    Article  PubMed  CAS  Google Scholar 

  31. Markus MA, Hinck AP, Huang S, Draper DE, Torchia DA: High resolution solution structure of ribosomal protein L11-C76, a helical protein with a flexible loop that becomes structured upon binding to RNA. Nat Struct Biol. 1997, 4: 70-77.

    Article  PubMed  CAS  Google Scholar 

  32. GRASP: Graphical Representation and Analysis of Structural Properties. [http://btcpxx.che.uni-bayreuth.de/COMPUTER/Software/GRASP/]

  33. Bogan AA, Thorn KS: Anatomy of hot spots in protein interfaces. J Mol Biol. 1998, 280: 1-9. 10.1006/jmbi.1998.1843.

    Article  PubMed  CAS  Google Scholar 

  34. Thorn KS, Bogan AA: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics. 2001, 17: 284-285. 10.1093/bioinformatics/17.3.284.

    Article  PubMed  CAS  Google Scholar 

  35. Walther D: WebMol-a Java-based PDB viewer. Trends Biochem Sci. 1997, 22: 274-275. 10.1016/S0968-0004(97)89047-0.

    Article  PubMed  CAS  Google Scholar 

  36. Joachimiak MP, Chang C, Rosenthal PJ, Cohen FE: The impact of whole genome sequence data on drug discovery - a malaria case study. Mol Med. 2001, 7: 698-710.

    PubMed  CAS  PubMed Central  Google Scholar 

  37. Wilson CA, Kreychman J, Gerstein M: Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol. 2000, 297: 233-249. 10.1006/jmbi.2000.3550.

    Article  PubMed  CAS  Google Scholar 

  38. Devos D, Valencia A: Practical limits of function prediction. Proteins. 2000, 41: 98-107. 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.3.CO;2-J.

    Article  PubMed  CAS  Google Scholar 

  39. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  40. Devereux J, Haeberli P, Smithies O: A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984, 12: 387-395.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  41. Feng DF, Doolittle RF: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987, 25: 351-360.

    Article  PubMed  CAS  Google Scholar 

  42. Feng DF, Doolittle RF: Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol. 1996, 266: 368-382.

    Article  PubMed  CAS  Google Scholar 

  43. Higgins DG, Sharp PM: Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci. 1989, 5: 151-153.

    PubMed  CAS  Google Scholar 

  44. Rogers JS, Swofford DL: Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Mol Biol Evol. 1999, 16: 1079-1085.

    Article  PubMed  CAS  Google Scholar 

  45. Protein sequence and structure utilities - ACCESS. [http://www.cmpharm.ucsf.edu/~srp/utils.html]

  46. Lee B, Richards FM: The interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971, 55: 379-400.

    Article  PubMed  CAS  Google Scholar 

  47. Defay TR, Cohen FE: Multiple sequence information for threading algorithms. J Mol Biol. 1996, 262: 314-323. 10.1006/jmbi.1996.0515.

    Article  PubMed  CAS  Google Scholar 

  48. Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.

    Google Scholar 

  49. Huang CC, Couch GS, Pettersen EF, Ferrin TE: Chimera: an extensible molecular modeling application constructed using standard components. Pac Symp Biocomput. 1996, 1: 724-

    Google Scholar 

  50. Creighton TE: Proteins: Structures and Molecular Properties. 1992, New York: WH Freeman

    Google Scholar 

  51. Karplus PA: Hydrophobicity regained. Protein Sci. 1997, 6: 1302-1307.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  52. SCF sequence coloring format description and source code download. [http://www.cmpharm.ucsf.edu/~marcinj/SCF/]

  53. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M: The protein data bank: a computer-based archival file for macromolecular structures. Arch Biochem Biophys. 1978, 185: 584-591.

    Article  PubMed  CAS  Google Scholar 

  54. JEvTrace manual and executable JAVA package download. [http://www.cmpharm.ucsf.edu/~marcinj/JEvTrace/]

  55. Cho SJ, Lee MG, Yang JK, Lee JY, Song HK, Suh SW: Crystal structure of Escherichia coli CyaY protein reveals a previously unidentified fold for the evolutionarily conserved frataxin family. Proc Natl Acad Sci USA. 2000, 97: 8932-8937. 10.1073/pnas.160270897.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  56. Sanner MF, Olson AJ, Spehner JC: Reduced surface: an efficient way to compute molecular surfaces. Biopolymers. 1996, 38: 305-320. 10.1002/(SICI)1097-0282(199603)38:3<305::AID-BIP4>3.3.CO;2-8.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

We are deeply grateful for the help of Dietlind Gerloff, Dirk Walther, Jonathan Blake, John-Marc Chandonia, Wally Novak, Anthony Lau and Chern-Sing Goh during the development of the application. Anthony Lau and Elaine Meng provided invaluable comments on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fred E Cohen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Joachimiak, M.P., Cohen, F.E. JEvTrace: refinement and variations of the evolutionary trace in JAVA. Genome Biol 3, research0077.1 (2002). https://doi.org/10.1186/gb-2002-3-12-research0077

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/gb-2002-3-12-research0077

Keywords

Navigation