Abstract
Post-translational modification of protein lysines was recently shown to be a common feature of eukaryotic organisms. The ubiquitin modification is regarded as a versatile regulatory mechanism with many important cellular roles. Large-scale datasets are becoming available for H. sapiens ubiquitination. However, using current experimental techniques the vast majority of their sites remain unidentified and in silico tools may offer an alternative. Here, we introduce Rapid UBIquitination (RUBI) a sequence-based ubiquitination predictor designed for rapid application on a genome scale. RUBI was constructed using an iterative approach. At each iteration, important factors which influenced performance and its usability were investigated. The final RUBI model has an AUC of 0.868 on a large cross-validation set and is shown to outperform other available methods on independent sets. Predicted intrinsic disorder is shown to be weakly anti-correlated to ubiquitination for the H. sapiens dataset and improves performance slightly. RUBI predicts the number of ubiquitination sites correctly within three sites for ca. 80 % of the tested proteins. The average potentially ubiquitinated proteome fraction is predicted to be at least 25 % across a variety of model organisms, including several thousand possible H. sapiens proteins awaiting experimental characterization. RUBI can accurately predict ubiquitination on unseen examples and has a signal across different eukaryotic organisms. The factors which influenced the construction of RUBI could also be tested in other post-translational modification predictors. One of the more interesting factors is the influence of intrinsic protein disorder on ubiquitinated lysines where residues with low disorder probability are preferred.
Similar content being viewed by others
References
Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Baldi P, Brunak S, Frasconi P et al (1999) Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15:937–946. doi:10.1093/bioinformatics/15.11.937
Bingol B, Sheng M (2011) Deconstruction for reconstruction: the role of proteolysis in neural plasticity and disease. Neuron 69:22–32. doi:10.1016/j.neuron.2010.11.006
Blom N, Gammeltoft S, Brunak S (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294:1351–1362. doi:10.1006/jmbi.1999.3310
Blom N, Sicheritz-Pontén T, Gupta R et al (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4:1633–1649. doi:10.1002/pmic.200300771
Cai Y, Huang T, Hu L et al (2012) Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 42:1387–1395. doi:10.1007/s00726-011-0835-0
Chau V, Tobias JW, Bachmair A et al (1989) A multiubiquitin chain is confined to specific lysine in a targeted short-lived protein. Science 243:1576–1583
Chen ZJ, Sun LJ (2009) Nonproteolytic functions of ubiquitin in cell signaling. Mol Cell 33:275–286. doi:10.1016/j.molcel.2009.01.014
Chen Z, Chen Y-Z, Wang X-F et al (2011) Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One 6:e22930. doi:10.1371/journal.pone.0022930
Chen X, Qiu J-D, Shi S-P et al (2013a) Incorporating key position and amino acid residue features to identify general and species-specific ubiquitin conjugation sites. Bioinformatics 29:1614–1622. doi:10.1093/bioinformatics/btt196
Chen Z, Zhou Y, Song J, Zhang Z (2013b) hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. Biochim Biophys Acta 1834:1461–1467. doi:10.1016/j.bbapap.2013.04.006
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. doi:10.1023/A:1022627411411
Di Domenico T, Walsh I, Martin AJM, Tosatto SCE (2012) MobiDB: a comprehensive database of intrinsic protein disorder annotations. Bioinformatics 28:2080–2081. doi:10.1093/bioinformatics/bts327
Glickman MH, Ciechanover A (2002) The ubiquitin–proteasome proteolytic pathway: destruction for the sake of construction. Physiol Rev 82:373–428. doi:10.1152/physrev.00027.2001
Hagai T, Azia A, Tóth-Petróczy Á, Levy Y (2011) Intrinsic disorder in ubiquitination substrates. J Mol Biol 412:319–324. doi:10.1016/j.jmb.2011.07.024
Hicke L (2001) Protein regulation by monoubiquitin. Nat Rev Mol Cell Biol 2:195–201. doi:10.1038/35056583
Hoeller D, Hecker C-M, Dikic I (2006) Ubiquitin and ubiquitin-like proteins in cancer pathogenesis. Nat Rev Cancer 6:776–788. doi:10.1038/nrc1994
Hornbeck PV, Kornhauser JM, Tkachev S et al (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 40:D261–D270. doi:10.1093/nar/gkr1122
Hunter T (2007) The age of crosstalk: phosphorylation, ubiquitination, and beyond. Mol Cell 28:730–738. doi:10.1016/j.molcel.2007.11.019
Ikeda F, Dikic I (2008) Atypical ubiquitin chains: new molecular signals. EMBO Rep 9:536–542. doi:10.1038/embor.2008.93
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. doi:10.1093/bioinformatics/btl158
Lonard DM, O’Malley BW (2007) Nuclear receptor coregulators: judges, juries, and executioners of cellular regulation. Mol Cell 27:691–700. doi:10.1016/j.molcel.2007.08.012
Monastyrskyy B, Fidelis K, Moult J et al (2011) Evaluation of disorder predictions in CASP9. Proteins 79(suppl 10):107–118. doi:10.1002/prot.23161
Nalepa G, Rolfe M, Harper JW (2006) Drug discovery in the ubiquitin–proteasome system. Nat Rev Drug Discov 5:596–613. doi:10.1038/nrd2056
Ng AHM, Fang NN, Comyn SA et al (2013) System-wide analysis reveals intrinsically disordered proteins are prone to ubiquitylation after misfolding stress. Mol Cell Proteomics. doi:10.1074/mcp.M112.023416
Peng J, Schwartz D, Elias JE et al (2003) A proteomics approach to understanding protein ubiquitination. Nat Biotechnol 21:921–926. doi:10.1038/nbt849
Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21:1719–1720. doi:10.1093/bioinformatics/bti203
Radivojac P, Vacic V, Haynes C et al (2010) Identification, analysis and prediction of protein ubiquitination sites. Proteins 78:365–380. doi:10.1002/prot.22555
Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70 % accuracy. J Mol Biol 232:584–599. doi:10.1006/jmbi.1993.1413
Schwartz D (2012) Prediction of lysine post-translational modifications using bioinformatic tools. Essays Biochem 52:165–177. doi:10.1042/bse0520165
Schwartz D, Chou MF, Church GM (2009) Predicting protein post-translational modifications using meta-analysis of proteome scale data sets. Mol Cell Proteomics 8:365–379. doi:10.1074/mcp.M800332-MCP200
Sun L, Chen ZJ (2004) The novel functions of ubiquitination in signaling. Curr Opin Cell Biol 16:119–126. doi:10.1016/j.ceb.2004.02.005
Tung C-W, Ho S-Y (2008) Computational identification of ubiquitylation sites from protein sequences. BMC Bioinform 9:310. doi:10.1186/1471-2105-9-310
Udeshi ND, Svinkina T, Mertins P et al (2013) Refined preparation and use of anti-diglycine remnant (K-ε-GG) antibody enables routine quantification of 10,000 s of ubiquitination sites in single proteomics experiments. Mol Cell Proteomics 12:825–831. doi:10.1074/mcp.O112.027094
Wagner SA, Beli P, Weinert BT et al (2011) A proteome-wide, quantitative survey of in vivo ubiquitylation sites reveals widespread regulatory roles. Mol Cell Proteomics. doi:10.1074/mcp.M111.013284
Walsh I, Martin AJM, Di Domenico T, Tosatto SCE (2012) ESpritz: accurate and fast prediction of protein disorder. Bioinformatics 28:503–509. doi:10.1093/bioinformatics/btr682
Wong BR, Parlati F, Qu K et al (2003) Drug discovery in the ubiquitin regulatory pathway. Drug Discov Today 8:746–754. doi:10.1016/S1359-6446(03)02780-6
Acknowledgments
The authors are grateful to members of the Biocomputing UP Lab for insightful discussions.
Conflict of interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Additional information
The RUBI web server and stand-alone version for Linux are available from the URL: http://protein.bio.unipd.it/rubi/.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Walsh, I., Di Domenico, T. & Tosatto, S.C.E. RUBI: rapid proteomic-scale prediction of lysine ubiquitination and factors influencing predictor performance. Amino Acids 46, 853–862 (2014). https://doi.org/10.1007/s00726-013-1645-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-013-1645-3