PupDB: a database of pupylated proteins
- 5.2k Downloads
Prokaryotic ubiquitin-like protein (Pup), the firstly identified post-translational protein modifier in prokaryotes, is an important signal for the selective degradation of proteins. Recently, large-scale proteomics technology has been applied to identify a large number of pupylated proteins. The development of a database for managing pupylated proteins and pupylation sites is important for further analyses.
A database named PupDB is constructed by collecting experimentally identified pupylated proteins and pupylation sites from published studies and integrating the information of pupylated proteins with corresponding structures and functional annotations. PupDB is a web-based database with tools for browses and searches of pupylated proteins and interactive displays of protein structures and pupylation sites.
The structured and searchable database PupDB is expected to provide a useful resource for further analyzing the substrate specificity, identifying pupylated proteins in other organisms and developing computational tools for predicting pupylation sites. PupDB is freely available at http://cwtung.kmu.edu.tw/pupdb.
KeywordsGene Ontology Mycobacterium Smegmatis String Kernel Interactive Display Specific Lysine Residue
Protein-to-protein modifications are essential for regulating protein functions. In eukaryotes, ubiquitylation involved in numerous regulatory functions such as protein degradation, DNA repair, transcription and signal transduction is particular important . Recently, pupylation has been identified as the first post-translational protein-to-protein modification in prokaryotes [2, 3]. Similar to ubiquitin, prokaryotic ubiquitin-like protein (Pup) attaches to specific lysine residues of substrate proteins by forming isopeptide bonds to target the proteins for proteasomal degradation [2, 3].
Although ubiquitylation and pupylation are functional analogues, the enzymology of ubiquitylation and pupylation is different. In contrast to the three-step reaction of ubiquitylation, pupylation requires only two steps that only two enzymes are involved in pupylation. First, the C-terminal glutamine of Pup is deamidated to glutamine by deamidase of Pup (Dop) . Subsequently, proteasome accessory factor A (PafA) attaches the deamidated Pup to specific lysine residues of substrate proteins .
The identification of pupylated proteins and pupylation sites can provide insights into the substrate specificity and functions of pupylation. Recently, large-scale proteomics technology has been applied to identify pupylated proteins and pupylation sites [6, 7, 8, 9]. As the number of identified pupylated proteins and sites grows, a structured and searchable database of pupylated proteins and pupylation sites is desirable for further analyzing substrate specificity and functions of pupylated proteins and developing prediction methods for pupylation sites. For this purpose, the freely accessible database named PupDB integrating information of pupylated proteins and pupylation sites, protein structures, functional annotations and tools for browses, searches and interactive displays of protein structures and pupylation sites was constructed.
Construction and content
The PupDB database is implemented using MySQL Server Edition 5.1. The PupDB website is publicly available at http://cwtung.kmu.edu.tw/pupdb. The web interface and all functions are implemented using PHP and Perl languages. The software of Google Chart Tools  is utilized to make sortable tables.
Two kinds of proteins included in PupDB are pupylated proteins and candidate pupylated proteins. All proteins are collected from four large-scale proteomics studies [6, 7, 8, 9]. Proteins with experimentally identified pupylation sites are annotated as pupylated proteins. Candidate pupylated proteins are experimentally identified proteins whose pupylation sites are still unknown.
As shown in Figure 1a, the first part of basic information includes the UniProt AC, description, gene name, organism and sequence length. For further information of protein annotations, PupDB provides links to the corresponding entries of UniProt database . Also, structure information including PDB (Protein Data Bank) ID and hyperlinks to the PDB database  is provided in the second part (Figure 1b). The visualization of pupylation sites in a protein structure can provide helpful information for analysis. The protein 3D structure and associated pupylation sites can be viewed in PupDB by clicking the link of '3D visualization'. The java applet-based program Jmol  is utilized for interactive displays of protein structure (Figure 1f). The UniProt protein accession numbers and PDB IDs are obtained by using the ID mapping function of UniProt. Currently, there are 766 PDB structures associated with 294 PupDB entries.
The GO annotations  can give useful information of molecular function, cellular component and biological process. For a given protein, the corresponding GO annotations can be extracted by using its UniProt accession number. Figure 1c shows the third part of GO annotations for protein P69440. Further GO information can be accessed by clicking the hyperlink of 'Detailed GO annotation' that links to the corresponding entry of QuickGO .
The fourth part of pupylation sites includes pupylation sites and corresponding references for pupylated proteins (Figure 1d). References are represented as PubMed IDs with hyperlinks to PubMed database . Instead of showing only references for a candidate pupylated protein whose pupylation sites are still unknown, PupDB highlights pupylation sites in both sequence and structure of a pupylated protein for visualization as shown in Figure 1e and 1f, respectively.
Utility and discussion
Hyperlinks to major protein, structure and annotation databases are provided for accessing related information. Four useful tools are constructed and integrated into PupDB to provide functions of browses, keyword searches, sequence similarity searches and interactive displays of protein structures. The functions of the integrated tools are introduced in the follows.
Search and BLAST tools
Interactive tool for protein structure
PupDB incorporates the Jmol applet of latest version 12.2 for interactive displays of protein structures. By default, PupDB represents protein structures and pupylation sites in grey and yellow colors, respectively. Users can either use the user interface or scripting console to manipulate protein structures.
The PupDB database is a comprehensive repository of pupylated proteins and pupylation sites with a web-based user interface. The built-in tools for browses, searches and interactive displays of protein structures and pupylation sites make PupDB a useful resource for further analyzing the substrate specificity, identifying pupylated proteins in other organisms and developing computational tools for predicting pupylation sites. In addition to the graphical analysis using two-sample logos, advanced machine learning methods such as string kernels  can also be utilized to further analyze the specificity of pupylation. The exported dataset of pupylated proteins is downloadable at PupDB.
Post-translational modification databases serve as good data source for developing prediction tools. For example, the construction of UbiPred  for predicting ubiquitylation sites is based on dataset of UbiProt . Although a predictor GPS-PUP  is available for predicting pupylation sites, PupDB with 215 pupylation sites can be utilized to further improve GPS-PUP trained on only 127 pupylation sites. Future works are two-fold. First, the development and integration of prediction tools based on the dataset of PupDB would be useful for analyzing and predicting pupylation sites. Second, the incorporation of orthology relationships and locations of functional domains can largely improve PupDB.
Availability and requirements
The PupDB is freely available at http://cwtung.kmu.edu.tw/pupdb. The website has been tested with browsers of Safari, Opera, Internet Explorer 7 or later, Firefox and Google Chrome. The Java Runtime Environment (JRE) is required for interactive displays of protein 3D structures by Jmol.
CWT would like to thank the National Science Council (NSC 101-2311-B-037-001-MY2) of Taiwan and Kaohsiung Medical University Research Foundation (KMU-Q110015 and KMU-ER013) for financially supporting this research. CWT thanks the anonymous reviewers for their valuable comments and suggestions to improve this work.
- 10.Google Chart Tools[http://code.google.com/intl/zh-TW/apis/chart/index.html]
- 15.Jmol: an open-source Java viewer for chemical structures in 3D[http://www.jmol.org/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.