LabCaS for Ranking Potential Calpain Substrate Cleavage Sites from Amino Acid Sequence
Calpains are a family of Ca2+-dependent cysteine proteases involved in many important biological processes, where they selectively cleave relevant substrates at specific cleavage sites to regulate the function of the substrate proteins. Presently, our knowledge about the function of calpains and the mechanism of substrate cleavage is still limited due to the fact that the experimental determination and validation on calpain bindings are usually laborious and expensive. This chapter describes LabCaS, an algorithm that is designed for predicting the calpain substrate cleavage sites from amino acid sequences. LabCaS is built on a conditional random field (CRF) statistic model, which trains the cleavage site prediction on multiple features of amino acid residue preference, solvent accessibility information, pair-wise alignment similarity score, secondary structure propensity, and physical-chemistry properties. Large-scale benchmark tests have shown that LabCaS can achieve a reliable recognition of the cleavage sites for most calpain proteins with an average AUC score of 0.862. Due to the fast speed and convenience of use, the protocol should find its usefulness in large-scale calpain-based function annotations of the newly sequenced proteins. The online web server of LabCaS is freely available at http://www.csbio.sjtu.edu.cn/bioinf/LabCaS.
Key wordsProtease substrate recognition Cleavage site prediction Sequence labeling Ensemble learning Calpain Conditional random fields
We are grateful to Mr. Wallace Chan and Dr. S M Golam Mortuza for proofreading the manuscript. This work was supported in part by the National Natural Science Foundation of China (No. 61462018, 61762026, 61671288, 91530321, 61725302, and 61603161), Guangxi Natural Science Foundation (No. 2017GXNSFAA198278), Guangxi Key Laboratory of Trusted Software (No. kx201403), Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Images and Graphics (No. GIIP201502), Science and Technology Commission of Shanghai Municipality (No. 16JC1404300, 17JC1403500), and the National Science Foundation (ABI 1564756).
- 23.Mak MW, Wang W, Kung SY (2009) Fusion of conditional random field and signalp for protein cleavage site prediction. In: In acoustics, speech and signal processing. Taipei, pp 716–721Google Scholar
- 25.Hammersley J, Clifford P (1971) Markov field on finite graphs and lattices. Unpublished manuscriptGoogle Scholar