Abstract
Protein self-interactions (SIPs) play significant roles in most life activities. Although numerous computational methods have been developed to predict SIPs, there is still a need of efficient and accurate techniques to improve the performance of SIPs prediction. In this paper, we proposed a machine learning scheme named LGCM for accurate SIP predictions based on protein sequence information. More specifically, an novel feature descriptor employing bi-gram and Chebyshev moments algorithm was developed with the extraction of discriminative sequence information. Then, we fed the integrated protein features into LightGBM classifier as input to train automatic LGCM model. It was demonstrated by rigorous cross-validations that the proposed approach LGCM had a superior prediction performance than other previous methods for SIP predictions with the accuracy of 96.90% and 98.29% on yeast and human datasets, respectively. Experiment results anticipated the effectiveness and reliability of LGCM and played a definite guiding role in future bioinformatics research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
You, Z.-H., Li, X., Chan, K.C.: An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers: Elsevier Science Publishers B. V. (2017)
An, J.-Y., You, Z.-H., Chen, X., Huang, D.-S., Yan, G., Wang, D.-F.: Robust and accurate prediction of protein self-interactions from amino acids sequence using evolutionary information. Mol. BioSyst. 12(12), 3702 (2016)
Gao, Z.-G., Lei, W., Xia, S.-X., You, Z.-H., Xin, Y., Yong, Z.: Ens-PPI: a novel ensemble classifier for predicting the interactions of proteins using autocovariance transformation from PSSM. BioMed Res. Int. 2016(4), 1–8 (2016)
Wang, Y.-B., You, Z.-H., Li, X., Jiang, T.-H., Cheng, L., Chen, Z.-H.: Prediction of protein self-interactions using stacked long short-term memory from protein sequences information. BMC Syst. Biol. 12(8), 129 (2018)
Song, X.-Y., Chen, Z.-H., Sun, X.-Y., You, Z.-H., Li, L.-P., Zhao, Y.: An ensemble classifier with random projection for predicting protein-protein interactions using sequence and evolutionary information. Appl. Sci. 8(1), 89 (2018)
Li, L.-P., Wang, Y.-B., You, Z.-H., Li, Y., An, J.-Y.: PCLPred: a bioinformatics method for predicting protein-protein interactions by combining relevance vector machine model with low-rank matrix approximation. Int. J. Mol. Sci. 19(4), 1029 (2018)
You, Z.-H., Zhou, M., Luo, X., Li, S.: Highly efficient framework for predicting interactions between proteins. IEEE Trans. Cybern. 47(3), 731–743 (2017)
Wen, Y.-T., Lei, H.-J., You, Z.-H., Lei, B.-Y., Chen, X., Li, L.-P.: Prediction of protein-protein interactions by label propagation with protein evolutionary and chemical information derived from heterogeneous network. J. Theor. Biol. 430, 9–20 (2017)
Li, Z.-W., You, Z.-H., Chen, X., Gui, J., Nie, R.: Highly accurate prediction of protein-protein interactions via incorporating evolutionary information and physicochemical characteristics. Int. J. Mol. Sci. 17(9), 1396 (2016)
An, J.-Y., You, Z.-H., Meng, F.-R., Xu, S.-J., Wang, Y.: RVMAB: using the relevance vector machine model combined with average blocks to predict the interactions of proteins from protein sequences. Int. J. Mol. Sci. 17(5), 757 (2016)
Huang, Y.-A., You, Z.-H., Chen, X., Yan, G.-Y.: Improved protein-protein interactions prediction via weighted sparse representation model combining continuous wavelet descriptor and PseAA composition. BMC Syst. Biol. 10(4), 120 (2016)
You, Z.-H., Chan, K.C., Hu, P.: Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS One 10(5), e0125811 (2015)
Lei, Y.-K., You, Z.-H., Dong, T., Jiang, Y.-X., Yang, J.-A.: Increasing reliability of protein interactome by fast manifold embedding. Pattern Recogn. Lett. 34(4), 372–379 (2013)
Xia, J.-F., Wu, M., You, Z.-H., Zhao, X.-M., Li, X.-L.: Prediction of β-hairpins in proteins using physicochemical properties and structure information. Protein Pept. Lett. 17(9), 1123–1128 (2010)
Akiva, E., Itzhaki, Z., Margalit, H.: Built-in loops allow versatility in domain-domain interactions: lessons from self-interacting domains. Proc. Natl. Acad. Sci. U.S.A. 105(36), 13292–13297 (2008)
You, Z.-H., Huang, W., Zhang, S., Huang, Y.-A., Yu, C.-Q., Li, L.-P.: An efficient ensemble learning approach for predicting protein-protein interactions by integrating protein primary sequence and evolutionary information. IEEE/ACM Trans. Comput. Biol. Bioinf. (2018)
Chen, Z.-H., You, Z.-H., Li, L.-P., Wang, Y.-B., Li, X.: RP-FIRF: prediction of self-interacting proteins using random projection classifier combining with finite impulse response filter. In: Huang, D.-S., Jo, K.-H., Zhang, X.-L. (eds.) ICIC 2018. LNCS, vol. 10955, pp. 232–240. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95933-7_29
Wang, Y.-B., You, Z.-H., Li, L.-P., Huang, Y.-A., Yi, H.-C.: Detection of interactions between proteins by using legendre moments descriptor to extract discriminatory information embedded in pssm. Molecules 22(8), 1366 (2017)
Li, J.-Q., You, Z.-H., Li, X., Ming, Z., Chen, X.: PSPEL: In silico prediction of self-interacting proteins from amino acids sequences using ensemble learning. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 14(5), 1165–1172 (2017)
Bao, W., You, Z.-H., Huang, D.-S.: CIPPN: computational identification of protein pupylation sites by using neural network. Oncotarget 8(65), 108867 (2017)
Koike, R., Kidera, A., Ota, M.: Alteration of oligomeric state and domain architecture is essential for functional transformation between transferase and hydrolase with the same scaffold. Protein Sci. 18(10), 2060 (2009)
You, Z.-H., Li, L., Ji, Z., Li, M., Guo, S.: Prediction of protein-protein interactions from amino acid sequences using extreme learning machine combined with auto covariance descriptor. In: 2013 IEEE Workshop on Memetic Computing (MC), pp. 80–85. IEEE (2013)
Huang, Q., You, Z., Zhang, X., Zhou, Y.: Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation. Int. J. Mol. Sci. 16(5), 10855–10869 (2015)
Luo, X., Ming, Z., You, Z., Li, S., Xia, Y., Leung, H.: Improving network topology-based protein interactome mapping via collaborative filtering. Knowl.-Based Syst. 90, 23–32 (2015)
Liu, X., Yang, S., Li, C., Zhang, Z., Song, J.: SPAR: a random forest-based predictor for self-interacting proteins with fine-grained domain information. Amino Acids 48(7), 1655–1665 (2016)
You, Z.-H., Zhu, L., Zheng, C.-H., Yu, H.-J., Deng, S.-P., Ji, Z.: Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinformatics 15(S15), S9 (2014)
Zhu, L., You, Z.-H., Huang, D.-S.: Increasing the reliability of protein–protein interaction networks via non-convex semantic embedding. Neurocomputing 121, 99–107 (2013)
Chen, X., Xie, D., Zhao, Q., You, Z.-H.: MicroRNAs and complex diseases: from experimental results to computational models. Briefings in bioinformatics (2017)
Luo, X., et al.: An incremental-and-static-combined scheme for matrix-factorization-based collaborative filtering. IEEE Trans. Autom. Sci. Eng. 13(1), 333–343 (2016)
An, J.-Y., et al.: Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix. Oncotarget 7(50), 82440–82449 (2016)
Wang, Y.-B., You, Z.-H., Li, L.-P., Huang, D.-S., Zhou, F.-F., Yang, S.: Improving prediction of self-interacting proteins using stacked sparse auto-encoder with PSSM profiles. Int. J. Biol. Sci. 14(8), 983–991 (2018)
You, Z.-H., Ming, Z., Huang, H., Peng, X.: A novel method to predict protein-protein interactions based on the information of protein sequence. In: 2012 IEEE International Conference on Control System, Computing and Engineering (ICCSCE), pp. 210–215. IEEE (2012)
Consortium UP: UniProt: a hub for protein information. Nucleic Acids Research 43(Database issue), p. D204 (2015)
Sharma, A., Lyons, J., Dehzangi, A., Paliwal, K.K.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theor. Biol. 320(1), 41 (2013)
Paliwal, K.K., Sharma, A., Lyons, J., Dehzangi, A.: A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans. Nanobiosci. 320(1), 41 (2013)
Chen, X., Yan, C.-C., Zhang, X., You, Z.-H.: Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinform. 18(4), 558 (2016)
Zhan, Z.-H., You, Z.-H., Zhou, Y., Li, L.-P., Li, Z.-W.: Efficient framework for predicting ncRNA-Protein interactions based on sequence information by deep learning. In: International Conference on Intelligent Computing, pp. 337–344 (2018)
Wang, Y.-B., et al.: Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Mol. BioSyst. 13(7), 1336–1344 (2017)
Wang, Y., You, Z., Li, X., Chen, X., Jiang, T., Zhang, J.: PCVMZM: using the probabilistic classification vector machines model combined with a zernike moments descriptor to predict protein-protein interactions from protein sequences. Int. J. Mol. Sci. 18(5), 1029 (2017)
Zhang, S.-L., Ye, F., Yuan, X.-G.: Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM. J. Biomol. Struct. Dyn. 29(6), 1138–1146 (2012)
Yap, P.T., Raveendran, P., Ong, S.H.: Chebyshev moments as a new set of moments for image reconstruction. 4, 2856–2860 (2001)
Askey, R.: Chebyshev polynomials from approximation theory to algebra and number theory. Bull. London Math. Soc. 23(1), 105–115 (1990)
Kotoulas, L., Andreadis, I.: Fast computation of Chebyshev moments. IEEE Trans. Circuits Syst. Video Technol. 16(7), 884–888 (2006)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree (2017)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Wang, L., You, Z.-H., Huang, D.-S., Zhou, F.: Combining high speed ELM learning with a deep convolutional neural network feature encoding for predicting protein-RNA interactions. IEEE/ACM Trans. Comput. Biol. Bioinf. (2018)
Chen, X., et al.: WBSMDA: within and between score for MiRNA-Disease association prediction. Sci. Rep. 6, 21106 (2016)
Huang, Y.-A., Chan, K.C., You, Z.-H.: Constructing prediction models from expression profiles for large scale lncRNA–miRNA interaction profiling. Bioinformatics 34(5), 812–819 (2017)
Li, J.-Q., Rong, Z.-H., Chen, X., Yan, G.-Y., You, Z.-H.: MCMDA: matrix completion for MiRNA-Disease association prediction. Oncotarget 8(13), 21187 (2017)
You, Z.-H., Ming, Z., Li, L., Huang, Q.-Y.: Research on signaling pathways reconstruction by integrating high content RNAi screening and functional gene network. In: Huang, D.-S., Jo, K.-H., Zhou, Y.-Q., Han, K. (eds.) ICIC 2013. LNCS (LNAI), vol. 7996, pp. 1–10. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39482-9_1
Huang, Y.-A., You, Z.-H., Chen, X.: A systematic prediction of drug-target interactions using molecular fingerprints and protein sequences. Curr. Protein Pept. Sci. 19(5), 468–478 (2018)
Zhu, H.-J., You, Z.-H., Zhu, Z.-X., Shi, W.-L., Chen, X., Cheng, L.: DroidDet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272, 638–646 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhan, ZH., You, ZH., Zhou, Y., Zheng, K., Li, ZW. (2019). An Efficient LightGBM Model to Predict Protein Self-interacting Using Chebyshev Moments and Bi-gram. In: Huang, DS., Jo, KH., Huang, ZK. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11644. Springer, Cham. https://doi.org/10.1007/978-3-030-26969-2_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-26969-2_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26968-5
Online ISBN: 978-3-030-26969-2
eBook Packages: Computer ScienceComputer Science (R0)