Skip to main content

An Efficient LightGBM Model to Predict Protein Self-interacting Using Chebyshev Moments and Bi-gram

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11644))

Included in the following conference series:

Abstract

Protein self-interactions (SIPs) play significant roles in most life activities. Although numerous computational methods have been developed to predict SIPs, there is still a need of efficient and accurate techniques to improve the performance of SIPs prediction. In this paper, we proposed a machine learning scheme named LGCM for accurate SIP predictions based on protein sequence information. More specifically, an novel feature descriptor employing bi-gram and Chebyshev moments algorithm was developed with the extraction of discriminative sequence information. Then, we fed the integrated protein features into LightGBM classifier as input to train automatic LGCM model. It was demonstrated by rigorous cross-validations that the proposed approach LGCM had a superior prediction performance than other previous methods for SIP predictions with the accuracy of 96.90% and 98.29% on yeast and human datasets, respectively. Experiment results anticipated the effectiveness and reliability of LGCM and played a definite guiding role in future bioinformatics research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. You, Z.-H., Li, X., Chan, K.C.: An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers: Elsevier Science Publishers B. V. (2017)

    Google Scholar 

  2. An, J.-Y., You, Z.-H., Chen, X., Huang, D.-S., Yan, G., Wang, D.-F.: Robust and accurate prediction of protein self-interactions from amino acids sequence using evolutionary information. Mol. BioSyst. 12(12), 3702 (2016)

    Article  Google Scholar 

  3. Gao, Z.-G., Lei, W., Xia, S.-X., You, Z.-H., Xin, Y., Yong, Z.: Ens-PPI: a novel ensemble classifier for predicting the interactions of proteins using autocovariance transformation from PSSM. BioMed Res. Int. 2016(4), 1–8 (2016)

    Google Scholar 

  4. Wang, Y.-B., You, Z.-H., Li, X., Jiang, T.-H., Cheng, L., Chen, Z.-H.: Prediction of protein self-interactions using stacked long short-term memory from protein sequences information. BMC Syst. Biol. 12(8), 129 (2018)

    Article  Google Scholar 

  5. Song, X.-Y., Chen, Z.-H., Sun, X.-Y., You, Z.-H., Li, L.-P., Zhao, Y.: An ensemble classifier with random projection for predicting protein-protein interactions using sequence and evolutionary information. Appl. Sci. 8(1), 89 (2018)

    Article  Google Scholar 

  6. Li, L.-P., Wang, Y.-B., You, Z.-H., Li, Y., An, J.-Y.: PCLPred: a bioinformatics method for predicting protein-protein interactions by combining relevance vector machine model with low-rank matrix approximation. Int. J. Mol. Sci. 19(4), 1029 (2018)

    Article  Google Scholar 

  7. You, Z.-H., Zhou, M., Luo, X., Li, S.: Highly efficient framework for predicting interactions between proteins. IEEE Trans. Cybern. 47(3), 731–743 (2017)

    Article  Google Scholar 

  8. Wen, Y.-T., Lei, H.-J., You, Z.-H., Lei, B.-Y., Chen, X., Li, L.-P.: Prediction of protein-protein interactions by label propagation with protein evolutionary and chemical information derived from heterogeneous network. J. Theor. Biol. 430, 9–20 (2017)

    Article  Google Scholar 

  9. Li, Z.-W., You, Z.-H., Chen, X., Gui, J., Nie, R.: Highly accurate prediction of protein-protein interactions via incorporating evolutionary information and physicochemical characteristics. Int. J. Mol. Sci. 17(9), 1396 (2016)

    Article  Google Scholar 

  10. An, J.-Y., You, Z.-H., Meng, F.-R., Xu, S.-J., Wang, Y.: RVMAB: using the relevance vector machine model combined with average blocks to predict the interactions of proteins from protein sequences. Int. J. Mol. Sci. 17(5), 757 (2016)

    Article  Google Scholar 

  11. Huang, Y.-A., You, Z.-H., Chen, X., Yan, G.-Y.: Improved protein-protein interactions prediction via weighted sparse representation model combining continuous wavelet descriptor and PseAA composition. BMC Syst. Biol. 10(4), 120 (2016)

    Article  Google Scholar 

  12. You, Z.-H., Chan, K.C., Hu, P.: Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS One 10(5), e0125811 (2015)

    Article  Google Scholar 

  13. Lei, Y.-K., You, Z.-H., Dong, T., Jiang, Y.-X., Yang, J.-A.: Increasing reliability of protein interactome by fast manifold embedding. Pattern Recogn. Lett. 34(4), 372–379 (2013)

    Article  Google Scholar 

  14. Xia, J.-F., Wu, M., You, Z.-H., Zhao, X.-M., Li, X.-L.: Prediction of β-hairpins in proteins using physicochemical properties and structure information. Protein Pept. Lett. 17(9), 1123–1128 (2010)

    Article  Google Scholar 

  15. Akiva, E., Itzhaki, Z., Margalit, H.: Built-in loops allow versatility in domain-domain interactions: lessons from self-interacting domains. Proc. Natl. Acad. Sci. U.S.A. 105(36), 13292–13297 (2008)

    Article  Google Scholar 

  16. You, Z.-H., Huang, W., Zhang, S., Huang, Y.-A., Yu, C.-Q., Li, L.-P.: An efficient ensemble learning approach for predicting protein-protein interactions by integrating protein primary sequence and evolutionary information. IEEE/ACM Trans. Comput. Biol. Bioinf. (2018)

    Google Scholar 

  17. Chen, Z.-H., You, Z.-H., Li, L.-P., Wang, Y.-B., Li, X.: RP-FIRF: prediction of self-interacting proteins using random projection classifier combining with finite impulse response filter. In: Huang, D.-S., Jo, K.-H., Zhang, X.-L. (eds.) ICIC 2018. LNCS, vol. 10955, pp. 232–240. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95933-7_29

    Chapter  Google Scholar 

  18. Wang, Y.-B., You, Z.-H., Li, L.-P., Huang, Y.-A., Yi, H.-C.: Detection of interactions between proteins by using legendre moments descriptor to extract discriminatory information embedded in pssm. Molecules 22(8), 1366 (2017)

    Article  Google Scholar 

  19. Li, J.-Q., You, Z.-H., Li, X., Ming, Z., Chen, X.: PSPEL: In silico prediction of self-interacting proteins from amino acids sequences using ensemble learning. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 14(5), 1165–1172 (2017)

    Article  Google Scholar 

  20. Bao, W., You, Z.-H., Huang, D.-S.: CIPPN: computational identification of protein pupylation sites by using neural network. Oncotarget 8(65), 108867 (2017)

    Article  Google Scholar 

  21. Koike, R., Kidera, A., Ota, M.: Alteration of oligomeric state and domain architecture is essential for functional transformation between transferase and hydrolase with the same scaffold. Protein Sci. 18(10), 2060 (2009)

    Article  Google Scholar 

  22. You, Z.-H., Li, L., Ji, Z., Li, M., Guo, S.: Prediction of protein-protein interactions from amino acid sequences using extreme learning machine combined with auto covariance descriptor. In: 2013 IEEE Workshop on Memetic Computing (MC), pp. 80–85. IEEE (2013)

    Google Scholar 

  23. Huang, Q., You, Z., Zhang, X., Zhou, Y.: Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation. Int. J. Mol. Sci. 16(5), 10855–10869 (2015)

    Article  Google Scholar 

  24. Luo, X., Ming, Z., You, Z., Li, S., Xia, Y., Leung, H.: Improving network topology-based protein interactome mapping via collaborative filtering. Knowl.-Based Syst. 90, 23–32 (2015)

    Article  Google Scholar 

  25. Liu, X., Yang, S., Li, C., Zhang, Z., Song, J.: SPAR: a random forest-based predictor for self-interacting proteins with fine-grained domain information. Amino Acids 48(7), 1655–1665 (2016)

    Article  Google Scholar 

  26. You, Z.-H., Zhu, L., Zheng, C.-H., Yu, H.-J., Deng, S.-P., Ji, Z.: Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinformatics 15(S15), S9 (2014)

    Article  Google Scholar 

  27. Zhu, L., You, Z.-H., Huang, D.-S.: Increasing the reliability of protein–protein interaction networks via non-convex semantic embedding. Neurocomputing 121, 99–107 (2013)

    Article  Google Scholar 

  28. Chen, X., Xie, D., Zhao, Q., You, Z.-H.: MicroRNAs and complex diseases: from experimental results to computational models. Briefings in bioinformatics (2017)

    Google Scholar 

  29. Luo, X., et al.: An incremental-and-static-combined scheme for matrix-factorization-based collaborative filtering. IEEE Trans. Autom. Sci. Eng. 13(1), 333–343 (2016)

    Article  Google Scholar 

  30. An, J.-Y., et al.: Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix. Oncotarget 7(50), 82440–82449 (2016)

    Article  Google Scholar 

  31. Wang, Y.-B., You, Z.-H., Li, L.-P., Huang, D.-S., Zhou, F.-F., Yang, S.: Improving prediction of self-interacting proteins using stacked sparse auto-encoder with PSSM profiles. Int. J. Biol. Sci. 14(8), 983–991 (2018)

    Article  Google Scholar 

  32. You, Z.-H., Ming, Z., Huang, H., Peng, X.: A novel method to predict protein-protein interactions based on the information of protein sequence. In: 2012 IEEE International Conference on Control System, Computing and Engineering (ICCSCE), pp. 210–215. IEEE (2012)

    Google Scholar 

  33. Consortium UP: UniProt: a hub for protein information. Nucleic Acids Research 43(Database issue), p. D204 (2015)

    Google Scholar 

  34. Sharma, A., Lyons, J., Dehzangi, A., Paliwal, K.K.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theor. Biol. 320(1), 41 (2013)

    Article  MathSciNet  Google Scholar 

  35. Paliwal, K.K., Sharma, A., Lyons, J., Dehzangi, A.: A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans. Nanobiosci. 320(1), 41 (2013)

    MATH  Google Scholar 

  36. Chen, X., Yan, C.-C., Zhang, X., You, Z.-H.: Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinform. 18(4), 558 (2016)

    Google Scholar 

  37. Zhan, Z.-H., You, Z.-H., Zhou, Y., Li, L.-P., Li, Z.-W.: Efficient framework for predicting ncRNA-Protein interactions based on sequence information by deep learning. In: International Conference on Intelligent Computing, pp. 337–344 (2018)

    Chapter  Google Scholar 

  38. Wang, Y.-B., et al.: Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Mol. BioSyst. 13(7), 1336–1344 (2017)

    Article  Google Scholar 

  39. Wang, Y., You, Z., Li, X., Chen, X., Jiang, T., Zhang, J.: PCVMZM: using the probabilistic classification vector machines model combined with a zernike moments descriptor to predict protein-protein interactions from protein sequences. Int. J. Mol. Sci. 18(5), 1029 (2017)

    Article  Google Scholar 

  40. Zhang, S.-L., Ye, F., Yuan, X.-G.: Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM. J. Biomol. Struct. Dyn. 29(6), 1138–1146 (2012)

    Article  Google Scholar 

  41. Yap, P.T., Raveendran, P., Ong, S.H.: Chebyshev moments as a new set of moments for image reconstruction. 4, 2856–2860 (2001)

    Google Scholar 

  42. Askey, R.: Chebyshev polynomials from approximation theory to algebra and number theory. Bull. London Math. Soc. 23(1), 105–115 (1990)

    MathSciNet  Google Scholar 

  43. Kotoulas, L., Andreadis, I.: Fast computation of Chebyshev moments. IEEE Trans. Circuits Syst. Video Technol. 16(7), 884–888 (2006)

    Article  Google Scholar 

  44. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree (2017)

    Google Scholar 

  45. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)

    Article  MathSciNet  Google Scholar 

  46. Wang, L., You, Z.-H., Huang, D.-S., Zhou, F.: Combining high speed ELM learning with a deep convolutional neural network feature encoding for predicting protein-RNA interactions. IEEE/ACM Trans. Comput. Biol. Bioinf. (2018)

    Google Scholar 

  47. Chen, X., et al.: WBSMDA: within and between score for MiRNA-Disease association prediction. Sci. Rep. 6, 21106 (2016)

    Article  Google Scholar 

  48. Huang, Y.-A., Chan, K.C., You, Z.-H.: Constructing prediction models from expression profiles for large scale lncRNA–miRNA interaction profiling. Bioinformatics 34(5), 812–819 (2017)

    Article  Google Scholar 

  49. Li, J.-Q., Rong, Z.-H., Chen, X., Yan, G.-Y., You, Z.-H.: MCMDA: matrix completion for MiRNA-Disease association prediction. Oncotarget 8(13), 21187 (2017)

    Google Scholar 

  50. You, Z.-H., Ming, Z., Li, L., Huang, Q.-Y.: Research on signaling pathways reconstruction by integrating high content RNAi screening and functional gene network. In: Huang, D.-S., Jo, K.-H., Zhou, Y.-Q., Han, K. (eds.) ICIC 2013. LNCS (LNAI), vol. 7996, pp. 1–10. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39482-9_1

    Chapter  Google Scholar 

  51. Huang, Y.-A., You, Z.-H., Chen, X.: A systematic prediction of drug-target interactions using molecular fingerprints and protein sequences. Curr. Protein Pept. Sci. 19(5), 468–478 (2018)

    Article  Google Scholar 

  52. Zhu, H.-J., You, Z.-H., Zhu, Z.-X., Shi, W.-L., Chen, X., Cheng, L.: DroidDet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272, 638–646 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhu-Hong You .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhan, ZH., You, ZH., Zhou, Y., Zheng, K., Li, ZW. (2019). An Efficient LightGBM Model to Predict Protein Self-interacting Using Chebyshev Moments and Bi-gram. In: Huang, DS., Jo, KH., Huang, ZK. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11644. Springer, Cham. https://doi.org/10.1007/978-3-030-26969-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26969-2_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26968-5

  • Online ISBN: 978-3-030-26969-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics