Efficient Framework for Predicting ncRNA-Protein Interactions Based on Sequence Information by Deep Learning

  • Zhao-Hui Zhan
  • Zhu-Hong YouEmail author
  • Yong Zhou
  • Li-Ping Li
  • Zheng-Wei Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10955)


The interactions between proteins and RNA (RPIs) play a crucial role in most cellular processes such as RNA stability and translation. Although there have been many high-throughput experiments recently to detect RPIs, these experiments are largely time-consuming and labor-intensive. Therefore, it is imminent to propose an efficient computational method to predict RPIs. In this study, we put forward a novel approach for predicting protein and ncRNA interactions based on sequences information only. By employing the bi-gram probability feature extraction method and k-mer algorithm, the represent features from protein and ncRNA were extracted. To evaluate the performance of the proposed model, two widely used datasets named RPI1807 and RPI2241 were trained with the adoption of random forest classifier by using five-fold cross-validation. The experimental results with the AUC of 0.992 and 0.947 on dataset RPI1807 and RPI2241 respectively indicated the effectiveness of our experimental approach for predicting RPIs, which provided the guidance for reference for future research in the biological field.


Protein-ncRNA interaction Bi-gram Deep learning Stacked autoencoder PSSM 


  1. 1.
    Wapinski, O., Chang, H.Y.: Long noncoding RNAs and human disease. Trends Cell Biol. 21(6), 354–361 (2011)CrossRefGoogle Scholar
  2. 2.
    Guttman, M., Amit, I., Garber, M., French, C., Lin, M.F., Feldser, D., Huarte, M., Zuk, O., Carey, B.W., Cassady, J.P.: Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458(7235), 223 (2009)CrossRefGoogle Scholar
  3. 3.
    Yu, F., Zheng, J., Mao, Y., Dong, P., Li, G., Lu, Z., Guo, C., Liu, Z., Fan, X.: Long non-coding RNA APTR promotes the activation of hepatic stellate cells and the progression of liver fibrosis. Biochem. Biophys. Res. Commun. 463(4), 679–685 (2015)CrossRefGoogle Scholar
  4. 4.
    Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B.L., Barrell, D., Zadissa, A., Searle, S.: GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22(9), 1760–1774 (2012)CrossRefGoogle Scholar
  5. 5.
    Chen, X., You, Z.H., Yan, G.Y., Gong, D.W.: IRWRLDA: improved random walk with restart for lncRNA-disease association prediction. Oncotarget 7(36), 57919–57931 (2016)Google Scholar
  6. 6.
    Chen, X., Yan, C.C., Zhang, X., You, Z.H.: Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinform. 18(4), 558 (2016)Google Scholar
  7. 7.
    Wang, Y.B., You, Z.H., Li, X., Jiang, T.H., Chen, X., Zhou, X., Wang, L.: Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Mol. BioSyst. 13(7), 1336–1344 (2017)CrossRefGoogle Scholar
  8. 8.
    Li, S., You, Z.H., Guo, H., Luo, X., Zhao, Z.Q.: Inverse-free extreme learning machine with optimal information updating. IEEE Trans. Cybern. 46(5), 1229 (2016)CrossRefGoogle Scholar
  9. 9.
    Lei, W., You, Z.H., Xing, C., Li, J.Q., Xin, Y., Wei, Z., Yuan, H.: An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences. Oncotarget 8(3), 5149–5159 (2016)Google Scholar
  10. 10.
    Huang, Q., You, Z., Zhang, X., Zhou, Y.: Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation. Int. J. Mol. Sci. 16(5), 10855–10869 (2015)CrossRefGoogle Scholar
  11. 11.
    Huang, Y.A., You, Z.H., Chen, X.: A systematic prediction of drug-target interactions using molecular fingerprints and protein sequences. Curr. Protein Pept. Sci. 5(19), 468–478 (2017)Google Scholar
  12. 12.
    You, Z.H., Huang, Z.A., Zhu, Z., Yan, G.Y., Li, Z.W., Wen, Z., Chen, X.: PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput. Biol. 13(3), e1005455 (2017)CrossRefGoogle Scholar
  13. 13.
    Li, Z.W., You, Z.H., Chen, X., Li, L.P., Huang, D.S., Yan, G.Y., Nie, R., Huang, Y.A.: Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier. Oncotarget 8(14), 23638 (2017)Google Scholar
  14. 14.
    An, J.Y., You, Z.H., Chen, X., Huang, D.S., Yan, G., Wang, D.F.: Robust and accurate prediction of protein self-interactions from amino acids sequence using evolutionary information. Mol. BioSyst. 12(12), 3702 (2016)CrossRefGoogle Scholar
  15. 15.
    An, J.Y., You, Z.H., Chen, X., Huang, D.S., Li, Z.W., Liu, G., Wang, Y.: Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix. Oncotarget 7(50), 82440–82449 (2016)CrossRefGoogle Scholar
  16. 16.
    Lei, Y.K., You, Z.H., Ji, Z., Zhu, L., Huang, D.S.: Assessing and predicting protein interactions by combining manifold embedding with multiple information integration. BMC Bioinform. 13(S7), S3 (2012)CrossRefGoogle Scholar
  17. 17.
    You, Z.H., Lei, Y.K., Gui, J., Huang, D.S., Zhou, X.: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26(21), 2744 (2010)CrossRefGoogle Scholar
  18. 18.
    You, Z.H., Zhu, L., Zheng, C.H., Yu, H.J., Deng, S.P., Ji, Z.: Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinform. 15(S15), S9 (2014)CrossRefGoogle Scholar
  19. 19.
    Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8), 831–838 (2015)CrossRefGoogle Scholar
  20. 20.
    Pan, X., Fan, Y.X., Yan, J., Shen, H.B.: IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction. BMC Genom. 17(1), 582 (2016)CrossRefGoogle Scholar
  21. 21.
    Chen, H., Huang, Z.: Medical image feature extraction and fusion algorithm based on K-SVD. In: Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 3PGCIC 2015, GuangDong, pp. 333–337 (2015)Google Scholar
  22. 22.
    Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U., Eisenberg, D.: The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004)CrossRefGoogle Scholar
  23. 23.
    Chatraryamontri, A., Breitkreutz, B.J., Oughtred, R., Boucher, L., Heinicke, S., Chen, D., Stark, C., Breitkreutz, A., Kolas, N., O’Donnell, L.: The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43, D470 (2015)CrossRefGoogle Scholar
  24. 24.
    Suresh, V., Liu, L., Adjeroh, D., Zhou, X.: Revealing protein–lncRNA interaction. Brief. Bioinform. 17, 106 (2015)Google Scholar
  25. 25.
    Paliwal, K.K., Sharma, A., Lyons, J., Dehzangi, A.: A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans. Nanobiosci. 13(1), 44–50 (2014)CrossRefGoogle Scholar
  26. 26.
    You, Z.H., Zhou, M.C., Xin, L., Shuai, L.: Highly efficient framework for predicting interactions between proteins. IEEE Trans. Cybern. PP(99), 1–13 (2016)Google Scholar
  27. 27.
    Huang, Y.A., Chen, X., You, Z.H., Huang, D.S., Chan, K.C.C.: ILNCSIM: improved lncRNA functional similarity calculation model. Oncotarget 7(18), 25902–25914 (2016)Google Scholar
  28. 28.
    Zhu, L., You, Z.H., Huang, D.S., Wang, B.: t-LSE: a novel robust geometric approach for modeling protein-protein interaction networks. PLoS ONE 8(4), e58368 (2013)CrossRefGoogle Scholar
  29. 29.
    Zhu, L., You, Z.H., Huang, D.S.: Increasing the reliability of protein–protein interaction networks via non-convex semantic embedding. Neurocomputing 121(18), 99–107 (2013)CrossRefGoogle Scholar
  30. 30.
    You, Z.H., Yin, Z., Han, K., Huang, D.S., Zhou, X.: A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network. BMC Bioinform. 11(1), 1–13 (2010)CrossRefGoogle Scholar
  31. 31.
    Xia, J.F., You, Z.H., Wu, M., Wang, S.L., Zhao, X.M.: Improved method for predicting phi-turns in proteins using a two-stage classifier. Protein Pept. Lett. 17(9), 1117 (2010)CrossRefGoogle Scholar
  32. 32.
    You, Z.H., Li, X., Chan, K.C.: An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers. Neurocomputing 228, 277–282 (2017)CrossRefGoogle Scholar
  33. 33.
    Li, J.Q., Rong, Z.H., Chen, X., Yan, G.Y., You, Z.H.: MCMDA: matrix completion for MiRNA-disease association prediction. Oncotarget 8(13), 21187 (2017)Google Scholar
  34. 34.
    Mchugh, C.A., Russell, P., Guttman, M.: Methods for comprehensive experimental identification of RNA-protein interactions. Genome Biol. 15(1), 203 (2014)CrossRefGoogle Scholar
  35. 35.
    Yi, H.-C., You, Z.-H., Huang, D.-S., Li, X., Jiang, T.-H., Li, L.-P.: A deep learning framework for robust and accurate prediction of ncRNA-protein interactions using evolutionary information. Mol. Ther. Nucleic Acids 11, 337–344 (2018)CrossRefGoogle Scholar
  36. 36.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12), 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, pp. 8609–8613 (2013)Google Scholar
  38. 38.
    You, Z.H., Li, J., Gao, X., He, Z., Zhu, L., Lei, Y.K., Ji, Z.: Detecting protein-protein interactions with a novel matrix-based protein sequence representation and support vector machines. Biomed. Res. Int. 2015(2), 1–9 (2015)Google Scholar
  39. 39.
    You, Z.H., Chan, K.C.C., Hu, P.: Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS ONE 10(5), e0125811 (2015)CrossRefGoogle Scholar
  40. 40.
    You, Z.H., Li, S., Gao, X., Luo, X., Ji, Z.: Large-scale protein-protein interactions detection by integrating big biosensing data with computational model. Biomed. Res. Int. (2) (2014).
  41. 41.
    Pedregosa, F., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(10), 2825–2830 (2012)MathSciNetzbMATHGoogle Scholar
  42. 42.
    Yuan, H., You, Z.H., Xing, C., Chan, K., Xin, L.: Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinform. 17(1), 184 (2016)CrossRefGoogle Scholar
  43. 43.
    An, J.Y., You, Z.H., Meng, F.R., Xu, S.J., Wang, Y.: RVMAB: using the relevance vector machine model combined with average blocks to predict the interactions of proteins from protein sequences. Int. J. Mol. Sci. 17(5), 757 (2016)CrossRefGoogle Scholar
  44. 44.
    An, J.Y., Meng, F.R., You, Z.H., Fang, Y.H., Zhao, Y.J., Ming, Z.: Using the relevance vector machine model combined with local phase quantization to predict protein-protein interactions from protein sequences. Biomed. Res. Int. 2016, 1–9 (2016)CrossRefGoogle Scholar
  45. 45.
    Wong, L., You, Z.H., Ming, Z., Li, J., Chen, X., Huang, Y.A.: Detection of interactions between proteins through rotation forest and local phase quantization descriptors. Int. J. Mol. Sci. 17(1), 21 (2015)CrossRefGoogle Scholar
  46. 46.
    Wang, L., You, Z.H., Xia, S.X., Chen, X., Yan, X., Zhou, Y., Liu, F.: An improved efficient rotation forest algorithm to predict the interactions among proteins. Soft. Comput. 17, 1–9 (2017)Google Scholar
  47. 47.
    Wang, L., You, Z.H., Chen, X., Yan, X., Liu, G., Zhang, W.: RFDT: a rotation forest-based predictor for predicting drug-target interactions using drug structure and protein sequence information. Curr. Protein Pept. Sci. 5(19), 445–454 (2016)Google Scholar
  48. 48.
    Chen, X., Huang, Y.A., Wang, X.S., You, Z.H., Chan, K.C.: FMLNCSIM: fuzzy measure-based lncRNA functional similarity calculation model. Oncotarget 7(29), 45948 (2016)Google Scholar
  49. 49.
    Luo, X., You, Z., Zhou, M., Li, S., Leung, H., Xia, Y., Zhu, Q.: A highly efficient approach to protein interactome mapping based on collaborative filtering framework. Sci. Rep. 5(7702), 7702 (2015)CrossRefGoogle Scholar
  50. 50.
    Lei, Y.K., You, Z.H., Dong, T., Jiang, Y.X., Yang, J.A.: Increasing reliability of protein interactome by fast manifold embedding. Pattern Recognit. Lett. 34(4), 372–379 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Zhao-Hui Zhan
    • 1
  • Zhu-Hong You
    • 2
    Email author
  • Yong Zhou
    • 1
  • Li-Ping Li
    • 2
  • Zheng-Wei Li
    • 1
  1. 1.School of Computer Science and TechnologyChina University of Mining and TechnologyXuzhouChina
  2. 2.The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of ScienceUrumqiChina

Personalised recommendations