AC-Caps: Attention Based Capsule Network for Predicting RBP Binding Sites of LncRNA

Abstract

Long non-coding RNA(lncRNA) is one of the non-coding RNAs longer than 200 nucleotides and it has no protein encoding function. LncRNA plays a key role in many biological processes. Studying the RNA-binding protein (RBP) binding sites on the lncRNA chain helps to reveal epigenetic and post-transcriptional mechanisms, to explore the physiological and pathological processes of cancer, and to discover new therapeutic breakthroughs. To improve the recognition rate of RBP binding sites and reduce the experimental time and cost, many calculation methods based on domain knowledge to predict RBP binding sites have emerged. However, these prediction methods are independent of nucleotides and do not take into account nucleotide statistics. In this paper, we use a high-order statistical-based encoding scheme, then the encoded lncRNA sequences are fed into a hybrid deep learning architecture named AC-Caps. It consists of a joint processing layer(composed of attention mechanism and convolutional neural network) and a capsule network. The AC-Caps model was evaluated using 31 independent experimental data sets from 12 lncRNA-binding proteins. In experiments, our method achieves excellent performance, with an average area under the curve (AUC) of 0.967 and an average accuracy (ACC) of 92.5%, which are 0.014, 2.3%, 0.261, 28.9%, 0.189, and 21.8% higher than HOCCNNLB, iDeepS, and DeepBind, respectively. The results show that the AC-Caps method can reliably process the large-scale RBP binding site data on the lncRNA chain, and the prediction performance is better than existing deep-learning models. The source code of AC-Caps and the datasets used in this paper are available at https://github.com/JinmiaoS/AC-Caps.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. 1.

    Chen LL, Carmichael GG (2010) Decoding the function of nuclear long non-coding RNAs. Curr Opin Cell Biol 22(3):357–364. https://doi.org/10.1016/j.ceb.2010.03.003

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Carpenter S, Ricci EP, Mercier BC et al (2014) Post-transcriptional regulation of gene expression in innate immunity. Nat Rev Immunol 14(6):361–376. https://doi.org/10.1038/nri3682

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Jiang Q, Wang J, Wu X et al (2015) LncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression. Nucl Acids Res 43(D1):D193–D196. https://doi.org/10.1093/nar/gku1173

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Michalik KM, You X, Manavski Y et al (2014) Long noncoding RNA MALAT1 regulates endothelial cell function and vessel growth. Circ Res 114(9):1389–1397. https://doi.org/10.1161/CIRCRESAHA.114.303265

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Rossi MN (2014) Antonangeli F (2014) LncRNAs: new players in apoptosis control. Int J Cell Biol. https://doi.org/10.1155/2014/473857

    Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Van K, Marieke Kedde M et al (2011) MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nat Rev Cancer 11(9):644–656. https://doi.org/10.1038/nrc3107

    CAS  Article  Google Scholar 

  7. 7.

    Xie G, Huang S, Luo Y et al (2019) LLCLPLDA: a novel model for predicting lncRNA-disease associations. Mol Genet Genom 294(6):1477–1486. https://doi.org/10.1007/s00438-019-01590-8

    CAS  Article  Google Scholar 

  8. 8.

    Jiang W, Qu Y, Yang Q et al (2019) D-lnc: a comprehensive database and analytical platform to dissect the modification of drugs on lncRNA expression. RNA Biol 16(11):1586–1591. https://doi.org/10.1080/15476286.2019.1649584

    Article  PubMed  Google Scholar 

  9. 9.

    Si J, Cui J, Cheng J, Wu R (2015) Computational prediction of rna-binding proteins and binding sites. Int J Mol Sci 16(11):26303–26317. https://doi.org/10.3390/ijms161125952

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Cirillo D, Blanco M, Armaos A et al (2017) Quantitative predictions of protein interactions with long noncoding RNAs. Nat Methods 14(1):5. https://doi.org/10.1038/nmeth.4100

    CAS  Article  Google Scholar 

  11. 11.

    Paz I, Kligun E, Bengad B et al (2016) BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins. Nucl Acids Res 44(W1):W568–W574. https://doi.org/10.1093/nar/gkw454

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Maticzka D, Lange SJ, Costa F et al (2014) GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol 15(1):R17. https://doi.org/10.1186/gb-2014-15-1-r17

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Stražar M, Žitnik M, Zupan B et al (2016) Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins. Bioinformatics 32(10):1527–1535. https://doi.org/10.1093/bioinformatics/btw003

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Zhang X, Liu S (2017) RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 33(6):854–862. https://doi.org/10.1093/bioinformatics/btw730

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (NIPS 2012), pp 1097-1105. https://doi.org/10.1145/3065386

  16. 16.

    Kamada S, Ichimura T, Harada T (2019) Knowledge extraction of adaptive structural learning of deep belief network for medical examination data. Int J Semant Comput 13(1):67–86. https://doi.org/10.1142/S1793351X1940004X

    Article  Google Scholar 

  17. 17.

    Zoughi T, Homayounpour MM (2019) A gender-aware deep neural network structure for speech recognition. Iran J Sci Technol Trans Electr Eng 43(3):635–644. https://doi.org/10.1007/s40998-019-00177-8

    Article  Google Scholar 

  18. 18.

    Alipanahi B, Delong A, Weirauch MT et al (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831. https://doi.org/10.1038/nbt.3300

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Pan X, Rijnbeek P, Yan J et al (2018) Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genom 19(1):511. https://doi.org/10.1186/s12864-018-4889-1

    CAS  Article  Google Scholar 

  20. 20.

    Pan X, Shen HB (2018) Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics 34(20):3427–3436. https://doi.org/10.1093/bioinformatics/bty364

    CAS  Article  PubMed  Google Scholar 

  21. 21.

    Ghanbari M, Ohler U (2019) Deep neural networks for interpreting RNA binding protein target preferences. Genome Res 30(2):214–226. https://doi.org/10.1101/gr.247494.118

    CAS  Article  Google Scholar 

  22. 22.

    Zhang K, Pan X, Yang Y et al (2019) CRIP: predicting circRNA-RBP-binding sites using a codon-based encoding and hybrid deep neural networks. RNA 25(12):1604–1615. https://doi.org/10.1261/rna.070565.119

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Du X, Diao Y, Yao Y et al (2018) DeepMVF-RBP: deep multi-view fusion representation learning for RNA-binding proteins prediction. In: IEEE International Conference on bioinformatics and biomedicine (BIBM), pp 65-68.https://doi.org/10.1109/BIBM.2018.8621102

  24. 24.

    Chung T, Kim D (2019) Prediction of binding property of RNA-binding proteins using multi-sized filters and multi-modal deep convolutional neural network. PLoS One. https://doi.org/10.1371/journal.pone.0216257

    Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Zhang Q, Zhu L, Huang DS (2018) High-order convolutional neural network architecture for predicting DNA-protein binding sites. IEEE/ACM Trans Comput Biol Bioinform 16(4):1184–1192. https://doi.org/10.1109/TCBB.2018.2819660

    Article  PubMed  Google Scholar 

  26. 26.

    Zhang SW, Wang Y, Zhang XX et al (2019) Prediction of the RBP binding sites on lncRNAs using the high-order nucleotide encoding convolutional neural network. Anal Biochem 583:113364. https://doi.org/10.1016/j.ab.2019.113364

    CAS  Article  PubMed  Google Scholar 

  27. 27.

    Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3856-3866. arXiv:1710.09829

  28. 28.

    Pan X, Shen HB (2017) RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform 18(1):136. https://doi.org/10.1186/s12859-017-1561-8

    CAS  Article  Google Scholar 

  29. 29.

    Muhammod R, Ahmed S, Md Farid D et al (2019) PyFeat: a Python-based effective feature generation tool for DNA. RNA and protein sequences. Bioinformatics 35(19):3831–3833. https://doi.org/10.1093/bioinformatics/btz165

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6):764–770. https://doi.org/10.1093/bioinformatics/btr011

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Melsted P, Pritchard JK (2011) Efficient counting of k-mers in DNA sequences using a bloom filter. BMC Bioinform 12(1):333. https://doi.org/10.1186/1471-2105-12-333

    CAS  Article  Google Scholar 

  32. 32.

    LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541

    Article  Google Scholar 

  33. 33.

    Shen Y, He X, Gao J et al (2014) A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 101-110. https://doi.org/10.1145/2661829.2661935

  34. 34.

    Pan X, Yan J (2017) Attention based convolutional neural network for predicting RNA-protein binding sites. arXiv:1712.02270

  35. 35.

    Kim J, Jang S, Park E et al (2019) Text classification using capsules. Neurocomputing 376:214–221. https://doi.org/10.1016/j.neucom.2019.10.033

    Article  Google Scholar 

  36. 36.

    Liu F, Zhang SW, Guo WF et al (2016) Inference of gene regulatory network based on local bayesian networks. PLoS Comput Biol. https://doi.org/10.1371/journal.pcbi.1005024

    Article  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437. https://doi.org/10.1016/j.ipm.2009.03.002

    Article  Google Scholar 

  38. 38.

    Baldi P, Brunak S, Chauvin Y et al (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424. https://doi.org/10.1093/bioinformatics/16.5.412

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010

    Article  Google Scholar 

  40. 40.

    Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: The 32nd International Conference on International Conference on machine learning. https://doi.org/10.5555/3045118.3045167

  41. 41.

    Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929-1958. http://jmlr.org/papers/v15/srivastava14a.html. Accessed 1 June 2020

Download references

Acknowledgements

This paper was supported by the National Natural Science Foundation of China (No. 61701073).

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Shengwei Tian or Yan Xing.

Ethics declarations

Conflict of Interest

The authors declared that they have no conflicts of interest to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 25 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Song, J., Tian, S., Yu, L. et al. AC-Caps: Attention Based Capsule Network for Predicting RBP Binding Sites of LncRNA. Interdiscip Sci Comput Life Sci (2020). https://doi.org/10.1007/s12539-020-00379-3

Download citation

Keywords

  • Attention mechanism
  • Capsule network
  • Convolutional neural network
  • lncRNA-binding protein