Journal of Zhejiang University-SCIENCE B

, Volume 20, Issue 6, pp 476–487 | Cite as

Genomic data mining for functional annotation of human long noncoding RNAs

  • Brian L. Gudenas
  • Jun Wang
  • Shu-zhen Kuang
  • An-qi Wei
  • Steven B. Cogill
  • Liang-jiang WangEmail author


Life may have begun in an RNA world, which is supported by increasing evidence of the vital role that RNAs perform in biological systems. In the human genome, most genes actually do not encode proteins; they are noncoding RNA genes. The largest class of noncoding genes is known as long noncoding RNAs (lncRNAs), which are transcripts greater in length than 200 nucleotides, but with no protein-coding capacity. While some lncRNAs have been demonstrated to be key regulators of gene expression and 3D genome organization, most lncRNAs are still uncharacterized. We thus propose several data mining and machine learning approaches for the functional annotation of human lncRNAs by leveraging the vast amount of data from genetic and genomic studies. Recent results from our studies and those of other groups indicate that genomic data mining can give insights into lncRNA functions and provide valuable information for experimental studies of candidate lncRNAs associated with human disease.

Key words

Long noncoding RNA Functional annotation Genomic data mining Machine learning 

利用基因组数据挖掘对人类长非编码RNA 进行 功能注释


越来越多证据表明RNA 在生物系统中扮演着重 要的角色,而这些发现支持了生命起源于RNA 的假设。在人类基因组中,大部分的基因并不编 码蛋白质,被称为非编码RNA 基因。长非编码 RNA(lncRNA)是其中最大的一类,其转录本长 度大于200 个核苷酸。虽然一些lncRNA 已被证 明是调控基因表达和3D 基因组结构的重要元 件,但是大部分lncRNA 还未被研究和注释。本 课题组利用大量基因组数据,提出一些基于数据 挖掘和机器学习的方法,对人类lncRNA 进行功 能注释。我们与其他同领域课题组的近期研究结 果表明,基因组数据挖掘可帮助加深对lncRNA 功能的理解,并为与疾病相关lncRNA 的实验研 究提供重要信息。


长非编码RNA(lncRNA) 功能注释 基因组数 据挖掘 机器学习 

CLC number



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Achar A, Sætrom P, 2015. RNA motif discovery: a computational overview. Biol Direct, 10:61. Google Scholar
  2. Brázda V, Hároniková L, Liao JCC, et al., 2014. DNA and RNA quadruplex-binding proteins. Int J Mol Sci, 15(10): 17493–17517. Google Scholar
  3. Cabili MN, Dunagin MC, McClanahan PD, et al., 2015. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol, 16:20. Google Scholar
  4. Cajigas I, Leib DE, Cochrane J, et al., 2015. Evf2 lncRNA/BRG1/DLX1 interactions reveal RNA-dependent inhibition of chromatin remodeling. Development, 142(15): 2641–2652. Google Scholar
  5. Cammas A, Millevoi S, 2017. RNA G-quadruplexes: emerging mechanisms in disease. Nucleic Acids Res, 45(4):1584–1595. Google Scholar
  6. Cao HF, Wahlestedt C, Kapranov P, 2018. Strategies to annotate and characterize long noncoding RNAs: advantages and pitfalls. Trends Genet, 34(9):704–721. Google Scholar
  7. Cao Z, Pan XY, Yang Y, et al., 2018. The lncLocator: a sub-cellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics, 34(13):2185–2194. Google Scholar
  8. Carlevaro-Fita J, Johnson R, 2019. Global positioning system: understanding long noncoding RNAs through subcellular localization. Mol Cell, 73(5):869–883. Google Scholar
  9. Chaudhary R, Gryder B, Woods WS, et al., 2017. Prosurvival long noncoding RNA PINCR regulates a subset of p53 targets in human colorectal cancer cells by binding to Matrin 3. eLife, 6:e23244. Google Scholar
  10. Chen LL, 2016. Linking long noncoding RNA localization and function. Trends Biochem Sci, 41(9):761–772. Google Scholar
  11. Ching T, Himmelstein DS, Beaulieu-Jones BK, et al., 2018. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface, 15(141):20170387. Google Scholar
  12. Clark BS, Blackshaw S, 2014. Long non-coding RNA-dependent transcriptional regulation in neuronal development and disease. Front Genet, 5:164. Google Scholar
  13. Clemson CM, Hutchinson JN, Sara SA, et al., 2009. An architectural role for a nuclear noncoding RNA: NEAT1 RNA is essential for the structure of paraspeckles. Mol Cell, 33(6):717–726. Google Scholar
  14. Cogill SB, Wang LJ, 2014. Co-expression network analysis of human lncRNAs and cancer genes. Cancer Inform, 13(Suppl 5):49–59. Google Scholar
  15. Cogill SB, Wang LJ, 2016. Support vector machine model of developmental brain gene expression data for prioritization of Autism risk gene candidates. Bioinformatics, 32(23):3611–3618. Google Scholar
  16. Cogill SB, Srivastava AK, Yang MQ, et al., 2018. Co-expression of long non-coding RNAs and autism risk genes in the developing human brain. BMC Syst Biol, 12(Suppl 7):91. Google Scholar
  17. Darnell JC, Fraser CE, Mostovetsky O, et al., 2005. Kissing complex RNAs mediate interaction between the Fragile-X mental retardation protein KH2 domain and brain polyribosomes. Genes Dev, 19(8):903–918. Google Scholar
  18. Davidovich C, Cech TR, 2015. The recruitment of chromatin modifiers by long noncoding RNAs: lessons from PRC2. RNA, 21(12):2007–2022. Google Scholar
  19. de Rubeis S, He X, Goldberg AP, et al., 2014. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature, 515(7526):209–215. Google Scholar
  20. Derrien T, Johnson R, Bussotti G, et al., 2012. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res, 22(9):1775–1789. Google Scholar
  21. ENCODE Project Consortium, 2012. An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414):57–74. Google Scholar
  22. Ferrè F, Colantoni A, Helmer-Citterich M, 2016. Revealing protein-lncRNA interaction. Brief Bioinform, 17(1):106–116. Google Scholar
  23. Geisler S, Coller J, 2013. RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol, 14(11):699–712. Google Scholar
  24. Gudenas BL, Wang LJ, 2015. Gene coexpression networks in human brain developmental transcriptomes implicate the association of long noncoding RNAs with intellectual disability. Bioinform Biol Insights, 9(Suppl 1):21–27. Google Scholar
  25. Gudenas BL, Wang LJ, 2018. Prediction of lncRNA subcellular localization with deep learning from sequence features. Sci Rep, 8(1):16385. Google Scholar
  26. Gudenas BL, Srivastava AK, Wang LJ, 2017. Integrative genomic analyses for identification and prioritization of long non-coding RNAs associated with autism. PLoS ONE, 12(5):e0178532. Google Scholar
  27. Guo Y, Chen X, Xing RX, et al., 2018. Interplay between FMRP and lncRNA TUG1 regulates axonal development through mediating SnoN-Ccd1 pathway. Hum Mol Genet, 27(3):475–485. Google Scholar
  28. Guttman M, Rinn JL, 2012. Modular regulatory principles of large non-coding RNAs. Nature, 482(7385):339–346. Google Scholar
  29. Hangauer MJ, Vaughn IW, McManus MT, 2013. Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet, 9(6):e1003569. Google Scholar
  30. Huarte M, Guttman M, Feldser D, et al., 2010. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell, 142(3):409–419. Google Scholar
  31. Iyer MK, Niknafs YS, Malik R, et al., 2015. The landscape of long noncoding RNAs in the human transcriptome. Nat Genet, 47(3):199–208. Google Scholar
  32. Jackman JE, Alfonzo JD, 2013. Transfer RNA modifications: nature’s combinatorial chemistry playground. Wiley Interdiscip Rev RNA, 4(1):35–48. Google Scholar
  33. Jin JJ, Lv W, Xia P, et al., 2018. Long noncoding RNA SYISL regulates myogenesis by interacting with polycomb repressive complex 2. Proc Natl Acad Sci USA, 115(42): E9802–E9811. Google Scholar
  34. Ke SD, Alemu EA, Mertens C, et al., 2015. A majority of m6A residues are in the last exons, allowing the potential for 3′ UTR regulation. Genes Dev, 29(19):2037–2053. Google Scholar
  35. Kiser DP, Rivero O, Lesch KP, 2015. Annual research review: the (epi)genetics of neurodevelopmental disorders in the era of whole-genome sequencing—unveiling the dark matter. J Child Psychol Psychiatry, 56(3):278–295. Google Scholar
  36. Kumar V, Westra HJ, Karjalainen J, et al., 2013. Human disease-associated genetic variation impacts large intergenic non-coding RNA expression. PLoS Genet, 9(1):e1003201. Google Scholar
  37. Kung JT, Kesner B, An JY, et al., 2015. Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF. Mol Cell, 57(2):361–375. Google Scholar
  38. Li L, Zhuang YL, Zhao XS, et al., 2019. Long non-coding RNA in neuronal development and neurological disorders. Front Genet, 9:744. Google Scholar
  39. Li R, Zhu HL, Luo YB, 2016. Understanding the functions of long non-coding RNAs through their higher-order structures. Int J Mol Sci, 17(5):E702. Google Scholar
  40. Liao Q, Liu CN, Yuan XY, et al., 2011. Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network. Nucleic Acids Res, 39(9): 3864–3878. Google Scholar
  41. Linder B, Grozhik AV, Olarerin-George AO, et al., 2015. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat Methods, 12(8):767–772. Google Scholar
  42. Liu N, Dai Q, Zheng GQ, et al., 2015. N 6-methyladenosine-dependent RNA structural switches regulate RNA-protein interactions. Nature, 518(7540):560–564. Google Scholar
  43. Lu QS, Ren SJ, Lu M, et al., 2013. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics, 14:651. Google Scholar
  44. Maurano MT, Humbert R, Rynes E, et al., 2012. Systematic localization of common disease-associated variation in regulatory DNA. Science, 337(6099):1190–1195. Google Scholar
  45. Morris KV, 2016. Long Non-coding RNAs in Human Disease. Springer International Publishing, Cham, Germany. Google Scholar
  46. Muppirala UK, Honavar VG, Dobbs D, 2011. Predicting RNA-protein interactions using only sequence information. BMC Bioinformatics, 12:489. Google Scholar
  47. Necsulea A, Soumillon M, Warnefors M, et al., 2014. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature, 505(7485):635–640. Google Scholar
  48. O’Roak BJ, Vives L, Girirajan S, et al., 2012. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature, 485(7397):246–250. Google Scholar
  49. Pan XY, Fan YX, Yan JC, et al., 2016. IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction. BMC Genomics, 17:582. Google Scholar
  50. Patil DP, Chen CK, Pickering BF, et al., 2016. m6A RNA methylation promotes XIST-mediated transcriptional repression. Nature, 537(7620):369–373. Google Scholar
  51. Pertea M, Salzberg SL, 2010. Between a chicken and a grape: estimating the number of human genes. Genome Biol, 11(5):206. Google Scholar
  52. Pian C, Zhang GL, Chen Z, et al., 2016. LncRNApred: classification of long non-coding RNAs and protein-coding transcripts by the ensemble algorithm with a new hybrid feature. PLoS ONE, 11(5):e0154567. Google Scholar
  53. Ponting CP, Oliver PL, Reik W, 2009. Evolution and functions of long noncoding RNAs. Cell, 136(4):629–641. Google Scholar
  54. Quinn JJ, Chang HY, 2016. Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet, 17(1):47–62. Google Scholar
  55. Rashid F, Shah A, Shan G, 2016. Long non-coding RNAs in the cytoplasm. Genomics Proteomics Bioinformatics, 14(2): 73–80. Google Scholar
  56. Rcaño-Ponce I, Wijmenga C, 2013. Mapping of immune-mediated disease genes. Annu Rev Genomics Hum Genet, 14:325–353. Google Scholar
  57. Song JH, Yi CQ, 2017. Chemical modifications to RNA: a new layer of gene expression regulation. ACS Chem Biol, 12(2):316–325. Google Scholar
  58. Srivastava AK, Schwartz CE, 2014. Intellectual disability and autism spectrum disorders: causal genes and molecular mechanisms. Neurosci Biobehav Rev, 46:161–174. Google Scholar
  59. Su ZD, Huang Y, Zhang ZY, et al., 2018. iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics, 34(24):4196–4204. Google Scholar
  60. Sun QY, Hao QY, Prasanth KV, 2018. Nuclear long noncoding RNAs: key regulators of gene expression. Trends Genet, 34(2):142–157. Google Scholar
  61. Sun S, del Rosario BC, Szanto A, et al., 2013. Jpx RNA activates Xist by evicting CTCF. Cell, 153(7):1537–1551. Google Scholar
  62. Tripathi V, Ellis JD, Shen Z, et al., 2010. The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell, 39(6):925–938. Google Scholar
  63. van de Vondervoort IIGM, Gordebeke PM, Khoshab N, et al., 2013. Long non-coding RNAs in neurodevelopmental disorders. Front Mol Neurosci, 6:53. Google Scholar
  64. Verpelli C, Montani C, Vicidomini C, et al., 2013. Mutations of the synapse genes and intellectual disability syndromes. Eur J Pharmacol, 719(1–3):112–116. Google Scholar
  65. Wang KC, Chang HY, 2011. Molecular mechanisms of long noncoding RNAs. Mol Cell, 43(6):904–914. Google Scholar
  66. Wang X, He C, 2014. Dynamic RNA modifications in post-transcriptional regulation. Mol Cell, 56(1):5–12. Google Scholar
  67. Wang X, Lu ZK, Gomez A, et al., 2014. N 6-methyladenosine-dependent regulation of messenger RNA stability. Nature, 505(7481):117–120. Google Scholar
  68. Wang X, Zhao BS, Roundtree IA, et al., 2015. N 6-methyladenosine modulates messenger RNA translation efficiency. Cell, 161(6):1388–1399. Google Scholar
  69. Wang Y, Zhao X, Ju W, et al., 2015. Genome-wide differential expression of synaptic long noncoding RNAs in autism spectrum disorder. Transl Psychiatry, 5(10):e660. Google Scholar
  70. Werner MS, Ruthenburg AJ, 2015. Nuclear fractionation reveals thousands of chromatin-tethered noncoding RNAs adjacent to active genes. Cell Rep, 12(7):1089–1098. Google Scholar
  71. Wu P, Zuo XL, Deng HL, et al., 2013. Roles of long noncoding RNAs in brain development, functional diversification and neurodegenerative diseases. Brain Res Bull, 97:69–80. Google Scholar
  72. Xu X, Xu YC, Shi CQ, et al., 2017. A genome-wide comprehensively analyses of long noncoding RNA profiling and metastasis associated lncRNAs in renal cell carcinoma. Oncotarget, 8(50):87773–87781. Google Scholar
  73. Yang LT, Tang YY, Xiong F, et al., 2018. LncRNAs regulate cancer metastasis via binding to functional proteins. Oncotarget, 9(1):1426–1443. Google Scholar
  74. Yoon JH, Abdelmohsen K, Kim J, et al., 2013. Scaffold function of long non-coding RNA HOTAIR in protein ubiq-uitination. Nat Commun, 4:2939. Google Scholar
  75. Zampetaki A, Albrecht A, Steinhofel K, 2018. Long-noncoding RNA structure and function: is there a link? Front Physiol, 9:1201. Google Scholar
  76. Zhang YQ, Hamada M, 2018. DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning. BMC Bioinformatics, 19(Suppl 19):524. Google Scholar
  77. Zhang ZH, Jhaveri DJ, Marshall VM, et al., 2014. A comparative study of techniques for differential expression analysis on RNA-seq data. PLoS ONE, 9(8):e103207. Google Scholar
  78. Zheng GXY, Do BT, Webster DE, et al., 2014. Dicer-microRNA-Myc circuit promotes transcription of hundreds of long noncoding RNAs. Nat Struct Mol Biol, 21(7):585–590. Google Scholar
  79. Zhou Y, Zeng P, Li YH, et al., 2016. SRAMP: prediction of mammalian N 6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Res, 44(10):e91. Google Scholar
  80. Ziats MN, Rennert OM, 2013. Aberrant expression of long noncoding RNAs in autistic brain. J Mol Neurosci, 49(3): 589–593. Google Scholar
  81. Zou Q, Xing PW, Wei LY, et al., 2019. Gene2vec: gene subsequence embedding for prediction of mammalian N 6-methyladenosine sites from mRNA. RNA, 25(2):205–218. Google Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Genetics and BiochemistryClemson UniversityClemsonUSA

Personalised recommendations