Prediction of LncRNA by Using Muitiple Feature Information Fusion and Feature Selection Technique

  • Jun Meng
  • Dingling Jiang
  • Zheng Chang
  • Yushi LuanEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10955)


Recent genomic studies suggest that long non-coding RNAs (lncRNAs) play an important role in regulation of plant growth. Therefore, it is important to find more plant lncRNAs and predict their functions. This paper presents an improved maximum correlation minimum redundancy method for lncRNAs recognition. Sequence feature, secondary structural feature and functional feature such as pseudo-nucleotides feature which is based on the physical and chemical properties between dimers dinucleotide of related RNA have been extracted. Then, using maximum correlation minimum redundancy method to integrate a variety of feature selection methods such as Pearson correlation coefficient, information gain, relief algorithm and random forest for feature selection. Based on the selected superior feature subset, the classification model is established by SVM. Experimental results on Arabidopsis sequence dataset show that pseudo-nucleotides feature reflects information of different RNA sequences and the classification model constructed according to the proposed method can be more accurate than other methods on identification of plant lncRNAs.


Ensemble feature selection Maximum correlation minimum redundancy Pseudo nucleotides features Classification LncRNA 



The current study was supported by the National Natural Science Foundation of China (Nos. 61472061 and 31471880), and the Graduate Educational Reform Fund of Dalian University of Technology (Jg2017015).


  1. 1.
    An, N., Palmer, C.M., Baker, R.L., et al.: Plant high-throughput phenotyping using photogrammetry and imaging techniques to measure leaf length and rosette area. Comput. Electron. Agric. 127(C), 376–394 (2016)CrossRefGoogle Scholar
  2. 2.
    Perron, U., Provero, P., Molineris, I.: In silico prediction of lncRNA function using tissue specific and evolutionary conserved expression. BMC Bioinform. 18(5), 144 (2017)CrossRefGoogle Scholar
  3. 3.
    Mercer, T.R., Mattick, J.S.: Structure and function of long noncoding RNAs in epigenetic regulation. Nat. Struct. Mol. Biol. 20(3), 300 (2013)CrossRefGoogle Scholar
  4. 4.
    Aryal, B., Rotllan, N., Fernández-hernando, C.: Noncoding RNAs and atherosclerosis. Current Atherosclerosis Rep. 16(5), 1–11 (2014)CrossRefGoogle Scholar
  5. 5.
    Lee, J.T., Bartolomei, M.S.: X-inactivation, imprinting, and long noncoding RNAs in health and disease. Cell 152(6), 1308–1323 (2013)CrossRefGoogle Scholar
  6. 6.
    Pian, C., Zhang, G., Chen, Z., et al.: LncRNApred: classification of long non-coding RNAs and protein-coding transcripts by the ensemble algorithm with a new hybrid feature. PLoS ONE 11(5), e0154567 (2016)CrossRefGoogle Scholar
  7. 7.
    Wang, L., Park, H.J., Dasari, S., Wang, S., Kocher, J.-P., Li, W.: CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41(6), e74 (2013)CrossRefGoogle Scholar
  8. 8.
    Long, H., Xu, Z., Hu, B., et al.: COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features. Nucleic Acids Res. 45(1), e2 (2017)CrossRefGoogle Scholar
  9. 9.
    Schneider, H.W., Raiol, T., Brigido, M.M., et al.: A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts. BMC Genom. 18(1), 804 (2017)CrossRefGoogle Scholar
  10. 10.
    Yen, S.J., Lee, Y.S.: Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36(3), 5718–5727 (2009)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Kumar, M., Gromiha, M.M., Raghava, G.P.: SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J. Mol. Recognit. 24(2), 303–313 (2011)CrossRefGoogle Scholar
  12. 12.
    Tatarinova, T., Brover, V., Troukhan, M., et al.: Skew in CG content near the transcription start site in, Arabidopsis thaliana. Bioinformatics 19(Suppl. 1), i313 (2003)CrossRefGoogle Scholar
  13. 13.
    Stadler, P.F., Hofacker, I.L., Lorenz, R., et al.: ViennaRNA Package 2.0. Algorithms Mol. Biol. 6(1), 26 (2011)CrossRefGoogle Scholar
  14. 14.
    Zhao, Y.W., Su, Z.D., Yang, W., et al.: IonchanPred 2.0: a tool to predict ion channels and their types. Int. J. Mol. Sci. 18(9), 1838 (2017)CrossRefGoogle Scholar
  15. 15.
    Chen, W., Feng, P.M., Lin, H., et al.: iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 41(6), e68 (2013)CrossRefGoogle Scholar
  16. 16.
    Liu, B., Liu, F., Fang, L., et al.: repRNA: a web server for generating various feature vectors of RNA sequences. Mol. Genet. Genomics 291(1), 473–481 (2016)CrossRefGoogle Scholar
  17. 17.
    Zuber, J., Sun, H., Zhang, X., et al.: A sensitivity analysis of RNA folding nearest neighbor parameters identifies a subset of free energy parameters with the greatest impact on RNA secondary structure prediction. Nucleic Acids Res. 45(10), 6168–6176 (2017)CrossRefGoogle Scholar
  18. 18.
    Dai, J., Xu, Q.: Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl. Soft Comput. J. 13(1), 211–221 (2013)CrossRefGoogle Scholar
  19. 19.
    Shin, J.H., Park, C.H., Yang, Y.J., et al.: Entropy-based analysis of the non-linear relationship between gene expression profiles of amplified and non-amplified RNA. Int. J. Mol. Med. 20(6), 905 (2007)Google Scholar
  20. 20.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  21. 21.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jun Meng
    • 1
  • Dingling Jiang
    • 1
  • Zheng Chang
    • 1
  • Yushi Luan
    • 2
    Email author
  1. 1.School of Computer Science and TechnologyDalian University of TechnologyDalianChina
  2. 2.School of Life Science and BiotechnologyDalian University of TechnologyDalianChina

Personalised recommendations