An SVM-Based Approach to Discover MicroRNA Precursors in Plant Genomes

  • Yi Wang
  • Cheqing Jin
  • Minqi Zhou
  • Aoying Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7104)


MicroRNAs (miRNAs) are noncoding RNAs of ~22 nucleotides that play versatile regulatory roles in multicelluler organisms. Since the cloning methods for miRNAs identification are biased towards abundant miRNAs, the computational approaches provide useful complements to identify miRNAs which are highly constrained by tissue- and time-specifically expression manners. In this paper, we propose a novel Support Vector Machine (SVM) based detector, named MiR-PD, to identify pre-miRNAs in plants. The classifier is constructed based on twelve features of pre-miRNAs, inclusive of five global features and seven sub-structure features. Trained on 790 plant pre-miRNAs and 7,900 pseudo pre-miRNAs, MiR-PD achieves 96.43% five-fold cross-validation accuracy. Tested on the newly identified 441 plant pre-miRNAs and 62,883 pseudo pre-miRNAs, MiR-PD reports an accuracy of 99.71% with 77.55% sensitivity and 99.87% specificity, suggesting a feasible genome-wide application of this miRNAs detector so as to identify novel miRNAs (especially for those species-specific miRNAs) in plants without relying on phylogenetical conservation.


MicroRNAs plant support vector machine MiR-PD 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Batuwita, R., Palade, V.: microPred: effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics 25(8), 989 (2009)CrossRefGoogle Scholar
  2. 2.
    Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzilai, A., Einat, P., Einav, U., Meiri, E., et al.: Identification of hundreds of conserved and nonconserved human microRNAs. Nature Genetics 37(7), 766–770 (2005)CrossRefGoogle Scholar
  3. 3.
    Bonnet, E., Wuyts, J., Rouzé, P., Van de Peer, Y.: Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. PNAS 101(31), 11511 (2004)CrossRefGoogle Scholar
  4. 4.
    Bonnet, E., Wuyts, J., Rouzé, P., Van de Peer, Y.: Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics (2004)Google Scholar
  5. 5.
    Carrington, J.C., Ambros, V.: Role of microRNAs in plant and animal development. Science 301(5631), 336 (2003)CrossRefGoogle Scholar
  6. 6.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001)Google Scholar
  7. 7.
    Chang, D., Wang, C., Chen, J.: Using a kernel density estimation based classifier to predict species-specific microRNA precursors. BMC Bioinformatics 9(suppl.12), 2 (2008)CrossRefGoogle Scholar
  8. 8.
    Cullen, B.: Viruses and microRNAs. Nature Genetics 38, S25–S30 (2006)CrossRefGoogle Scholar
  9. 9.
    Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S., Bateman, A.: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Research 33(Database Issue), D121 (2005)CrossRefGoogle Scholar
  10. 10.
    Griffiths-Jones, S., Saini, H., Dongen, S., Enright, A.: miRBase: tools for microRNA genomics. Nucleic Acids Research (2007)Google Scholar
  11. 11.
    Hertel, J., Stadler, P.: Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data. Bioinformatics 22(14), e197 (2006)CrossRefGoogle Scholar
  12. 12.
    Hofacker, I., Fekete, M., Stadler, P.: Secondary structure prediction for aligned RNA sequences. Journal of Molecular Biology 319(5), 1059–1066 (2002)CrossRefGoogle Scholar
  13. 13.
    Hsieh, C., Chang, D., Hsueh, C., Wu, C., Oyang, Y.: Predicting microRNA precursors with a generalized Gaussian components based density estimation algorithm. BMC Bioinformatics 11(suppl.1), 52 (2010)CrossRefGoogle Scholar
  14. 14.
    Jones-Rhoades, M., Bartel, D.: Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Molecular Cell 14(6), 787–799 (2004)CrossRefGoogle Scholar
  15. 15.
    Kwang Loong, S., Mishra, S.: De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures. Bioinformatics (2007)Google Scholar
  16. 16.
    Lai, E., Tomancak, P., Williams, R., Rubin, G.: Computational identification of Drosophila microRNA genes. Genome Biol. 4(7), R42 (2003)CrossRefGoogle Scholar
  17. 17.
    Lee, R., Feinbaum, R., Ambros, V.: The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5), 843–854 (1993)CrossRefGoogle Scholar
  18. 18.
    Lim, L., Glasner, M., Yekta, S., Burge, C., Bartel, D.: Vertebrate microRNA genes. Science 299(5612), 1540 (2003)CrossRefGoogle Scholar
  19. 19.
    Lim, L., Lau, N., Weinstein, E., Abdelhakim, A., Yekta, S., Rhoades, M., Burge, C., Bartel, D.: The microRNAs of Caenorhabditis elegans. Genes & Development 17(8), 991 (2003)CrossRefGoogle Scholar
  20. 20.
    Osuna, E., Freund, R., Girosi, F.: Support vector machines: Training and applications. CBCL-144 (1997)Google Scholar
  21. 21.
    Pedersen, J., Bejerano, G., Siepel, A., Rosenbloom, K., Lindblad-Toh, K., Lander, E., Kent, J., Miller, W., Haussler, D.: Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2(4), e33 (2006)CrossRefGoogle Scholar
  22. 22.
    Reinhart, B., Slack, F., Basson, M., Pasquinelli, A., Bettinger, J., Rougvie, A., Horvitz, H., Ruvkun, G.: The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403(6772), 901–906 (2000)CrossRefGoogle Scholar
  23. 23.
    Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M., Tuschl, T., Van Nimwegen, E., Zavolan, M.: Identification of clustered microRNAs using an ab initio prediction method. BMC Bioinformatics 6(1), 267 (2005)CrossRefGoogle Scholar
  24. 24.
    Wang, X., Zhang, J., Li, F., Gu, J., He, T., Zhang, X., Li, Y.: MicroRNA identification based on sequence and structure alignment. Bioinformatics 21(18), 3610 (2005)CrossRefGoogle Scholar
  25. 25.
    Wang, X., Reyes, J., Chua, N., Gaasterland, T.: Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets. Genome Biology 5(9), R65 (2004)CrossRefGoogle Scholar
  26. 26.
    Washietl, S., Hofacker, I., Stadler, P.: Fast and reliable prediction of noncoding RNAs. Proceedings of the National Academy of Sciences 102(7), 2454 (2005)CrossRefGoogle Scholar
  27. 27.
    Xue, C., Li, F., He, T., Liu, G., Li, Y., Zhang, X.: Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6(1), 310 (2005)CrossRefGoogle Scholar
  28. 28.
    Zhang, B., Pan, X., Cox, S., Cobb, G., Anderson, T.: Evidence that miRNAs are different from other RNAs. Cellular and Molecular Life Sciences 63(2), 246–254 (2006)CrossRefGoogle Scholar
  29. 29.
    Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research 31(13), 3406 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yi Wang
    • 1
  • Cheqing Jin
    • 1
  • Minqi Zhou
    • 1
  • Aoying Zhou
    • 1
  1. 1.Shanghai Key Laboratory of Trustworthy Computing, Software Engineering InstituteEast China Normal UniversityChina

Personalised recommendations