Analysis of Next-Generation Sequencing Data of miRNA for the Prediction of Breast Cancer

  • Indrajit SahaEmail author
  • Shib Sankar Bhowmick
  • Filippo Geraci
  • Marco Pellegrini
  • Debotosh Bhattacharjee
  • Ujjwal Maulik
  • Dariusz Plewczynski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9873)


Recently, Next-Generation Sequencing (NGS) has emerged as revolutionary technique in the fields of ‘-omics’ research. The Cancer Research Atlas (TCGA) is a great example of it where massive amount of sequencing data is present for miRNA and mRNA. Analysing these data could bring out some potential biological insight. Moreover, developing a prognostic system based on this newly available sequencing data will give a greater help to cancer diagnosis. Hence, in this article, we have made an attempt to analyse such sequencing data of miRNA for accurate prediction of Breast Cancer. Generally miRNAs are small non-coding RNAs which are shown to participate in several carcinogenic processes either by tumor suppressors or oncogenes. This is the reason clinical treatment of the breast cancer patient has changed nowadays. Thus, it is interesting to understand the role of miRNAs for the prediction of breast cancer. In this regard, we have developed a technique using Gravitation Search Algorithm, which optimizes the underlying classification performance of Support Vector Machine. The proposed technique is able to select the potential features, in this case miRNAs, in order to achieve better prediction accuracy. In this study, we have achieved the classification accuracy upto 95.29 % by considering \({\simeq }\)1.5 % miRNAs of whole dataset automatically. Thereafter, a list of miRNAs is created after providing a rank. It is found from the list of top 15 miRNAs that 6 miRNAs are associated with the breast cancer while in others, 5 miRNAs are associated with different cancer types and 4 are unknown miRNAs. The performance of the proposed technique is compared with seven other state-of-the-art techniques. Finally, the results have been justified by the means of statistical test along with biological significance analysis of selected miRNAs.


Breast cancer Gravitation search algorithm MicroRNA Support vector machine The Cancer Research Atlas 



This work was carried out during the tenure of an ERCIM ‘Alain Bensoussan’ Fellowship Programme as well as partially supported by the Polish National Science Centre (Grant number UMO-2013/09/B/NZ2/00121 and 2014/15/B/ST6/05082), COST BM1405 and BM1408 EU actions.


  1. 1.
    Grada, A., Weinbrecht, K.: Next-generation sequencing: methodology and application. J. Invest. Dermatol. 133(8), e11 (2013)CrossRefGoogle Scholar
  2. 2.
    Miller, T., Ghoshal, K., Ramaswamy, B., Roy, S., Datta, J., Shapiro, C., Jacob, S., Majumder, S.: MicroRNA-221/222 confers tamoxifen resistance in breast cancer by targeting p27Kip1. J. Biol. Chem. 283(44), 29897–29903 (2008)CrossRefGoogle Scholar
  3. 3.
    Bartel, D.: MicroRNAs: target recognition and regulatory functions. Cell 136, 215–233 (2009)CrossRefGoogle Scholar
  4. 4.
    Jacobsen, A., Silber, J., Harinath, G., Huse, J., Schultz, N., Sander, C.: Analysis of microRNA-target interactions across diverse cancer types. Nat. Struct. Mol. Biol. 20(11), 1325–1332 (2013)CrossRefGoogle Scholar
  5. 5.
    Bang-Berthelsen, C., Pedersen, L., Fløyel, T., Hagedorn, P., Gylvin, T., Pociot, F.: Independent component and pathway-based analysis of miRNA-regulated gene expression in a model of type 1 diabetes. BMC Genomics 12(1), 97 (2011)CrossRefGoogle Scholar
  6. 6.
    Song, H., Wang, Q., Guo, Y., Liu, S., Song, R., Gao, X., Dai, L., Li, B., Zhang, D., Cheng, J.: Microarray analysis of microRNA expression in peripheral blood mononuclear cells of critically ill patients with influenza A (H1N1). BMC Infect. Dis. 13(1), 257 (2013)CrossRefGoogle Scholar
  7. 7.
    Hunsberger, J., Fessler, E., Chibane, F., Leng, Y., Maric, D., Elkahloun, A., Chuang, D.: Mood stabilizer-regulated miRNAs in neuropsychiatric and neurodegenerative diseases: identifying associations and functions. Am. J. Transl. Res. 5(4), 450–464 (2013)Google Scholar
  8. 8.
    Baskerville, S., Bartel, D.: Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11(3), 241–247 (2005)CrossRefGoogle Scholar
  9. 9.
    Rodriguez, A., Griffiths-Jones, S., Ashurst, J., Bradley, A.: Identification of mammalian microRNA host genes and transcription units. Genome Res. 14(10a), 1902–1910 (2004)CrossRefGoogle Scholar
  10. 10.
    Sun, Y., Koo, S., White, N., Peralta, E., Esau, C., Dean, N., Perera, R.: Development of a micro-array to detect human and mouse microRNAs and characterization of expression in human organs. Nucleic Acids Res. 32, e188 (2004)CrossRefGoogle Scholar
  11. 11.
    Grimson, A., Farh, K., Johnston, W., Garrett-Engele, P., Lim, L., Bartel, D.: MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27(1), 91–105 (2007)CrossRefGoogle Scholar
  12. 12.
    Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S.: GSA: a gravitational search algorithm. Inf. Sci. 179(13), 2232–2248 (2009)CrossRefzbMATHGoogle Scholar
  13. 13.
    Boser, B.E., Guyon, I.M., Vapnik, N.V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)Google Scholar
  14. 14.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gassenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomeld, D.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  15. 15.
    Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day, San Francisco (1977)zbMATHGoogle Scholar
  16. 16.
    Hollander, M., Wolfe, D.A.: Nonparametric Statistical Methods, vol. 2. Wiley, New York (1999)zbMATHGoogle Scholar
  17. 17.
    Yang, H., Moody, J.: Feature selection based on joint mutual information. In: Proceedings of the International Symposium on Advances in Intelligent Data Analysis, pp. 22–25 (1999)Google Scholar
  18. 18.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  19. 19.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Networks 5(4), 537–550 (1994)CrossRefGoogle Scholar
  20. 20.
    Lancucki, A., Saha, I., Lipinski, P.: A new evolutionary gene selection technique. In: Proceedings of the International IEEE Conference on Evolutionary Computing, pp. 1612–1619 (2015)Google Scholar
  21. 21.
    Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. 11, 86–92 (1940)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Xie, B., Ding, Q., Han, H., Wu, D.: miRCancer: a microRNA-cancer association database constructed by text mining on literature. Bioinformatics 29(5), 638–644 (2013)CrossRefGoogle Scholar
  23. 23.
    Saha, I., Maulik, U., Plewczynski, D.: A new multi-objective technique for differential fuzzy clustering. Appl. Soft Comput. 11(2), 2765–2776 (2011)CrossRefGoogle Scholar
  24. 24.
    Saha, I., Plewczynski, D., Maulik, U., Bandyopadhyay, S.: Improved differential evolution for microarray analysis. Int. J. Data Min. Bioinform. 6(1), 86–103 (2012)CrossRefGoogle Scholar
  25. 25.
    Saha, I., Rak, B., Bhowmick, S.S., Maulik, U., Bhattacharjee, D., Koch, U., Lazniewski, M., Plewczynski, D.: Binding activity prediction of cyclin-dependent inhibitors. J. Chem. Inf. Model. 55(7), 1469–1482 (2015)CrossRefGoogle Scholar
  26. 26.
    Bhowmick, S.S., Saha, I., Mazzocco, G., Maulik, U., Rato, L., Bhattacharjee, D., Plewczynski, D.: Application of RotaSVM for HLA class II protein-peptide interaction prediction. In: Proceedings of the 5th International Conference on Bioinformatics, pp. 178–185 (2014)Google Scholar
  27. 27.
    Mazzocco, G., Bhowmick, S.S., Saha, I., Maulik, U., Bhattacharjee, D., Plewczynski, D.: MaER: a new ensemble based multiclass classifier for binding activity prediction of HLA Class II proteins. in: Proceedings of the 6th International Conference on Pattern Recognition and Machine Intelligence, pp. 462–471 (2015)Google Scholar
  28. 28.
    Saha, I., Zubek, J., Klingström, T., Forsberg, S., Wikander, J., Kierczak, M., Maulik, U., Plewczynski, D.: Ensemble learning prediction of protein-protein interactions using proteins functional annotations. Mol. BioSyst. 10(4), 820–830 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Indrajit Saha
    • 1
    • 2
    • 3
    Email author
  • Shib Sankar Bhowmick
    • 4
  • Filippo Geraci
    • 1
  • Marco Pellegrini
    • 1
  • Debotosh Bhattacharjee
    • 4
  • Ujjwal Maulik
    • 4
  • Dariusz Plewczynski
    • 3
  1. 1.Institute of Informatics and TelematicsNational Research CouncilPisaItaly
  2. 2.National Institute of Technical Teachers’ Training and ResearchKolkataIndia
  3. 3.Centre of New TechnologiesUniversity of WarsawWarsawPoland
  4. 4.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations