Feature Selection from Image Descriptors Data for Breast Cancer Diagnosis Based on CAD

  • Laura A. Zanella-Calzada
  • Carlos E. Galván-Tejada
  • Jorge I. Galván-Tejada
  • José M. Celaya-Padilla
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10633)


Breast cancer is an important public health problem worldwide among women. Its early detection generally increase the survival rate of patients, however, is one of the biggest deficiencies to the present. The purpose of this paper is to obtain a model capable of classifying benign and malign breast tumors, using a public dataset composed by features extracted from mammography images, obtained from the Breast Cancer Digital Repository initiative. Multivariate and univariate models were constructed using the machine learning algorithm based on CAD, Random Forest, applied to the images features. Both of the models were statistical compared looking for the better model according to their fitness. Results suggest the multivariate model has a better prediction capability than the univariate model, with an AUC between 0.991 and 0.910, however, they were found five specific descriptive features that can classify tumors with a similar fitness as the multivariate model, with AUCs between 0.897 and 0.958.


Breast cancer diagnosis Tumor classification CAD Machine learning Random forest 



This work was partially supported by the Laboratorio de Software Libre (Labsol) from Consejo Zacatecano de Ciencia Tecnología e Innovación (COZCyT). Also this work group thanks to Universidad Autónoma de Zacatecas (UAZ) for partially support the developed research.


  1. 1.
    Cheng, H.D., Cai, X., Chen, X., Hu, L., Lou, X.: Computer-aided detection and classification of microcalcifications in mammograms: a survey. Pattern Recognit. 36(12), 2967–2991 (2003)CrossRefGoogle Scholar
  2. 2.
    Adams, P.: The breast cancer conundrum (2013)Google Scholar
  3. 3.
    Brandan, M.E., Villaseñor, Y.: Detección del cáncer de mama: estado de la mamografía en México. Cancerología 1(3), 147–162 (2006)Google Scholar
  4. 4.
    Dixon, A.M.: Diagnostic Breast Imaging: Mammography, Sonography, Magnetic Resonance Imaging, and Interventional Procedures (2014)Google Scholar
  5. 5.
    Wulaningsih, W., et al.: Serum calcium and the risk of breast cancer: findings from the swedish amoris study and a meta-analysis of prospective studies. Int. J. Mol. Sci. 17(9), 1487 (2016)CrossRefGoogle Scholar
  6. 6.
    Xia, C., Kahn, C., Wang, J., Liao, Y., Chen, W., Yu, X.Q.: Temporal trends in geographical variation in breast cancer mortality in china, 1973–2005: an analysis of nationwide surveys on cause of death. Int. J. Environ. Res. Pub. Health 13(10), 963 (2016)CrossRefGoogle Scholar
  7. 7.
    Houghton, L.C., et al.: Associations of breast cancer risk factors with premenopausal sex hormones in women with very low breast cancer risk. Int. J. Environ. Res. Pub. Health 13(11), 1066 (2016)CrossRefGoogle Scholar
  8. 8.
    Astley, S., Gilbert, F.: Computer-aided detection in mammography. Clin. Radiol. 59(5), 390–399 (2004)CrossRefGoogle Scholar
  9. 9.
    El Abbadi, N.K., Al Taee, E.J.: Breast cancer diagnosis by CAD. Int. J. Comput. Appl. 100(5) (2014)Google Scholar
  10. 10.
    Eadie, L.H., Taylor, P., Gibson, A.P.: A systematic review of computer-assisted diagnosis in diagnostic cancer imaging. Eur. J. Radiol. 81(1), e70–e76 (2012)CrossRefGoogle Scholar
  11. 11.
    Moftah, H.M., Azar, A.T., Al-Shammari, E.T., Ghali, N.I., Hassanien, A.E., Shoman, M.: Adaptive k-means clustering algorithm for MR breast image segmentation. Neural Comput. Appl. 24(7–8), 1917–1928 (2014)CrossRefGoogle Scholar
  12. 12.
    Dheeba, J., Singh, N.A., Selvi, S.T.: Computer-aided detection of breast cancer on mammograms: a swarm intelligence optimized wavelet neural network approach. J. Biomed. Inform. 49, 45–52 (2014)CrossRefGoogle Scholar
  13. 13.
    Ramani, R., Vanitha, N.S.: Computer a ided detection of tumours in mammograms. Int. J. Image Graph. Signal Process. 6(4), 54 (2014)CrossRefGoogle Scholar
  14. 14.
    Karahaliou, A., et al.: Texture analysis of tissue surrounding microcalcifications on mammograms for breast cancer diagnosis. British J. Radiol. 80(956), 648–656 (2007)CrossRefGoogle Scholar
  15. 15.
    Moura, D.C., López, M.A.G.: An evaluation of image descriptors combined with clinical data for breast cancer diagnosis. Int. J. Comput. Assist. Radiol. Surg. 8(4), 561–574 (2013)CrossRefGoogle Scholar
  16. 16.
    Ripley, B.D.: The R project in statistical computing MSOR Connections. Newslett. LTSN Maths Stats OR Netw. 1(1), 23–25 (2001)Google Scholar
  17. 17.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  18. 18.
    Robin, X., et al.: Package ‘proc’ (2017)Google Scholar
  19. 19.
    Aragon, T.: Epitools: epidemiology tools. R package version 0.5-7 (2012/2016)Google Scholar
  20. 20.
    Lele, S.R., Keim, J.L., Solymos, P., Solymos, M.P.: Package ‘resourceselection’ (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Laura A. Zanella-Calzada
    • 1
  • Carlos E. Galván-Tejada
    • 1
  • Jorge I. Galván-Tejada
    • 1
  • José M. Celaya-Padilla
    • 1
  1. 1.Universidad Autónoma de Zacatecas, Unidad Académica de Ingeniería EléctricaZacatecasMexico

Personalised recommendations