Feature Selection from Image Descriptors Data for Breast Cancer Diagnosis Based on CAD
Breast cancer is an important public health problem worldwide among women. Its early detection generally increase the survival rate of patients, however, is one of the biggest deficiencies to the present. The purpose of this paper is to obtain a model capable of classifying benign and malign breast tumors, using a public dataset composed by features extracted from mammography images, obtained from the Breast Cancer Digital Repository initiative. Multivariate and univariate models were constructed using the machine learning algorithm based on CAD, Random Forest, applied to the images features. Both of the models were statistical compared looking for the better model according to their fitness. Results suggest the multivariate model has a better prediction capability than the univariate model, with an AUC between 0.991 and 0.910, however, they were found five specific descriptive features that can classify tumors with a similar fitness as the multivariate model, with AUCs between 0.897 and 0.958.
KeywordsBreast cancer diagnosis Tumor classification CAD Machine learning Random forest
This work was partially supported by the Laboratorio de Software Libre (Labsol) from Consejo Zacatecano de Ciencia Tecnología e Innovación (COZCyT). Also this work group thanks to Universidad Autónoma de Zacatecas (UAZ) for partially support the developed research.
- 2.Adams, P.: The breast cancer conundrum (2013)Google Scholar
- 3.Brandan, M.E., Villaseñor, Y.: Detección del cáncer de mama: estado de la mamografía en México. Cancerología 1(3), 147–162 (2006)Google Scholar
- 4.Dixon, A.M.: Diagnostic Breast Imaging: Mammography, Sonography, Magnetic Resonance Imaging, and Interventional Procedures (2014)Google Scholar
- 9.El Abbadi, N.K., Al Taee, E.J.: Breast cancer diagnosis by CAD. Int. J. Comput. Appl. 100(5) (2014)Google Scholar
- 16.Ripley, B.D.: The R project in statistical computing MSOR Connections. Newslett. LTSN Maths Stats OR Netw. 1(1), 23–25 (2001)Google Scholar
- 18.Robin, X., et al.: Package ‘proc’ (2017)Google Scholar
- 19.Aragon, T.: Epitools: epidemiology tools. R package version 0.5-7 (2012/2016)Google Scholar
- 20.Lele, S.R., Keim, J.L., Solymos, P., Solymos, M.P.: Package ‘resourceselection’ (2017)Google Scholar