Abstract
This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686–690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bird R, Wallace T, Yankaskas B (1992) Analysis of cancer missed at screening mammography. Radiology 184:613–617
Hall F, Storella J, Silverstone D, Wyshak G (1988) Nonpalpable breast lesions: recommendations for biopsy based on suspicion of carcinoma at mammography. Radiology 167:353–358
Cotta C, Sloper C, Moscato P (2004) Evolutionary search of thresholds for robust feature set selection: application to the analysis of microarray data. In: Proceedings of EvoBio2004—2nd European workshop on evolutionary computation and bioinformatics, Coimbra, Portugal, 5–7 April 2004, pp 21–30
Kovalerchuk B, Triantaphyllou E, Ruiz J, Torvik V, Vityaev E (2000) The reliability issue of computer-aided breast cancer diagnosis. Comput Biomed Res 33:296–313
Davies S, Russell S (1994) NP-completeness of searches for smallest possible feature sets. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) fall symposium on relevance, pp 41–43
Goldberg D, Sastry K (2010) Genetic algorithms: the design of innovation, 2nd edn. Springer, New York
Moscato P, Cotta C, Mendes A (2004) Memetic algorithms. In: Onwubolu G, Babu B (eds) New optimization techniques in engineering. Springer, New York, pp 53–86
Cotta C, Moscato P (2003) The k-Feature Set problem is W[2]-complete. J Comput Syst Sci 67(4):686–690
Kovalerchuk B, Vityaev E, Ruiz J (2000) Consistent knowledge discovery in medical diagnosis. IEEE Eng Med Biol 19:26–37
Weihe K (1998) Covering trains by stations or the power of data reduction. In: Proceedings of ALEX'98—1st workshop on algorithms and experiments, Trento, Italy, 9–11 February 1998, pp 1–8
Berretta R, Mendes A, Moscato P (2007) Selection of discriminative genes in microarray experiments using mathematical programming. J Res Pract Inform Technol 39(4):287–299
Moscato P, Cotta C (2003) A gentle introduction to memetic algorithms. In: Glover F, Kochenberger G (eds) Handbook of metaheuristics. Springer, New York, pp 105–144
Neri F, Cotta C, Moscato P (2011) Handbook of memetic algorithms. Springer, New York
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, USA
Yunus M, Ahmed N, Masroor I, Yaqoob J (2004) Mammographic criteria for determining the diagnostic value of microcalcifications in the detection of early breast cancer. J Pak Med Assoc 54:24–29
Cotta C, Mendes A, Garcia V, Franca P, Moscato P (2003) Applying memetic algorithms to the analysis of microarray data. In: Cagnoni S et al. (eds) Proceedings of EvoBIO2003—1st European workshop on evolutionary bioinformatics, Essex, UK, 14–16 April 2003. Lecture Notes in Computer Science, vol 2611. Springer, Heidelberg, pp 22–32
Moscato P, Mendes A, Berretta R (2007) Benchmarking a memetic algorithm for ordering microarray data. Biosystems 88(1–2):56–75
Johnstone D, Milward EA, Berretta R, Moscato P (2012) Multivariate protein signatures of pre-clinical Alzheimer’s disease in the Alzheimer’s disease neuroimaging initiative (ADNI) plasma proteome dataset. PLoS One 7(4):e34341
de Paula MR, Ravetti MG, Berretta R, Moscato P (2011) Differences in abundances of cell-signalling proteins in blood reveal novel biomarkers for early detection of clinical Alzheimer’s disease. PLoS One 6(3):e17481
Ravetti MG, Moscato P (2008) Identification of a 5-protein biomarker molecular signature for predicting Alzheimer’s disease. PLoS One 3(9):e3111
Johnstone D, Graham RM, Trinder D, Delima RD, Riveros C, Olynyk JK et al (2012) Brain transcriptome perturbations in the Hfe(−/−) mouse model of genetic iron loading. Brain Res 1448:144–152
Johnstone DM, Graham RM, Trinder D, Riveros C, Olynyk JK, Scott RJ et al (2012) Changes in brain transcripts related to Alzheimer’s disease in a model of HFE hemochromatosis are not consistent with increased Alzheimer’s disease risk. J Alzheimers Dis 30(4):791–803
Ravetti MG, Rosso OA, Berretta R, Moscato P (2010) Uncovering molecular biomarkers that correlate cognitive decline with the changes of hippocampus’ gene expression profiles in Alzheimer’s disease. PLoS One 5(4):e10153
Riveros C, Mellor D, Gandhi KS, McKay FC, Cox MB, Berretta R et al (2010) A transcription factor map as revealed by a genome-wide gene expression analysis of whole-blood mRNA transcriptome in multiple sclerosis. PLoS One 5(12):e14176
Rosso OA, Mendes A, Berretta R, Rostas JA, Hunter M, Moscato P (2009) Distinguishing childhood absence epilepsy patients from controls by the analysis of their background brain electrical activity (II): a combinatorial optimization approach for electrode selection. J Neurosci Methods 181(2):257–267
Mendes A, Scott RJ, Moscato P (2008) Microarrays—identifying molecular portraits for prostate tumors with different Gleason patterns. Methods Mol Med 141:131–151
Berretta R, Costa W, Moscato P (2008) Combinatorial optimization models for finding genetic signatures from gene expression datasets. Methods Mol Biol 453:363–377
Milward EA, Moscato P, Riveros C, Johnstone DM (2014) Beyond statistics: a new combinatorial approach to identifying biomarker panels for the early detection and diagnosis of Alzheimer’s disease. J Alzheimers Dis 39(1):211–217
Pastore G, Costantini M, Valentini V, Romani M, Terribile D, Belli P (2002) Clinically nonpalpable breast tumors: global critical review and second look on microcalcifications. Rays 27(4):233–239
Bocchi L, Nori J (2007) Shape analysis of microcalcifications using Radon transform. Med Eng Phys 29(6):691–698
Resende LM, Matias MA, Oliveira GM, Salles MA, Melo FH, Gobbi H (2008) Evaluation of breast microcalcifications according to Breast Imaging Reporting and Data System (BI-RADS) and Le Gal’s classifications. Rev Bras Ginecol Obstet 30(2):75–79
Wilson GH 3rd, Gore JC, Yankeelov TE, Barnes S, Peterson TE, True JM et al (2014) An approach to breast cancer diagnosis via PET imaging of microcalcifications using 18F-NaF. J Nucl Med 55(7):1138–1143
Boisserie-Lacroix M, Bullier B, Hurtevent-Labrot G, Ferron S, Lippa N, Mac Grogan G (2014) Correlation between imaging and prognostic factors: molecular classification of breast cancers. Diagn Intervent Imaging 95(2):227–233
Scimeca M, Giannini E, Antonacci C, Pistolese CA, Spagnoli LG, Bonanno E (2014) Microcalcifications in breast cancer: an active phenomenon mediated by epithelial cells with mesenchymal characteristics. BMC Cancer 14:286
Cox RF, Morgan MP (2013) Microcalcifications in breast cancer: lessons from physiological mineralization. Bone 53(2):437–450
Jing H, Yang Y, Nishikawa RM (2012) Retrieval boosted computer-aided diagnosis of clustered microcalcifications for breast cancer. Med Phys 39(2):676–685
Baker R, Rogers KD, Shepherd N, Stone N (2010) New relationships between breast microcalcifications and cancer. Br J Cancer 103(7):1034–1039
Uematsu T, Kasami M, Yuen S (2009) A cluster of microcalcifications: women with high risk for breast cancer versus other women. Breast Cancer 16(4):307–314
Karahaliou A, Skiadopoulos S, Boniatis I, Sakellaropoulos P, Likaki E, Panayiotakis G et al (2007) Texture analysis of tissue surrounding microcalcifications on mammograms for breast cancer diagnosis. Br J Radiol 80(956):648–656
Kamitani T, Yabuuchi H, Soeda H, Matsuo Y, Okafuji T, Sakai S et al (2007) Detection of masses and microcalcifications of breast cancer on digital mammograms: comparison among hard-copy film, 3-megapixel liquid crystal display (LCD) monitors and 5-megapixel LCD monitors: an observer performance study. Eur Radiol 17(5):1365–1371
Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, Leung WK (2006) Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: initial experience. Radiology 240(3):666–673
Jing H, Yang Y, Nishikawa RM (2012) Regularization in retrieval-driven classification of clustered microcalcifications for breast cancer. Int J Biomed Imaging 2012, id463408
Farshid G, Sullivan T, Downey P, Gill PG, Pieterse S (2011) Independent predictors of breast malignancy in screen-detected microcalcifications: biopsy results in 2545 cases. Br J Cancer 105(11):1669–1675
Hsieh SL, Hsieh SH, Cheng PH, Chen CH, Hsu KP, Lee IS et al (2012) Design ensemble machine learning model for breast cancer diagnosis. J Med Syst 36(5):2841–2847
Djebbari A, Liu Z, Phan S, Famili F (2008) An ensemble machine learning approach to predict survival in breast cancer. Int J Comput Biol Drug Des 1(3):275–294
Choi JY, Kim DH, Plataniotis KN, Ro YM (2014) Computer-aided detection (CAD) of breast masses in mammography: combined detection and ensemble classification. Phys Med Biol 59(14):3697–3719
Ali S, Majid A, Khan A (2014) IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46(4):977–993
Krawczyk B, Schaefer G (2013) A pruned ensemble classifier for effective breast thermogram analysis. In: Annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp 7120–7123
Luo ST, Cheng BW (2012) Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst 36(2):569–577
Takemura A, Shimizu A, Hamamoto K (2010) Discrimination of breast tumors in ultrasonic images using an ensemble classifier based on the AdaBoost algorithm with feature selection. IEEE Trans Med Imaging 29(3):598–609
Vimieiro R, Moscato P (2014) Disclosed: an efficient depth-first, top-down algorithm for mining disjunctive closed itemsets in high-dimensional data. Inform Sci 280:171–187
Vimieiro R, Moscato P (2014) A new method for mining disjunctive emerging patterns in high-dimensional datasets using hypergraphs. Inform Syst 40:1–10
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this protocol
Cite this protocol
Mathieson, L., Mendes, A., Marsden, J., Pond, J., Moscato, P. (2017). Computer-Aided Breast Cancer Diagnosis with Optimal Feature Sets: Reduction Rules and Optimization Techniques. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_17
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6613-4_17
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6611-0
Online ISBN: 978-1-4939-6613-4
eBook Packages: Springer Protocols