Skip to main content

Computer-Aided Breast Cancer Diagnosis with Optimal Feature Sets: Reduction Rules and Optimization Techniques

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1526))

Abstract

This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686–690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Notes

  1. 1.

    http://www.csc.lsu.edu/trianta/ResearchAreas/DigitalMammography/index.html.

References

  1. Bird R, Wallace T, Yankaskas B (1992) Analysis of cancer missed at screening mammography. Radiology 184:613–617

    Article  CAS  PubMed  Google Scholar 

  2. Hall F, Storella J, Silverstone D, Wyshak G (1988) Nonpalpable breast lesions: recommendations for biopsy based on suspicion of carcinoma at mammography. Radiology 167:353–358

    Article  CAS  PubMed  Google Scholar 

  3. Cotta C, Sloper C, Moscato P (2004) Evolutionary search of thresholds for robust feature set selection: application to the analysis of microarray data. In: Proceedings of EvoBio2004—2nd European workshop on evolutionary computation and bioinformatics, Coimbra, Portugal, 5–7 April 2004, pp 21–30

    Google Scholar 

  4. Kovalerchuk B, Triantaphyllou E, Ruiz J, Torvik V, Vityaev E (2000) The reliability issue of computer-aided breast cancer diagnosis. Comput Biomed Res 33:296–313

    Article  CAS  PubMed  Google Scholar 

  5. Davies S, Russell S (1994) NP-completeness of searches for smallest possible feature sets. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) fall symposium on relevance, pp 41–43

    Google Scholar 

  6. Goldberg D, Sastry K (2010) Genetic algorithms: the design of innovation, 2nd edn. Springer, New York

    Google Scholar 

  7. Moscato P, Cotta C, Mendes A (2004) Memetic algorithms. In: Onwubolu G, Babu B (eds) New optimization techniques in engineering. Springer, New York, pp 53–86

    Chapter  Google Scholar 

  8. Cotta C, Moscato P (2003) The k-Feature Set problem is W[2]-complete. J Comput Syst Sci 67(4):686–690

    Article  Google Scholar 

  9. Kovalerchuk B, Vityaev E, Ruiz J (2000) Consistent knowledge discovery in medical diagnosis. IEEE Eng Med Biol 19:26–37

    Article  CAS  Google Scholar 

  10. Weihe K (1998) Covering trains by stations or the power of data reduction. In: Proceedings of ALEX'98—1st workshop on algorithms and experiments, Trento, Italy, 9–11 February 1998, pp 1–8

    Google Scholar 

  11. Berretta R, Mendes A, Moscato P (2007) Selection of discriminative genes in microarray experiments using mathematical programming. J Res Pract Inform Technol 39(4):287–299

    Google Scholar 

  12. Moscato P, Cotta C (2003) A gentle introduction to memetic algorithms. In: Glover F, Kochenberger G (eds) Handbook of metaheuristics. Springer, New York, pp 105–144

    Chapter  Google Scholar 

  13. Neri F, Cotta C, Moscato P (2011) Handbook of memetic algorithms. Springer, New York

    Google Scholar 

  14. Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, USA

    Google Scholar 

  15. Yunus M, Ahmed N, Masroor I, Yaqoob J (2004) Mammographic criteria for determining the diagnostic value of microcalcifications in the detection of early breast cancer. J Pak Med Assoc 54:24–29

    CAS  PubMed  Google Scholar 

  16. Cotta C, Mendes A, Garcia V, Franca P, Moscato P (2003) Applying memetic algorithms to the analysis of microarray data. In: Cagnoni S et al. (eds) Proceedings of EvoBIO2003—1st European workshop on evolutionary bioinformatics, Essex, UK, 14–16 April 2003. Lecture Notes in Computer Science, vol 2611. Springer, Heidelberg, pp 22–32

    Google Scholar 

  17. Moscato P, Mendes A, Berretta R (2007) Benchmarking a memetic algorithm for ordering microarray data. Biosystems 88(1–2):56–75

    Article  CAS  PubMed  Google Scholar 

  18. Johnstone D, Milward EA, Berretta R, Moscato P (2012) Multivariate protein signatures of pre-clinical Alzheimer’s disease in the Alzheimer’s disease neuroimaging initiative (ADNI) plasma proteome dataset. PLoS One 7(4):e34341

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. de Paula MR, Ravetti MG, Berretta R, Moscato P (2011) Differences in abundances of cell-signalling proteins in blood reveal novel biomarkers for early detection of clinical Alzheimer’s disease. PLoS One 6(3):e17481

    Article  Google Scholar 

  20. Ravetti MG, Moscato P (2008) Identification of a 5-protein biomarker molecular signature for predicting Alzheimer’s disease. PLoS One 3(9):e3111

    Article  Google Scholar 

  21. Johnstone D, Graham RM, Trinder D, Delima RD, Riveros C, Olynyk JK et al (2012) Brain transcriptome perturbations in the Hfe(−/−) mouse model of genetic iron loading. Brain Res 1448:144–152

    Article  CAS  PubMed  Google Scholar 

  22. Johnstone DM, Graham RM, Trinder D, Riveros C, Olynyk JK, Scott RJ et al (2012) Changes in brain transcripts related to Alzheimer’s disease in a model of HFE hemochromatosis are not consistent with increased Alzheimer’s disease risk. J Alzheimers Dis 30(4):791–803

    PubMed  Google Scholar 

  23. Ravetti MG, Rosso OA, Berretta R, Moscato P (2010) Uncovering molecular biomarkers that correlate cognitive decline with the changes of hippocampus’ gene expression profiles in Alzheimer’s disease. PLoS One 5(4):e10153

    Article  Google Scholar 

  24. Riveros C, Mellor D, Gandhi KS, McKay FC, Cox MB, Berretta R et al (2010) A transcription factor map as revealed by a genome-wide gene expression analysis of whole-blood mRNA transcriptome in multiple sclerosis. PLoS One 5(12):e14176

    Article  PubMed  PubMed Central  Google Scholar 

  25. Rosso OA, Mendes A, Berretta R, Rostas JA, Hunter M, Moscato P (2009) Distinguishing childhood absence epilepsy patients from controls by the analysis of their background brain electrical activity (II): a combinatorial optimization approach for electrode selection. J Neurosci Methods 181(2):257–267

    Article  PubMed  Google Scholar 

  26. Mendes A, Scott RJ, Moscato P (2008) Microarrays—identifying molecular portraits for prostate tumors with different Gleason patterns. Methods Mol Med 141:131–151

    Google Scholar 

  27. Berretta R, Costa W, Moscato P (2008) Combinatorial optimization models for finding genetic signatures from gene expression datasets. Methods Mol Biol 453:363–377

    Article  CAS  PubMed  Google Scholar 

  28. Milward EA, Moscato P, Riveros C, Johnstone DM (2014) Beyond statistics: a new combinatorial approach to identifying biomarker panels for the early detection and diagnosis of Alzheimer’s disease. J Alzheimers Dis 39(1):211–217

    PubMed  Google Scholar 

  29. Pastore G, Costantini M, Valentini V, Romani M, Terribile D, Belli P (2002) Clinically nonpalpable breast tumors: global critical review and second look on microcalcifications. Rays 27(4):233–239

    PubMed  Google Scholar 

  30. Bocchi L, Nori J (2007) Shape analysis of microcalcifications using Radon transform. Med Eng Phys 29(6):691–698

    Article  CAS  PubMed  Google Scholar 

  31. Resende LM, Matias MA, Oliveira GM, Salles MA, Melo FH, Gobbi H (2008) Evaluation of breast microcalcifications according to Breast Imaging Reporting and Data System (BI-RADS) and Le Gal’s classifications. Rev Bras Ginecol Obstet 30(2):75–79

    Article  PubMed  Google Scholar 

  32. Wilson GH 3rd, Gore JC, Yankeelov TE, Barnes S, Peterson TE, True JM et al (2014) An approach to breast cancer diagnosis via PET imaging of microcalcifications using 18F-NaF. J Nucl Med 55(7):1138–1143

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Boisserie-Lacroix M, Bullier B, Hurtevent-Labrot G, Ferron S, Lippa N, Mac Grogan G (2014) Correlation between imaging and prognostic factors: molecular classification of breast cancers. Diagn Intervent Imaging 95(2):227–233

    Article  CAS  Google Scholar 

  34. Scimeca M, Giannini E, Antonacci C, Pistolese CA, Spagnoli LG, Bonanno E (2014) Microcalcifications in breast cancer: an active phenomenon mediated by epithelial cells with mesenchymal characteristics. BMC Cancer 14:286

    Article  PubMed  PubMed Central  Google Scholar 

  35. Cox RF, Morgan MP (2013) Microcalcifications in breast cancer: lessons from physiological mineralization. Bone 53(2):437–450

    Article  CAS  PubMed  Google Scholar 

  36. Jing H, Yang Y, Nishikawa RM (2012) Retrieval boosted computer-aided diagnosis of clustered microcalcifications for breast cancer. Med Phys 39(2):676–685

    Article  PubMed  PubMed Central  Google Scholar 

  37. Baker R, Rogers KD, Shepherd N, Stone N (2010) New relationships between breast microcalcifications and cancer. Br J Cancer 103(7):1034–1039

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Uematsu T, Kasami M, Yuen S (2009) A cluster of microcalcifications: women with high risk for breast cancer versus other women. Breast Cancer 16(4):307–314

    Article  PubMed  Google Scholar 

  39. Karahaliou A, Skiadopoulos S, Boniatis I, Sakellaropoulos P, Likaki E, Panayiotakis G et al (2007) Texture analysis of tissue surrounding microcalcifications on mammograms for breast cancer diagnosis. Br J Radiol 80(956):648–656

    Article  CAS  PubMed  Google Scholar 

  40. Kamitani T, Yabuuchi H, Soeda H, Matsuo Y, Okafuji T, Sakai S et al (2007) Detection of masses and microcalcifications of breast cancer on digital mammograms: comparison among hard-copy film, 3-megapixel liquid crystal display (LCD) monitors and 5-megapixel LCD monitors: an observer performance study. Eur Radiol 17(5):1365–1371

    Article  PubMed  Google Scholar 

  41. Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, Leung WK (2006) Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: initial experience. Radiology 240(3):666–673

    Article  PubMed  Google Scholar 

  42. Jing H, Yang Y, Nishikawa RM (2012) Regularization in retrieval-driven classification of clustered microcalcifications for breast cancer. Int J Biomed Imaging 2012, id463408

    Google Scholar 

  43. Farshid G, Sullivan T, Downey P, Gill PG, Pieterse S (2011) Independent predictors of breast malignancy in screen-detected microcalcifications: biopsy results in 2545 cases. Br J Cancer 105(11):1669–1675

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Hsieh SL, Hsieh SH, Cheng PH, Chen CH, Hsu KP, Lee IS et al (2012) Design ensemble machine learning model for breast cancer diagnosis. J Med Syst 36(5):2841–2847

    Article  PubMed  Google Scholar 

  45. Djebbari A, Liu Z, Phan S, Famili F (2008) An ensemble machine learning approach to predict survival in breast cancer. Int J Comput Biol Drug Des 1(3):275–294

    Article  PubMed  Google Scholar 

  46. Choi JY, Kim DH, Plataniotis KN, Ro YM (2014) Computer-aided detection (CAD) of breast masses in mammography: combined detection and ensemble classification. Phys Med Biol 59(14):3697–3719

    Article  PubMed  Google Scholar 

  47. Ali S, Majid A, Khan A (2014) IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46(4):977–993

    Article  CAS  PubMed  Google Scholar 

  48. Krawczyk B, Schaefer G (2013) A pruned ensemble classifier for effective breast thermogram analysis. In: Annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp 7120–7123

    Google Scholar 

  49. Luo ST, Cheng BW (2012) Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst 36(2):569–577

    Article  PubMed  Google Scholar 

  50. Takemura A, Shimizu A, Hamamoto K (2010) Discrimination of breast tumors in ultrasonic images using an ensemble classifier based on the AdaBoost algorithm with feature selection. IEEE Trans Med Imaging 29(3):598–609

    Article  PubMed  Google Scholar 

  51. Vimieiro R, Moscato P (2014) Disclosed: an efficient depth-first, top-down algorithm for mining disjunctive closed itemsets in high-dimensional data. Inform Sci 280:171–187

    Article  Google Scholar 

  52. Vimieiro R, Moscato P (2014) A new method for mining disjunctive emerging patterns in high-dimensional datasets using hypergraphs. Inform Syst 40:1–10

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo Moscato .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Mathieson, L., Mendes, A., Marsden, J., Pond, J., Moscato, P. (2017). Computer-Aided Breast Cancer Diagnosis with Optimal Feature Sets: Reduction Rules and Optimization Techniques. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6613-4_17

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6611-0

  • Online ISBN: 978-1-4939-6613-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics