A Model Based on Genetic Algorithm for Colorectal Cancer Diagnosis

  • Daniela F. TainoEmail author
  • Matheus G. Ribeiro
  • Guilherme Freire Roberto
  • Geraldo F. D. Zafalon
  • Marcelo Zanchetta do Nascimento
  • Thaína A. Tosta
  • Alessandro S. Martins
  • Leandro A. Neves
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11896)


In this paper we present a method based on genetic algorithm capable of analyzing a significant number of features obtained from fractal techniques, Haralick texture features and curvelet coefficients, as well as several selection methods and classifiers for the study and pattern recognition of colorectal cancer. The chromosomal structure was represented by four genes in order to define an individual. The steps for evaluation and selection of individuals as well as crossover and mutation were directed to provide distinctions of colorectal cancer groups with the highest accuracy rate and the smallest number of features. The tests were performed with features from histological images H&E, different values of population and iterations numbers and with the k-fold cross-validation method. The best result was provided by a population of 500 individuals and 50 iterations applying relief, random forest and 29 features (obtained mainly from the combination of percolation measures and curvelet subimages). This solution was capable of distinguishing the groups with an accuracy rate of 90.82% and an AUC equal to 0.967.


Genetic algorithm Colorectal cancer Feature selection Feature classification 



The authors gratefully acknowledge the financial support of National Council for Scientific and Technological Development CNPq (Grants #427114/2016-0, #304848/2018-2, #430965/2018-4 and #313365/2018-0), the State of Minas Gerais Research Foundation - FAPEMIG (Grant #APQ-00578-18).


  1. 1.
    Al-Rajab, M., Lu, J., Xu, Q.: Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. Comput. Methods Programs Biomed. 146, 11–24 (2017)CrossRefGoogle Scholar
  2. 2.
    Alteri, R., Kramer, J., Simpson, S.: Colorectal Cancer Facts and Figures 2014–2016, pp. 1–30. American Cancer Society, Atlanta (2014)Google Scholar
  3. 3.
    Anbarasi, M., Anupriya, E., Iyengar, N.: Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int. J. Eng. Sci. Technol. 2(10), 5370–5376 (2010)Google Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  5. 5.
    Bruderer, E., Singh, J.V.: Organizational evolution, learning, and selection: a genetic-algorithm-based model. Acad. Manag. J. 39(5), 1322–1349 (1996)Google Scholar
  6. 6.
    Candès, E.J., Donoho, D.L.: New tight frames of curvelets and optimal representations of objects with piecewise \({\rm c}^2\) singularities. Commun. Pure Appl. Math. 57(2), 219–266 (2004)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Chan, H.P., Charles, E., Metz, P., Lam, K., Wu, Y., Macmahon, H.: Improvement in radiologists’ detection of clustered microcalcifications on mammograms. Arbor 1001, 48109–0326 (1990)Google Scholar
  8. 8.
    Cleary, J.G., Trigg, L.E.: K*: an instance-based learner using an entropic distance measure. In: Machine Learning Proceedings, pp. 108–114. Elsevier (1995)CrossRefGoogle Scholar
  9. 9.
    Dai, J., Xu, Q.: Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl. Soft Comput. 13(1), 211–221 (2013)CrossRefGoogle Scholar
  10. 10.
    Doi, K.: Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput. Med. Imaging Graph. 31(4–5), 198–211 (2007)CrossRefGoogle Scholar
  11. 11.
    Eltoukhy, M.M., Faye, I., Samir, B.B.: A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation. Comput. Biol. Med. 42(1), 123–128 (2012)CrossRefGoogle Scholar
  12. 12.
    Gardner, M.W., Dorling, S.: Artificial neural networks (the multilayer perceptron)-a review of applications in the atmospheric sciences. Atmos. Environ. 32(14–15), 2627–2636 (1998)CrossRefGoogle Scholar
  13. 13.
    Gonçalves, E.C., Freitas, A.A., Plastino, A.: A survey of genetic algorithms for multi-label classification. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2018)Google Scholar
  14. 14.
    Gou, J., Ma, H., Ou, W., Zeng, S., Rao, Y., Yang, H.: A generalized mean distance-based k-nearest neighbor classifier. Expert Syst. Appl. 115, 356–372 (2019)CrossRefGoogle Scholar
  15. 15.
    Gurcan, M.N., et al.: Lung nodule detection on thoracic computed tomography images: preliminary evaluation of a computer-aided diagnosis system. Med. Phys. 29(11), 2552–2558 (2002)CrossRefGoogle Scholar
  16. 16.
    Haralick, R.M.: Statistical and structural approaches to texture. Proc. IEEE 67(5), 786–804 (1979)CrossRefGoogle Scholar
  17. 17.
    IARC: Cancer fact sheets: Colorectal cancer. Technical report, International Agency for Research on Cancer, Lyon, France (2012)Google Scholar
  18. 18.
    Ivanovici, M., Richard, N., Decean, H.: Fractal dimension and lacunarity of psoriatic lesions-a colour approach. Medicine 6(4), 7 (2009)Google Scholar
  19. 19.
    Jørgensen, A.S., et al.: Using cell nuclei features to detect colon cancer tissue in hematoxylin and eosin stained slides. Cytometry Part A 91(8), 785–793 (2017)CrossRefGoogle Scholar
  20. 20.
    Kather, J.N., et al.: Multi-class texture analysis in colorectal cancer histology. Sci. Rep. 6, 27988 (2016)CrossRefGoogle Scholar
  21. 21.
    Kečo, D., Subasi, A., Kevric, J.: Cloud computing-based parallel genetic algorithm for gene selection in cancer classification. Neural Comput. Appl. 30(5), 1601–1610 (2018)CrossRefGoogle Scholar
  22. 22.
    Khan, A., Qureshi, A.S., Hussain, M., Hamza, M.Y., et al.: A recent survey on the applications of genetic programming in image processing. arXiv preprint arXiv:1901.07387 (2019)
  23. 23.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Machine Learning Proceedings, pp. 249–256. Elsevier (1992)Google Scholar
  24. 24.
    Lu, C., Zhu, Z., Gu, X.: An intelligent system for lung cancer diagnosis using a new genetic algorithm based feature selection method. J. Med. Syst. 38(9), 97 (2014)CrossRefGoogle Scholar
  25. 25.
    Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1998)Google Scholar
  26. 26.
    Muni, D.P., Pal, N.R., Das, J.: Genetic programming for simultaneous feature selection and classifier design (2006)Google Scholar
  27. 27.
    Nikolaidis, N., Nikolaidis, I., Tsouros, C.: A variation of the box-counting algorithm applied to colour images. arXiv preprint arXiv:1107.2336 (2011)
  28. 28.
    Paul, D., Su, R., Romain, M., Sébastien, V., Pierre, V., Isabelle, G.: Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier. Comput. Med. Imaging Graph. 60, 42–49 (2017)CrossRefGoogle Scholar
  29. 29.
    Quinlan, J.R.: C4. 5: Programs for Machine Learning. Elsevier, Amsterdam (2014)Google Scholar
  30. 30.
    Ribeiro, M.G., Neves, L.A., do Nascimento, M.Z., Roberto, G.F., Martins, A.S., Tosta, T.A.A.: Classification of colorectal cancer based on the association of multidimensional and multiresolution features. Expert Syst. Appl. 120, 262–278 (2019)., Scholar
  31. 31.
    Roberto, G.F.: Features based on the percolation theory for quantification of non-hodgkin lymphomas. Comput. Biol. Med. 91, 135–147 (2017)CrossRefGoogle Scholar
  32. 32.
    Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3), 660–674 (1991)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)CrossRefGoogle Scholar
  34. 34.
    Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)CrossRefGoogle Scholar
  35. 35.
    Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. In: Liu, H., Motoda, H. (eds.) Feature Extraction, Construction and Selection, vol. 453, pp. 117–136. Springer, Heidelberg (1998). Scholar
  36. 36.
    Yu, S., Guan, L.: A CAD system for the automatic detection of clustered microcalcifications in digitized mammogram films. IEEE Trans. Med. Imaging 19(2), 115–126 (2000)CrossRefGoogle Scholar
  37. 37.
    Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newslett. 6(1), 80–89 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Daniela F. Taino
    • 1
    Email author
  • Matheus G. Ribeiro
    • 1
  • Guilherme Freire Roberto
    • 2
  • Geraldo F. D. Zafalon
    • 1
  • Marcelo Zanchetta do Nascimento
    • 2
  • Thaína A. Tosta
    • 3
  • Alessandro S. Martins
    • 4
  • Leandro A. Neves
    • 1
  1. 1.Department of Computer Science and StatisticsSão Paulo State University (UNESP)São José do Rio PretoBrazil
  2. 2.Faculty of Computation (FACOM)Federal University of Uberlândia (UFU)UberlândiaBrazil
  3. 3.Center of Mathematics, Computing and CognitionFederal University of ABC (UFABC)Santo AndréBrazil
  4. 4.Federal Institute of Triângulo Mineiro (IFTM)ItuiutabaBrazil

Personalised recommendations