Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

An instance and variable selection approach in pixel-based classification for automatic white blood cells segmentation

  • 16 Accesses


Instance and variable selection involve identifying a subset of instances and variables such that the learning process will use only this subset with better performances and lower cost. Due to the huge amount of data available in many fields, data reduction is considered as an NP-hard problem. In this paper, we present a simultaneous instance and variable selection approach based on the Random Forest-RI ensemble methods in the aim to discard noisy and useless information from the original data set. We proposed a selection principle based on two concepts: the ensemble margin and the importance variable measure of Random Forest-RI. Experiments were conducted on cytological images for the automatic segmentation and recognition of white blood cells WBC (nucleus and cytoplasm). Moreover, in order to explore the performance of our proposed approach, experiments were carried out on standardized datasets from UCI and ASU repository, and the obtained results of the instances and variable selection by the Random Forest classifier are very encouraging.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    Azmi R, Norozi N, Anbiaee R, Salehi L, Amirzadi A (2011) Impst: a new interactive self-training approach to segmentation suspicious lesions in breast MRI. J Med Signals Sens 1(2):138–148

  2. 2.

    Baluja S (1994) Population-based incremental learning: a method for integrating genetic search based function optimization and competitive learning. Technical Report, CMU-CS-94-163, Computer Science Department, Carnegie Mellon University

  3. 3.

    Baluja S (1995) An empirical comparison of seven iterative and evolutionary function optimization heuristics. Technical report, School of Computer Science Carnegie Mellon University

  4. 4.

    Baluja S, Caruana R (1995) Removing the genetics from the standard genetic algorithm. Technical report, School of Computer Science Carnegie Mellon University

  5. 5.

    Bechar ME, Settouti N, Barra V, Chikh MA (2017) Semi-supervised superpixel classification for medical images segmentation: application to detection of glaucoma disease. Multidimens Syst Signal Process.

  6. 6.

    Benazzouz M, Baghli I, Chikh MA (2013) Microscopic image segmentation based on pixel classification and dimensionality reduction. Int J Imaging Syst Technol 23(1):22–28

  7. 7.

    Boukir S, Guo L, Chehata N (2013) Classification of remote sensing data using margin-based ensemble methods. In: 2013 IEEE international conference on image processing, pp 2602–2606.

  8. 8.

    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140.

  9. 9.

    Breiman L (2001) Random forests. Mach Learn 45:5–32

  10. 10.

    Cano J, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evolut Comput 7:561–575

  11. 11.

    Chen ZY, Lin WC, Ke SW, Tsai CF (2015) Evolutionary feature and instance selection for traffic sign recognition. Comput Ind 74:201–211.

  12. 12.

    Cicconet M, Hochbaum DR, Richmond D, Sabatini BL (2017) Bots for software-assisted analysis of image-based transcriptomics. bioRxiv 5:4.

  13. 13.

    do Carmo RAF, de Freitas FG, de Souza JT (2010) Empowering simultaneous feature and instance selection in classification problems through the adaptation of two selection algorithms. In: Proceedings of the 2010 9th international conference on machine learning and applications

  14. 14.

    Derrac J, Garcia S, Herrera F (2010) IFs-CoCo: instance and feature selection based on cooperative coevolution with nearest neighbor rule. Pattern Recognit 49:2082–2105

  15. 15.

    Derrac J, Triguero I, Garcia S, Herrera F (2012) Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms. IEEE Trans Syst Man Cybern 42:1383–1397

  16. 16.

    Drimbarean A, Whelan P (2001) Experiments in colour texture analysis. Pattern Recognit Lett 22(10):1161–1167.

  17. 17.

    Ebner M (2007) Color constancy. Wiley, London

  18. 18.

    Eshelman L (1991) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. Morgan Kaufmann, Los Altos, pp 265–283

  19. 19.

    Gao C, Wang L, Xiao Y, Zhao Q, Meng D (2018) Infrared small-dim target detection based on markov random field guided noise modeling. Pattern Recognit 76:463–475.

  20. 20.

    García-Pedrajas N, Romero del Castillo J, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78:381–420

  21. 21.

    Garcia-Pedrajas N, de Haro-Garcia A, Pérez-Rodriguez J (2014) A scalable memetic algorithm for simultaneous instance and feature selection. Evolut Comput 22(1):1–45. 23544367)

  22. 22.

    Guo L, Boukir S (2014) Ensemble margin framework for image classification. In: 2014 IEEE international conference on image processing (ICIP), pp 4231–4235.

  23. 23.

    Gupta V, Bhavsar A (2017) Random forest-based feature importance for hep-2 cell image classification. In: Valdés Hernández M, González-Castro V (eds) Medical image understanding and analysis. Springer International Publishing, Cham, pp 922–934

  24. 24.

    Hamidzadeh J, Monsefi R, Yazdi HS (2016) Large symmetric margin instance selection algorithm. Int J Mach Learn Cybern 7:25–45

  25. 25.

    Hoehfeld M, Rudolph G (1997) Towards a theory of population based incremental learning. In: Proceedings of the IEEE conference on evolutionary computation

  26. 26.

    Ishibuchi H, Nakashima T, Nii M (2001) Genetic-algorithm-based instance and feature selection, chap. 6. Springer, Dordrecht, pp 95–112

  27. 27.

    Kim JH, Park YS, Ahn SH, Kim SK (2014) A feature-based small target detection system. In: Park JJJH, Adeli H, Park N, Woungang I (eds) Mobile, ubiquitous, and intelligent computing. Springer, Berlin, pp 541–548

  28. 28.

    Kursa MB (2014) Robustness of random forest-based gene selection methods. BMC Bioinform 15(1):8.

  29. 29.

    Laszlo L, Szidonia L, Simina E, Mircea Florin V (2017) Random forest feature selection approach for image segmentation.

  30. 30.

    Lefkovits L, Lefkovits S, Vaida MF, Emerich S, Maluţan R (2017) Comparison of classifiers for brain tumor segmentation. In: Vlad S, Roman NM (eds) International conference on advancements of medicine and health care through technology; 12th–15th Oct 2016, Cluj-Napoca, Romania. Springer International Publishing, Cham, pp 195–200

  31. 31.

    Li H, Tan Y, Li Y, Tian J (2014) Image layering based small infrared target detection method. Electron Lett 50:42–44

  32. 32.

    Li Y, Zhang Y (2018) Robust infrared small target detection using local steering kernel reconstruction. Pattern Recognit 77(C):113–125.

  33. 33.

    Lim YW, Lee SU (1990) On the color image segmentation algorithm based on the thresholding and the fuzzy c-means techniques. Pattern Recognit 23(9):935–952

  34. 34.

    Liu Y, Zhao H (2017) Variable importance-weighted random forests. Quant Biol 5(4):338–351.

  35. 35.

    Lizarraga-Morales RA, Sanchez-Yanez RE, Ayala-Ramirez V, Patlan-Rosales AJ (2014) Improving a rough set theory-based segmentation approach using adaptable threshold selection and perceptual color spaces. J Electron Imaging 23(1):013024–013024

  36. 36.

    Martinez W, Gray JB (2014) The role of margins in boosting and ensemble performance. Wiley Interdiscip Rev Comput Stat 6(2):124–131.

  37. 37.

    Matale SM, Banait SS (2017) A review on instance and feature selection in big data environment. Int J Adv Res Innov Ideas Educ 3(2):519–523

  38. 38.

    Mellor A, Boukir S, Haywood A, Jones S (2015) Using ensemble margin to explore issues of training data imbalance and mislabeling on large area land cover classification. In: 2014 IEEE international conference on image processing, ICIP 2014, pp 5067–5071.

  39. 39.

    Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. Retrieved 21 May 2019

  40. 40.

    Nguyen TT, Zhao H, Huang JZ, Nguyen TT, Li MJ (2015) A new feature sampling method in random forests for predicting high-dimensional data. In: Cao T, Lim EP, Zhou ZH, Ho TB, Cheung D, Motoda H (eds) Advances in knowledge discovery and data mining. Springer International Publishing, Cham, pp 459–470

  41. 41.

    Ohta YI, Kanade T, Sakai T (1980) Color information for region segmentation. Comput Graph Image Process 13(3):222–241

  42. 42.

    Paschos G (2001) Perceptually uniform color spaces for color texture analysis: an empirical evaluation. IEEE Trans Image Process 10(6):932–937.

  43. 43.

    Phung SL, Bouzerdoum A, Chai D (2005) Skin segmentation using color pixel classification: analysis and comparison. IEEE Trans Pattern Anal Mach Intell 27(1):148–154

  44. 44.

    Potter MA, De Jong K (2000) Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evolut Comput 8:1–29

  45. 45.

    Pérez-Rodríguez J, Arroyo-Peña AG, García-Pedrajas N (2015) Simultaneous instance and feature selection and weighting using evolutionary computation: proposal and study. Appl Soft Comput 37:416–443.

  46. 46.

    Ramirez-Cruz JF, Fuentes O, V AA, L GB (2006) Instance selection and feature weighting using evolutionary algorithms. In: Proceedings of the 15th international conference on computing (CIC’06)

  47. 47.

    Ros F, Harba R, Pintore M (2012) Fast dual selection using genetic algorithms for large data sets. In: 12th international conference on intelligent systems design and applications (ISDA)

  48. 48.

    Saidi M, Bechar MEA, Settouti N, Chikh MA (2017) Instances selection algorithm by ensemble margin. J Exp Theor Artif Intell.

  49. 49.

    Saidi M, El Amine Bechar M, Settouti N, Chikh MA (2016) Application of pixel selection in pixel-based classification for automatic white blood cell segmentation. In: Proceedings of the Mediterranean conference on pattern recognition and artificial intelligence, MedPRAI-2016. ACM, New York, pp 31–38.

  50. 50.

    Sakinah S, Ahmad S, Pedrycz W (2011) Feature and instance selection via cooperative PSO. IEEE

  51. 51.

    Saraswat M, Arya KV (2014) Feature selection and classification of leukocytes using random forest. Med Biol Eng Comput 52(12):1041–1052.

  52. 52.

    Schapire R, Freund F (2012) Boosting: foundations and algorithms. The MIT Press, Cambridge

  53. 53.

    Serra J (1986) Introduction to mathematical morphology. Comput Vis Graph Image Process 35(3):283–305.

  54. 54.

    Settouti N, El Habib Daho M, Bechar MEA, Lazouni MA, Chikh MA (2018) Semi-automated method for the glaucoma monitoring. Springer International Publishing, Cham.

  55. 55.

    Sirikulviriya N, Sinthupinyo S (2011) Integration of rules from a random forest. In: International conference on information and electronics engineering IPCSIT, vol 6. IACSIT Press, Singapore

  56. 56.

    Soltaninejad M, Zhang L, Lambrou T, Allinson NM, Ye X (2017) Multimodal MRI brain tumor segmentation using random forests with features learned from fully convolutional neural network. CoRR arXiv:abs/1704.08134.

  57. 57.

    Teixeira de Souza J, Ferreira do Carmo RA, Lima De Campos GA (2008) A novel approach for integrating feature and instance selection. In: Proceedings of the 7th international conference on machine learning and cybernetics. Kunming

  58. 58.

    Tsai CF, Eberle W, Chu CY (2013) Genetic algorithms in feature and instance selection. Knowl-Based Syst 39:240–247

  59. 59.

    Vandenbroucke N, Macaire L, Postaire JG (2003) Color image segmentation by pixel classification in an adapted hybrid color space. Application to soccer image analysis. Comput Vis Image Underst 90(2):190–216.

  60. 60.

    Villuendas-Rey Y, Caballero-Mota Y, Garcìa-Lorenzo M (2013) Intelligent feature and instance selection to improve nearest neighbor classifiers. Springer, Berlin

  61. 61.

    Wang H, Yang F, Zhang C, Ren M (2018) Infrared small target detection based on patch image model with local and global analysis. Int J Image Graph 18(01):1850002.

  62. 62.

    Wang L, Gao Y, Shi F, Li G, Chen K, Tang Z, Xia J, Shen D (2016) Automated segmentation of CBCT image with prior-guided sequential random forest. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9601 LNCS. Springer, Germany, pp 72–82.

  63. 63.

    Yang J, Yao D, Zhan X, Zhan X (2014) Predicting disease risks using feature selection based on random forest and support vector machine. In: Basu M, Pan Y, Wang J (eds) Bioinformatics research and applications. Springer International Publishing, Cham, pp 1–11

  64. 64.

    Zafarani R, Liu H (1998) Asu repository of social computing databases. Retrieved 21 May 2019

  65. 65.

    Zhang L, Chen C, Bu J, He X (2012) A unified feature and instance selection framework using optimum experimental design. IEEE Trans Image Process 21(5):2379–2388

Download references

Author information

Correspondence to Nesma Settouti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Settouti, N., Saidi, M., Bechar, M.E.A. et al. An instance and variable selection approach in pixel-based classification for automatic white blood cells segmentation. Pattern Anal Applic (2020).

Download citation


  • Instance and variable selection
  • Random Forest
  • Data reduction
  • Small target detection
  • Automatic segmentation
  • Pixel-based classification
  • White blood cells