Abstract
Instance and variable selection involve identifying a subset of instances and variables such that the learning process will use only this subset with better performances and lower cost. Due to the huge amount of data available in many fields, data reduction is considered as an NP-hard problem. In this paper, we present a simultaneous instance and variable selection approach based on the Random Forest-RI ensemble methods in the aim to discard noisy and useless information from the original data set. We proposed a selection principle based on two concepts: the ensemble margin and the importance variable measure of Random Forest-RI. Experiments were conducted on cytological images for the automatic segmentation and recognition of white blood cells WBC (nucleus and cytoplasm). Moreover, in order to explore the performance of our proposed approach, experiments were carried out on standardized datasets from UCI and ASU repository, and the obtained results of the instances and variable selection by the Random Forest classifier are very encouraging.
References
Azmi R, Norozi N, Anbiaee R, Salehi L, Amirzadi A (2011) Impst: a new interactive self-training approach to segmentation suspicious lesions in breast MRI. J Med Signals Sens 1(2):138–148
Baluja S (1994) Population-based incremental learning: a method for integrating genetic search based function optimization and competitive learning. Technical Report, CMU-CS-94-163, Computer Science Department, Carnegie Mellon University
Baluja S (1995) An empirical comparison of seven iterative and evolutionary function optimization heuristics. Technical report, School of Computer Science Carnegie Mellon University
Baluja S, Caruana R (1995) Removing the genetics from the standard genetic algorithm. Technical report, School of Computer Science Carnegie Mellon University
Bechar ME, Settouti N, Barra V, Chikh MA (2017) Semi-supervised superpixel classification for medical images segmentation: application to detection of glaucoma disease. Multidimens Syst Signal Process. https://doi.org/10.1007/s11045-017-0483-y
Benazzouz M, Baghli I, Chikh MA (2013) Microscopic image segmentation based on pixel classification and dimensionality reduction. Int J Imaging Syst Technol 23(1):22–28
Boukir S, Guo L, Chehata N (2013) Classification of remote sensing data using margin-based ensemble methods. In: 2013 IEEE international conference on image processing, pp 2602–2606. https://doi.org/10.1109/ICIP.2013.6738536
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1023/A:1018054314350
Breiman L (2001) Random forests. Mach Learn 45:5–32
Cano J, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evolut Comput 7:561–575
Chen ZY, Lin WC, Ke SW, Tsai CF (2015) Evolutionary feature and instance selection for traffic sign recognition. Comput Ind 74:201–211. https://doi.org/10.1016/j.compind.2015.08.007
Cicconet M, Hochbaum DR, Richmond D, Sabatini BL (2017) Bots for software-assisted analysis of image-based transcriptomics. bioRxiv 5:4. https://doi.org/10.1101/172296
do Carmo RAF, de Freitas FG, de Souza JT (2010) Empowering simultaneous feature and instance selection in classification problems through the adaptation of two selection algorithms. In: Proceedings of the 2010 9th international conference on machine learning and applications
Derrac J, Garcia S, Herrera F (2010) IFs-CoCo: instance and feature selection based on cooperative coevolution with nearest neighbor rule. Pattern Recognit 49:2082–2105
Derrac J, Triguero I, Garcia S, Herrera F (2012) Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms. IEEE Trans Syst Man Cybern 42:1383–1397
Drimbarean A, Whelan P (2001) Experiments in colour texture analysis. Pattern Recognit Lett 22(10):1161–1167. https://doi.org/10.1016/S0167-8655(01)00058-7
Ebner M (2007) Color constancy. Wiley, London
Eshelman L (1991) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. Morgan Kaufmann, Los Altos, pp 265–283
Gao C, Wang L, Xiao Y, Zhao Q, Meng D (2018) Infrared small-dim target detection based on markov random field guided noise modeling. Pattern Recognit 76:463–475. https://doi.org/10.1016/j.patcog.2017.11.016
García-Pedrajas N, Romero del Castillo J, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78:381–420
Garcia-Pedrajas N, de Haro-Garcia A, Pérez-Rodriguez J (2014) A scalable memetic algorithm for simultaneous instance and feature selection. Evolut Comput 22(1):1–45. https://doi.org/10.1162/EVCO_a_00102(PMID: 23544367)
Guo L, Boukir S (2014) Ensemble margin framework for image classification. In: 2014 IEEE international conference on image processing (ICIP), pp 4231–4235. https://doi.org/10.1109/ICIP.2014.7025859
Gupta V, Bhavsar A (2017) Random forest-based feature importance for hep-2 cell image classification. In: Valdés Hernández M, González-Castro V (eds) Medical image understanding and analysis. Springer International Publishing, Cham, pp 922–934
Hamidzadeh J, Monsefi R, Yazdi HS (2016) Large symmetric margin instance selection algorithm. Int J Mach Learn Cybern 7:25–45
Hoehfeld M, Rudolph G (1997) Towards a theory of population based incremental learning. In: Proceedings of the IEEE conference on evolutionary computation
Ishibuchi H, Nakashima T, Nii M (2001) Genetic-algorithm-based instance and feature selection, chap. 6. Springer, Dordrecht, pp 95–112
Kim JH, Park YS, Ahn SH, Kim SK (2014) A feature-based small target detection system. In: Park JJJH, Adeli H, Park N, Woungang I (eds) Mobile, ubiquitous, and intelligent computing. Springer, Berlin, pp 541–548
Kursa MB (2014) Robustness of random forest-based gene selection methods. BMC Bioinform 15(1):8. https://doi.org/10.1186/1471-2105-15-8
Laszlo L, Szidonia L, Simina E, Mircea Florin V (2017) Random forest feature selection approach for image segmentation. https://doi.org/10.1117/12.2268694
Lefkovits L, Lefkovits S, Vaida MF, Emerich S, Maluţan R (2017) Comparison of classifiers for brain tumor segmentation. In: Vlad S, Roman NM (eds) International conference on advancements of medicine and health care through technology; 12th–15th Oct 2016, Cluj-Napoca, Romania. Springer International Publishing, Cham, pp 195–200
Li H, Tan Y, Li Y, Tian J (2014) Image layering based small infrared target detection method. Electron Lett 50:42–44
Li Y, Zhang Y (2018) Robust infrared small target detection using local steering kernel reconstruction. Pattern Recognit 77(C):113–125. https://doi.org/10.1016/j.patcog.2017.12.012
Lim YW, Lee SU (1990) On the color image segmentation algorithm based on the thresholding and the fuzzy c-means techniques. Pattern Recognit 23(9):935–952
Liu Y, Zhao H (2017) Variable importance-weighted random forests. Quant Biol 5(4):338–351. https://doi.org/10.1007/s40484-017-0121-6
Lizarraga-Morales RA, Sanchez-Yanez RE, Ayala-Ramirez V, Patlan-Rosales AJ (2014) Improving a rough set theory-based segmentation approach using adaptable threshold selection and perceptual color spaces. J Electron Imaging 23(1):013024–013024
Martinez W, Gray JB (2014) The role of margins in boosting and ensemble performance. Wiley Interdiscip Rev Comput Stat 6(2):124–131. https://doi.org/10.1002/wics.1292
Matale SM, Banait SS (2017) A review on instance and feature selection in big data environment. Int J Adv Res Innov Ideas Educ 3(2):519–523
Mellor A, Boukir S, Haywood A, Jones S (2015) Using ensemble margin to explore issues of training data imbalance and mislabeling on large area land cover classification. In: 2014 IEEE international conference on image processing, ICIP 2014, pp 5067–5071. https://doi.org/10.1109/ICIP.2014.7026026
Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html. Retrieved 21 May 2019
Nguyen TT, Zhao H, Huang JZ, Nguyen TT, Li MJ (2015) A new feature sampling method in random forests for predicting high-dimensional data. In: Cao T, Lim EP, Zhou ZH, Ho TB, Cheung D, Motoda H (eds) Advances in knowledge discovery and data mining. Springer International Publishing, Cham, pp 459–470
Ohta YI, Kanade T, Sakai T (1980) Color information for region segmentation. Comput Graph Image Process 13(3):222–241
Paschos G (2001) Perceptually uniform color spaces for color texture analysis: an empirical evaluation. IEEE Trans Image Process 10(6):932–937. https://doi.org/10.1109/83.923289
Phung SL, Bouzerdoum A, Chai D (2005) Skin segmentation using color pixel classification: analysis and comparison. IEEE Trans Pattern Anal Mach Intell 27(1):148–154
Potter MA, De Jong K (2000) Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evolut Comput 8:1–29
Pérez-Rodríguez J, Arroyo-Peña AG, García-Pedrajas N (2015) Simultaneous instance and feature selection and weighting using evolutionary computation: proposal and study. Appl Soft Comput 37:416–443. https://doi.org/10.1016/j.asoc.2015.07.046
Ramirez-Cruz JF, Fuentes O, V AA, L GB (2006) Instance selection and feature weighting using evolutionary algorithms. In: Proceedings of the 15th international conference on computing (CIC’06)
Ros F, Harba R, Pintore M (2012) Fast dual selection using genetic algorithms for large data sets. In: 12th international conference on intelligent systems design and applications (ISDA)
Saidi M, Bechar MEA, Settouti N, Chikh MA (2017) Instances selection algorithm by ensemble margin. J Exp Theor Artif Intell. https://doi.org/10.1080/0952813X.2017.1409283
Saidi M, El Amine Bechar M, Settouti N, Chikh MA (2016) Application of pixel selection in pixel-based classification for automatic white blood cell segmentation. In: Proceedings of the Mediterranean conference on pattern recognition and artificial intelligence, MedPRAI-2016. ACM, New York, pp 31–38. https://doi.org/10.1145/3038884.3038890
Sakinah S, Ahmad S, Pedrycz W (2011) Feature and instance selection via cooperative PSO. IEEE
Saraswat M, Arya KV (2014) Feature selection and classification of leukocytes using random forest. Med Biol Eng Comput 52(12):1041–1052. https://doi.org/10.1007/s11517-014-1200-8
Schapire R, Freund F (2012) Boosting: foundations and algorithms. The MIT Press, Cambridge
Serra J (1986) Introduction to mathematical morphology. Comput Vis Graph Image Process 35(3):283–305. https://doi.org/10.1016/0734-189X(86)90002-2
Settouti N, El Habib Daho M, Bechar MEA, Lazouni MA, Chikh MA (2018) Semi-automated method for the glaucoma monitoring. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-63754-9_11
Sirikulviriya N, Sinthupinyo S (2011) Integration of rules from a random forest. In: International conference on information and electronics engineering IPCSIT, vol 6. IACSIT Press, Singapore
Soltaninejad M, Zhang L, Lambrou T, Allinson NM, Ye X (2017) Multimodal MRI brain tumor segmentation using random forests with features learned from fully convolutional neural network. CoRR arXiv:abs/1704.08134. http://arxiv.org/abs/1704.08134
Teixeira de Souza J, Ferreira do Carmo RA, Lima De Campos GA (2008) A novel approach for integrating feature and instance selection. In: Proceedings of the 7th international conference on machine learning and cybernetics. Kunming
Tsai CF, Eberle W, Chu CY (2013) Genetic algorithms in feature and instance selection. Knowl-Based Syst 39:240–247
Vandenbroucke N, Macaire L, Postaire JG (2003) Color image segmentation by pixel classification in an adapted hybrid color space. Application to soccer image analysis. Comput Vis Image Underst 90(2):190–216. https://doi.org/10.1016/S1077-3142(03)00025-0
Villuendas-Rey Y, Caballero-Mota Y, Garcìa-Lorenzo M (2013) Intelligent feature and instance selection to improve nearest neighbor classifiers. Springer, Berlin
Wang H, Yang F, Zhang C, Ren M (2018) Infrared small target detection based on patch image model with local and global analysis. Int J Image Graph 18(01):1850002. https://doi.org/10.1142/S021946781850002X
Wang L, Gao Y, Shi F, Li G, Chen K, Tang Z, Xia J, Shen D (2016) Automated segmentation of CBCT image with prior-guided sequential random forest. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9601 LNCS. Springer, Germany, pp 72–82. https://doi.org/10.1007/978-3-319-42016-5_7
Yang J, Yao D, Zhan X, Zhan X (2014) Predicting disease risks using feature selection based on random forest and support vector machine. In: Basu M, Pan Y, Wang J (eds) Bioinformatics research and applications. Springer International Publishing, Cham, pp 1–11
Zafarani R, Liu H (1998) Asu repository of social computing databases. http://socialcomputing.asu.edu/pages/datasets. Retrieved 21 May 2019
Zhang L, Chen C, Bu J, He X (2012) A unified feature and instance selection framework using optimum experimental design. IEEE Trans Image Process 21(5):2379–2388
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Settouti, N., Saidi, M., Bechar, M.E.A. et al. An instance and variable selection approach in pixel-based classification for automatic white blood cells segmentation. Pattern Anal Applic 23, 1709–1726 (2020). https://doi.org/10.1007/s10044-020-00873-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00873-w