Advertisement

Fast feature selection algorithm for neighborhood rough set model based on Bucket and Trie structures

  • Rachid BenouiniEmail author
  • Imad Batioua
  • Soufiane Ezghari
  • Khalid Zenkouar
  • Azeddine Zahi
Original Paper
  • 4 Downloads

Abstract

Feature selection is viewed as the problem of finding the minimal number of features from an original set with the minimum information loss. Due to its high importance in the fields of pattern recognition and data mining, it is necessary to investigate fast and effective search algorithms. In this paper, we introduce a novel fast feature selection algorithm for neighborhood rough set model based on Bucket and Trie structures. This proposed algorithm can guarantee to find the optimal minimal reduct by adopting a global search strategy. In addition, the dependence degree is subsequently used to evaluate the relevance of the attribute subset. Consequently, the proposed algorithm is tested on several standard data sets from UCI repository and compared with the most recent related approaches. The obtained theoretical and experimental results reveal that the present algorithm is very effective and convenient for the problem of feature selection, indicating that it could be useful for many pattern recognition and data mining systems.

Keywords

Feature selection Rough set theory Neighborhood rough set Fast algorithm Trie structure Bucket structure 

Notes

Acknowledgements

The authors thankfully acknowledge the Laboratory of Intelligent Systems and Applications (LSIA) for his support to achieve this work.

Compliance with ethical standards

Conflict of interest

The authors declare no conflict of interest.

References

  1. Bodon F, Rónyai L (2003) Trie: an alternative data structure for data mining algorithms. Math Comput Modell 38(7–9):739–751MathSciNetCrossRefzbMATHGoogle Scholar
  2. Breiman L (2017) Classification and regression trees. Routledge, New YorkCrossRefGoogle Scholar
  3. Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79CrossRefGoogle Scholar
  4. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27CrossRefGoogle Scholar
  5. Chen S-M, Chang Y-C (2011) Weighted fuzzy rule interpolation based on GA-based weight-learning techniques. IEEE Trans Fuzzy Syst 19(4):729–744MathSciNetCrossRefGoogle Scholar
  6. Chen S-M, Chen S-W (2015) Fuzzy forecasting based on two-factors second-order fuzzy-trend logical relationship groups and the probabilities of trends of fuzzy logical relationships. IEEE Trans Cybern 45(3):391–403CrossRefGoogle Scholar
  7. Chen Q, Jensen R (2004) Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approach. IEEE Trans Knowl Data Eng 16(12):1457–1471CrossRefGoogle Scholar
  8. Chen Y, Miao D, Wang R (2010) A rough set approach to feature selection based on ant colony optimization. Pattern Recognit Lett 31(3):226–233CrossRefGoogle Scholar
  9. Chen Y, Miao D, Wang R, Wu K (2011) A rough set approach to feature selection based on power set tree. Knowl Based Syst 24(2):275–281CrossRefGoogle Scholar
  10. Chen S-M, Chu H-P, Sheu T-W (2012) TAIEX forecasting using fuzzy time series and automatically generated weights of multiple factors. IEEE Trans Syst Man Cybern Part A Syst Hum 42(6):1485–1495CrossRefGoogle Scholar
  11. Chen S-M, Manalu GMT, Pan J-S, Liu H-C (2013) Fuzzy forecasting based on two-factors second-order fuzzy-trend logical relationship groups and particle swarm optimization techniques. IEEE Trans Cybern 43(3):1102–1117CrossRefGoogle Scholar
  12. Chen Y, Zeng Z, Lu J (2017) Neighborhood rough set reduction with fish swarm algorithm. Soft Comput 21(23):6907–6918CrossRefGoogle Scholar
  13. Cheng S-H, Chen S-M, Jian W-S (2016) Fuzzy time series forecasting based on fuzzy logical relationships and similarity measures. Inf Sci 327:272–287MathSciNetCrossRefzbMATHGoogle Scholar
  14. Chouchoulas A, Shen Q (2001) Rough set-aided keyword reduction for text categorization. Appl Artif Intell 15(9):843–873CrossRefGoogle Scholar
  15. Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms. MIT Press, CambridgezbMATHGoogle Scholar
  16. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27CrossRefzbMATHGoogle Scholar
  17. Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. Irvine, University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml/
  18. Fan X, Zhao W, Wang C, Huang Y (2018) Attribute reduction based on max-decision neighborhood rough set model. Knowl Based Syst 151:16–23CrossRefGoogle Scholar
  19. Fredkin E (1960) Trie memory. Commun ACM 3(9):490–499CrossRefGoogle Scholar
  20. García S, Luengo J, Herrera F (2015) Data preprocessing data mining. Intelligent systems reference library. Springer, BerlinCrossRefGoogle Scholar
  21. Hedar A-R, Wang J, Fukushima M (2008) Tabu search for attribute reduction in rough set theory. Soft Comput 12(9):909–918CrossRefzbMATHGoogle Scholar
  22. Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594MathSciNetCrossRefzbMATHGoogle Scholar
  23. Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17(4):824–838CrossRefGoogle Scholar
  24. Jing S-Y (2014) A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput 18(7):1373–1382CrossRefGoogle Scholar
  25. Kacprzyk J, Pedrycz W (2015) Springer handbook of computational intelligence. Springer, BerlinCrossRefzbMATHGoogle Scholar
  26. Lai Z, Chen Y, Wu J, Wong WK, Shen F (2018) Jointly sparse hashing for image retrieval. IEEE Trans Image Process 27(12):6147–6158MathSciNetCrossRefzbMATHGoogle Scholar
  27. Li B, Chow TW, Huang D (2013) A novel feature selection method and its application. J Intell Inf Syst 41(2):235–268CrossRefGoogle Scholar
  28. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv 50(6):94:1–94:45CrossRefGoogle Scholar
  29. Liang M, Mi J, Feng T (2018) Optimal granulation selection for multi-label data based on multi-granulation rough sets. Granul Comput.  https://doi.org/10.1007/s41066-018-0110-9 Google Scholar
  30. Liu K, Tsang ECC, Song J, Yu H, Chen X, Yang X (2018) Neighborhood attribute reduction approach to partially labeled data. Granul Comput.  https://doi.org/10.1007/s41066-018-00151-5 Google Scholar
  31. Mandal P, Ranadive AS (2019) Multi-granulation interval-valued fuzzy probabilistic rough sets and their corresponding three-way decisions based on interval-valued fuzzy preference relations. Granul Comput 4(1):89–108CrossRefGoogle Scholar
  32. Mannila H, Räihä K-J (1992) On the complexity of inferring functional dependencies. Discret Appl Math 40(2):237–243MathSciNetCrossRefzbMATHGoogle Scholar
  33. Pacheco F, Cerrada M, Sánchez R-V, Cabrera D, Li C, de Oliveira JV (2017) Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery. Expert Syst Appl 71:69–86CrossRefGoogle Scholar
  34. Parthaláin N, Shen Q, Jensen R (2010) A distance measure approach to exploring the rough set boundary region for attribute reduction. IEEE Trans Knowl Data Eng 22(3):305–317CrossRefGoogle Scholar
  35. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356CrossRefzbMATHGoogle Scholar
  36. Pawlak Z, Skowron A (2007) Rough sets: some extensions. Inf Sci 177(1):28–40MathSciNetCrossRefzbMATHGoogle Scholar
  37. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  38. Pedrycz W, Chen S-M (2011) Granular computing and intelligent systems: design with information granules of higher order and higher type, vol 13. Springer, BerlinCrossRefGoogle Scholar
  39. Pedrycz W, Chen S-M (2014) Information granularity, big data, and computational intelligence, vol 8. Springer, BerlinGoogle Scholar
  40. Pedrycz W, Chen S-M (2015) Granular computing and decision-making: interactive and iterative approaches, vol 10. Springer, BerlinGoogle Scholar
  41. Post MJ, van der Putten P, van Rijn JN (2016) Does feature selection improve classification? a large scale experiment in OpenML. In: International symposium on intelligent data analysis. Springer, pp 158–170Google Scholar
  42. Qian Y, Liang J, Pedrycz W, Dang C (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9–10):597–618MathSciNetCrossRefzbMATHGoogle Scholar
  43. Rezvan MT, Hamadani AZ, Hejazi SR (2015) An exact feature selection algorithm based on rough set theory. Complexity 20(5):50–62MathSciNetCrossRefGoogle Scholar
  44. Shen Y, Wang F (2011) Variable precision rough set model over two universes and its properties. Soft Comput 15(3):557–567CrossRefzbMATHGoogle Scholar
  45. Shi Y, Huang Y, Wang C, He Q (2018) Attribute reduction based on the boolean matrix. Granul Comput 1–10Google Scholar
  46. Skowron A, Jankowski A, Dutta S (2016) Interactive granular computing. Granul Comput 1(2):95–113MathSciNetCrossRefzbMATHGoogle Scholar
  47. Sun L, Xu J, Tian Y (2012) Feature selection using rough entropy-based uncertainty measures in incomplete decision systems. Knowl Based Syst 36:206–216CrossRefGoogle Scholar
  48. Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recognit Lett 24:833–849CrossRefzbMATHGoogle Scholar
  49. Thangavel K, Manavalan R (2014) Soft computing models based feature selection for trus prostate cancer image classification. Soft Comput 18(6):1165–1176CrossRefGoogle Scholar
  50. Urbanowicz RJ, Olson RS, Schmitt P, Meeker M, Moore JH (2018) Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 85:168–188CrossRefGoogle Scholar
  51. Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett 28(4):459–471CrossRefGoogle Scholar
  52. Wang F, Xu T, Tang T, Zhou M, Wang H (2017) Bilevel feature extraction-based text mining for fault diagnosis of railway systems. IEEE Trans Intell Transp Syst 18(1):49–58CrossRefGoogle Scholar
  53. William-West TO, Singh D (2018) Information granulation for rough fuzzy hypergraphs. Granul Comput 3(1):75–92CrossRefGoogle Scholar
  54. Xu W, Li W, Zhang X (2017) Generalized multigranulation rough sets and optimal granularity selection. Granul Comput 2(4):271–288CrossRefGoogle Scholar
  55. Yang M, Yang P (2008) A novel condensing tree structure for rough set feature selection. Neurocomputing 71(4–6):1092–1100CrossRefGoogle Scholar
  56. Yong L, Wenliang H, Yunliang J, Zhiyong Z (2014) Quick attribute reduct algorithm for neighborhood rough set model. Inf Sci 271:65–81MathSciNetCrossRefzbMATHGoogle Scholar
  57. Zadeh LA et al (1965) Fuzzy sets. Inf Control 8(3):338–353CrossRefzbMATHGoogle Scholar
  58. Zhang H-Y, Yang S-Y (2017) Feature selection and approximate reasoning of large-scale set-valued decision tables based on \(\alpha\)-dominance-based quantitative rough sets. Inf Sci 378:328–347MathSciNetCrossRefGoogle Scholar
  59. Zhang W, Wang X, Yang X, Chen X, and Wang P (2018a) Neighborhood attribute reduction for imbalanced data. Granul ComputGoogle Scholar
  60. Zhang X, Mei C, Chen D, Yang Y (2018b) A fuzzy rough set-based feature selection method using representative instances. Knowl Based Syst 151:216–229CrossRefGoogle Scholar
  61. Zhong N, Dong J, Ohsuga S (2001) Using rough sets with heuristics for feature selection. J Intell Inf Syst 16(3):199–214CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Laboratory of Intelligent Systems and Application (LSIA), Faculty of Sciences and TechnologySidi Mohamed Ben Abdellah UniversityFezMorocco

Personalised recommendations