Discernible neighborhood counting based incremental feature selection for heterogeneous data

Abstract

Incremental feature selection refreshes a subset of information-rich features from added-in samples without forgetting the previously learned knowledge. However, most existing algorithms for incremental feature selection have no explicit mechanisms to handle heterogeneous data with symbolic and real-valued features. Therefore, this paper presents an incremental feature selection method for heterogeneous data with the sequential arrival of samples in group. Discernible neighborhood counting that measures different types of features, is first introduced to establish a framework for feature selection from heterogeneous data. With the arrival of new samples, the discernible neighborhood counting of a feature subset is then updated to reveal the incremental feature selection scheme. This scheme determines the criterion for efficiently adding informative features and deleting redundant features. Based on the incremental scheme, our incremental feature selection algorithm is further formulated to select valuable features from heterogeneous data. Extensive experiments are finally conducted to demonstrate the effectiveness and the efficiency of the proposed incremental feature selection algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    http://datam.i2r.a-tar.edu.sg/datasets/krbd/index.html.

  2. 2.

    http://www.ics.uci.edu/~mlearn/MLRepository.html.

References

  1. 1.

    Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9):597–618

    MathSciNet  MATH  Google Scholar 

  2. 2.

    Zhang X, Mei CL, Chen DG, Yang YY (2018) A fuzzy rough set-based feature selection method using representative instances. Knowl Based Syst 151:216–229

    Google Scholar 

  3. 3.

    Bell DA, Wang H (2000) A formalism for relevance and its application in feature subset selection. Mach Learn 41(2):175–195

    MATH  Google Scholar 

  4. 4.

    Zeng AP, Li TR, Liu D, Zhang JB, Chen HM (2015) A fuzzy rough set approach for incremental feature selection on hybrid information systems. Fuzzy Sets Syst 258:39–60

    MathSciNet  MATH  Google Scholar 

  5. 5.

    Zhang JB, Zhu Y, Pan Y, Li TR (2016) Efficient parallel boolean matrix based algorithms for computing composite rough set approximations. Inf Sci 329:287–302

    MATH  Google Scholar 

  6. 6.

    Zhang JB, Li TR, Chen HM (2014) Composite rough sets for dynamic data mining. Inf Sci 257:81–100

    MathSciNet  MATH  Google Scholar 

  7. 7.

    Tang WY, Mao KZ (2007) Feature selection algorithm for mixed data with both nominal and continuous features. Pattern Recogn Lett 28(5):563–571

    Google Scholar 

  8. 8.

    Ching JY, Wong AKC, Chan KCC (1995) Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans Pattern Anal Mach Intell 17(7):641–651

    Google Scholar 

  9. 9.

    Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331

    MATH  Google Scholar 

  10. 10.

    Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  11. 11.

    Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594

    MathSciNet  MATH  Google Scholar 

  12. 12.

    Chen DG, Yang YY (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334

    Google Scholar 

  13. 13.

    Zhang X, Mei CL, Chen DG, Li JH (2016) Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy. Pattern Recogn 56:1–15

    MATH  Google Scholar 

  14. 14.

    Wang CZ, Hu QH, Wang XZ, Chen DG, Qian YH, Dong Z (2018) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999

    MathSciNet  Google Scholar 

  15. 15.

    Wang CZ, He Q, Shao MW, Hu QH (2018) Feature selection based on maximal neighborhood discernibility. Int J Mach Learn Cybern 9(11):1929–1940

    Google Scholar 

  16. 16.

    Wu Y, Hoi SCH, Mei T, Yu NH (2017) Large-scale online feature selection for ultra-high dimensional sparse data. ACM Trans Knowl Discov Data 11(4):1–13

    Google Scholar 

  17. 17.

    Luo C, Li TR, Chen HM, Fujita H, Zhang Y (2018) Incremental rough set approach for hierarchical multicriteria classification. Inf Sci 429:72–87

    MathSciNet  Google Scholar 

  18. 18.

    Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411

    Google Scholar 

  19. 19.

    Xie XJ, Qin XL (2018) A novel incremental attribute reduction approach for dynamic incomplete decision systems. Int J Approx Reason 93:443–462

    MathSciNet  MATH  Google Scholar 

  20. 20.

    Huang YY, Li TR, Luo C, Fujita H, Horng SJ (2017) Matrix-based dynamic updating rough fuzzy approximations for data mining. Knowl Based Syst 119(C):273–283

    Google Scholar 

  21. 21.

    Hu CX, Liu SX, Liu GX (2017) Matrix-based approaches for dynamic updating approximations in multigranulation rough sets. Knowl Based Syst 122:51–63

    Google Scholar 

  22. 22.

    Zhang YY, Li TR, Luo C, Zhang JB, Chen HM (2016) Incremental updating of rough approximations in interval-valued information systems under attribute generalization. Inf Sci 373:461–475

    MATH  Google Scholar 

  23. 23.

    Hu J, Li TR, Luo C, Fujita H, Li SY (2016) Incremental fuzzy probabilistic rough sets over two universes. Int J Approx Reason 81:28–48

    MathSciNet  MATH  Google Scholar 

  24. 24.

    Luo C, Li TR, Chen HM, Fujita H, Zhang Y (2016) Efficient updating of probabilistic approximations with incremental objects. Knowl Based Syst 109:71–83

    Google Scholar 

  25. 25.

    Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2014) A rough set-based method for updating decision rules on attribute values’ coarsening and refining. IEEE Trans Knowl Data Eng 6(12):2886–2899

    Google Scholar 

  26. 26.

    Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2015) A decision-theoretic rough set approach for dynamic data mining. IEEE Trans Fuzzy Syst 23(6):1958–1970

    Google Scholar 

  27. 27.

    Orlowska ME, Orlowski MW (1992) Maintenance of knowledge in dynamic information systems. Springer, Dordrecht, pp 315–329

    Google Scholar 

  28. 28.

    Hu F, Wang GY, Huang H, Wu Y (2005) Incremental attribute reduction based on elementary sets. In: Slezak D, Wang G, Szczuka M, Duntsch I, Yao Y (eds) International conference on rough sets, fuzzy sets, data mining, and granular computing. Springer, Berlin, Heidelberg, pp 185–193

    Google Scholar 

  29. 29.

    Hu F, Dai J, Wang GY (2007) Incremental algorithms for attribute reduction in decision table. Control Decis 22(3):268–272

    MATH  Google Scholar 

  30. 30.

    Yang M (2007) An incremental updating algorithm for attribute reduction based on improved discernibility matrix. Chin J Comput 30(5):815–822

    MathSciNet  Google Scholar 

  31. 31.

    Feng SR, Zhang DZ (2012) Increment algorithm for attribute reduction based on improvement of discernibility matrix. J Shenzhen Univ Sci Eng 29:5

    MathSciNet  MATH  Google Scholar 

  32. 32.

    Shu WH, Shen H (2013) A rough-set based incremental approach for updating attribute reduction under dynamic incomplete decision systems. In: IEEE international conference on fuzzy systems. IEEE, Hyderabad, pp 1–7

  33. 33.

    Liang JY, Wang F, Dang CY, Qian YH (2013) A group incremental approach to feature selection applying rough set technique. IEEE Trans Knowl Data Eng 26(2):294–308

    Google Scholar 

  34. 34.

    Chen DG, Yang YY, Dong Z (2016) An incremental algorithm for attribute reduction with variable precision rough sets. Appl Soft Comput 45:129–149

    Google Scholar 

  35. 35.

    Yang YY, Chen DG, Wang H, Tsang ECC, Zhang DL (2016) Fuzzy rough set based incremental attribute reduction from dynamic data with sample arriving. Fuzzy Sets Syst 312:66–86

    MathSciNet  MATH  Google Scholar 

  36. 36.

    Yang YY, Chen DG, Wang H, Wang XZ (2018) Incremental perspective for feature selection based on fuzzy rough sets. IEEE Trans Fuzzy Syst 26(3):1257–1273

    Google Scholar 

  37. 37.

    Yang YY, Chen DG, Wang H (2017) Active sample selection based incremental algorithm for attribute reduction with rough sets. IEEE Trans Fuzzy Syst 25(4):825–838

    Google Scholar 

  38. 38.

    Lang GM, Li QG, Cai MJ, Yang T (2015) Characteristic matrixes-based knowledge reduction in dynamic covering decision information systems. Knowl Based Syst 85(C):1–26

    Google Scholar 

  39. 39.

    Jing YG, Li TR, Luo C, Horng SJ, Wang GY, Yu Z (2016) An incremental approach for attribute reduction based on knowledge granularity. Knowl Based Syst 104(C):24–38

    Google Scholar 

  40. 40.

    Jing YG, Li TR, Fujita H, Yu Z, Wang B (2017) An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view. Inf Sci 411:23–38

    MathSciNet  Google Scholar 

  41. 41.

    Wang H (2006) Nearest neighbors by neighborhood counting. IEEE Trans Pattern Anal Mach Intell 28(6):942–953

    Google Scholar 

  42. 42.

    Wu WZ, Zhang WX (2002) Neighborhood operator systems and approximations. Inf Sci 144(1):201–217

    MathSciNet  MATH  Google Scholar 

  43. 43.

    Zhu PF, Hu QH (2013) Adaptive neighborhood granularity selection and combination based on margin distribution optimization. Inf Sci 249:1–12

    MathSciNet  MATH  Google Scholar 

  44. 44.

    Wang CZ, Shi YP, Fan XD, Shao MW (2019) Attribute reduction based on k-nearest neighborhood rough sets. Int J Approx Reason 106:18–31

    MathSciNet  MATH  Google Scholar 

  45. 45.

    Wang CZ, Shao MW, He Q, Qian YH, Qi YL (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Syst 111:173–179

    Google Scholar 

Download references

Acknowledgements

The paper is supported by the National Key R&D Program of China under Grant no. 2016YFB1200203, the National Natural Science Foundation of China under Grant nos. 61806108, 71471060 and 61602372, and the Project funded by China Postdoctoral Science Foundation under Grant no. 2018M631475.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yanyan Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Song, S., Chen, D. et al. Discernible neighborhood counting based incremental feature selection for heterogeneous data. Int. J. Mach. Learn. & Cyber. 11, 1115–1127 (2020). https://doi.org/10.1007/s13042-019-00997-4

Download citation

Keywords

  • Incremental feature selection
  • Feature selection
  • Neighborhood rough set
  • Heterogeneous data