Abstract
Incremental feature selection refreshes a subset of information-rich features from added-in samples without forgetting the previously learned knowledge. However, most existing algorithms for incremental feature selection have no explicit mechanisms to handle heterogeneous data with symbolic and real-valued features. Therefore, this paper presents an incremental feature selection method for heterogeneous data with the sequential arrival of samples in group. Discernible neighborhood counting that measures different types of features, is first introduced to establish a framework for feature selection from heterogeneous data. With the arrival of new samples, the discernible neighborhood counting of a feature subset is then updated to reveal the incremental feature selection scheme. This scheme determines the criterion for efficiently adding informative features and deleting redundant features. Based on the incremental scheme, our incremental feature selection algorithm is further formulated to select valuable features from heterogeneous data. Extensive experiments are finally conducted to demonstrate the effectiveness and the efficiency of the proposed incremental feature selection algorithm.
Similar content being viewed by others
References
Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9):597–618
Zhang X, Mei CL, Chen DG, Yang YY (2018) A fuzzy rough set-based feature selection method using representative instances. Knowl Based Syst 151:216–229
Bell DA, Wang H (2000) A formalism for relevance and its application in feature subset selection. Mach Learn 41(2):175–195
Zeng AP, Li TR, Liu D, Zhang JB, Chen HM (2015) A fuzzy rough set approach for incremental feature selection on hybrid information systems. Fuzzy Sets Syst 258:39–60
Zhang JB, Zhu Y, Pan Y, Li TR (2016) Efficient parallel boolean matrix based algorithms for computing composite rough set approximations. Inf Sci 329:287–302
Zhang JB, Li TR, Chen HM (2014) Composite rough sets for dynamic data mining. Inf Sci 257:81–100
Tang WY, Mao KZ (2007) Feature selection algorithm for mixed data with both nominal and continuous features. Pattern Recogn Lett 28(5):563–571
Ching JY, Wong AKC, Chan KCC (1995) Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans Pattern Anal Mach Intell 17(7):641–651
Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Chen DG, Yang YY (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334
Zhang X, Mei CL, Chen DG, Li JH (2016) Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy. Pattern Recogn 56:1–15
Wang CZ, Hu QH, Wang XZ, Chen DG, Qian YH, Dong Z (2018) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999
Wang CZ, He Q, Shao MW, Hu QH (2018) Feature selection based on maximal neighborhood discernibility. Int J Mach Learn Cybern 9(11):1929–1940
Wu Y, Hoi SCH, Mei T, Yu NH (2017) Large-scale online feature selection for ultra-high dimensional sparse data. ACM Trans Knowl Discov Data 11(4):1–13
Luo C, Li TR, Chen HM, Fujita H, Zhang Y (2018) Incremental rough set approach for hierarchical multicriteria classification. Inf Sci 429:72–87
Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411
Xie XJ, Qin XL (2018) A novel incremental attribute reduction approach for dynamic incomplete decision systems. Int J Approx Reason 93:443–462
Huang YY, Li TR, Luo C, Fujita H, Horng SJ (2017) Matrix-based dynamic updating rough fuzzy approximations for data mining. Knowl Based Syst 119(C):273–283
Hu CX, Liu SX, Liu GX (2017) Matrix-based approaches for dynamic updating approximations in multigranulation rough sets. Knowl Based Syst 122:51–63
Zhang YY, Li TR, Luo C, Zhang JB, Chen HM (2016) Incremental updating of rough approximations in interval-valued information systems under attribute generalization. Inf Sci 373:461–475
Hu J, Li TR, Luo C, Fujita H, Li SY (2016) Incremental fuzzy probabilistic rough sets over two universes. Int J Approx Reason 81:28–48
Luo C, Li TR, Chen HM, Fujita H, Zhang Y (2016) Efficient updating of probabilistic approximations with incremental objects. Knowl Based Syst 109:71–83
Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2014) A rough set-based method for updating decision rules on attribute values’ coarsening and refining. IEEE Trans Knowl Data Eng 6(12):2886–2899
Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2015) A decision-theoretic rough set approach for dynamic data mining. IEEE Trans Fuzzy Syst 23(6):1958–1970
Orlowska ME, Orlowski MW (1992) Maintenance of knowledge in dynamic information systems. Springer, Dordrecht, pp 315–329
Hu F, Wang GY, Huang H, Wu Y (2005) Incremental attribute reduction based on elementary sets. In: Slezak D, Wang G, Szczuka M, Duntsch I, Yao Y (eds) International conference on rough sets, fuzzy sets, data mining, and granular computing. Springer, Berlin, Heidelberg, pp 185–193
Hu F, Dai J, Wang GY (2007) Incremental algorithms for attribute reduction in decision table. Control Decis 22(3):268–272
Yang M (2007) An incremental updating algorithm for attribute reduction based on improved discernibility matrix. Chin J Comput 30(5):815–822
Feng SR, Zhang DZ (2012) Increment algorithm for attribute reduction based on improvement of discernibility matrix. J Shenzhen Univ Sci Eng 29:5
Shu WH, Shen H (2013) A rough-set based incremental approach for updating attribute reduction under dynamic incomplete decision systems. In: IEEE international conference on fuzzy systems. IEEE, Hyderabad, pp 1–7
Liang JY, Wang F, Dang CY, Qian YH (2013) A group incremental approach to feature selection applying rough set technique. IEEE Trans Knowl Data Eng 26(2):294–308
Chen DG, Yang YY, Dong Z (2016) An incremental algorithm for attribute reduction with variable precision rough sets. Appl Soft Comput 45:129–149
Yang YY, Chen DG, Wang H, Tsang ECC, Zhang DL (2016) Fuzzy rough set based incremental attribute reduction from dynamic data with sample arriving. Fuzzy Sets Syst 312:66–86
Yang YY, Chen DG, Wang H, Wang XZ (2018) Incremental perspective for feature selection based on fuzzy rough sets. IEEE Trans Fuzzy Syst 26(3):1257–1273
Yang YY, Chen DG, Wang H (2017) Active sample selection based incremental algorithm for attribute reduction with rough sets. IEEE Trans Fuzzy Syst 25(4):825–838
Lang GM, Li QG, Cai MJ, Yang T (2015) Characteristic matrixes-based knowledge reduction in dynamic covering decision information systems. Knowl Based Syst 85(C):1–26
Jing YG, Li TR, Luo C, Horng SJ, Wang GY, Yu Z (2016) An incremental approach for attribute reduction based on knowledge granularity. Knowl Based Syst 104(C):24–38
Jing YG, Li TR, Fujita H, Yu Z, Wang B (2017) An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view. Inf Sci 411:23–38
Wang H (2006) Nearest neighbors by neighborhood counting. IEEE Trans Pattern Anal Mach Intell 28(6):942–953
Wu WZ, Zhang WX (2002) Neighborhood operator systems and approximations. Inf Sci 144(1):201–217
Zhu PF, Hu QH (2013) Adaptive neighborhood granularity selection and combination based on margin distribution optimization. Inf Sci 249:1–12
Wang CZ, Shi YP, Fan XD, Shao MW (2019) Attribute reduction based on k-nearest neighborhood rough sets. Int J Approx Reason 106:18–31
Wang CZ, Shao MW, He Q, Qian YH, Qi YL (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Syst 111:173–179
Acknowledgements
The paper is supported by the National Key R&D Program of China under Grant no. 2016YFB1200203, the National Natural Science Foundation of China under Grant nos. 61806108, 71471060 and 61602372, and the Project funded by China Postdoctoral Science Foundation under Grant no. 2018M631475.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, Y., Song, S., Chen, D. et al. Discernible neighborhood counting based incremental feature selection for heterogeneous data. Int. J. Mach. Learn. & Cyber. 11, 1115–1127 (2020). https://doi.org/10.1007/s13042-019-00997-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-019-00997-4