A Novel Feature Selection-Based Sequential Ensemble Learning Method for Class Noise Detection in High-Dimensional Data
Most of the irrelevant or noise features in high-dimensional data present significant challenges to high-dimensional mislabeled instances detection methods based on feature selection. Traditional methods often perform the two dependent step: The first step, searching for the relevant subspace, and the second step, using the feature subspace which obtained in the previous step training model. However, Feature subspace that are not related to noise scores and influence detection performance. In this paper, we propose a novel sequential ensemble method SENF that aggregate the above two phases, our method learns the sequential ensembles to obtain refine feature subspace and improve detection accuracy by iterative sparse modeling with noise scores as the regression target attribute. Through extensive experiments on 8 real-world high-dimensional datasets from the UCI machine learning repository , we show that SENF performs significantly better or at least similar to the individual baselines as well as the existing state-of-the-art label noise detection method.
KeywordsNoise Filtering Sequential ensemble Feature selection
This research was supported by Nature Science Foundation of China (Grant No. 61672284), Natural Science Foundation of Jiangsu Province (Grant No. BK20171418), China Postdoctoral Science Foundation (Grant No. 2016M591841), Jiangsu Planned Projects for Postdoctoral Research Funds (No. 1601225C). This research was also supported by the Fundamental Research Funds for the Central Universities (No. NS2016089). Meanwhile, this research work was supported by Zayed University Research Cluster Award # R18038.
- 1.Angelova, A., Abu-Mostafam, Y., Perona, P.: Pruning training sets for learning of object categories. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 494–501. IEEE (2005)Google Scholar
- 3.Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017)Google Scholar
- 4.Folleco, A., Khoshgoftaar, T.M., Van Hulse, J., Bullard, L.: Identifying learners robust to low quality data. In: 2008 IEEE International Conference on Information Reuse and Integration, IRI 2008, pp. 190–195. IEEE (2008)Google Scholar
- 6.Gamberger, D., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: ICML, pp. 143–151 (1999)Google Scholar
- 9.Khoshgoftaar, T.M., Rebours, P.: Generating multiple noise elimination filters with the ensemble-partitioning filter. In: 2004 Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, IRI 2004, pp. 369–375. IEEE (2004)Google Scholar
- 11.Miranda, A.L.B., Garcia, L.P.F., Carvalho, A.C.P.L.F., Lorena, A.C.: Use of classification algorithms in noise detection and elimination. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS (LNAI), vol. 5572, pp. 417–424. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02319-4_50CrossRefGoogle Scholar
- 12.Pang, G., Cao, L., Chen, L., Lian, D., Liu, H.: Sparse modeling-based sequential ensemble learning for effective outlier detection in high-dimensional numeric data (2018)Google Scholar
- 13.Pechenizkiy, M., Tsymbal, A., Puuronen, S., Pechenizkiy, O.: Class noise and supervised learning in medical domains: the effect of feature extraction. In: 2006 19th IEEE International Symposium on CBMS 2006 Computer-Based Medical Systems, pp. 708–713. IEEE (2006)Google Scholar
- 17.Thongkam, J., Xu, G., Zhang, Y., Huang, F.: Support vector machine for outlier detection in breast cancer survivability prediction. In: Ishikawa, Y., et al. (eds.) APWeb 2008. LNCS, vol. 4977, pp. 99–109. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89376-9_10CrossRefGoogle Scholar
- 20.Yue, L., Chen, W., Li, X., Zuo, W., Yin, M.: A survey of sentiment analysis in social media. Knowl. Inf. Syst. 1–47 (2018)Google Scholar