Advertisement

A Novel Feature Selection-Based Sequential Ensemble Learning Method for Class Noise Detection in High-Dimensional Data

  • Kai Chen
  • Donghai Guan
  • Weiwei Yuan
  • Bohan Li
  • Asad Masood Khattak
  • Omar Alfandi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11323)

Abstract

Most of the irrelevant or noise features in high-dimensional data present significant challenges to high-dimensional mislabeled instances detection methods based on feature selection. Traditional methods often perform the two dependent step: The first step, searching for the relevant subspace, and the second step, using the feature subspace which obtained in the previous step training model. However, Feature subspace that are not related to noise scores and influence detection performance. In this paper, we propose a novel sequential ensemble method SENF that aggregate the above two phases, our method learns the sequential ensembles to obtain refine feature subspace and improve detection accuracy by iterative sparse modeling with noise scores as the regression target attribute. Through extensive experiments on 8 real-world high-dimensional datasets from the UCI machine learning repository [3], we show that SENF performs significantly better or at least similar to the individual baselines as well as the existing state-of-the-art label noise detection method.

Keywords

Noise Filtering Sequential ensemble Feature selection 

Notes

Acknowledgements

This research was supported by Nature Science Foundation of China (Grant No. 61672284), Natural Science Foundation of Jiangsu Province (Grant No. BK20171418), China Postdoctoral Science Foundation (Grant No. 2016M591841), Jiangsu Planned Projects for Postdoctoral Research Funds (No. 1601225C). This research was also supported by the Fundamental Research Funds for the Central Universities (No. NS2016089). Meanwhile, this research work was supported by Zayed University Research Cluster Award # R18038.

References

  1. 1.
    Angelova, A., Abu-Mostafam, Y., Perona, P.: Pruning training sets for learning of object categories. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 494–501. IEEE (2005)Google Scholar
  2. 2.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)CrossRefGoogle Scholar
  3. 3.
    Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017)Google Scholar
  4. 4.
    Folleco, A., Khoshgoftaar, T.M., Van Hulse, J., Bullard, L.: Identifying learners robust to low quality data. In: 2008 IEEE International Conference on Information Reuse and Integration, IRI 2008, pp. 190–195. IEEE (2008)Google Scholar
  5. 5.
    Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRefGoogle Scholar
  6. 6.
    Gamberger, D., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: ICML, pp. 143–151 (1999)Google Scholar
  7. 7.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)CrossRefGoogle Scholar
  8. 8.
    Jeatrakul, P., Wong, K.W., Fung, C.C.: Data cleaning for classification using misclassification analysis. J. Adv. Comput. Intell. Intell. Inf. 14(3), 297–302 (2010)CrossRefGoogle Scholar
  9. 9.
    Khoshgoftaar, T.M., Rebours, P.: Generating multiple noise elimination filters with the ensemble-partitioning filter. In: 2004 Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, IRI 2004, pp. 369–375. IEEE (2004)Google Scholar
  10. 10.
    Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22(3), 387–396 (2007)CrossRefGoogle Scholar
  11. 11.
    Miranda, A.L.B., Garcia, L.P.F., Carvalho, A.C.P.L.F., Lorena, A.C.: Use of classification algorithms in noise detection and elimination. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS (LNAI), vol. 5572, pp. 417–424. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-02319-4_50CrossRefGoogle Scholar
  12. 12.
    Pang, G., Cao, L., Chen, L., Lian, D., Liu, H.: Sparse modeling-based sequential ensemble learning for effective outlier detection in high-dimensional numeric data (2018)Google Scholar
  13. 13.
    Pechenizkiy, M., Tsymbal, A., Puuronen, S., Pechenizkiy, O.: Class noise and supervised learning in medical domains: the effect of feature extraction. In: 2006 19th IEEE International Symposium on CBMS 2006 Computer-Based Medical Systems, pp. 708–713. IEEE (2006)Google Scholar
  14. 14.
    Sáez, J.A., Galar, M., Luengo, J., Herrera, F.: INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf. Fusion 27, 19–32 (2016)CrossRefGoogle Scholar
  15. 15.
    Sánchez, J.S., Pla, F., Ferri, F.J.: Prototype selection for the nearest neighbour rule through proximity graphs. Pattern Recogn. Lett. 18(6), 507–513 (1997)CrossRefGoogle Scholar
  16. 16.
    Teng, C.M.: Dealing with data corruption in remote sensing. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds.) IDA 2005. LNCS, vol. 3646, pp. 452–463. Springer, Heidelberg (2005).  https://doi.org/10.1007/11552253_41CrossRefGoogle Scholar
  17. 17.
    Thongkam, J., Xu, G., Zhang, Y., Huang, F.: Support vector machine for outlier detection in breast cancer survivability prediction. In: Ishikawa, Y., et al. (eds.) APWeb 2008. LNCS, vol. 4977, pp. 99–109. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-89376-9_10CrossRefGoogle Scholar
  18. 18.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)CrossRefGoogle Scholar
  20. 20.
    Yue, L., Chen, W., Li, X., Zuo, W., Yin, M.: A survey of sentiment analysis in social media. Knowl. Inf. Syst. 1–47 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Kai Chen
    • 1
  • Donghai Guan
    • 1
    • 2
  • Weiwei Yuan
    • 1
    • 2
  • Bohan Li
    • 1
    • 2
  • Asad Masood Khattak
    • 3
  • Omar Alfandi
    • 3
  1. 1.College of Computer Science and TechnologyNanjing University of Aeronautics and AstronauticsNanjingChina
  2. 2.Collaborative Innovation Center of Novel Software Technology and IndustrializationPittsburghUSA
  3. 3.College of Technological InnovationZayed UniversityAbu DhabiUAE

Personalised recommendations