Advertisement

A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data

  • Ali Braytee
  • Wei LiuEmail author
  • Paul Kennedy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9949)

Abstract

In this paper, novel cost-sensitive principal component analysis (CSPCA) and cost-sensitive non-negative matrix factorization (CSNMF) methods are proposed for handling the problem of feature extraction from imbalanced data. The presence of highly imbalanced data misleads existing feature extraction techniques to produce biased features, which results in poor classification performance especially for the minor class problem. To solve this problem, we propose a cost-sensitive learning strategy for feature extraction techniques that uses the imbalance ratio of classes to discount the majority samples. This strategy is adapted to the popular feature extraction methods such as PCA and NMF. The main advantage of the proposed methods is that they are able to lessen the inherent bias of the extracted features to the majority class in existing PCA and NMF algorithms. Experiments on twelve public datasets with different levels of imbalance ratios show that the proposed methods outperformed the state-of-the-art methods on multiple classifiers.

Keywords

Principal Component Analysis Base Classifier Minority Class Imbalanced Class Imbalanced Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Alcalá, J., Fernández, A., et al.: Keel data-mining software tool: data set repository. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2010)Google Scholar
  2. 2.
    Asuncion, A., Newman, D.: Uci machine learning repository (2007)Google Scholar
  3. 3.
    Chawla, N., Japkowicz, N., Kolcz, A.: Special issue on learning from imbalanced datasets, sigkdd explorations. In: ACM SIGKDD (2004)Google Scholar
  4. 4.
    Dmochowski, J.P., Sajda, P., Parra, L.C.: Maximum likelihood in cost-sensitive learning: model specification, approximations, and upper bounds. J. Mach. Learn. Res. 11, 3313–3332 (2010)MathSciNetzbMATHGoogle Scholar
  5. 5.
    He, H., Bai, Y., et al.: Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008, pp. 1322–1328. IEEE (2008)Google Scholar
  6. 6.
    Kirby, M., Sirovich, L.: Application of the karhunen-loeve procedure for the characterization of human faces. IEEE Trans. Pattern Anal. Mach. Intell. 12(1), 103–108 (1990)CrossRefGoogle Scholar
  7. 7.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2001)Google Scholar
  8. 8.
    Liu, W., Chan, J., Bailey, J., Leckie, C., Ramamohanarao, K.: Mining labelled tensors by discovering both their common and discriminative subspaces. In: SIAM International Conference on Data Mining (SDM13), pp. 614–622 (2013)Google Scholar
  9. 9.
    Liu, W., Chawla, S., et al.: A robust decision tree algorithm for imbalanced data sets. In: SDM, vol. 10, pp. 766–777. SIAM (2010)Google Scholar
  10. 10.
    Liu, W., Kan, A., Chan, J., Bailey, J., Leckie, C., Pei, J., Kotagiri, R.: On compressing weighted time-evolving graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012), pp. 2319–2322 (2012)Google Scholar
  11. 11.
    Ristanoski, G., Liu, W., Bailey, J.: Discrimination aware classification for imbalanced datasets. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013), pp. 1529–1532 (2013)Google Scholar
  12. 12.
    Wang, J., You, J., et al.: Extract minimum positive and maximum negative features for imbalanced binary classification. Pattern Recogn. 45(3), 1136–1145 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Quantum Computation and Intelligent Systems, School of SoftwareUniversity of Technology SydneySydneyAustralia
  2. 2.Advanced Analytics InstituteUniversity of Technology SydneySydneyAustralia

Personalised recommendations