Advertisement

One-Sided Prototype Selection on Class Imbalanced Dissimilarity Matrices

  • Mónica Millán-Giraldo
  • Vicente García
  • J. Salvador Sánchez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7626)

Abstract

In the dissimilarity representation paradigm, several prototype selection methods have been used to cope with the topic of how to select a small representation set for generating a low-dimensional dissimilarity space. In addition, these methods have also been used to reduce the size of the dissimilarity matrix. However, these approaches assume a relatively balanced class distribution, which is grossly violated in many real-life problems. Often, the ratios of prior probabilities between classes are extremely skewed. In this paper, we study the use of renowned prototype selection methods adapted to the case of learning from an imbalanced dissimilarity matrix. More specifically, we propose the use of these methods to under-sample the majority class in the dissimilarity space. The experimental results demonstrate that the one-sided selection strategy performs better than the classical prototype selection methods applied over all classes.

Keywords

Minority Class Class Imbalance Dissimilarity Matrix Neighbor Rule Pattern Recognition Letter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Duin, R.P.W., Pękalska, E.: The dissimilarity space: Bridging structural and statistical pattern recognition. Pattern Recognition Letters 33(7), 826–832 (2012)CrossRefGoogle Scholar
  2. 2.
    Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific (2005)Google Scholar
  3. 3.
    Pekalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters 23(8), 943–956 (2002)zbMATHCrossRefGoogle Scholar
  4. 4.
    Kim, S.W.: An empirical evaluation on dimensionality reduction schemes for dissimilarity-based classifications. Pattern Recognition Letters 32(6), 816–823 (2011)CrossRefGoogle Scholar
  5. 5.
    Duin, R.P.W., Pękalska, E.: The Dissimilarity Representation for Structural Pattern Recognition. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 1–24. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Pekalska, E., Duin, R.P.W., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recognition 39(2), 189–208 (2006)zbMATHCrossRefGoogle Scholar
  7. 7.
    Plasencia-Calaña, Y., García-Reyes, E., Duin, R.P.W.: Prototype selection methods for dissimilarity space classification. Technical report, Advanced Technologies Application Center CENATAVGoogle Scholar
  8. 8.
    Kim, S.W., Oommen, B.J.: On using prototype reduction schemes to optimize dissimilarity-based classification. Pattern Recognition 40(11), 2946–2957 (2007)zbMATHCrossRefGoogle Scholar
  9. 9.
    Plasencia-Calaña, Y., García-Reyes, E., Orozco-Alzate, M., Duin, R.P.W.: Prototype selection for dissimilarity representation by a genetic algorithm. In: Proc. 20th International Conference on Pattern Recognition, pp. 177–180 (2010)Google Scholar
  10. 10.
    Chawla, N., Japkowicz, N., Kotcz, A.: Editorial: Special issue on learning from imbalanced data sets. SIGKDD Explorations 6(1), 1–6 (2004)CrossRefGoogle Scholar
  11. 11.
    Sun, Y., Wong, A., Kamel, M.S.: Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence 23(4), 687–719 (2009)CrossRefGoogle Scholar
  12. 12.
    Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations 6(1), 20–29 (2004)CrossRefGoogle Scholar
  13. 13.
    Lozano, M., Sotoca, J.M., Sánchez, J.S., Pla, F., Pekalska, E., Duin, R.P.W.: Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces. Pattern Recognition 39, 1827–1838 (2006)zbMATHCrossRefGoogle Scholar
  14. 14.
    Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. on Information Theory 14, 515–516 (1968)CrossRefGoogle Scholar
  15. 15.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. on Systems, Man and Cybernetics 2(3), 408–421 (1972)zbMATHCrossRefGoogle Scholar
  16. 16.
    Daskalaki, S., Kopanas, I., Avouris, N.: Evaluation of classifiers for an uneven class distribution problem. Applied Artificial Intelligence 20(5), 381–417 (2006)CrossRefGoogle Scholar
  17. 17.
    Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Proc. 3rd International Conference on Knowledge Discovery and Data Mining, pp. 43–48 (1997)Google Scholar
  18. 18.
    Sokolova, M.V., Japkowicz, N., Szpakowicz, S.: Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In: Sattar, A., Kang, B.-H. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7(1), 1–30 (2006)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Mónica Millán-Giraldo
    • 1
    • 2
  • Vicente García
    • 2
  • J. Salvador Sánchez
    • 2
  1. 1.Intelligent Data Analysis LaboratoryUniversity of ValenciaBurjassotSpain
  2. 2.Institute of New Imaging Technologies, Department of Computer Languages and SystemsUniversity Jaume ICastelló de la PlanaSpain

Personalised recommendations