Advertisement

A Novel Contrast Pattern Selection Method for Class Imbalance Problems

  • Octavio Loyola-GonzálezEmail author
  • José Fco. Martínez-Trinidad
  • Jesús Ariel Carrasco-Ochoa
  • Milton García-Borroto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10267)

Abstract

Selecting contrast patterns is an important task for pattern-based classifiers, especially in class imbalance problems. The main reason is that the contrast pattern miners commonly extract several patterns with high support for the majority class and only a few patterns, with low support, for the minority class. This produces a bias of classification results toward the majority class, obtaining a low accuracy for the minority class. In this paper, we introduce a contrast pattern selection method for class imbalance problems. Our proposal selects all the contrast patterns for the minority class and a certain percent of contrast patterns for the majority class. Our experiments performed over several imbalanced databases show that our proposal selects significantly better contrast patterns, obtaining better AUC results, than other approaches reported in the literature.

Keywords

Supervised classification Pattern selection Contrast patterns Imbalanced databases 

Notes

Acknowledgment

This work was partly supported by National Council of Science and Technology of Mexico under the scholarship grant 370272.

References

  1. 1.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  2. 2.
    Alhammady, H.: A novel approach for mining emerging patterns in rare-class datasets. In: Sobh, T. (ed.) Innovations and Advanced Techniques in Computer and Information Sciences and Engineering, pp. 207–211. Springer, Dordrecht (2007)CrossRefGoogle Scholar
  3. 3.
    Cieslak, D., Hoens, T., Chawla, N., Kegelmeyer, W.: Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Disc. 24(1), 136–158 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Coenen, F., Leng, P.: An evaluation of approaches to classification rule selection. In: Fourth IEEE International Conference on Data Mining, pp. 359–362 (2004)Google Scholar
  5. 5.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comp. 1(1), 3–18 (2011)CrossRefGoogle Scholar
  7. 7.
    Dong, G., Bailey, J.: Contrast Data Mining: Concepts, Algorithms, and Applications. Chapman and Hall/CRC, 1st edn. (2012)Google Scholar
  8. 8.
    Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: classification by aggregating emerging patterns. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS, vol. 1721, pp. 30–42. Springer, Heidelberg (1999). doi: 10.1007/3-540-46846-3_4 CrossRefGoogle Scholar
  9. 9.
    Fürnkranz, J., Flach, P.: An analysis of stopping and filtering criteria for rule learning. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 123–133. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30115-8_14 CrossRefGoogle Scholar
  10. 10.
    García-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: Finding the best diversity generation procedures for mining contrast patterns. Expert Syst. Appl. 42(11), 4859–4866 (2015)CrossRefGoogle Scholar
  11. 11.
    García-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Medina-Pérez, M.A., Ruiz-Shulcloper, J.: LCMine: an efficient algorithm for mining discriminative regularities and its application in supervised classification. Pattern Recogn. 43(9), 3025–3034 (2010)CrossRefzbMATHGoogle Scholar
  12. 12.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Expl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  13. 13.
    Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)CrossRefGoogle Scholar
  14. 14.
    Kundu, G., Islam, M., Munir, S., Bari, M.: ACN: an associative classifier with negative rules. In: Proceedings of the 11th IEEE International Conference on Computational Science and Engineering, pp. 369–375. IEEE Xplore Press (2008)Google Scholar
  15. 15.
    Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the International Conference on Data Mining, ICDM 2001, pp. 369–376. IEEE (2001)Google Scholar
  16. 16.
    Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data mining, KDD 1998, pp. 80–86. AAAI (1998)Google Scholar
  17. 17.
    Loyola-González, O., Garcia-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: An empirical comparison among quality measures for pattern based classifiers. Intell. Data Anal. 18, S5–S17 (2014)Google Scholar
  18. 18.
    Loyola-González, O., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., García-Borroto, M.: Effect of class imbalance on quality measures for contrast patterns: an experimental study. Inform. Sci. 374, 179–192 (2016)CrossRefGoogle Scholar
  19. 19.
    Loyola-González, O., Medina-Pérez, M.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Monroy, R., García-Borroto, M.: PBC4cip: a new contrast pattern-based classifier for class imbalance problems. Knowl.-Based Syst. 115, 100–109 (2016)CrossRefGoogle Scholar
  20. 20.
    Moreno-Torres, J.G., Saez, J.A., Herrera, F.: Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans. Neural Networks Learn. Syst. 23(8), 1304–1312 (2012)CrossRefGoogle Scholar
  21. 21.
    Orriols-Puig, A., Bernadó-Mansilla, E.: Evolutionary rule-based systems for imbalanced data sets. Soft Comput. 13(3), 213–225 (2009)CrossRefGoogle Scholar
  22. 22.
    Refai, M.H., Yusof, Y.: Partial rule match for filtering rules in associative classification. J. Comput. Sci. 10(4), 570 (2014)CrossRefGoogle Scholar
  23. 23.
    Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Inf. Syst. 29(4), 293–313 (2004)CrossRefGoogle Scholar
  24. 24.
    Wang, Y.J., Xin, Q., Coenen, F.: A novel rule weighting approach in classification association rule mining. In: Seventh IEEE International Conference on Data Mining Workshops, pp. 271–276 (2007)Google Scholar
  25. 25.
    Ye, Y., Li, T., Jiang, Q., Wang, Y.: CIMDS: adapting postprocessing techniques of associative classification for malware detection. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(3), 298–307 (2010)CrossRefGoogle Scholar
  26. 26.
    Yin, X., Han, J.: CPAR: classification based on predictive association rules. In: Proceedings of the Third SIAM International Conference on Data Mining, SDM 2003, pp. 331–335. SIAM (2003)Google Scholar
  27. 27.
    Zhang, X., Dong, G.: Overview and Analysis of Contrast Pattern Based Classification. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications, Chap. 11. Data Mining and Knowledge Discovery Series, pp. 151–170. Chapman and Hall/CRC (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Octavio Loyola-González
    • 1
    • 2
    Email author
  • José Fco. Martínez-Trinidad
    • 1
  • Jesús Ariel Carrasco-Ochoa
    • 1
  • Milton García-Borroto
    • 3
  1. 1.Instituto Nacional de Astrofísica, Óptica y ElectrónicaPueblaMexico
  2. 2.Centro de BioplantasUniversidad de Ciego de Ávila.Ciego de ávilaCuba
  3. 3.Instituto Superior Politécnico José Antonio Echeverría.MarianaoCuba

Personalised recommendations