Advertisement

Comparing Quality Measures for Contrast Pattern Classifiers

  • Milton García-Borroto
  • Octavio Loyola-Gonzalez
  • José Francisco Martínez-Trinidad
  • Jesús Ariel Carrasco-Ochoa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8258)

Abstract

Contrast pattern miners and contrast pattern classifiers typically use a quality measure to evaluate the discriminative power of a pattern. Since many quality measures exist, it is important to perform comparative studies among them. Nevertheless, previous studies mostly compare measures based on how they impact the classification accuracy. In this paper, we introduce a comparative study of quality measures over different aspects: accuracy using the whole training set, accuracy using pattern subsets, and accuracy and compression for filtering patterns. Experiments over 10 quality measures in 25 repository databases show that there is a huge correlation among different quality measures and that the most accurate quality measures are not appropriate in contexts like pattern filtering.

Keywords

quality evaluation contrast patterns emerging patterns 

References

  1. 1.
    Martens, D., Baesens, B., Gestel, T.V., Vanthienen, J.: Comprehensible credit scoring models using rule extraction from support vector machines. European Journal of Operational Research 183(3), 1466–1476 (2007)CrossRefzbMATHGoogle Scholar
  2. 2.
    Dong, G.: Overview of Results on Contrast Mining and Applications. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications, pp. 353–362. Chapman & Hall/CRC, United States of America (2012)Google Scholar
  3. 3.
    Fang, G., Wang, W., Oatley, B., Ness, B.V., Steinbach, M., Kumar, V.: Characterizing discriminative patterns. Computing Research Repository, abs/1102.4 (2011)Google Scholar
  4. 4.
    An, A., Cercone, N.: Rule quality measures for rule induction systems: Description and evaluation. Computational Intelligence 17(3), 409–424 (2001)CrossRefGoogle Scholar
  5. 5.
    Bailey, J.: Statistical Measures for Contrast Patterns. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications, pp. 13–20. Chapman & Hall/CRC, United States of America (2012)CrossRefGoogle Scholar
  6. 6.
    Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp. 43–52 (1999)Google Scholar
  7. 7.
    Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: Mining contrast sets. In: ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp. 302–306 (1999)Google Scholar
  8. 8.
    Li, J., Yang, Q.: Strong compound-risk factors: Efficient discovery through emerging patterns and contrast sets. IEEE Transactions on Information Technology in Biomedicine 11(5), 544–552 (2007)CrossRefGoogle Scholar
  9. 9.
    Yin, X., Han, J.: CPAR: Classification based on predictive association rules. In: SIAM International Conference on Data Mining, SDM (2003)Google Scholar
  10. 10.
    Li, J., Li, H., Wong, L., Pei, J., Dong, G.: Minimum description length principle: Generators are preferable to closed patterns. In: 21st National Conf. on AI, pp. 409–414 (2006)Google Scholar
  11. 11.
    Lavrac, N., Kavsek, B., Flach, P.A., Todorovski, L.: Subgroup discovery with cn2-sd. Journal of Machine Learning Research with CN2-SD 5, 153–188 (2004)MathSciNetGoogle Scholar
  12. 12.
    Ramamohanarao, K., Fan, H.: Patterns based classifiers. World Wide Web 10, 71–83 (2007)CrossRefGoogle Scholar
  13. 13.
    Abudawood, T., Flach, P.: Evaluation measures for multi-class subgroup discovery. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS (LNAI), vol. 5781, pp. 35–50. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  14. 14.
    García-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Medina-Pérez, M.A., Ruiz-Shulcloper, J.: LCMine: An efficient algorithm for mining discriminative regularities and its application in supervised classification. Pattern Recognition 43(9), 3025–3034 (2010)CrossRefzbMATHGoogle Scholar
  15. 15.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)Google Scholar
  16. 16.
    García, S., Herrera, F., Shawe-Taylor, J.: An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research 9 (2008)Google Scholar
  17. 17.
    Merz, C.J., Murphy, P.M.: Uci repository of machine learning databases, Technical report, Department of Information and Computer Science, University of California at Irvine (1998)Google Scholar
  18. 18.
    Loyola-González, O., García-Borroto, M., Medina-Pérez, M.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., De Ita, G.: An Empirical Study of Oversampling and Undersampling Methods for LCMine an Emerging Pattern Based Classifier. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Rodríguez, J.S., di Baja, G.S. (eds.) MCPR 2013. LNCS, vol. 7914, pp. 264–273. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Milton García-Borroto
    • 1
  • Octavio Loyola-Gonzalez
    • 1
    • 2
  • José Francisco Martínez-Trinidad
    • 2
  • Jesús Ariel Carrasco-Ochoa
    • 2
  1. 1.Centro de Bioplantas.Ciego de AvilaCuba
  2. 2.Instituto Nacional de AstrofísicaÓptica y Electrónica.PueblaMéxico

Personalised recommendations