Advertisement

Domain of Competency of Classifiers on Overlapping Complexity of Datasets Using Multi-label Classification with Meta-Learning

  • Shivani GuptaEmail author
  • Atul Gupta
Conference paper
  • 17 Downloads
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1086)

Abstract

A classifier’s performance can be greatly influenced by the characteristics of the underlying dataset. We aim at investigating the connection between the overlapping complexity of dataset and the performance of a classifier in order to understand the domain of competence of these machine learning classifiers. In this paper, we report the results and implications of a study investigating the connection between four overlapping measures and the performance of three classifiers, namely KNN, C4.5 and SVM. In this study, we first evaluated the performance of the three classifiers over 1060 binary classification datasets. Next, we constructed a multi-label classification dataset by computing the four overlapping measures as features and multi-labeled with the competent classifiers over these 1060 binary classification datasets. The generated multi-label classification dataset is then used to estimate the domain of the competence of the three classifiers with respect to the overlapping complexity. This allowed us to express the domain of competence of these classifiers as a set of rules obtained through multi-label rule learning. We found classifiers’ performance invariably degraded with the datasets having high values of complexity measures (N1 and N3). This suggested for the existence of a strong negative correlation between the classifiers’ performance and class overlapping present in the data.

Keywords

Multi-label classification Multi-class classification Class overlapping Meta-learning 

References

  1. 1.
    D.H. Wolpert, The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)CrossRefGoogle Scholar
  2. 2.
    J. Luengo, F. Herrera, Domains of competence of fuzzy rule based classification systems with data complexity measures: a case of study using a fuzzy hybrid genetic based machine learning method. Fuzzy Sets Syst. 161(1), 3–19 (2010)MathSciNetCrossRefGoogle Scholar
  3. 3.
    E. Ramentol et al., SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)CrossRefGoogle Scholar
  4. 4.
    J. Derrac et al., Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 42(5), 1383–1397 (2012)CrossRefGoogle Scholar
  5. 5.
    I. Vainer et al., Obtaining scalable and accurate classification in large-scale spatio-temporal domains. Knowl. Inf. Syst. 29(3), 527–564 (2011)CrossRefGoogle Scholar
  6. 6.
    L.I. Kuncheva, J.J. Rodruez, A weighted voting framework for classifiers ensembles. Knowl. Inf. Syst. 38(2), 259–275 (2014)CrossRefGoogle Scholar
  7. 7.
    J.A. Sez et al., Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition. Knowl. Inf. Syst. 38(1), 179–206 (2014)CrossRefGoogle Scholar
  8. 8.
    F. Thabtah, P. Cowling, Y. Peng, MCAR: multi-class classification based on association rule, in The 3rd ACS/IEEE International Conference on Computer Systems and Applications (IEEE, 2005)Google Scholar
  9. 9.
    M. Basu, T.K. Ho (eds.) Data Complexity in Pattern Recognition (Springer Science and Business Media, 2006)Google Scholar
  10. 10.
    T.K. Ho, M. Basu, Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 3, 289–300 (2002)Google Scholar
  11. 11.
    A. Orriols-Puig, N. Macia, T.K. Ho, Documentation for the Data Complexity Library in C++ (Universitat Ramon Llull, La Salle 196, 2010)Google Scholar
  12. 12.
    J.S. Snchez, R.A. Mollineda, J.M. Sotoca, An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal. Appl. 10(3), 189–201 (2007)MathSciNetCrossRefGoogle Scholar
  13. 13.
    M.-L. Zhang, Zhi-Hua Zhou, ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)CrossRefGoogle Scholar
  14. 14.
    G.-J. Qi, et al. Correlative multi-label video annotation, in Proceedings of the 15th ACM International Conference on Multimedia (ACM, 2007)Google Scholar
  15. 15.
    Y. Zhang, S. Burer, W.N. Street, Ensemble pruning via semi-definite programming. J. Mach. Learn. Res. 7(Jul), 1315–1338 (2006)MathSciNetzbMATHGoogle Scholar
  16. 16.
    G. Tsoumakas, I. Katakis, I. Vlahavas, Mining multi-label data, in Data Mining and Knowledge Discovery Handbook (Springer, Boston, MA, 2009), pp. 667–685Google Scholar
  17. 17.
    S. Godbole, S. Sarawagi, Discriminative methods for multi-labeled classification, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer, Berlin, Heidelberg, 2004)Google Scholar
  18. 18.
    K. Rameshkumar, M. Sambath, S. Ravi, Relevant association rule mining from medical dataset using new irrelevant rule elimination technique, in 2013 International Conference on Information Communication and Embedded Systems (ICICES) (IEEE, 2013)Google Scholar
  19. 19.
    B.M. Al-Maqaleh, Discovering interesting association rules: a multi-objective genetic algorithm approach. Int. J. Appl. Inf. Syst. 5(3), 47–52 (2013)Google Scholar
  20. 20.
    F.A. Thabtah, P. Cowling, Y. Peng, MMAC: a new multi-class, multi-label associative classification approach, in Fourth IEEE International Conference on Data Mining, 2004, ICDM’04 (IEEE, 2004)Google Scholar
  21. 21.
    Y. Ma, B. Liu, W. Hsu, Integrating classification and association rule mining, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (1998)Google Scholar
  22. 22.
    R.S. Lynch, P.K. Willett, Classifier fusion results using various open literature data sets, in SMC’03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme-System Security and Assurance (Cat. No. 03CH37483), vol. 1 (IEEE, 2003)Google Scholar
  23. 23.
    C. Blake, UCI Repository of Machine Learning Databases. http://www.ics.uci.edu/~mlearn/MLRepository.html (1998)
  24. 24.
    I. Russell, Z. Markov, An Introduction to the Weka data mining system, in Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (ACM, 2017)Google Scholar
  25. 25.
    F. Thabtah, Rules pruning in associative classification mining, in Proceedings of the IBIMA Conference (2005)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2021

Authors and Affiliations

  1. 1.Manipal UniversityJaipurIndia
  2. 2.Indian Institute of Information Technology, Design and ManufacturingJabalpurIndia

Personalised recommendations