On Measuring the Complexity of Classification Problems

  • Ana Carolina LorenaEmail author
  • Marcilio C. P. de Souto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9489)


There has been a growing interest in describing the difficulty of solving a classification problem. This knowledge can be used, among other things, to support more grounded decisions concerning data pre-processing, as well as for the development of new data-driven pattern recognition techniques. Indeed, to estimate the intrinsic complexity of a classification problem, there are a variety of measures that can be extracted from a training data set. This paper presents some of them, performing a theoretical analysis.


Machine Learning Complexity measures Classification problems 


  1. 1.
    Antolnez, N.M.: Data complexity in supervised learning: a far-reaching implication. Ph.D. thesis, La Salle, Universitat Ramon Llull (2011)Google Scholar
  2. 2.
    Basu, M., Ho, T.K.: Data Complexity in Pattern Recognition. Springer, London (2006)CrossRefzbMATHGoogle Scholar
  3. 3.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)CrossRefzbMATHGoogle Scholar
  4. 4.
    Cummins, L.: Combining and choosing case base maintenance algorithms. Ph.D. thesis, National University of Ireland, Cork (2013)Google Scholar
  5. 5.
    Dong, M., Kothari, R.: Feature subset selection using a new definition of classificability. PRL 24, 1215–1225 (2003)CrossRefzbMATHGoogle Scholar
  6. 6.
    Flores, M.J., Gámez, J.A., Martínez, A.M.: Domains of competence of the semi-naive bayesian network classifiers. Inf. Sci. 260, 120–148 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Garcia, L.P.F., de Carvalho, A.C.P.L.F., Lorena, A.C.: Effect of label noise in the complexity of classification problems. Neurocomputing (accepted) (2015, in press)Google Scholar
  8. 8.
    Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)CrossRefGoogle Scholar
  9. 9.
    Hoekstra, A., Duin, R.P.: On the nonlinearity of pattern classifiers. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. 4, pp. 271–275. IEEE (1996)Google Scholar
  10. 10.
    Hu, Q., Pedrycz, W., Yu, D., Lang, J.: Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40(1), 137–150 (2010)CrossRefGoogle Scholar
  11. 11.
    Li, L., Abu-Mostafa, Y.S.: Data complexity in machine learning. Technical Report CaltechCSTR:2006.004, Caltech Computer Science (2006)Google Scholar
  12. 12.
    Lorena, A.C., Costa, I.G., Spolar, N., Souto, M.C.P.: Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75, 33–42 (2012)CrossRefGoogle Scholar
  13. 13.
    Luengo, J., Herrera, F.: Shared domains of competence of approximate learning models using measures of separability of classes. Inf. Sci. 185(1), 43–65 (2012)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Mansilla, E.B., Ho, T.K.: On classifier domains of competence. In: Proceedings of the 17th ICPR, pp. 136–139 (2004)Google Scholar
  15. 15.
    Mollineda, R.A., Sánchez, J.S., Sotoca, J.M.: Data characterization for effective prototype selection. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523, pp. 27–34. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  16. 16.
    Orriols-Puig, A., Maci, N., Ho, T.K.: Documentation for the data complexity library in c++. Technical report, La Salle - Universitat Ramon Llull (2010)Google Scholar
  17. 17.
    Singh, S.: Multiresolution estimates of classification complexity. IEEE Trans. PAMI 25, 1534–1539 (2003)CrossRefGoogle Scholar
  18. 18.
    Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2014)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Souto, M.C.P., Lorena, A.C., Spolar, N., Costa, I.G.: Complexity measures of supervised classification tasks: a case study for cancer gene expression data. In: Proceedings of IJCNN, pp. 1352–1358 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ana Carolina Lorena
    • 1
    Email author
  • Marcilio C. P. de Souto
    • 2
  1. 1.Instituto de Ciência e Tecnologia, Universidade Federal de São PauloSão José dos CamposBrazil
  2. 2.Univ. Orléans, INSA Centre Val de Loire, LIFO EA 4022OrléansFrance

Personalised recommendations