Hierarchical Classification for Solving Multi-class Problems: A New Approach Using Naive Bayesian Classification

  • Esra’a Alshdaifat
  • Frans Coenen
  • Keith Dures
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8346)


A hierarchical classification ensemble methodology is proposed as a solution to the multi-class classification problem where the output from a collection of classifiers, arranged in a hierarchical manner, are combined to produce a better composite global classification (better than when the classifiers making up the ensemble operate in isolation). A novel topology for arranging the classifiers in the hierarchy is proposed such that the leaf classifiers act as binary classifiers and the remaining classifiers (those at the root and intermediate nodes) address groupings of classes. The main challenge is how to address the general drawback of the hierarchical model, that is if a record is miss-classified early on in the classification process (near the root of the hierarchy) it will continue to be miss-classified at deeper levels too. Three different approaches, founded on Naive Bayes classification, are proposed whereby Bayesian probability values are used to indicate whether single or multiple paths should be followed within the hierarchy. Reported experimental results demonstrate that the proposed mechanism can improve classification performance, in terms of average AUC, in the context of selected data sets.


Hierarchical classification multi-class classification ensemble classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bache, K., Lichman, M.: UCI machine learning repository (2013),
  2. 2.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  3. 3.
    Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proceedings of the Ninth European Conference on Artificial Intelligence, pp. 147–149. Pitman, Stockholm (1990)Google Scholar
  4. 4.
    Coenen, F.: The LUCS-KDD discretised/normalised arm and carm data library (2003),
  5. 5.
    Coenen, F., Leng, P.: The effect of threshold values on association rule based classification accuracy. Journal of Data and Knowledge Engineering 60(2), 345–360 (2007)CrossRefGoogle Scholar
  6. 6.
    Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. JAIR (1995)Google Scholar
  7. 7.
    Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29(2-3), 103–130 (1997), CrossRefzbMATHGoogle Scholar
  8. 8.
    Duin, R.P.W., Tax, D.M.J.: Experiments with classifier combining rules. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 16–29. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall (2003)Google Scholar
  10. 10.
    Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14(5), 771–780 (1999)Google Scholar
  11. 11.
    Gangrade, A., Patel, R.: Privacy preserving three-layer nave bayes classifier for vertically partitioned databases. Journal of Information and Computing Science 8(2), 119–129 (2013)Google Scholar
  12. 12.
    Giacinto, G., Roli, F.: Dynamic classifier selection. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 177–189. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Grim, J., Kittler, J., Pudil, P., Somol, P.: Combining multiple classifiers in probabilistic neural networks. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 157–166. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  14. 14.
    Jiawei, H., Micheline, K., Jian, P.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2011)Google Scholar
  15. 15.
    Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 223–228. MIT Press (1992)Google Scholar
  16. 16.
    Leonard, T., Hsu, J.S.: Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers. Cambridge University Press (2001)Google Scholar
  17. 17.
    Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  18. 18.
    Rifkin, R.M., Klautau, A.: In defense of one-vs-all classification. Journal of Machine Learning Research 5, 101–141 (2004)zbMATHMathSciNetGoogle Scholar
  19. 19.
    Schapire, R.E.: Using output codes to boost multiclass learning problems. In: Machine Learning: Proceedings of the Fourteenth International Conference (ICML 1997) (1997)Google Scholar
  20. 20.
    Tax, D.M.J., Duin, R.P.W.: Using two-class classifiers for multiclass classification. In: ICPR, vol. (2), pp. 124–127 (2002)Google Scholar
  21. 21.
    Zhang, G.P.: Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 30(4), 451–462 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Esra’a Alshdaifat
    • 1
  • Frans Coenen
    • 1
  • Keith Dures
    • 1
  1. 1.Department of Computer ScienceUniversity of LiverpoolUnited Kingdom

Personalised recommendations