A Deep Interpretation of Classifier Chains

  • Jesse Read
  • Jaakko Hollmén
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8819)


In the “classifier chains” (CC) approach for multi-label classification, the predictions of binary classifiers are cascaded along a chain as additional features. This method has attained high predictive performance, and is receiving increasing analysis and attention in the recent multi-label literature, although a deep understanding of its performance is still taking shape. In this paper, we show that CC gets predictive power from leveraging labels as additional stochastic features, contrasting with many other methods, such as stacking and error correcting output codes, which use label dependence only as kind of regularization. CC methods can learn a concept which these cannot, even supposing the same base classifier and hypothesis space. This leads us to connections with deep learning (indeed, we show that CC is competitive precisely because it is a deep learner), and we employ deep learning methods – showing that they can supplement or even replace a classifier chain. Results are convincing, and throw new insight into promising future directions.


Binary Rele Extreme Learning Machine Deep Learning Hypothesis Space Restricted Boltzmann Machine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press (2012)Google Scholar
  2. 2.
    Dembczyński, K., Cheng, W., Hüllermeier, E.: Bayes optimal multilabel classification via probabilistic classifier chains. In: ICML 2010: 27th International Conference on Machine Learning, pp. 279–286. Omni Press, Haifa (2010)Google Scholar
  3. 3.
    Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76(2-3), 211–225 (2009)CrossRefGoogle Scholar
  4. 4.
    Dembczyński, K., Waegeman, W., Cheng, W., Hüllermeier, E.: On label dependence and loss minimization in multi-label classification. Mach. Learn. 88(1-2), 5–45 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Dembczyński, K., Waegeman, W., Hüllermeier, E.: An analysis of chaining in multi-label classification. In: ECAI: European Conference of Artificial Intelligence. Frontiers in Artificial Intelligence and Applications, vol. 242, pp. 294–299. IOS Press (2012)Google Scholar
  6. 6.
    Ghani, R.: Using error-correcting codes for text classification. In: ICML 2000: 17th International Conference on Machine Learning, pp. 303–310. Morgan Kaufmann Publishers, Stanford (2000)Google Scholar
  7. 7.
    Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989)CrossRefGoogle Scholar
  10. 10.
    Huang, G.-B., Wang, D., Lan, Y.: Extreme learning machines: A survey. International Journal of Machine Learning and Cybernetics 2(2), 107–122 (2011)CrossRefGoogle Scholar
  11. 11.
    Kumar, A., Vembu, S., Menon, A.K., Elkan, C.: Learning and inference in probabilistic classifier chains with beam search. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 665–680. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognition 45(9), 3084–3104 (2012)CrossRefGoogle Scholar
  13. 13.
    Minsky, M., Papert, S.: Perceptrons — An introduction to Computational Geometry. The MIT Press (1969)Google Scholar
  14. 14.
    Read, J., Achutegui, K., Miguez, J.: A distributed particle filter for nonlinear tracking in wireless sensor networks. Signal Processing 98, 121–134 (2014)CrossRefGoogle Scholar
  15. 15.
    Read, J., Martino, L., Luengo, D.: Efficient monte carlo methods for multi-dimensional learning with classifier chains. Pattern Recognition 47(3) (2014)Google Scholar
  16. 16.
    Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Machine Learning 85(3), 333–359 (2011)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Rumelhart, D.E., McClelland, J.L., Research Group, P.D.P. (eds.): Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations. MIT Press, Cambridge (1986)Google Scholar
  18. 18.
    Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)zbMATHCrossRefGoogle Scholar
  19. 19.
    Thomas Miller III, W., Glanz, F.H., Gordon Kraft III, L.: CMAC: An associative neural network alternative to backpropagation. Proceedings of the IEEE 78(10), 1561–1567 (1990)CrossRefGoogle Scholar
  20. 20.
    Zaragoza, J.H., Sucar, L.E., Morales, E.F., Bielza, C., Larrañaga, P.: Bayesian chain classifiers for multidimensional classification. In: 24th International Conference on Artificial Intelligence (IJCAI 2011), pp. 2192–2197 (2011)Google Scholar
  21. 21.
    Zhang, M.-L., Zhou, Z.-H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering 18(10), 1338–1351 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jesse Read
    • 1
    • 2
  • Jaakko Hollmén
    • 1
    • 2
  1. 1.Department of Information and Computer ScienceAalto UniversityAalto, EspooFinland
  2. 2.Helsinki Institute for Information Technology (HIIT)Finland

Personalised recommendations