Learning and Inference in Probabilistic Classifier Chains with Beam Search

  • Abhishek Kumar
  • Shankar Vembu
  • Aditya Krishna Menon
  • Charles Elkan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)


Multilabel learning is an extension of binary classification that is both challenging and practically important. Recently, a method for multilabel learning called probabilistic classifier chains (PCCs) was proposed with numerous appealing properties, such as conceptual simplicity, flexibility, and theoretical justification. However, PCCs suffer from the computational issue of having inference that is exponential in the number of tags, and the practical issue of being sensitive to the suitable ordering of the tags while training. In this paper, we show how the classical technique of beam search may be used to solve both these problems. Specifically, we show how to use beam search to perform tractable test time inference, and how to integrate beam search with training to determine a suitable tag ordering. Experimental results on a range of multilabel datasets show that these proposed changes dramatically extend the practical viability of PCCs.


Beam Width Kernel Matrix Beam Search Binary Relevance Complete Binary Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dembczyński, K., Cheng, W., Hüllermeier, E.: Bayes optimal multilabel classification via probabilistic classifier chains. In: ICML (2010)Google Scholar
  2. 2.
    Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Machine Learning 85(3), 333–359 (2011)CrossRefGoogle Scholar
  3. 3.
    Finley, T., Joachims, T.: Training structural SVMs when exact inference is intractable. In: ICML (2008)Google Scholar
  4. 4.
    Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.: Kernel dependency estimation. In: NIPS (2002)Google Scholar
  5. 5.
    Rai, P., Daumé III, H.: Multi-label prediction via sparse infinite CCA. In: NIPS (2009)Google Scholar
  6. 6.
    Hsu, D., Kakade, S., Langford, J., Zhang, T.: Multi-label prediction via compressed sensing. In: NIPS (2009)Google Scholar
  7. 7.
    Bi, W., Kwok, J.T.: Multilabel classification on tree- and DAG-structured hierarchies. In: ICML (2011)Google Scholar
  8. 8.
    Cristianini, N., Shawe-Taylor, J., Elisseeff, A., Kandola, J.S.: On kernel-target alignment. In: NIPS (2001)Google Scholar
  9. 9.
    Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)CrossRefGoogle Scholar
  10. 10.
    Tsoumakas, G., Katakis, I., Vlahavas, I.P.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2010)Google Scholar
  11. 11.
    Sorower, M.S.: A literature survey on algorithms for multi-label learning. Technical report, Oregon State University, Corvallis, OR, USA (December 2010)Google Scholar
  12. 12.
    Dembczyński, K., Waegeman, W., Hüllermeier, E.: Joint mode estimation in multi-label classification by chaining. In: ECML Workshop - CoLISD (2011)Google Scholar
  13. 13.
    Zaragoza, J., Sucar, L., Morales, E.: Bayesian chain classifiers for multidimensional classification. In: IJCAI (2011)Google Scholar
  14. 14.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)Google Scholar
  15. 15.
    Hart, P., Nilsson, N., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4(2), 100–107 (1968)CrossRefGoogle Scholar
  16. 16.
    Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: A java library for multi-label learning. Journal of Machine Learning Research 12, 2411–2414 (2011)MathSciNetGoogle Scholar
  17. 17.
    King, G., Zeng, L.: Logistic regression in rare events data. Political Analysis 9(2), 137–163 (2001)CrossRefGoogle Scholar
  18. 18.
    Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: KDD (2002)Google Scholar
  19. 19.
    Menon, A.K., Jiang, X., Vembu, S., Elkan, C., Ohno-Machado, L.: Predicting accurate probabilities with a ranking loss. In: ICML (2012)Google Scholar
  20. 20.
    Schölkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Müller, K.R., Rätsch, G., Smola, A.J.: Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks 10(5), 1000–1017 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Abhishek Kumar
    • 1
  • Shankar Vembu
    • 2
  • Aditya Krishna Menon
    • 1
  • Charles Elkan
    • 1
  1. 1.Department of Computer ScienceUC San DiegoUSA
  2. 2.Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoCanada

Personalised recommendations