Abstract
Although multi-label classification has become an increasingly important problem in machine learning, current approaches remain restricted to learning in the original label space (or in a simple linear projection of the original label space). Instead, we propose to use kernels on output label vectors to significantly expand the forms of label dependence that can be captured. The main challenge is to reformulate standard multi-label losses to handle kernels between output vectors. We first demonstrate how a state-of-the-art large margin loss for multi-label classification can be reformulated, exactly, to handle output kernels as well as input kernels. Importantly, the pre-image problem for multi-label classification can be easily solved at test time, while the training procedure can still be simply expressed as a quadratic program in a dual parameter space. We then develop a projected gradient descent training procedure for this new formulation. Our empirical results demonstrate the efficacy of the proposed approach on complex image labeling tasks.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Joachims, T.: Text categorization with support vector machines: learn with many relevant features. In: Proc. of ECML (1998)
McCallum, A.: Multi-label text classification with a mixture model trained by EM. In: AAAI Workshop on Text Learning (1999)
Zhu, S., Ji, X., Xu, W., Gong., Y.: Multi-labelled classification using maximum entropy method. In: SIGIR 2005 (2005)
Petterson, J., Caetano, T.: Submodular multi-label learning. In: Advances in Neural Information Processing Systems, NIPS (2011)
Kazawa, H., Izumitani, T., Taira, H., Maeda, E.: Maximal margin labeling for multi-topic text categorization. In: NIPS 17 (2004)
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)
Hariharan, B., Vishwanathan, S., Varma, M.: Efficient max-margin multi-label classification with applications to zero-shot learning. Machine Learning 88 (2012)
Guo, Y., Schuurmans, D.: Adaptive large margin training for multilabel classification. In: Proc. of AAAI (2011)
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: NIPS (2001)
Schapire, R., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning Journal, 135–168 (2000)
Shalev-Shwartz, S., Singer, Y.: Efficient learning of label ranking by soft projections onto polyhedra. JMLR 7, 1567–1599 (2006)
Fuernkranz, J., Huellermeier, E., Mencia, E., Brinker, K.: Multilabel classification via calibrated label ranking. Machine Learning 73(2)
Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proc. of CIKM (2005)
Zaragoza, J., Sucar, L., Morales, E., Bielza, C., Larranaga, P.: Bayesian chain classifiers for multidimensional classification. In: Proc. of IJCAI (2011)
Guo, Y., Gu, S.: Multi-label classification using conditional dependency networks. In: Proc. of IJCAI (2011)
Hsu, D., Kakade, S., Langford, J., Zhang, T.: Multi-label prediction via compressed sensing. In: Proceedings NIPS (2009)
Chen, Y., Lin, H.: Feature-aware label space dimension reduction for multi-label classification. In: Proceedings NIPS (2012)
Tai, F., Lin, H.: Multi-label classification with principal label space transformation. In: Proc. 2nd International Workshop on Learning from Multi-Label Data (2010)
Zhang, Y., Schneider, J.: Max margin output coding. In: Proc. ICML (2012)
Zhang, Y., Schneider, J.: Multi-label output codes using canonical correlation analysis. In: Proceedings AISTATS (2011)
Zhou, T., Tao, D., Wu, X.: Compressed labeling on distilled labelsets for multi-label learning. Machine Learning 88, 69–126 (2012)
Kimeldorf, G., Wahba, G.: Some results on tchebycheffian spline functions. Journal of Mathematical Analysis and Applications 33, 82–95 (1971)
Schoelkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press (2002)
Huang, D., Tian, Y., De la Torre, F.: Local isomorphism to solve the pre-image problem in kernel methods. In: Proceedings CVPR (2011)
Guo, Y., Schuurmans, D.: Convex relaxations of latent variable training. In: Proceedings of Advances in Neural Information Processing Systems, NIPS (2007)
Xu, L., Schuurmans, D.: Unsupervised and semi-supervised multi-class support vector machines. In: Proceedings AAAI (2005)
Xu, L., Wilkinson, D., Southey, F., Schuurmans, D.: Discriminative unsupervised learning of structured predictors. In: Proceedings ICML (2006)
Cortes, C., Mohri, M., Weston, J.: A general regression technique for learning transductions. In: Proceedings ICML (2005)
Weston, J., Chapelle, O., Elisseeff, A., Schoelkopf, B., Vapnik, V.: Kernel dependency estimation. In: Proceedings NIPS (2002)
Wang, Z., Shawe-Taylor, J.: A kernel regression framework for SMT. Machine Translation 24(2), 87–102 (2010)
Micchelli, C., Pontil, M.: On learning vector-valued functions. Neural Computation 17(1), 177–204 (2005)
Geurts, P., Wehenkel, L., d’Alché Buc, F.: Kernelizing the output of tree-based methods. In: Proceedings ICML (2006)
Geurts, P., Wehenkel, L., d’Alché Buc, F.: Gradient boosting for kernelized output spaces. In: Proceedings ICML (2007)
Geurts, P., Touleimat, N., Dutreix, M., d’Alché Buc, F.: Inferring biological networks with output kernel trees. BMC Bioinformatics 8(S-2) (2007)
Brouard, C., d’Alché Buc, F., Szafranski, M.: Semi-supervised penalized output kernel regression for link prediction. In: Proceedings ICML (2011)
Brouard, C., Szafranski, M.: Regularized output kernel regression applied to protein-protein interaction network inference. In: NIPS MLCB Workshop (2010)
Kadri, H., Duflos, E., Preux, P., Canu, S., Davy, M.: Nonlinear functional regression: a functional RKHS approach. In: Proceedings AISTATS (2010)
Weston, J., Schölkopf, B., Bousquet, O.: Joint kernel maps. In: Cabestany, J., Prieto, A.G., Sandoval, F. (eds.) IWANN 2005. LNCS, vol. 3512, pp. 176–191. Springer, Heidelberg (2005)
Zhang, Y., Yeung, D.: A convex formulation for learning task relationships in multi-task learning. In: Proceedings UAI (2010)
Dinuzzo, F., Fukumizu, K.: Learning low-rank output kernels. In: Proceedings ACML (2011)
Dinuzzo, F., Ong, C., Gehler, P., Pillonetto, G.: Learning output kernels with block coordinate descent. In: Proceedings ICML (2011)
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining (2007)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classiffication. Pattern Recognition 37(9), 1757–1771 (2004)
Huiskes, M., Lew, M.: The MIR flickr retrieval evaluation. In: Proc. of ACM International Conference on Multimedia Information Retrieval (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, Y., Schuurmans, D. (2013). Multi-label Classification with Output Kernels. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40991-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-40991-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40990-5
Online ISBN: 978-3-642-40991-2
eBook Packages: Computer ScienceComputer Science (R0)