Abstract
Traditionally, Multi-task Learning (MTL) models optimize the average of task-related objective functions, which is an intuitive approach and which we will be referring to as Average MTL. However, a more general framework, referred to as Conic MTL, can be formulated by considering conic combinations of the objective functions instead; in this framework, Average MTL arises as a special case, when all combination coefficients equal 1. Although the advantage of Conic MTL over Average MTL has been shown experimentally in previous works, no theoretical justification has been provided to date. In this paper, we derive a generalization bound for the Conic MTL method, and demonstrate that the tightest bound is not necessarily achieved, when all combination coefficients equal 1; hence, Average MTL may not always be the optimal choice, and it is important to consider Conic MTL. As a byproduct of the generalization bound, it also theoretically explains the good experimental results of previous relevant works. Finally, we propose a new Conic MTL model, whose conic combination coefficients minimize the generalization bound, instead of choosing them heuristically as has been done in previous methods. The rationale and advantage of our model is demonstrated and verified via a series of experiments by comparing with several other methods.
Chapter PDF
References
Caruana, R.: Multitask learning. Machine Learning 28, 41–75 (1997)
Evgeniou, T., Pontil, M.: Regularized multi–task learning. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 109–117. ACM (2004)
Zhong, L.W., Kwok, J.T.: Convex multitask learning with flexible task clusters. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012 (2012)
Zhou, J., Chen, J., Ye, J.: Clustered multi-task learning via alternating structure optimization. In: Advances in Neural Information Processing Systems, pp. 702–710 (2011)
Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing 20(2), 231–252 (2010)
Jalali, A., Sanghavi, S., Ruan, C., Ravikumar, P.K.: A dirty model for multi-task learning. In: Advances in Neural Information Processing Systems, pp. 964–972 (2010)
Gong, P., Ye, J., Zhang, C.: Multi-stage multi-task feature learning. The Journal of Machine Learning Research 14(1), 2979–3010 (2013)
Liu, J., Ji, S., Ye, J.: Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 339–348. AUAI Press (2009)
Fei, H., Huan, J.: Structured feature selection and task relationship inference for multi-task learning. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 171–180 (2011)
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Machine Learning 73, 243–272 (2008)
Kang, Z., Grauman, K., Sha, F.: Learning with whom to share in multi-task feature learning. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011 (2011)
Zhang, Y., Yeung, D.Y.: A convex formulation for learning task relationships in multi-task learning. ArXiv e-prints (2012)
Zhang, Y.: Heterogeneous-neighborhood-based multi-task local learning algorithms. In: Advances in Neural Information Processing Systems, pp. 1896–1904 (2013)
Romera-Paredes, B., Argyriou, A., Berthouze, N., Pontil, M.: Exploiting unrelated tasks in multi-task learning. In: International Conference on Artificial Intelligence and Statistics, pp. 951–959 (2012)
Pu, J., Jiang, Y.G., Wang, J., Xue, X.: Multiple task learning using iteratively reweighted least square. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. 1607–1613 (2013)
Tang, L., Chen, J., Ye, J.: On multiple kernel learning with multiple labels. In: Proceedings of the 21st International Joint Conference on Artifical Intelligence, pp. 1255–1260 (2009)
Rakotomamonjy, A., Flamary, R., Gasso, G., Canu, S.: l p − l q penalty for sparse linear and sparse multiple kernel multitask learning. IEEE Transactions on Neural Networks 22, 1307–1320 (2011)
Samek, W., Binder, A., Kawanabe, M.: Multi-task learning via non-sparse multiple kernel learning. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011, Part I. LNCS, vol. 6854, pp. 335–342. Springer, Heidelberg (2011)
Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A.: l p -norm multiple kernel learning. Journal of Machine Learning Research 12, 953–997 (2011)
Li, C., Georgiopoulos, M., Anagnostopoulos, G.C.: Pareto-Path Multi-Task Multiple Kernel Learning. ArXiv e-prints (April 2014)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)
Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research 6 (2005)
Maurer, A.: The rademacher complexity of linear transformation classes. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 65–78. Springer, Heidelberg (2006)
Maurer, A.: Bounds for linear multi-task learning. Journal of Machine Learning Research 7, 117–139 (2006)
Kakade, S.M., Shalev-Shwartz, S., Tewari, A.: Regularization techniques for learning with matrices. Journal of Machine Learning Research 13, 1865–1890 (2012)
Maurer, A., Pontil, M.: Structured sparsity and generalization. Journal of Machine Learning Research 13, 671–690 (2012)
Pontil, M., Maurer, A.: Excess risk bounds for multitask learning with trace norm regularization. In: Conference on Learning Theory, pp. 55–76 (2013)
Liu, J., Chen, J., Chen, S., Ye, J.: Learning the optimal neighborhood kernel for classification. In: Proceedings of the 21st International Joint Conference on Artifical Intelligence, pp. 1144–1149 (2009)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Grant, M.C., Boyd, S.P.: Graph implementations for nonsmooth convex programs. In: Blondel, V.D., Boyd, S.P., Kimura, H. (eds.) Recent Advances in Learning and Control. LNCIS, vol. 371, pp. 95–110. Springer, Heidelberg (2008), http://stanford.edu/~boyd/graph_dcp.html
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 1.21 (April 2011)
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, C., Georgiopoulos, M., Anagnostopoulos, G.C. (2014). Conic Multi-task Classification. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8725. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44851-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-662-44851-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44850-2
Online ISBN: 978-3-662-44851-9
eBook Packages: Computer ScienceComputer Science (R0)