International Journal of Computer Vision

, Volume 126, Issue 8, pp 875–896 | Cite as

Multi-label Learning with Missing Labels Using Mixed Dependency Graphs

  • Baoyuan Wu
  • Fan Jia
  • Wei Liu
  • Bernard Ghanem
  • Siwei Lyu


This work focuses on the problem of multi-label learning with missing labels (MLML), which aims to label each test instance with multiple class labels given training instances that have an incomplete/partial set of these labels (i.e., some of their labels are missing). The key point to handle missing labels is propagating the label information from the provided labels to missing labels, through a dependency graph that each label of each instance is treated as a node. We build this graph by utilizing different types of label dependencies. Specifically, the instance-level similarity is served as undirected edges to connect the label nodes across different instances and the semantic label hierarchy is used as directed edges to connect different classes. This base graph is referred to as the mixed dependency graph, as it includes both undirected and directed edges. Furthermore, we present another two types of label dependencies to connect the label nodes across different classes. One is the class co-occurrence, which is also encoded as undirected edges. Combining with the above base graph, we obtain a new mixed graph, called mixed graph with co-occurrence (MG-CO). The other is the sparse and low rank decomposition of the whole label matrix, to embed high-order dependencies over all labels. Combining with the base graph, the new mixed graph is called as MG-SL (mixed graph with sparse and low rank decomposition). Based on MG-CO and MG-SL, we further propose two convex transductive formulations of the MLML problem, denoted as MLMG-CO and MLMG-SL respectively. In both formulations, the instance-level similarity is embedded through a quadratic smoothness term, while the semantic label hierarchy is used as a linear constraint. In MLMG-CO, the class co-occurrence is also formulated as a quadratic smoothness term, while the sparse and low rank decomposition is incorporated into MLMG-SL, through two additional matrices (one is assumed as sparse, and the other is assumed as low rank) and an equivalence constraint between the summation of this two matrices and the original label matrix. Interestingly, two important applications, including image annotation and tag based image retrieval, can be jointly handled using our proposed methods. Experimental results on several benchmark datasets show that our methods lead to significant improvements in performance and robustness to missing labels over the state-of-the-art methods.


Multi-label learning Missing labels Mixed dependency graphs Image annotation Image retrieval 

Supplementary material

11263_2018_1085_MOESM1_ESM.pdf (153 kb)
Supplementary material 1 (pdf 153 KB)


  1. Agrawal, R., Gupta, A., Prabhu, Y., & Varma, M. (2013). Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW (pp. 13–24).Google Scholar
  2. Bi, W., & Kwok, J. T. (2011). Multi-label classification on tree-and dag-structured hierarchies. In ICML (pp. 17–24).Google Scholar
  3. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  4. Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.CrossRefzbMATHGoogle Scholar
  5. Bucak, S. S., Jin, R., & Jain, A. K. (2011). Multi-label learning with incomplete class assignments. In CVPR (pp. 2801–2808). New York: IEEE.Google Scholar
  6. Cabral, R. S., De la Torre, F., Costeira, J.P., & Bernardino, A. (2011). Matrix completion for multi-label image classification. In NIPS (pp. 190–198).Google Scholar
  7. Chang, X., Xiang, T., & Hospedales, T. M. (2016). L1 graph based sparse model for label de-noising. In BMVC.Google Scholar
  8. Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In BMVC.Google Scholar
  9. Chen, Z., Chen, M., Weinberger, K.Q., & Zhang, W. (2015). Marginalized denoising for link prediction and multi-label learning. In AAAI.Google Scholar
  10. Chen, G., Song, Y., Wang, F., & Zhang, C. (2008). Semi-supervised multi-label learning by solving a sylvester equation. In SIAM international conference on data mining (pp. 410–419).Google Scholar
  11. Chen, M., Zheng, A., & Weinberger, K. (2013). Fast image tagging. In ICML (pp. 1274–1282).Google Scholar
  12. Chen, C., He, B., Ye, Y., & Yuan, X. (2016). The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Mathematical Programming, 155(1–2), 57–79.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., Li, Y., Neven, H., & Adam, H. (2014). Large-scale object classification using label relation graphs. In ECCV (pp. 48–64). Berlin: Springer.Google Scholar
  14. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR (pp. 248–255). IEEE.Google Scholar
  15. Duygulu, P., Barnard, K., de Freitas, J.F., & Forsyth, D.A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV (pp. 97–112). Berlin: Springer.Google Scholar
  16. Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, PhD thesis, Stanford University.Google Scholar
  17. Fellbaum, C. (1998). WordNet. New York: Wiley Online Library.zbMATHGoogle Scholar
  18. Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.CrossRefGoogle Scholar
  19. Geng, B., Yang, L., Xu, C., & Hua, X. S. (2008). Collaborative learning for image and video annotation. In Proceedings of the 1st ACM international conference on multimedia information retrieval (pp. 443–450). New York: ACM.Google Scholar
  20. Ghadimi, E., Teixeira, A., Shames, I., & Johansson, M. (2015). Optimal parameter selection for the alternating direction method of multipliers (admm): Quadratic problems. IEEE Transactions on Automatic Control, 60(3), 644–658.MathSciNetCrossRefzbMATHGoogle Scholar
  21. Gibaja, E., & Ventura, S. (2014). Multi-label learning: A review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(6), 411–444.Google Scholar
  22. Goldberg, A. B., Zhu, X., Recht, B., Xu, J. M., & Nowak, R. D. (2010). Transduction with matrix completion: Three birds with one stone. In NIPS (pp. 757–765).Google Scholar
  23. Grubinger, M., Clough, P., Müller, H., & Deselaers, T. (2006). The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage (pp. 13–23).Google Scholar
  24. Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV (pp. 309–316).Google Scholar
  25. Kapoor, A., Viswanathan, R., & Jain, P. (2012). Multilabel classification using bayesian compressed sensing. In NIPS (pp. 2654–2662).Google Scholar
  26. Li, X., Zhao, F., & Guo, Y. (2015). Conditional restricted boltzmann machines for multi-label learning with incomplete labels. In AISTATS (pp. 635–643).Google Scholar
  27. Lin, Z., Ding, G., Hu, M., Wang, J., & Ye, X. (2013). Image tag completion via image-specific and tag-specific linear sparse reconstructions. In CVPR (pp. 1618–1625). IEEE.Google Scholar
  28. Li, Y., Wu, B., Ghanem, B., Zhao, Y., Yao, H., & Ji, Q. (2016). Facial action unit recognition under incomplete data based on multi-label learning with missing labels. Pattern Recognition, 60, 890–900.CrossRefGoogle Scholar
  29. Manning, C. D., Raghavan, P., Schütze, H., et al. (2008). Introduction to information retrieval (Vol. 1). Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  30. Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2012). Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2233–2246.CrossRefGoogle Scholar
  31. Raghunathan, A. U., & Di Cairano, S. (2014). Optimal step-size selection in alternating direction method of multipliers for convex quadratic programs and model predictive control,. In Proceedings of symposium on mathematical theory of networks and systems (pp. 807–814).Google Scholar
  32. Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.MathSciNetCrossRefzbMATHGoogle Scholar
  33. Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2005). Learning hierarchical multi-category text classification models. In ICML (pp. 744–751). New York: ACM.Google Scholar
  34. Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2006). Kernel-based learning of hierarchical multilabel classification models. The Journal of Machine Learning Research, 7, 1601–1626.MathSciNetzbMATHGoogle Scholar
  35. Snoek, C. G., Worring, M., Van Gemert, J. C., Geusebroek, J. M., & Smeulders, A. W. (2006). The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th annual ACM international conference on Multimedia (pp. 421–430). New York: ACM.Google Scholar
  36. Sun, Y., Zhang, Y., & Zhou, Z. H. (2010). Multi-label learning with weak label. In AAAI (pp. 593–598).Google Scholar
  37. Sun, H., Wang, J., & Deng, T. (2016). On the global and linear convergence of direct extension of admm for 3-block separable convex minimization models. Journal of Inequalities and Applications, 2016(1), 227.MathSciNetCrossRefzbMATHGoogle Scholar
  38. Tousch, A. M., Herbin, S., & Audibert, J. Y. (2012). Semantic hierarchies for image annotation: A survey. Pattern Recognition, 45(1), 333–345.CrossRefGoogle Scholar
  39. Vasisht, D., Damianou, A., Varma, M., & Kapoor, A. (2014). Active learning for sparse bayesian multilabel classification. In SIGKDD (pp. 472–481). New York: ACM.Google Scholar
  40. Von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 319–326). New York: ACM.Google Scholar
  41. Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.MathSciNetCrossRefGoogle Scholar
  42. Wang, Q., Shen, B., Wang, S., Li, L., & Si, L. (2014). Binary codes embedding for fast image tagging with incomplete labels. In ECCV (pp. 425–439). Berlin: Springer.Google Scholar
  43. Wang, Q., Si, L., & Zhang, D. (2014). Learning to hash with partial tags: Exploring correlation between tags and hashing bits for large scale image retrieval. In ECCV (pp. 378–392). Berlin: Springer.Google Scholar
  44. Weston, J., Chapelle, O., Vapnik, V., Elisseeff, A., & Schölkopf, B. (2002). Kernel dependency estimation. In NIPS (pp. 873–880).Google Scholar
  45. Wu, B., Chen, W., Liu, W., Sun, P., Ghanem, B., & Lyu, S. (2018). Tagging like humans: Diverse and distinct image annotation. In CVPR. IEEE.Google Scholar
  46. Wu, B., Jia, F., Liu, W., & Ghanem, B. (2017). Diverse image annotation. In CVPR (pp. 2559–2567). New York: IEEE.Google Scholar
  47. Wu, B., Liu, Z., Wang, S., Hu, B.G., & Ji, Q. (2014). Multi-label learning with missing labels. In ICPR.Google Scholar
  48. Wu, B., Lyu, S., & Ghanem, B. (2015a). Ml-mg: multi-label learning with missing labels using a mixed graph. In ICCV (pp. 4157–4165).Google Scholar
  49. Wu, B., Lyu, S., & Ghanem, B. (2016). Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In AAAI (pp. 2229–2236).Google Scholar
  50. Wu, L., Jin, R., & Jain, A. K. (2013). Tag completion for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3), 716–727.CrossRefGoogle Scholar
  51. Wu, B., Lyu, S., Hu, B. G., & Ji, Q. (2015b). Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recognition, 48(7), 2279–2289.CrossRefGoogle Scholar
  52. Xu, M., Jin, R., & Zhou, Z.H. (2013). Speedup matrix completion with side information: Application to multi-label learning. In NIPS (pp. 2301–2309).Google Scholar
  53. Xu, C., Tao, D., & Xu, C. (2016). Robust extreme multi-label learning. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 13–17).Google Scholar
  54. Yu, H. F., Jain, P., Kar, P., & Dhillon, I. (2014). Large-scale multi-label learning with missing labels. In ICML (pp. 593–601).Google Scholar
  55. Yu, G., Zhu, H., & Domeniconi, C. (2015). Predicting protein functions using incomplete hierarchical labels. BMC Bioinformatics, 16(1), 1.CrossRefGoogle Scholar
  56. Zehfuss, G. (1858). Über eine gewisse determinante. Zeitschrift für Mathematik und Physik, 3, 298–301.Google Scholar
  57. Zhang, M. L., & Zhou, Z. H. (2007). Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.CrossRefzbMATHGoogle Scholar
  58. Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.CrossRefGoogle Scholar
  59. Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012). Low-rank sparse learning for robust visual tracking. In ECCV (pp. 470–484). Berlin: Springer.Google Scholar
  60. Zhang, Y., & Zhou, Z. H. (2010). Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data, 4(3), 14.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Tencent AI LabShenzhenChina
  2. 2.Visual Computing CenterKing Abdullah University of Science and TechnologyThuwalSaudi Arabia
  3. 3.Computer Science DepartmentUniversity at Albany, State University of New YorkAlbanyUSA

Personalised recommendations