Skip to main content
Log in

Multi-label Learning with Missing Labels Using Mixed Dependency Graphs

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This work focuses on the problem of multi-label learning with missing labels (MLML), which aims to label each test instance with multiple class labels given training instances that have an incomplete/partial set of these labels (i.e., some of their labels are missing). The key point to handle missing labels is propagating the label information from the provided labels to missing labels, through a dependency graph that each label of each instance is treated as a node. We build this graph by utilizing different types of label dependencies. Specifically, the instance-level similarity is served as undirected edges to connect the label nodes across different instances and the semantic label hierarchy is used as directed edges to connect different classes. This base graph is referred to as the mixed dependency graph, as it includes both undirected and directed edges. Furthermore, we present another two types of label dependencies to connect the label nodes across different classes. One is the class co-occurrence, which is also encoded as undirected edges. Combining with the above base graph, we obtain a new mixed graph, called mixed graph with co-occurrence (MG-CO). The other is the sparse and low rank decomposition of the whole label matrix, to embed high-order dependencies over all labels. Combining with the base graph, the new mixed graph is called as MG-SL (mixed graph with sparse and low rank decomposition). Based on MG-CO and MG-SL, we further propose two convex transductive formulations of the MLML problem, denoted as MLMG-CO and MLMG-SL respectively. In both formulations, the instance-level similarity is embedded through a quadratic smoothness term, while the semantic label hierarchy is used as a linear constraint. In MLMG-CO, the class co-occurrence is also formulated as a quadratic smoothness term, while the sparse and low rank decomposition is incorporated into MLMG-SL, through two additional matrices (one is assumed as sparse, and the other is assumed as low rank) and an equivalence constraint between the summation of this two matrices and the original label matrix. Interestingly, two important applications, including image annotation and tag based image retrieval, can be jointly handled using our proposed methods. Experimental results on several benchmark datasets show that our methods lead to significant improvements in performance and robustness to missing labels over the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://lear.inrialpes.fr/people/guillaumin/data.php.

  2. http://www.vlfeat.org/matconvnet/pretrained/.

  3. http://mulan.sourceforge.net/datasets-mlc.html.

  4. The complete semantic hierarchies and the complete label matrices of all four datasets can be downloaded from “https://sites.google.com/site/baoyuanwu2015/”.

References

  • Agrawal, R., Gupta, A., Prabhu, Y., & Varma, M. (2013). Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW (pp. 13–24).

  • Bi, W., & Kwok, J. T. (2011). Multi-label classification on tree-and dag-structured hierarchies. In ICML (pp. 17–24).

  • Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.

    Article  MATH  Google Scholar 

  • Bucak, S. S., Jin, R., & Jain, A. K. (2011). Multi-label learning with incomplete class assignments. In CVPR (pp. 2801–2808). New York: IEEE.

  • Cabral, R. S., De la Torre, F., Costeira, J.P., & Bernardino, A. (2011). Matrix completion for multi-label image classification. In NIPS (pp. 190–198).

  • Chang, X., Xiang, T., & Hospedales, T. M. (2016). L1 graph based sparse model for label de-noising. In BMVC.

  • Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In BMVC.

  • Chen, Z., Chen, M., Weinberger, K.Q., & Zhang, W. (2015). Marginalized denoising for link prediction and multi-label learning. In AAAI.

  • Chen, G., Song, Y., Wang, F., & Zhang, C. (2008). Semi-supervised multi-label learning by solving a sylvester equation. In SIAM international conference on data mining (pp. 410–419).

  • Chen, M., Zheng, A., & Weinberger, K. (2013). Fast image tagging. In ICML (pp. 1274–1282).

  • Chen, C., He, B., Ye, Y., & Yuan, X. (2016). The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Mathematical Programming, 155(1–2), 57–79.

    Article  MathSciNet  MATH  Google Scholar 

  • Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., Li, Y., Neven, H., & Adam, H. (2014). Large-scale object classification using label relation graphs. In ECCV (pp. 48–64). Berlin: Springer.

  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR (pp. 248–255). IEEE.

  • Duygulu, P., Barnard, K., de Freitas, J.F., & Forsyth, D.A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV (pp. 97–112). Berlin: Springer.

  • Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, PhD thesis, Stanford University.

  • Fellbaum, C. (1998). WordNet. New York: Wiley Online Library.

    MATH  Google Scholar 

  • Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.

    Article  Google Scholar 

  • Geng, B., Yang, L., Xu, C., & Hua, X. S. (2008). Collaborative learning for image and video annotation. In Proceedings of the 1st ACM international conference on multimedia information retrieval (pp. 443–450). New York: ACM.

  • Ghadimi, E., Teixeira, A., Shames, I., & Johansson, M. (2015). Optimal parameter selection for the alternating direction method of multipliers (admm): Quadratic problems. IEEE Transactions on Automatic Control, 60(3), 644–658.

    Article  MathSciNet  MATH  Google Scholar 

  • Gibaja, E., & Ventura, S. (2014). Multi-label learning: A review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(6), 411–444.

    Google Scholar 

  • Goldberg, A. B., Zhu, X., Recht, B., Xu, J. M., & Nowak, R. D. (2010). Transduction with matrix completion: Three birds with one stone. In NIPS (pp. 757–765).

  • Grubinger, M., Clough, P., Müller, H., & Deselaers, T. (2006). The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage (pp. 13–23).

  • Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV (pp. 309–316).

  • Kapoor, A., Viswanathan, R., & Jain, P. (2012). Multilabel classification using bayesian compressed sensing. In NIPS (pp. 2654–2662).

  • Li, X., Zhao, F., & Guo, Y. (2015). Conditional restricted boltzmann machines for multi-label learning with incomplete labels. In AISTATS (pp. 635–643).

  • Lin, Z., Ding, G., Hu, M., Wang, J., & Ye, X. (2013). Image tag completion via image-specific and tag-specific linear sparse reconstructions. In CVPR (pp. 1618–1625). IEEE.

  • Li, Y., Wu, B., Ghanem, B., Zhao, Y., Yao, H., & Ji, Q. (2016). Facial action unit recognition under incomplete data based on multi-label learning with missing labels. Pattern Recognition, 60, 890–900.

    Article  Google Scholar 

  • Manning, C. D., Raghavan, P., Schütze, H., et al. (2008). Introduction to information retrieval (Vol. 1). Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2012). Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2233–2246.

    Article  Google Scholar 

  • Raghunathan, A. U., & Di Cairano, S. (2014). Optimal step-size selection in alternating direction method of multipliers for convex quadratic programs and model predictive control,. In Proceedings of symposium on mathematical theory of networks and systems (pp. 807–814).

  • Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.

    Article  MathSciNet  MATH  Google Scholar 

  • Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2005). Learning hierarchical multi-category text classification models. In ICML (pp. 744–751). New York: ACM.

  • Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2006). Kernel-based learning of hierarchical multilabel classification models. The Journal of Machine Learning Research, 7, 1601–1626.

    MathSciNet  MATH  Google Scholar 

  • Snoek, C. G., Worring, M., Van Gemert, J. C., Geusebroek, J. M., & Smeulders, A. W. (2006). The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th annual ACM international conference on Multimedia (pp. 421–430). New York: ACM.

  • Sun, Y., Zhang, Y., & Zhou, Z. H. (2010). Multi-label learning with weak label. In AAAI (pp. 593–598).

  • Sun, H., Wang, J., & Deng, T. (2016). On the global and linear convergence of direct extension of admm for 3-block separable convex minimization models. Journal of Inequalities and Applications, 2016(1), 227.

    Article  MathSciNet  MATH  Google Scholar 

  • Tousch, A. M., Herbin, S., & Audibert, J. Y. (2012). Semantic hierarchies for image annotation: A survey. Pattern Recognition, 45(1), 333–345.

    Article  Google Scholar 

  • Vasisht, D., Damianou, A., Varma, M., & Kapoor, A. (2014). Active learning for sparse bayesian multilabel classification. In SIGKDD (pp. 472–481). New York: ACM.

  • Von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 319–326). New York: ACM.

  • Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.

    Article  MathSciNet  Google Scholar 

  • Wang, Q., Shen, B., Wang, S., Li, L., & Si, L. (2014). Binary codes embedding for fast image tagging with incomplete labels. In ECCV (pp. 425–439). Berlin: Springer.

  • Wang, Q., Si, L., & Zhang, D. (2014). Learning to hash with partial tags: Exploring correlation between tags and hashing bits for large scale image retrieval. In ECCV (pp. 378–392). Berlin: Springer.

  • Weston, J., Chapelle, O., Vapnik, V., Elisseeff, A., & Schölkopf, B. (2002). Kernel dependency estimation. In NIPS (pp. 873–880).

  • Wu, B., Chen, W., Liu, W., Sun, P., Ghanem, B., & Lyu, S. (2018). Tagging like humans: Diverse and distinct image annotation. In CVPR. IEEE.

  • Wu, B., Jia, F., Liu, W., & Ghanem, B. (2017). Diverse image annotation. In CVPR (pp. 2559–2567). New York: IEEE.

  • Wu, B., Liu, Z., Wang, S., Hu, B.G., & Ji, Q. (2014). Multi-label learning with missing labels. In ICPR.

  • Wu, B., Lyu, S., & Ghanem, B. (2015a). Ml-mg: multi-label learning with missing labels using a mixed graph. In ICCV (pp. 4157–4165).

  • Wu, B., Lyu, S., & Ghanem, B. (2016). Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In AAAI (pp. 2229–2236).

  • Wu, L., Jin, R., & Jain, A. K. (2013). Tag completion for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3), 716–727.

    Article  Google Scholar 

  • Wu, B., Lyu, S., Hu, B. G., & Ji, Q. (2015b). Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recognition, 48(7), 2279–2289.

    Article  Google Scholar 

  • Xu, M., Jin, R., & Zhou, Z.H. (2013). Speedup matrix completion with side information: Application to multi-label learning. In NIPS (pp. 2301–2309).

  • Xu, C., Tao, D., & Xu, C. (2016). Robust extreme multi-label learning. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 13–17).

  • Yu, H. F., Jain, P., Kar, P., & Dhillon, I. (2014). Large-scale multi-label learning with missing labels. In ICML (pp. 593–601).

  • Yu, G., Zhu, H., & Domeniconi, C. (2015). Predicting protein functions using incomplete hierarchical labels. BMC Bioinformatics, 16(1), 1.

    Article  Google Scholar 

  • Zehfuss, G. (1858). Über eine gewisse determinante. Zeitschrift für Mathematik und Physik, 3, 298–301.

    Google Scholar 

  • Zhang, M. L., & Zhou, Z. H. (2007). Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.

    Article  MATH  Google Scholar 

  • Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.

    Article  Google Scholar 

  • Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012). Low-rank sparse learning for robust visual tracking. In ECCV (pp. 470–484). Berlin: Springer.

  • Zhang, Y., & Zhou, Z. H. (2010). Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data, 4(3), 14.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baoyuan Wu.

Additional information

Communicated by Svetlana Lazebnik.

This work is supported by Tencent AI Lab. The participation of Bernard Ghanem is supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research. The participation of Siwei Lyu is partially supported by National Science Foundation National Robotics Initiative (NRI) Grant (IIS-1537257).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 153 KB)

Convexity Proof

Convexity Proof

For clarity and continuity, we rewrite the continuous optimization problem of MLMG-CO here,

$$\begin{aligned} \min _\mathbf {Z}&f_1(\mathbf {Z}) = -\mathrm {tr}( \overline{\mathbf {Y}}^\top \mathbf {Z}) + \beta \mathrm {tr}(\mathbf {Z}\mathbf {L}_\mathbf {X}\mathbf {Z}^\top ) + \gamma \mathrm {tr}(\mathbf {Z}^\top \mathbf {L}_{\mathbf {C}} \mathbf {Z}),\nonumber \\ \text {s.t.}&\mathbf {Z}\in [0, 1]^{m \times n}, \quad \varvec{\varPhi }^\top \mathbf {Z}\ge 0. \end{aligned}$$
(28)

Firstly we introduce the following vector variables:

$$\begin{aligned}&\mathbf {z}= \text {vec}(\mathbf {Z}) = [\mathbf {Z}_{11}, \ldots , \mathbf {Z}_{m1}, \ldots , \mathbf {Z}_{mn}]^\top \in \{ -1, +1 \}^{mn \times 1}, \nonumber \\&\overline{\mathbf {y}} = \text {vec}(\overline{\mathbf {Y}})= [\overline{\mathbf {Y}}_{11}, \ldots , \overline{\mathbf {Y}}_{m1}, \ldots , \overline{\mathbf {Y}}_{mn}]^\top \in \{ -1, 0, +1 \}^{mn}, \nonumber \\&\mathbf {W}= \beta \cdot \mathbf {W}_\mathbf {X}^\top \otimes \mathbf {I}_m + \gamma \cdot \mathbf {I}_n \otimes \mathbf {W}_{\mathbf {C}}\in \mathbb {R}^{mn \times mn}, \nonumber \\&\mathbf {L} = \beta \cdot \mathbf {L}_\mathbf {X}^\top \otimes \mathbf {I}_m + \gamma \cdot \mathbf {I}_n \otimes \mathbf {L}_{\mathbf {C}} \in \mathbb {R}^{mn \times mn}, \nonumber \\&\overline{\varvec{\varPhi }} = \varvec{\varPhi } \otimes \mathbf {I}_n, \end{aligned}$$

where \(\otimes \) indicates the Kronecker product (Zehfuss 1858). Then Problem (28) can be transformed to its equivalent vector based formulation, as follows:

$$\begin{aligned} \arg \min _\mathbf {z}&\quad f_2(\mathbf {z}) = -\overline{\mathbf {y}}^\top \mathbf {z}+ \mathbf {z}^\top \mathbf {L} \mathbf {z}, \nonumber \\ \text {s.t.}&\quad \mathbf {z}\in [0, 1]^{mn \times 1}, \quad \overline{\varvec{\varPhi }}^\top \mathbf {z}\ge 0. \end{aligned}$$
(29)

Lemma 1

\(\mathbf {L}\) is positive semi-definite (PSD).

Proof

Given two square matrix \(\mathbf {A} \in \mathbb {R}^{n_1 \times n_1}\) and \(\mathbf {B} \in \mathbb {R}^{n_2 \times n_2}\), their eigenvalues are denoted as \(\lambda _1,\ldots ,\lambda _{n_1}\) and \(\mu _1,\ldots ,\mu _{n_2}\). According to the property of Kronecker product, the eigenvalues of \(\mathbf {A} \otimes \mathbf B \) are \(\lambda _i \mu _j, i=1,\ldots ,n_1; j = 1,\ldots ,n_2\). \(\mathbf {L}_\mathbf {X}^\top \) is PSD and \(\mathbf {I}_m\) is positive definite (PD). Obviously all eigenvalues of \(\mathbf {L}_\mathbf {X}^\top \otimes \mathbf {I}_m\) are non-zero values, so \(\mathbf {L}_\mathbf {X}^\top \otimes \mathbf {I}_m\) is a PSD matrix. Similarly we can obtain that \(\mathbf {I}_n \otimes \mathbf {L}_{\mathbf {C}}\) is also PSD. Finally, as \(\mathbf {L}\) is the positive weighted linear combination of two PSD matrices, it is easy to conclude that \(\mathbf {L}\) is a PSD matrix. \(\square \)

Proposition 1

Problem (28) is convex.

Proof

The Hessian of the objective function in (29) with respect to \(\mathbf {z}\) is \(\mathbf {L}\), which has been proven to be a PSD matrix in Lemma 1. Thus, the objective function (29) is a convex function in \(\mathbf {z}\). The box and linear inequality constraints lead to a convex feasible solution space (that satisfies Slater’s condition), so it is easy to conclude that Problem (29) is a convex optimization problem. Finally, as Problems (28) and (29) are equivalent, thus Problem (28) is a convex problem. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, B., Jia, F., Liu, W. et al. Multi-label Learning with Missing Labels Using Mixed Dependency Graphs. Int J Comput Vis 126, 875–896 (2018). https://doi.org/10.1007/s11263-018-1085-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-018-1085-3

Keywords

Navigation