Multi-label Learning with Missing Labels Using Mixed Dependency Graphs

Wu, Baoyuan; Jia, Fan; Liu, Wei; Ghanem, Bernard; Lyu, Siwei

doi:10.1007/s11263-018-1085-3

Multi-label Learning with Missing Labels Using Mixed Dependency Graphs

Published: 06 April 2018

Volume 126, pages 875–896, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

2046 Accesses
62 Citations
Explore all metrics

Abstract

This work focuses on the problem of multi-label learning with missing labels (MLML), which aims to label each test instance with multiple class labels given training instances that have an incomplete/partial set of these labels (i.e., some of their labels are missing). The key point to handle missing labels is propagating the label information from the provided labels to missing labels, through a dependency graph that each label of each instance is treated as a node. We build this graph by utilizing different types of label dependencies. Specifically, the instance-level similarity is served as undirected edges to connect the label nodes across different instances and the semantic label hierarchy is used as directed edges to connect different classes. This base graph is referred to as the mixed dependency graph, as it includes both undirected and directed edges. Furthermore, we present another two types of label dependencies to connect the label nodes across different classes. One is the class co-occurrence, which is also encoded as undirected edges. Combining with the above base graph, we obtain a new mixed graph, called mixed graph with co-occurrence (MG-CO). The other is the sparse and low rank decomposition of the whole label matrix, to embed high-order dependencies over all labels. Combining with the base graph, the new mixed graph is called as MG-SL (mixed graph with sparse and low rank decomposition). Based on MG-CO and MG-SL, we further propose two convex transductive formulations of the MLML problem, denoted as MLMG-CO and MLMG-SL respectively. In both formulations, the instance-level similarity is embedded through a quadratic smoothness term, while the semantic label hierarchy is used as a linear constraint. In MLMG-CO, the class co-occurrence is also formulated as a quadratic smoothness term, while the sparse and low rank decomposition is incorporated into MLMG-SL, through two additional matrices (one is assumed as sparse, and the other is assumed as low rank) and an equivalence constraint between the summation of this two matrices and the original label matrix. Interestingly, two important applications, including image annotation and tag based image retrieval, can be jointly handled using our proposed methods. Experimental results on several benchmark datasets show that our methods lead to significant improvements in performance and robustness to missing labels over the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Learning from positive and unlabeled data: a survey

Article 02 April 2020

Learning with Noisy Correspondence

Article 13 April 2024

A systematic review for class-imbalance in semi-supervised learning

Article 04 September 2023

Notes

http://lear.inrialpes.fr/people/guillaumin/data.php.
http://www.vlfeat.org/matconvnet/pretrained/.
http://mulan.sourceforge.net/datasets-mlc.html.
The complete semantic hierarchies and the complete label matrices of all four datasets can be downloaded from “https://sites.google.com/site/baoyuanwu2015/”.

References

Agrawal, R., Gupta, A., Prabhu, Y., & Varma, M. (2013). Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW (pp. 13–24).
Bi, W., & Kwok, J. T. (2011). Multi-label classification on tree-and dag-structured hierarchies. In ICML (pp. 17–24).
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.
Article MATH Google Scholar
Bucak, S. S., Jin, R., & Jain, A. K. (2011). Multi-label learning with incomplete class assignments. In CVPR (pp. 2801–2808). New York: IEEE.
Cabral, R. S., De la Torre, F., Costeira, J.P., & Bernardino, A. (2011). Matrix completion for multi-label image classification. In NIPS (pp. 190–198).
Chang, X., Xiang, T., & Hospedales, T. M. (2016). L1 graph based sparse model for label de-noising. In BMVC.
Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In BMVC.
Chen, Z., Chen, M., Weinberger, K.Q., & Zhang, W. (2015). Marginalized denoising for link prediction and multi-label learning. In AAAI.
Chen, G., Song, Y., Wang, F., & Zhang, C. (2008). Semi-supervised multi-label learning by solving a sylvester equation. In SIAM international conference on data mining (pp. 410–419).
Chen, M., Zheng, A., & Weinberger, K. (2013). Fast image tagging. In ICML (pp. 1274–1282).
Chen, C., He, B., Ye, Y., & Yuan, X. (2016). The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Mathematical Programming, 155(1–2), 57–79.
Article MathSciNet MATH Google Scholar
Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., Li, Y., Neven, H., & Adam, H. (2014). Large-scale object classification using label relation graphs. In ECCV (pp. 48–64). Berlin: Springer.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR (pp. 248–255). IEEE.
Duygulu, P., Barnard, K., de Freitas, J.F., & Forsyth, D.A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV (pp. 97–112). Berlin: Springer.
Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, PhD thesis, Stanford University.
Fellbaum, C. (1998). WordNet. New York: Wiley Online Library.
MATH Google Scholar
Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.
Article Google Scholar
Geng, B., Yang, L., Xu, C., & Hua, X. S. (2008). Collaborative learning for image and video annotation. In Proceedings of the 1st ACM international conference on multimedia information retrieval (pp. 443–450). New York: ACM.
Ghadimi, E., Teixeira, A., Shames, I., & Johansson, M. (2015). Optimal parameter selection for the alternating direction method of multipliers (admm): Quadratic problems. IEEE Transactions on Automatic Control, 60(3), 644–658.
Article MathSciNet MATH Google Scholar
Gibaja, E., & Ventura, S. (2014). Multi-label learning: A review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(6), 411–444.
Google Scholar
Goldberg, A. B., Zhu, X., Recht, B., Xu, J. M., & Nowak, R. D. (2010). Transduction with matrix completion: Three birds with one stone. In NIPS (pp. 757–765).
Grubinger, M., Clough, P., Müller, H., & Deselaers, T. (2006). The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage (pp. 13–23).
Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV (pp. 309–316).
Kapoor, A., Viswanathan, R., & Jain, P. (2012). Multilabel classification using bayesian compressed sensing. In NIPS (pp. 2654–2662).
Li, X., Zhao, F., & Guo, Y. (2015). Conditional restricted boltzmann machines for multi-label learning with incomplete labels. In AISTATS (pp. 635–643).
Lin, Z., Ding, G., Hu, M., Wang, J., & Ye, X. (2013). Image tag completion via image-specific and tag-specific linear sparse reconstructions. In CVPR (pp. 1618–1625). IEEE.
Li, Y., Wu, B., Ghanem, B., Zhao, Y., Yao, H., & Ji, Q. (2016). Facial action unit recognition under incomplete data based on multi-label learning with missing labels. Pattern Recognition, 60, 890–900.
Article Google Scholar
Manning, C. D., Raghavan, P., Schütze, H., et al. (2008). Introduction to information retrieval (Vol. 1). Cambridge: Cambridge University Press.
Book MATH Google Scholar
Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2012). Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2233–2246.
Article Google Scholar
Raghunathan, A. U., & Di Cairano, S. (2014). Optimal step-size selection in alternating direction method of multipliers for convex quadratic programs and model predictive control,. In Proceedings of symposium on mathematical theory of networks and systems (pp. 807–814).
Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.
Article MathSciNet MATH Google Scholar
Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2005). Learning hierarchical multi-category text classification models. In ICML (pp. 744–751). New York: ACM.
Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2006). Kernel-based learning of hierarchical multilabel classification models. The Journal of Machine Learning Research, 7, 1601–1626.
MathSciNet MATH Google Scholar
Snoek, C. G., Worring, M., Van Gemert, J. C., Geusebroek, J. M., & Smeulders, A. W. (2006). The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th annual ACM international conference on Multimedia (pp. 421–430). New York: ACM.
Sun, Y., Zhang, Y., & Zhou, Z. H. (2010). Multi-label learning with weak label. In AAAI (pp. 593–598).
Sun, H., Wang, J., & Deng, T. (2016). On the global and linear convergence of direct extension of admm for 3-block separable convex minimization models. Journal of Inequalities and Applications, 2016(1), 227.
Article MathSciNet MATH Google Scholar
Tousch, A. M., Herbin, S., & Audibert, J. Y. (2012). Semantic hierarchies for image annotation: A survey. Pattern Recognition, 45(1), 333–345.
Article Google Scholar
Vasisht, D., Damianou, A., Varma, M., & Kapoor, A. (2014). Active learning for sparse bayesian multilabel classification. In SIGKDD (pp. 472–481). New York: ACM.
Von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 319–326). New York: ACM.
Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
Article MathSciNet Google Scholar
Wang, Q., Shen, B., Wang, S., Li, L., & Si, L. (2014). Binary codes embedding for fast image tagging with incomplete labels. In ECCV (pp. 425–439). Berlin: Springer.
Wang, Q., Si, L., & Zhang, D. (2014). Learning to hash with partial tags: Exploring correlation between tags and hashing bits for large scale image retrieval. In ECCV (pp. 378–392). Berlin: Springer.
Weston, J., Chapelle, O., Vapnik, V., Elisseeff, A., & Schölkopf, B. (2002). Kernel dependency estimation. In NIPS (pp. 873–880).
Wu, B., Chen, W., Liu, W., Sun, P., Ghanem, B., & Lyu, S. (2018). Tagging like humans: Diverse and distinct image annotation. In CVPR. IEEE.
Wu, B., Jia, F., Liu, W., & Ghanem, B. (2017). Diverse image annotation. In CVPR (pp. 2559–2567). New York: IEEE.
Wu, B., Liu, Z., Wang, S., Hu, B.G., & Ji, Q. (2014). Multi-label learning with missing labels. In ICPR.
Wu, B., Lyu, S., & Ghanem, B. (2015a). Ml-mg: multi-label learning with missing labels using a mixed graph. In ICCV (pp. 4157–4165).
Wu, B., Lyu, S., & Ghanem, B. (2016). Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In AAAI (pp. 2229–2236).
Wu, L., Jin, R., & Jain, A. K. (2013). Tag completion for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3), 716–727.
Article Google Scholar
Wu, B., Lyu, S., Hu, B. G., & Ji, Q. (2015b). Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recognition, 48(7), 2279–2289.
Article Google Scholar
Xu, M., Jin, R., & Zhou, Z.H. (2013). Speedup matrix completion with side information: Application to multi-label learning. In NIPS (pp. 2301–2309).
Xu, C., Tao, D., & Xu, C. (2016). Robust extreme multi-label learning. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 13–17).
Yu, H. F., Jain, P., Kar, P., & Dhillon, I. (2014). Large-scale multi-label learning with missing labels. In ICML (pp. 593–601).
Yu, G., Zhu, H., & Domeniconi, C. (2015). Predicting protein functions using incomplete hierarchical labels. BMC Bioinformatics, 16(1), 1.
Article Google Scholar
Zehfuss, G. (1858). Über eine gewisse determinante. Zeitschrift für Mathematik und Physik, 3, 298–301.
Google Scholar
Zhang, M. L., & Zhou, Z. H. (2007). Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.
Article MATH Google Scholar
Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.
Article Google Scholar
Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012). Low-rank sparse learning for robust visual tracking. In ECCV (pp. 470–484). Berlin: Springer.
Zhang, Y., & Zhou, Z. H. (2010). Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data, 4(3), 14.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Tencent AI Lab, Shenzhen, 518000, China
Baoyuan Wu, Fan Jia & Wei Liu
Visual Computing Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
Bernard Ghanem
Computer Science Department, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY, 12222, USA
Siwei Lyu

Authors

Baoyuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Fan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Ghanem
View author publications
You can also search for this author in PubMed Google Scholar
Siwei Lyu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baoyuan Wu.

Additional information

Communicated by Svetlana Lazebnik.

This work is supported by Tencent AI Lab. The participation of Bernard Ghanem is supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research. The participation of Siwei Lyu is partially supported by National Science Foundation National Robotics Initiative (NRI) Grant (IIS-1537257).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 153 KB)

Convexity Proof

For clarity and continuity, we rewrite the continuous optimization problem of MLMG-CO here,

$$\begin{aligned} \min _\mathbf {Z}&f_1(\mathbf {Z}) = -\mathrm {tr}( \overline{\mathbf {Y}}^\top \mathbf {Z}) + \beta \mathrm {tr}(\mathbf {Z}\mathbf {L}_\mathbf {X}\mathbf {Z}^\top ) + \gamma \mathrm {tr}(\mathbf {Z}^\top \mathbf {L}_{\mathbf {C}} \mathbf {Z}),\nonumber \\ \text {s.t.}&\mathbf {Z}\in [0, 1]^{m \times n}, \quad \varvec{\varPhi }^\top \mathbf {Z}\ge 0. \end{aligned}$$

(28)

Firstly we introduce the following vector variables:

$$\begin{aligned}&\mathbf {z}= \text {vec}(\mathbf {Z}) = [\mathbf {Z}_{11}, \ldots , \mathbf {Z}_{m1}, \ldots , \mathbf {Z}_{mn}]^\top \in \{ -1, +1 \}^{mn \times 1}, \nonumber \\&\overline{\mathbf {y}} = \text {vec}(\overline{\mathbf {Y}})= [\overline{\mathbf {Y}}_{11}, \ldots , \overline{\mathbf {Y}}_{m1}, \ldots , \overline{\mathbf {Y}}_{mn}]^\top \in \{ -1, 0, +1 \}^{mn}, \nonumber \\&\mathbf {W}= \beta \cdot \mathbf {W}_\mathbf {X}^\top \otimes \mathbf {I}_m + \gamma \cdot \mathbf {I}_n \otimes \mathbf {W}_{\mathbf {C}}\in \mathbb {R}^{mn \times mn}, \nonumber \\&\mathbf {L} = \beta \cdot \mathbf {L}_\mathbf {X}^\top \otimes \mathbf {I}_m + \gamma \cdot \mathbf {I}_n \otimes \mathbf {L}_{\mathbf {C}} \in \mathbb {R}^{mn \times mn}, \nonumber \\&\overline{\varvec{\varPhi }} = \varvec{\varPhi } \otimes \mathbf {I}_n, \end{aligned}$$

where $\otimes $ indicates the Kronecker product (Zehfuss 1858). Then Problem (28) can be transformed to its equivalent vector based formulation, as follows:

$$\begin{aligned} \arg \min _\mathbf {z}&\quad f_2(\mathbf {z}) = -\overline{\mathbf {y}}^\top \mathbf {z}+ \mathbf {z}^\top \mathbf {L} \mathbf {z}, \nonumber \\ \text {s.t.}&\quad \mathbf {z}\in [0, 1]^{mn \times 1}, \quad \overline{\varvec{\varPhi }}^\top \mathbf {z}\ge 0. \end{aligned}$$

(29)

Lemma 1

$\mathbf {L}$ is positive semi-definite (PSD).

Proof

Given two square matrix $\mathbf {A} \in \mathbb {R}^{n_1 \times n_1}$ and $\mathbf {B} \in \mathbb {R}^{n_2 \times n_2}$, their eigenvalues are denoted as $\lambda _1,\ldots ,\lambda _{n_1}$ and $\mu _1,\ldots ,\mu _{n_2}$. According to the property of Kronecker product, the eigenvalues of $\mathbf {A} \otimes \mathbf B $ are $\lambda _i \mu _j, i=1,\ldots ,n_1; j = 1,\ldots ,n_2$. $\mathbf {L}_\mathbf {X}^\top $ is PSD and $\mathbf {I}_m$ is positive definite (PD). Obviously all eigenvalues of $\mathbf {L}_\mathbf {X}^\top \otimes \mathbf {I}_m$ are non-zero values, so $\mathbf {L}_\mathbf {X}^\top \otimes \mathbf {I}_m$ is a PSD matrix. Similarly we can obtain that $\mathbf {I}_n \otimes \mathbf {L}_{\mathbf {C}}$ is also PSD. Finally, as $\mathbf {L}$ is the positive weighted linear combination of two PSD matrices, it is easy to conclude that $\mathbf {L}$ is a PSD matrix. $\square $

Proposition 1

Problem (28) is convex.

Proof

The Hessian of the objective function in (29) with respect to $\mathbf {z}$ is $\mathbf {L}$, which has been proven to be a PSD matrix in Lemma 1. Thus, the objective function (29) is a convex function in $\mathbf {z}$. The box and linear inequality constraints lead to a convex feasible solution space (that satisfies Slater’s condition), so it is easy to conclude that Problem (29) is a convex optimization problem. Finally, as Problems (28) and (29) are equivalent, thus Problem (28) is a convex problem. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, B., Jia, F., Liu, W. et al. Multi-label Learning with Missing Labels Using Mixed Dependency Graphs. Int J Comput Vis 126, 875–896 (2018). https://doi.org/10.1007/s11263-018-1085-3

Download citation

Received: 19 July 2017
Accepted: 28 March 2018
Published: 06 April 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11263-018-1085-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-label Learning with Missing Labels Using Mixed Dependency Graphs

Abstract

Access this article

Similar content being viewed by others

Learning from positive and unlabeled data: a survey

Learning with Noisy Correspondence

A systematic review for class-imbalance in semi-supervised learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 153 KB)

Convexity Proof

Lemma 1

Proof

Proposition 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-label Learning with Missing Labels Using Mixed Dependency Graphs

Abstract

Access this article

Similar content being viewed by others

Learning from positive and unlabeled data: a survey

Learning with Noisy Correspondence

A systematic review for class-imbalance in semi-supervised learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 153 KB)

Convexity Proof

Convexity Proof

Lemma 1

Proof

Proposition 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation