# Multi-label Learning with Missing Labels Using Mixed Dependency Graphs

- 522 Downloads
- 1 Citations

## Abstract

This work focuses on the problem of multi-label learning with missing labels (MLML), which aims to label each test instance with multiple class labels given training instances that have an incomplete/partial set of these labels (i.e., some of their labels are missing). The key point to handle missing labels is propagating the label information from the provided labels to missing labels, through a dependency graph that each label of each instance is treated as a node. We build this graph by utilizing different types of label dependencies. Specifically, the instance-level similarity is served as undirected edges to connect the label nodes across different instances and the semantic label hierarchy is used as directed edges to connect different classes. This base graph is referred to as the mixed dependency graph, as it includes both undirected and directed edges. Furthermore, we present another two types of label dependencies to connect the label nodes across different classes. One is the class co-occurrence, which is also encoded as undirected edges. Combining with the above base graph, we obtain a new mixed graph, called mixed graph with co-occurrence (MG-CO). The other is the sparse and low rank decomposition of the whole label matrix, to embed high-order dependencies over all labels. Combining with the base graph, the new mixed graph is called as MG-SL (mixed graph with sparse and low rank decomposition). Based on MG-CO and MG-SL, we further propose two convex transductive formulations of the MLML problem, denoted as MLMG-CO and MLMG-SL respectively. In both formulations, the instance-level similarity is embedded through a quadratic smoothness term, while the semantic label hierarchy is used as a linear constraint. In MLMG-CO, the class co-occurrence is also formulated as a quadratic smoothness term, while the sparse and low rank decomposition is incorporated into MLMG-SL, through two additional matrices (one is assumed as sparse, and the other is assumed as low rank) and an equivalence constraint between the summation of this two matrices and the original label matrix. Interestingly, two important applications, including image annotation and tag based image retrieval, can be jointly handled using our proposed methods. Experimental results on several benchmark datasets show that our methods lead to significant improvements in performance and robustness to missing labels over the state-of-the-art methods.

## Keywords

Multi-label learning Missing labels Mixed dependency graphs Image annotation Image retrieval## Supplementary material

## References

- Agrawal, R., Gupta, A., Prabhu, Y., & Varma, M. (2013). Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In
*WWW*(pp. 13–24).Google Scholar - Bi, W., & Kwok, J. T. (2011). Multi-label classification on tree-and dag-structured hierarchies. In
*ICML*(pp. 17–24).Google Scholar - Boyd, S., & Vandenberghe, L. (2004).
*Convex optimization*. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar - Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers.
*Foundations and Trends in Machine Learning*,*3*(1), 1–122.CrossRefzbMATHGoogle Scholar - Bucak, S. S., Jin, R., & Jain, A. K. (2011). Multi-label learning with incomplete class assignments. In
*CVPR*(pp. 2801–2808). New York: IEEE.Google Scholar - Cabral, R. S., De la Torre, F., Costeira, J.P., & Bernardino, A. (2011). Matrix completion for multi-label image classification. In
*NIPS*(pp. 190–198).Google Scholar - Chang, X., Xiang, T., & Hospedales, T. M. (2016). L1 graph based sparse model for label de-noising. In
*BMVC*.Google Scholar - Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In
*BMVC*.Google Scholar - Chen, Z., Chen, M., Weinberger, K.Q., & Zhang, W. (2015). Marginalized denoising for link prediction and multi-label learning. In
*AAAI*.Google Scholar - Chen, G., Song, Y., Wang, F., & Zhang, C. (2008). Semi-supervised multi-label learning by solving a sylvester equation. In
*SIAM international conference on data mining*(pp. 410–419).Google Scholar - Chen, M., Zheng, A., & Weinberger, K. (2013). Fast image tagging. In
*ICML*(pp. 1274–1282).Google Scholar - Chen, C., He, B., Ye, Y., & Yuan, X. (2016). The direct extension of admm for multi-block convex minimization problems is not necessarily convergent.
*Mathematical Programming*,*155*(1–2), 57–79.MathSciNetCrossRefzbMATHGoogle Scholar - Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., Li, Y., Neven, H., & Adam, H. (2014). Large-scale object classification using label relation graphs. In
*ECCV*(pp. 48–64). Berlin: Springer.Google Scholar - Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In
*CVPR*(pp. 248–255). IEEE.Google Scholar - Duygulu, P., Barnard, K., de Freitas, J.F., & Forsyth, D.A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In
*ECCV*(pp. 97–112). Berlin: Springer.Google Scholar - Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, PhD thesis, Stanford University.Google Scholar
- Fellbaum, C. (1998).
*WordNet*. New York: Wiley Online Library.zbMATHGoogle Scholar - Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking.
*Machine Learning*,*73*(2), 133–153.CrossRefGoogle Scholar - Geng, B., Yang, L., Xu, C., & Hua, X. S. (2008). Collaborative learning for image and video annotation. In
*Proceedings of the 1st ACM international conference on multimedia information retrieval*(pp. 443–450). New York: ACM.Google Scholar - Ghadimi, E., Teixeira, A., Shames, I., & Johansson, M. (2015). Optimal parameter selection for the alternating direction method of multipliers (admm): Quadratic problems.
*IEEE Transactions on Automatic Control*,*60*(3), 644–658.MathSciNetCrossRefzbMATHGoogle Scholar - Gibaja, E., & Ventura, S. (2014). Multi-label learning: A review of the state of the art and ongoing research.
*Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery*,*4*(6), 411–444.Google Scholar - Goldberg, A. B., Zhu, X., Recht, B., Xu, J. M., & Nowak, R. D. (2010). Transduction with matrix completion: Three birds with one stone. In
*NIPS*(pp. 757–765).Google Scholar - Grubinger, M., Clough, P., Müller, H., & Deselaers, T. (2006). The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In
*International Workshop OntoImage*(pp. 13–23).Google Scholar - Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In
*ICCV*(pp. 309–316).Google Scholar - Kapoor, A., Viswanathan, R., & Jain, P. (2012). Multilabel classification using bayesian compressed sensing. In
*NIPS*(pp. 2654–2662).Google Scholar - Li, X., Zhao, F., & Guo, Y. (2015). Conditional restricted boltzmann machines for multi-label learning with incomplete labels. In
*AISTATS*(pp. 635–643).Google Scholar - Lin, Z., Ding, G., Hu, M., Wang, J., & Ye, X. (2013). Image tag completion via image-specific and tag-specific linear sparse reconstructions. In
*CVPR*(pp. 1618–1625). IEEE.Google Scholar - Li, Y., Wu, B., Ghanem, B., Zhao, Y., Yao, H., & Ji, Q. (2016). Facial action unit recognition under incomplete data based on multi-label learning with missing labels.
*Pattern Recognition*,*60*, 890–900.CrossRefGoogle Scholar - Manning, C. D., Raghavan, P., Schütze, H., et al. (2008).
*Introduction to information retrieval*(Vol. 1). Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar - Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2012). Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*34*(11), 2233–2246.CrossRefGoogle Scholar - Raghunathan, A. U., & Di Cairano, S. (2014). Optimal step-size selection in alternating direction method of multipliers for convex quadratic programs and model predictive control,. In
*Proceedings of symposium on mathematical theory of networks and systems*(pp. 807–814).Google Scholar - Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.
*SIAM Review*,*52*(3), 471–501.MathSciNetCrossRefzbMATHGoogle Scholar - Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2005). Learning hierarchical multi-category text classification models. In
*ICML*(pp. 744–751). New York: ACM.Google Scholar - Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2006). Kernel-based learning of hierarchical multilabel classification models.
*The Journal of Machine Learning Research*,*7*, 1601–1626.MathSciNetzbMATHGoogle Scholar - Snoek, C. G., Worring, M., Van Gemert, J. C., Geusebroek, J. M., & Smeulders, A. W. (2006). The challenge problem for automated detection of 101 semantic concepts in multimedia. In
*Proceedings of the 14th annual ACM international conference on Multimedia*(pp. 421–430). New York: ACM.Google Scholar - Sun, Y., Zhang, Y., & Zhou, Z. H. (2010). Multi-label learning with weak label. In
*AAAI*(pp. 593–598).Google Scholar - Sun, H., Wang, J., & Deng, T. (2016). On the global and linear convergence of direct extension of admm for 3-block separable convex minimization models.
*Journal of Inequalities and Applications*,*2016*(1), 227.MathSciNetCrossRefzbMATHGoogle Scholar - Tousch, A. M., Herbin, S., & Audibert, J. Y. (2012). Semantic hierarchies for image annotation: A survey.
*Pattern Recognition*,*45*(1), 333–345.CrossRefGoogle Scholar - Vasisht, D., Damianou, A., Varma, M., & Kapoor, A. (2014). Active learning for sparse bayesian multilabel classification. In
*SIGKDD*(pp. 472–481). New York: ACM.Google Scholar - Von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In
*Proceedings of the SIGCHI conference on Human factors in computing systems*(pp. 319–326). New York: ACM.Google Scholar - Von Luxburg, U. (2007). A tutorial on spectral clustering.
*Statistics and Computing*,*17*(4), 395–416.MathSciNetCrossRefGoogle Scholar - Wang, Q., Shen, B., Wang, S., Li, L., & Si, L. (2014). Binary codes embedding for fast image tagging with incomplete labels. In
*ECCV*(pp. 425–439). Berlin: Springer.Google Scholar - Wang, Q., Si, L., & Zhang, D. (2014). Learning to hash with partial tags: Exploring correlation between tags and hashing bits for large scale image retrieval. In
*ECCV*(pp. 378–392). Berlin: Springer.Google Scholar - Weston, J., Chapelle, O., Vapnik, V., Elisseeff, A., & Schölkopf, B. (2002). Kernel dependency estimation. In
*NIPS*(pp. 873–880).Google Scholar - Wu, B., Chen, W., Liu, W., Sun, P., Ghanem, B., & Lyu, S. (2018). Tagging like humans: Diverse and distinct image annotation. In
*CVPR*. IEEE.Google Scholar - Wu, B., Jia, F., Liu, W., & Ghanem, B. (2017). Diverse image annotation. In
*CVPR*(pp. 2559–2567). New York: IEEE.Google Scholar - Wu, B., Liu, Z., Wang, S., Hu, B.G., & Ji, Q. (2014). Multi-label learning with missing labels. In
*ICPR*.Google Scholar - Wu, B., Lyu, S., & Ghanem, B. (2015a). Ml-mg: multi-label learning with missing labels using a mixed graph. In
*ICCV*(pp. 4157–4165).Google Scholar - Wu, B., Lyu, S., & Ghanem, B. (2016). Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In
*AAAI*(pp. 2229–2236).Google Scholar - Wu, L., Jin, R., & Jain, A. K. (2013). Tag completion for image retrieval.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*35*(3), 716–727.CrossRefGoogle Scholar - Wu, B., Lyu, S., Hu, B. G., & Ji, Q. (2015b). Multi-label learning with missing labels for image annotation and facial action unit recognition.
*Pattern Recognition*,*48*(7), 2279–2289.CrossRefGoogle Scholar - Xu, M., Jin, R., & Zhou, Z.H. (2013). Speedup matrix completion with side information: Application to multi-label learning. In
*NIPS*(pp. 2301–2309).Google Scholar - Xu, C., Tao, D., & Xu, C. (2016). Robust extreme multi-label learning. In
*Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining*(pp. 13–17).Google Scholar - Yu, H. F., Jain, P., Kar, P., & Dhillon, I. (2014). Large-scale multi-label learning with missing labels. In
*ICML*(pp. 593–601).Google Scholar - Yu, G., Zhu, H., & Domeniconi, C. (2015). Predicting protein functions using incomplete hierarchical labels.
*BMC Bioinformatics*,*16*(1), 1.CrossRefGoogle Scholar - Zehfuss, G. (1858). Über eine gewisse determinante.
*Zeitschrift für Mathematik und Physik*,*3*, 298–301.Google Scholar - Zhang, M. L., & Zhou, Z. H. (2007). Ml-knn: A lazy learning approach to multi-label learning.
*Pattern Recognition*,*40*(7), 2038–2048.CrossRefzbMATHGoogle Scholar - Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms.
*IEEE Transactions on Knowledge and Data Engineering*,*26*(8), 1819–1837.CrossRefGoogle Scholar - Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012). Low-rank sparse learning for robust visual tracking. In
*ECCV*(pp. 470–484). Berlin: Springer.Google Scholar - Zhang, Y., & Zhou, Z. H. (2010). Multilabel dimensionality reduction via dependence maximization.
*ACM Transactions on Knowledge Discovery from Data*,*4*(3), 14.CrossRefGoogle Scholar