Advertisement

Multi-label Image Annotation by Structural Grouping Sparsity

  • Yahong Han
  • Fei Wu
  • Yueting Zhuang

Abstract

We can obtain high-dimensional heterogeneous features from real-world images on photo-sharing website, for an example Flickr. Those features are implemented to describe their various aspects of visual characteristics, such as color, texture and shape etc. The heterogeneous features are often over-complete to describe certain semantic. Therefore, the selection of limited discriminative features for certain semantics is hence crucial to make the image understanding more interpretable. This chapter introduces one approach for multi-label image annotation with a regularized penalty. We call it Multi-label Image Boosting by the selection of heterogeneous features with structural Grouping Sparsity (MtBGS). MtBGS induces a (structural) sparse selection model to identify subgroups of homogeneous features for predicting a certain label. Moreover, the correlations among multiple tags are utilized in MtBGS to boost the performance of multi-label annotation. Extensive experiments on public image datasets show that the proposed approach has better multi-label image annotation performance and leads to a quite interpretable model for image understanding.

Keywords

Canonical Correlation Analysis Discriminative Feature Image Annotation Heterogeneous Feature Group Lasso 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This work is supported by NSFC (90920303, 61070068), 863 Program (2006 AA010107) and Program for Changjiang Scholars and Innovative Research Team in University (IRT0652, PCSIRT).

References

  1. 1.
    Barnard, K., Duygulu, P., Forsyth, D., De Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003) CrossRefMATHGoogle Scholar
  2. 2.
    Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 24(6), 2350–2383 (1996) CrossRefMATHMathSciNetGoogle Scholar
  3. 3.
    Breiman, L., Friedman, J.: Predicting multivariate responses in multiple linear regression. J. R. Stat. Soc. B 59(1), 3–54 (1997) CrossRefMATHMathSciNetGoogle Scholar
  4. 4.
    Cao, L., Luo, J., Liang, F., Huang, T.: Heterogeneous feature machines for visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2009) Google Scholar
  5. 5.
    Chen, Y., Wang, J.Z., Geman, D.: Image categorization by learning and reasoning with regions. J. Mach. Learn. Res. 5, 913–939 (2004) Google Scholar
  6. 6.
    Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: A real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9. ACM, New York (2009) Google Scholar
  7. 7.
    Clemmensen, L., Hastie, T., Ersbøll, B.: Sparse discriminant analysis. http://www-stat.stanford.edu/~hastie/Papers/ (2008)
  8. 8.
    Duygulu, P., Barnard, K., De Freitas, J., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Computer Vision, ECCV 2002, pp. 349–354 (2002) Google Scholar
  9. 9.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–451 (2004) CrossRefMATHMathSciNetGoogle Scholar
  10. 10.
    Fan, J., Gao, Y., Luo, H.: Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation. IEEE Trans. Image Process. 17(3), 407 (2008) CrossRefMathSciNetGoogle Scholar
  11. 11.
    Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. http://www-stat.stanford.edu/~tibs/research.html (2010)
  12. 12.
    Genkin, A., Lewis, D.D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007) CrossRefMathSciNetGoogle Scholar
  13. 13.
    Grangier, D., Bengio, S.: A discriminative kernel-based approach to rank images from text queries. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1371–1384 (2008) CrossRefGoogle Scholar
  14. 14.
    Han, Y., Wu, F., Jia, J., Zhuang, Y., Yu, B.: Multi-task sparse discriminant analysis (MtSDA) with overlapping categories. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10), pp. 469–474 (2010) Google Scholar
  15. 15.
    Hastie, T., Buja, A., Tibshirani, R.: Penalized discriminant analysis. Ann. Stat. 23(1), 73–102 (1995) CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Hotelling, H.: Relations between two sets of variates. Biometrika 28(3), 321–377 (1936) MATHMathSciNetGoogle Scholar
  17. 17.
    Ji, S., Tang, L., Yu, S., Ye, J.: Extracting shared subspace for multi-label classification. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 381–389. ACM, New York (2008) Google Scholar
  18. 18.
    Kang, F., Jin, R., Sukthankar, R.: Correlated label propagation with application to multi-label learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1719–1726 (2006) Google Scholar
  19. 19.
    Lewis, D.D.: Evaluating text categorization. In: Proceedings of Speech and Natural Language Workshop, pp. 312–318 (1991) CrossRefGoogle Scholar
  20. 20.
    Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 341–349 (1998) Google Scholar
  21. 21.
    Praks, P., Kucera, R., Izquierdo, E.: The sparse image representation for automated image retrieval. In: Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, pp. 25–28. IEEE, New York (2008) CrossRefGoogle Scholar
  22. 22.
    Quattoni, A., Collins, M., Darrell, T.: Transfer learning for image classification with sparse prototype representations. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–8. IEEE, New York (2008) Google Scholar
  23. 23.
    Shen, X., Huang, H.: Grouping pursuit through a regularization solution surface. J. Am. Stat. Assoc. 105(490), 727–739 (2010) CrossRefMathSciNetGoogle Scholar
  24. 24.
    Shevade, S., Keerthi, S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003) CrossRefGoogle Scholar
  25. 25.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B, Stat. Methodol. 58(1), 267–288 (1996) MATHMathSciNetGoogle Scholar
  26. 26.
    Wang, C., Yan, S., Zhang, L., Zhang, H.: Multi-label sparse coding for automatic image annotation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1643–1650 (2009) Google Scholar
  27. 27.
    Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2008) CrossRefGoogle Scholar
  28. 28.
    Wu, F., Han, Y.H., Tian, Q., Zhuang, Y.T.: Multi-label boosting for image annotation by structural grouping sparsity. In: Proceedings of the 2010 ACM International Conference on Multimedia (ACM Multimedia), pp. 15–24. ACM, New York (2010) Google Scholar
  29. 29.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68(1), 49–67 (2006) CrossRefMATHMathSciNetGoogle Scholar
  30. 30.
    Zhang, Y., Zhou, Z.: Multi-label dimensionality reduction via dependence maximization. In: Proceedings of AAAI Conference on Artificial Intelligence (AAAI), pp. 1503–1505 (2008) Google Scholar
  31. 31.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B., Stat. Methodol. 67(2), 301–320 (2005) CrossRefMATHMathSciNetGoogle Scholar
  32. 32.
    Zhou, Z.H., Zhang, M.L.: Multi-instance multi-label learning with application to scene classification. In: Proceedings of Neural Information Processing Systems (NIPS) (2007) Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.College of Computer ScienceZhejiang UniversityHangzhouChina

Personalised recommendations