Attribute Learning for Understanding Unstructured Social Activity

  • Yanwei Fu
  • Timothy M. Hospedales
  • Tao Xiang
  • Shaogang Gong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7575)


The rapid development of social video sharing platforms has created a huge demand for automatic video classification and annotation techniques, in particular for videos containing social activities of a group of people (e.g. YouTube video of a wedding reception). Recently, attribute learning has emerged as a promising paradigm for transferring learning to sparsely labelled classes in object or single-object short action classification. In contrast to existing work, this paper for the first time, tackles the problem of attribute learning for understanding group social activities with sparse labels. This problem is more challenging because of the complex multi-object nature of social activities, and the unstructured nature of the activity context. To solve this problem, we (1) contribute an unstructured social activity attribute (USAA) dataset with both visual and audio attributes, (2) introduce the concept of semi-latent attribute space and (3) propose a novel model for learning the latent attributes which alleviate the dependence of existing models on exact and exhaustive manual specification of the attribute-space. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multi-media sparse data learning tasks including: multi-task learning, N-shot transfer learning, learning with label noise and importantly zero-shot learning.


Attribute Space Latent Attribute Topic Model Latent Dirichlet Allocation Transfer Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Wang, C., Blei, D., Li, F.F.: Simultaneous image classification and annotation. In: Proc. CVPR (2009)Google Scholar
  2. 2.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR, pp. 951–958 (2009)Google Scholar
  3. 3.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Proc. CVPR (2009)Google Scholar
  4. 4.
    Farhadi, A., Endres, I., Hoiem, D.: Attribute-centric recognition for cross-category generalization. In: CVPR (2010)Google Scholar
  5. 5.
    Parikh, D., Grauman, K.: Relative attributes. In: Proc. ICCV (2011)Google Scholar
  6. 6.
    Palatucci, M., Hinton, G., Pomerleau, D., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Proc. NIPS (2009)Google Scholar
  7. 7.
    Hospedales, T., Gong, S., Xiang, T.: Learning tags from unsegmented videos of multiple human actions. In: Proc. ICDM (2011)Google Scholar
  8. 8.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  9. 9.
    Hospedales, T., Li, J., Gong, S., Xiang, T.: Identifying rare and subtle behaviours: A weakly supervised joint topic model. PAMI (2011)Google Scholar
  10. 10.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Proc. CVPR (2009)Google Scholar
  11. 11.
    Mahajan, D., Sellamanickam, S., Nair, V.: A joint learning framework for attribute models and object descriptions. In: Proc. ICCV (2011)Google Scholar
  12. 12.
    Parikh, D., Grauman, K.: Interactively building a discriminative vocabulary of nameable attributes. In: CVPR (2011)Google Scholar
  13. 13.
    Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: Proc. CVPR (2011)Google Scholar
  14. 14.
    Wang, Y., Mori, G.: Human action recognition by semilatent topic models. TPAMI 31, 1762–1774 (2009)CrossRefGoogle Scholar
  15. 15.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79, 299–318 (2008)CrossRefGoogle Scholar
  16. 16.
    Jiang, Y.G., Ye, G., Chang, S.F., Ellis, D., Loui, A.C.: Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In: ICMR (2011)Google Scholar
  17. 17.
    Yanagawa, A., Loui, E.C., Luo, J., Chang, S.F., Ellis, D., Jiang, W., Kennedy, L.: Kodak consumer video benchmark data set: concept definition and annotation. In: Proc. ACM MIR (2007)Google Scholar
  18. 18.
    Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos (2009)Google Scholar
  19. 19.
    Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multigraph learning. IEEE Trans. Cir. and Sys. for Video Technol. (2009)Google Scholar
  20. 20.
    Tang, J., Yan, S., Hong, R., Qi, G.J., Chua, T.S.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proc. ACM MM (2009)Google Scholar
  21. 21.
    Tang, J., Hua, X.S., Qi, G.J., Song, Y., Wu, X.: Video annotation based on kernel linear neighborhood propagation. IEEE Transactions on Multimedia (2008)Google Scholar
  22. 22.
    Snoek, C.G.M., Worring, M.: Concept-based video retrieval. Foundations and Trends in Information Retrieval 4, 215–322 (2009)Google Scholar
  23. 23.
    Snoek, C.G.M., Huurnink, B., Hollink, L., de Rijke, M., Schreiber, G., Worring, M.: Adding semantics to detectors for video retrieval. IEEE Transactions on Multimedia 9, 975–986 (2007)CrossRefGoogle Scholar
  24. 24.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS, pp. 65–72 (2005)Google Scholar
  25. 25.
    Salakhutdinov, R., Torralba, A., Tenenbaum, J.: Learning to share visual appearance for multiclass object detecti. In: Proc. CVPR (2011)Google Scholar
  26. 26.
    Blei, D., Lafferty, J.: A correlated topic model of science. Annals of Applied Statistics 1, 17–35 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Putthividhy, D., Attias, H.T., Nagarajan, S.S.: Topic regression multi-modal latent dirichlet allocation for image annotation. In: Proc. CVPR, pp. 3408–3415 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yanwei Fu
    • 1
  • Timothy M. Hospedales
    • 1
  • Tao Xiang
    • 1
  • Shaogang Gong
    • 1
  1. 1.School of EECSQueen Mary University of LondonUK

Personalised recommendations