A Discriminative Latent Model of Object Classes and Attributes

  • Yang Wang
  • Greg Mori
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6315)


We present a discriminatively trained model for joint modelling of object class labels (e.g. “person”, “dog”, “chair”, etc.) and their visual attributes (e.g. “has head”, “furry”, “metal”, etc.). We treat attributes of an object as latent variables in our model and capture the correlations among attributes using an undirected graphical model built from training data. The advantage of our model is that it allows us to infer object class labels using the information of both the test image itself and its (latent) attributes. Our model unifies object class prediction and attribute prediction in a principled framework. It is also flexible enough to deal with different performance measurements. Our experimental results provide quantitative evidence that attributes can improve object naming.


Training Data Object Class Object Category Visual Attribute Class Accuracy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barrow, H.G., Tenenbaum, J.M.: Recovering intrinsic scene characteristics from images. In: Computer Vision Systems. Academic Press, London (1978)Google Scholar
  2. 2.
    Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14(3), 462–467 (1968)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human-object interactions. In: Workshop on Structured Models in Computer Vision (2010)Google Scholar
  4. 4.
    Do, T.M.T., Artieres, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)Google Scholar
  5. 5.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  6. 6.
    Farhadi, A., Endres, I., Hoiem, D.: Attribute-centric recognition for cross-category generalization. In: IEEE CVPR (2010)Google Scholar
  7. 7.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE CVPR (2009)Google Scholar
  8. 8.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE CVPR (2008)Google Scholar
  9. 9.
    Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS. MIT Press, Cambridge (2007)Google Scholar
  10. 10.
    Joachims, T.: A support vector method for multivariate performance measures. In: ICML (2005)Google Scholar
  11. 11.
    Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: IEEE ICCV (2009)Google Scholar
  12. 12.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE CVPR (2009)Google Scholar
  13. 13.
    Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: ACM Multimedia (2007)Google Scholar
  14. 14.
    Quattoni, A., Wang, S., Morency, L.P., Collins, M., Darrell, T.: Hidden conditional random fields. IEEE PAMI 29(10), 1848–1852 (2007)Google Scholar
  15. 15.
    Ranjbar, M., Mori, G., Wang, Y.: Optimizing complex loss functions in structured prediction. In: ECCV (2010)Google Scholar
  16. 16.
    Taskar, B., Lacoste-Julien, S., Jordan, M.I.: Structured prediction, dual extragradient and Bregman projections. JMLR 7, 1627–1653 (2006)MathSciNetGoogle Scholar
  17. 17.
    Tran, D., Forsyth, D.: Configuration estimates improve pedestrian finding. In: NIPS. MIT Press, Cambridge (2008)Google Scholar
  18. 18.
    Vaquero, D.A., Feris, R.S., Tran, D., Brown, L., Hampapur, A., Turk, M.: Attribute-based people search in surveillance environments. In: IEEE Workshop on Applications of Computer Vision (2009)Google Scholar
  19. 19.
    Wang, G., Forsyth, D.A.: Joint learning of visual attributes, object classes and visual saliency. In: IEEE ICCV (2009)Google Scholar
  20. 20.
    Wang, G., Hoiem, D., Forsyth, D.: Building text features for object image classification. In: IEEE CVPR (2009)Google Scholar
  21. 21.
    Wang, Y., Mori, G.: Max-margin hidden conditional random fields for human action recognition. In: IEEE CVPR (2009)Google Scholar
  22. 22.
    Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: IEEE CVPR (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yang Wang
    • 1
  • Greg Mori
    • 1
  1. 1.School of Computing ScienceSimon Fraser UniversityCanada

Personalised recommendations