Discovering Multipart Appearance Models from Captioned Images

  • Michael Jamieson
  • Yulia Eskin
  • Afsaneh Fazly
  • Suzanne Stevenson
  • Sven Dickinson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6315)


Even a relatively unstructured captioned image set depicting a variety of objects in cluttered scenes contains strong correlations between caption words and repeated visual structures. We exploit these correlations to discover named objects and learn hierarchical models of their appearance. Revising and extending a previous technique for finding small, distinctive configurations of local features, our method assembles these co-occurring parts into graphs with greater spatial extent and flexibility. The resulting multipart appearance models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. We demonstrate improved annotation precision and recall on datasets to which the non-hierarchical technique was previously applied and show extended spatial coverage of detected objects.


Part Model Appearance Model Image Annotation Part Detection National Hockey League 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jamieson, M., Fazly, A., Dickinson, S., Stevenson, S., Wachsmuth, S.: Using language to learn structured appearance models for image annotation. IEEE PAMI 32, 148–164 (2010)Google Scholar
  2. 2.
    Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)zbMATHCrossRefGoogle Scholar
  3. 3.
    Carneiro, G., Chan, A., Moreno, P., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE PAMI 29, 394–410 (2007)Google Scholar
  4. 4.
    Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Monay, F., Gatica-Perez, D.: Modeling semantic aspects for cross-media image indexing. IEEE PAMI 29, 1802–1817 (2007)Google Scholar
  6. 6.
    Quattoni, A., Collins, M., Darrell, T.: Learning visual representations using images with captions. In: CVPR (2007)Google Scholar
  7. 7.
    Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from Google’s image search. In: CVPR (2005)Google Scholar
  8. 8.
    Crandall, D.J., Huttenlocher, D.P.: Weakly supervised learning of part-based spatial models for visual object recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 16–29. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Kokkinos, I., Yuille, A.: HOP: Hierarchical object parsing. In: CVPR (2009)Google Scholar
  10. 10.
    Zhu, L., Lin, C., Huang, H., Chen, Y., Yuille, A.: Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 759–773. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Bouchard, G., Triggs, B.: Hierarchical part-based visual object categorization. In: CVPR (2005)Google Scholar
  12. 12.
    Fidler, S., Boben, M., Leonardis, A.: Similarity-based cross-layered hierarchical representation for object categorization. In: CVPR (2008)Google Scholar
  13. 13.
    Epshtein, B., Ullman, S.: Feature hierarchies for object classification. In: ICCV (2005)Google Scholar
  14. 14.
    Ommer, B., Buhmann, J.: Learning the compositional nature of visual object categories for recognition. IEEE PAMI 32, 501–516 (2010)Google Scholar
  15. 15.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  16. 16.
    Ke, Y., Sukthankar, R.: PCA-SIFT: A more distinctive representation for local image descriptors. In: CVPR (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Michael Jamieson
    • 1
  • Yulia Eskin
    • 1
  • Afsaneh Fazly
    • 1
  • Suzanne Stevenson
    • 1
  • Sven Dickinson
    • 1
  1. 1.University of Toronto 

Personalised recommendations