Reasoning about Object Affordances in a Knowledge Base Representation

  • Yuke Zhu
  • Alireza Fathi
  • Li Fei-Fei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8690)


Reasoning about objects and their affordances is a fundamental problem for visual intelligence. Most of the previous work casts this problem as a classification task where separate classifiers are trained to label objects, recognize attributes, or assign affordances. In this work, we consider the problem of object affordance reasoning using a knowledge base representation. Diverse information of objects are first harvested from images and other meta-data sources. We then learn a knowledge base (KB) using a Markov Logic Network (MLN). Given the learned KB, we show that a diverse set of visual inference tasks can be done in this unified framework without training separate classifiers, including zero-shot affordance prediction and object recognition given human poses.


Markov Random Fields Categorical Attribute Visual Attribute Knowledge Graph Partial Observation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: CVPR (2009)Google Scholar
  2. 2.
    Bart, E., Ullman, S.: Single-example learning of novel classes using representation by similarity. In: BMVC (2005)Google Scholar
  3. 3.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A collaboratively created graph database for structuring human knowledge. In: ACM SIGMOD International Conference on Management of Data (2008)Google Scholar
  4. 4.
    Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: AAAI Conference on Artificial Intelligence (2011)Google Scholar
  5. 5.
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI Conference on Artificial Intelligence (2010)Google Scholar
  6. 6.
    Chen, X., Shrivastava, A., Gupta, A.: Neil: Extracting visual knowledge from web data. In: IEEE International Conference on Computer Vision (2013)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE International Conference on Computer Vision (2009)Google Scholar
  8. 8.
    Deng, J., Krause, J., Berg, A.C., Fei-Fei, L.: Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. In: Computer Vision and Pattern Recognition (2012)Google Scholar
  9. 9.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Computer Vision and Pattern Recognition (2009)Google Scholar
  10. 10.
    Fellbaum, C.: Wordnet: An electronic lexical database. Bradford Books (1998)Google Scholar
  11. 11.
    Felzenszwalb, P., McAllester, D., Ramaman, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  12. 12.
    Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: ICCV (2005)Google Scholar
  13. 13.
    Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A.A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J., Schlaefer, N., Welty, C.: Building watson: An overview of the deepqa project. AI Magazine (2010)Google Scholar
  14. 14.
    Fink, M.: Object classification from a single example utilizing class relevance pseudo-metrics. In: NIPS (2004)Google Scholar
  15. 15.
    Fouhey, D.F., Delaitre, V., Gupta, A., Efros, A.A., Laptev, I., Sivic, J.: People watching: human actions as a cue for single view geometry. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 732–745. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Gibson, J.J.: The Ecological Approach to Visual Perception. Houghton Mifflin, Boston (1979)Google Scholar
  17. 17.
    Grabner, H., Gall, J., Gool, L.V.: What makes a chair a chair? In: CVPR (2011)Google Scholar
  18. 18.
    Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. PAMI (2009)Google Scholar
  19. 19.
    Gupta, A., Satkin, S., Efros, A., Hebert, M.: From 3d scene geometry to human workspace. In: CVPR (2011)Google Scholar
  20. 20.
    Jiang, Y., Koppula, H.S., Saxena, A.: Hallucinated humans as the hidden context for labeling 3d scenes. In: CVPR (2013)Google Scholar
  21. 21.
    Kjellstrom, H., Romero, J., Kragic, D.: Visual object action recognition: inferring object affordances from human demonstration. In: CVIU (2010)Google Scholar
  22. 22.
    Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: Robotics: Science and Systems (RSS) (2013)Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  24. 24.
    Kuettel, D., Guillaumin, M., Ferrari, V.: Segmentation propagation in imagenet. In: European Conference on Computer Vision (2012)Google Scholar
  25. 25.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)Google Scholar
  26. 26.
    Niu, F., Zhang, C., Ré, C., Shavlik, J.: Elementary: Large-scale knowledge-base construction via machine learning and statistical inference. In: International Journal on Semantic Web and Information Systems - Special Issue on Web-Scale Knowledge Extraction (2012)Google Scholar
  27. 27.
    Parikh, D., Grauman, K.: Relative attributes. In: International Conference on Computer Vision (2011)Google Scholar
  28. 28.
    Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2), 107–136 (2006)CrossRefGoogle Scholar
  29. 29.
    Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: CVPR (2011)Google Scholar
  30. 30.
    Singh, A.P., Gordon, G.J.: Relational learning via collective matrix factorization. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008)Google Scholar
  31. 31.
    Singla, P., Domingos, P.: Lifted first-order belief propagation. In: AAAI Conference on Artificial Intelligence (2008)Google Scholar
  32. 32.
    Socher, R., Chen, D., Manning, C.D., Ng, A.Y.: Reasoning with neural tensor networks for knowledge base completion. In: Conference on Neural Information Processing Systems (2013)Google Scholar
  33. 33.
    Tran, S.D., Davis, L.S.: Event modeling and recognition using markov logic networks. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 610–623. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  34. 34.
    Winston, P.H., Binford, T.O., Katz, B., Lowry, M.: Learning physical descriptions from functional definitions, examples, and precedents. In: AI Memos (1982)Google Scholar
  35. 35.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures of parts. In: CVPR (2011)Google Scholar
  36. 36.
    Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: IEEE International Conference on Computer Vision (2011)Google Scholar
  37. 37.
    Yao, B., Ma, J., Fei-Fei, L.: Discovering object functionality. In: ICCV (2013)Google Scholar
  38. 38.
    Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.-R.: Statsnowball: a statistical approach to extracting entity relationships. In: International World Wide Web Conference (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yuke Zhu
    • 1
  • Alireza Fathi
    • 1
  • Li Fei-Fei
    • 1
  1. 1.Computer Science DepartmentStanford UniversityUSA

Personalised recommendations