Advertisement

What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization

  • Li Fei-Fei
  • Li-Jia Li
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 285)

Abstract

We live in a richly visual world. More than one third of the entire human brain is involved in visual processing and understanding. Psychologists have shown that the human visual system is particularly efficient and effective in perceiving high-level meanings in cluttered real-world scenes, such as objects, scene classes, activities and the stories in the images. In this chapter, we discuss a generativemodel approach for classifying complex human activities (such as croquet game, snowboarding, etc.) given a single static image.We observe that object recognition in the scene as well as scene environment classification of the image facilitate each other in the overall activity recognition task. We formulate this observation in a graphical model representation where activity classification is achieved by combining information from both the object recognition and the scene classification pathways. For evaluating the robustness of our algorithm, we have assembled a challenging dataset consisting real-world images of eight different sport events, most of them collected from the Internet. Experimental results show that our hierarchical model performs better than existing methods.

Keywords

Object Recognition Object Class Event Category Foreground Object Visual World 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fei-Fei, L., Iyer, A., Koch, C., Perona, P.: What do we perceive in a glance of a real-world scene? Journal of Vision 7(1),10, 1–29 (2007)CrossRefGoogle Scholar
  2. 2.
    Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. In: Short Course of the International Conference on Computer Vision and Pattern Recognition (2007), http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
  3. 3.
    Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of International Workshop on Content-based Access of Image and Vedeo Databases (1998)Google Scholar
  4. 4.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42 (2001)Google Scholar
  5. 5.
    Vogel, J., Schiele, B.: A semantic typicality measure for natural scene categorization. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 195–203. Springer, Heidelberg (2004)Google Scholar
  6. 6.
    Fei-Fei, L., Perona, P.: A Bayesian hierarchy model for learning natural scene categories. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  7. 7.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 101–108. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  8. 8.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 264–271 (2003)Google Scholar
  9. 9.
    Kumar, M.P., Torr, P.H.S., Zisserman, A.: Obj cut. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 18–25 (2005)Google Scholar
  10. 10.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511–518 (2001)Google Scholar
  11. 11.
    Zhang, H., Berg, A., Maire, M., Malik, J.: Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
  12. 12.
    Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: International Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22 (2004)Google Scholar
  13. 13.
    Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.: Discovering object categories in image collections. In: Proceedings of the International Conference on Computer Vision (2005)Google Scholar
  14. 14.
    Li, L.-J., Wang, G., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  15. 15.
    Wolfe, J.: Visual memory: what do you know about what you saw? Current Biology 8, R303–R304 (1998)CrossRefGoogle Scholar
  16. 16.
    Hoiem, D., Efros, A., Hebert, M.: Automatic photo pop-up. In: Proceedings of ACM SIGGRAPH, vol. 24(3), pp. 577–584 (2005)Google Scholar
  17. 17.
    Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees:a graphical model relating features, objects and scenes. In: Proceedings of Neural Information Processing Systems (2004)Google Scholar
  18. 18.
    Hoiem, D., Efros, A., Hebert, M.: Putting Objects in Perspective. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
  19. 19.
    Sudderth, E., Torralba, A., Freeman, W., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: Proceedings of the International Conference on Computer Vision (2005)Google Scholar
  20. 20.
    Tu, Z., Chen, X., Yuille, A., Zhu, S.: Image Parsing: Unifying Segmentation, Detection, and Recognition. International Journal of Computer Vision 63(2), 113–140 (2005)CrossRefGoogle Scholar
  21. 21.
    Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision (1999)Google Scholar
  22. 22.
    Dorko, G., Schmid, C.: Object class recognition using discriminative local features. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted)Google Scholar
  23. 23.
    Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: Proceedings of the British Machine Vision Conference, pp. 113–122 (2002)Google Scholar
  24. 24.
    Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHCrossRefGoogle Scholar
  25. 25.
    Winn, J., Bishop, C.M.: Variational message passing. Journal of Machine Learning Research 6, 661–694 (2004)MathSciNetGoogle Scholar
  26. 26.
    Krempp, S., Geman, D., Amit, Y.: Sequential learning with reusable parts for object detection. Technical report, Johns Hopkins University (2002)Google Scholar
  27. 27.
    Yao, Z.-Y., Yang, X., Zhu, S.-C.: Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmarks. In: Yuille, A.L., Zhu, S.-C., Cremers, D., Wang, Y. (eds.) EMMCVPR 2007. LNCS, vol. 4679, pp. 169–183. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Li Fei-Fei
    • 1
  • Li-Jia Li
    • 1
  1. 1.Dept. of Computer ScienceStanford UniversityUSA

Personalised recommendations