Advertisement

Semi-Latent Dirichlet Allocation: A Hierarchical Model for Human Action Recognition

  • Yang Wang
  • Payam Sabzmeydani
  • Greg Mori
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4814)

Abstract

We propose a new method for human action recognition from video sequences using latent topic models. Video sequences are represented by a novel “bag-of-words” representation, where each frame corresponds to a “word”. The major difference between our model and previous latent topic models for recognition problems in computer vision is that, our model is trained in a “semi-supervised” way. Our model has several advantages over other similar models. First of all, the training is much easier due to the decoupling of the model parameters. Secondly, it naturally solves the problem of how to choose the appropriate number of latent topics. Thirdly, it achieves much better performance by utilizing the information provided by the class labels in the training set. We present action classification and irregularity detection results, and show improvement over previous methods.

Keywords

Video Sequence Class Label Visual Word Action Recognition Latent Dirichlet Allocation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bissacco, A., Yang, M.H., Soatto, S.: Detecting humans via their pose. In: NIPS. Advances in Neural Information Processing Systems, vol. 19, pp. 169–176. MIT Press, Cambridge (2007)Google Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHCrossRefGoogle Scholar
  3. 3.
    Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3), 257–267 (2001)CrossRefGoogle Scholar
  4. 4.
    Boiman, O., Irani, M.: Detecting irregularities in images and in video. In: IEEE International Conference on Computer Vision, vol. 1, pp. 462–469 (2005)Google Scholar
  5. 5.
    Bosch, A., Zisserman, A., Munoz, X.: Scene classification via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Cutler, R., Davis, L.S.: Robust real-time periodic motion detection, analysis, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 781–796 (2000)CrossRefGoogle Scholar
  7. 7.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: ICCV 2005. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (2005)Google Scholar
  8. 8.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (2003)Google Scholar
  9. 9.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(4), 594–611 (2006)CrossRefGoogle Scholar
  10. 10.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 524–531 (2005)Google Scholar
  11. 11.
    Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1816–1823 (2005)Google Scholar
  12. 12.
    Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1458–1465 (2005)Google Scholar
  13. 13.
    Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR. Proceedings of Twenty-Second Annual International Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)Google Scholar
  14. 14.
    Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: IEEE International Conference on Computer Vision, vol. 1, pp. 166–173 (2005)Google Scholar
  15. 15.
    Lazebnik, S., Schmid, C., Ponce, J.: A maximum entropy framework for part-based texture and object recognition. In: IEEE International Conference on Computer Vision, vol. 1, pp. 832–838 (2005)Google Scholar
  16. 16.
    Little, J.L., Boyd, J.E.: Recognizing people by their gait: The shape of motion. Videre 1(2), 1–32 (1998)Google Scholar
  17. 17.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the DARPA Image Understanding Workshop, pp. 121–130 (April 1981)Google Scholar
  18. 18.
    Minka, T.P.: Estimating a Dirichlet distribution. Technical report, Massachusetts Institute of Technology (2000)Google Scholar
  19. 19.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: British Machine Vision Conference, vol. 3, pp. 1249–1258 (2006)Google Scholar
  20. 20.
    Polana, R., Nelson, R.C.: Detection and recognition of periodic, non-rigid motion. International Journal of Computer Vision 23(3), 261–282 (1997)CrossRefGoogle Scholar
  21. 21.
    Rao, C., Yilmaz, A., Shah, M.: View-invariant representation and recognition of actions. International Journal of Computer Vision 50(2), 203–226 (2002)zbMATHCrossRefGoogle Scholar
  22. 22.
    Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1605–1614 (2006)Google Scholar
  23. 23.
    Sabzmeydani, P., Mori, G.: Detecting pedestrians by learning shapelet features. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Press, Los Alamitos (2007)Google Scholar
  24. 24.
    Schuldt, C., Laptev, L., Caputo, B.: Recognizing human actions: a local SVM approach. In: IEEE International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)Google Scholar
  25. 25.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: IEEE International Conference on Computer Vision, vol. 1, pp. 370–377 (2005)Google Scholar
  26. 26.
    Sullivan, J., Carlsson, S.: Recognizing and tracking human action. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 629–644. Springer, Heidelberg (2002)Google Scholar
  27. 27.
    Zhong, H., Shi, J., Visontai, M.: Detecting unusual activity in video. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 819–826 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Yang Wang
    • 1
  • Payam Sabzmeydani
    • 1
  • Greg Mori
    • 1
  1. 1.School of Computing Science, Simon Fraser University, Burnaby, BCCanada

Personalised recommendations