Advertisement

Action Recognition Using Topic Models

  • Xiaogang Wang

Abstract

In this book chapter, we will introduce approaches of using topic models for action recognition. Topic models were originally developed in language processing. In recent years, they were applied to action recognition and other computer vision problems, and achieved great success. Topic models are unsupervised. The models of actions are learned through exploring the co-occurrence of visual features without manually labeled training examples. This is important when there are a large number of actions to be recognized in a large variety of scenes. Most topic models are hierarchical Bayesian models and they jointly model simple actions and complicated actions at different hierarchical levels. Knowledge and contextual information can be well integrated into topic models as priors. We will explain how topic models can be used in different ways for action recognition in different scenarios. For examples, the scenes may be sparse or crowded. There may be a single camera view or multiple camera views. The camera settings may be near-field or far-field. In different scenarios, different features, such as trajectories, local motions and spatial-temporal interest points, are used for action recognition.

Keywords

Video Sequence Video Clip Action Class Action Recognition Interest Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Atev, S., Arumugam, H., Masaoud, O., Janardan, R., Papanikolopoulos, N.P.: A vision-based approach to collision prediction at traffic intersections. IEEE Trans. Intell. Transp. Syst. 6, 416–423 (2005) CrossRefGoogle Scholar
  2. 2.
    Ayers, D., Shah, M.: Monitoring human behavior from video taken in an office environment. Image Vis. Comput. 19, 833–846 (2001) CrossRefGoogle Scholar
  3. 3.
    Barnard, K., Duygulu, P., Forsyth, D., Freitas, N., Blei, D.M., Jordan, M.J.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003) MATHGoogle Scholar
  4. 4.
    Bird, N.D., Masoud, O., Papanikolopoulos, N.P., Isaacs, A.: Detection of loitering individuals in public transportation area. IEEE Trans. Intell. Transp. Syst. 6, 167–177 (2005) CrossRefGoogle Scholar
  5. 5.
    Blei, D.M., Jordan, M.I.: Variational methods for the Dirichlet process. In: Proc. Int’l Conf. Machine Learning (2004) Google Scholar
  6. 6.
    Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet process mixtures. Journal of Bayesian Analysis 1, 121–144 (2005) MathSciNetGoogle Scholar
  7. 7.
    Blei, D.M., McAuliffe, J.D.: Supervised topic models. In: Proc. Neural Information Processing Systems Conf. (2007) Google Scholar
  8. 8.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) MATHGoogle Scholar
  9. 9.
    Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proc. Int’l Conf. Machine Learning (2006) Google Scholar
  10. 10.
    Blei, D.M., Lafferty, J.D.: A correlated topic model of science. Annals of Applied Statistics 1, 17–35 (2007) MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Cao, L., Fei-Fei, L.: Spatially coherent latent topic model for concurrent object segmentation and classification. In: Proc. Int’l Conf. Computer Vision (2007) Google Scholar
  12. 12.
    Datta, A., Shah, M., Da, N., Lobo, V.: Person-on-person violence detection in video data. In: Proc. Int’l Conf. Pattern Recognition (2002) Google Scholar
  13. 13.
    Davis, J.W., Bobick, A.F.: The representation and recognition of action using temporal templates. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (1997) Google Scholar
  14. 14.
    Dever, J., Lobo, N.V., Shah, M.: Automatic visual recognition of armed robbery. In: Proc. Int’l Conf. Pattern Recognition (2002) Google Scholar
  15. 15.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proc. IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (2005) Google Scholar
  16. 16.
    Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2005) Google Scholar
  17. 17.
    Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209–230 (1973) MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman & Hall, London (2004) MATHGoogle Scholar
  19. 19.
    Ghanem, N., Dementhon, D., Doermann, D., Davis, L.: Representation and recognition of events in surveillance video using petri net. In: CVPR Workshop (2004) Google Scholar
  20. 20.
    Griffin, J.E., Steel, M.F.J.: Order-based dependent Dirichlet processes. J. Am. Stat. Assoc. 101, 179–194 (2006) MathSciNetMATHCrossRefGoogle Scholar
  21. 21.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: Proc. of the National Academy of Sciences of the United States of America (2004) Google Scholar
  22. 22.
    Griffths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. In: Proc. Neural Information Processing Systems Conf. (2004) Google Scholar
  23. 23.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. of Uncertainty in Artificial Intelligence (1999) Google Scholar
  24. 24.
    Honggeng, S., Nevatia, R.: Multi-agent event recognition. In: Proc. Int’l Conf. Computer Vision (2001) Google Scholar
  25. 25.
    Hospedales, T., Gong, S., Xiang, T.: A Markov clustering topic model for mining behaviour in video. In: Proc. Int’l Conf. Computer Vision (2009) Google Scholar
  26. 26.
    Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern., Part C, Appl. Rev. 34, 334–352 (2004) CrossRefGoogle Scholar
  27. 27.
    Hu, W., Xiao, X., Fu, Z., Xie, D., Tan, T., Maybank, S.: A system for learning statistical motion patterns. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1450–1464 (2006) CrossRefGoogle Scholar
  28. 28.
    Intille, S.S., Bobick, A.F.: A framework for recognizing multi-agent action from visual evidence. In: Proc. National Conf. Artificial Intelligence (1999) Google Scholar
  29. 29.
    Jordan, M.: Learning in Graphical Models. MIT Press, Cambridge (1999) Google Scholar
  30. 30.
    Ke, Y., Suckthanlar, R., Hebert, M.: Event detection in crowded videos. In: Proc. Int’l Conf. Computer Vision (2007) Google Scholar
  31. 31.
    Khoshabeh, R., Gandhi, T., Trivedi, M.M.: Multi-camera based traffic flow characterization and classification. In: Proc. IEEE Conf. Intelligent Transportation Systems (2007) Google Scholar
  32. 32.
    Kumar, P., Ranganath, S., Hu, W., Sengupta, K.: Framework for real-time behavior interpretation from traffic video. IEEE Trans. Intell. Transp. Syst. 6, 43–53 (2005) CrossRefGoogle Scholar
  33. 33.
    Lee, L., Romano, R., Stein, G.: Monitoring activities from multiple video streams: Establishing a common coordinate frame. IEEE Trans. Pattern Anal. Mach. Intell. 22, 758–768 (2000) CrossRefGoogle Scholar
  34. 34.
    Li, J., Gong, S., Xiang, T.: Scene segmentation for behaviour correlation. In: Proc. European Conf. Computer Vision (2008) Google Scholar
  35. 35.
    Li, J., Gong, S., Xiang, T.: Discovering multi-camera behaviour correlations for on-the-fly global activity prediction and anomaly detection. In: Proc. of IEEE Int’l Workshop on Visual Surveillance (2009) Google Scholar
  36. 36.
    Lin, C., Ling, Z.: Automatic fall incident detection in compressed video for intelligent homecare. In: Proc. IEEE Int’l Conf. Computer Communications and Networks (2007) Google Scholar
  37. 37.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proc. Int’l Joint Conf. Artificial Intelligence, pp. 674–680 (1981) Google Scholar
  38. 38.
    Mimno, D., Wallach, H.M., Naradowsky, J., Smith, D.A., MaCallum, A.: Polylingual topic models. In: Proc. of Conference on Empirical Methods in Natural Language Processing (2009) Google Scholar
  39. 39.
    Morris, B.T., Trivedi, M.M.: A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans. Circuits Syst. Video Technol. 18, 1114–1127 (2008) CrossRefGoogle Scholar
  40. 40.
    Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed inference for latent Dirichlet allocation. In: Proc. Neural Information Processing Systems Conf. (2007) Google Scholar
  41. 41.
    Niebles, J.C., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2007) Google Scholar
  42. 42.
    Niebles, J.C., Wang, H., Li, F.: Unsupervised learning of human action categories using spatial-temporal words. In: Proc. British Machine Vision Conference (2006) Google Scholar
  43. 43.
    Oliver, N., Rosario, B., Pentland, A.: A Bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22, 831–843 (2000) CrossRefGoogle Scholar
  44. 44.
    Passino, G., Patras, I., Izquierdo, E.: Latent semantics local distribution for CRF-based image semantic segmentation. In: Proc. British Machine Vision Conference (2009) Google Scholar
  45. 45.
    Ren, L., Dunson, D.B., Carin, L.: The dynamic hierarchical Dirichlet process. In: Proc. Int’l Conf. Machine Learning (2008) Google Scholar
  46. 46.
    Rodriguez, A., Dunson, D.B., Gelfand, A.E.: The nested Dirichlet process. Technical report, Working Paper 2006-19, Duke Institute of Statistics and Decision Sciences (2006) Google Scholar
  47. 47.
    Rosen-Zvi, M., Griffths, T., Steyvers, M., Smyth, P.: Probabilistic author-topic models for information discovery. In: Proc. of ACM Special Interest Group on Knowledge Discovery and Data Mining (2004) Google Scholar
  48. 48.
    Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2006) Google Scholar
  49. 49.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: Proc. Int’l Conf. Pattern Recognition (2004) Google Scholar
  50. 50.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering object categories in image collections. In: Proc. Int’l Conf. Computer Vision (2005) Google Scholar
  51. 51.
    Smith, P., Lobo, N.V., Shah, M.: Temporal boost for event recognition. In: Proc. Int’l Conf. Computer Vision (2005) Google Scholar
  52. 52.
    Srebro, N., Roweis, S.: Time-varying topic models using dependent Dirichlet processes. Technical report, Department of Computer Science, University of Toronto (2005) Google Scholar
  53. 53.
    Stauffer, C., Grimson, E.: Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. 22, 747–757 (2000) CrossRefGoogle Scholar
  54. 54.
    Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Describing visual scenes using transformed objects and parts. Int. J. Comput. Vis. 77, 291–330 (2007) CrossRefGoogle Scholar
  55. 55.
    Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet process. J. Am. Stat. Assoc. 101, 1566–1581 (2006) MathSciNetMATHCrossRefGoogle Scholar
  56. 56.
    Thirde, D., Borg, M., Ferryman, J., Aguilera, J., Kampel, M.: Distributed multi-camera surveillance for aircraft servicing operations. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2005) Google Scholar
  57. 57.
    Veeraraghavan, H., Maoud, O., Papanikolopoulos, N.: Computer vision algorithms for intersection monitoring. IEEE Trans. Intell. Transp. Syst. 4, 78–89 (2003) CrossRefGoogle Scholar
  58. 58.
    Verbeek, J., Triggs, B.: Region classification with Markov field aspect models. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2007) Google Scholar
  59. 59.
    Wang, X., Grimson, E.: Spatial latent Dirichlet allocation. In: Proc. Neural Information Processing Systems Conf. (2007) Google Scholar
  60. 60.
    Wang, Y., Jiang, T., Drew, M.S., Li, Z., Mori, G.: Unsupervised discovery of action classes. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2006) Google Scholar
  61. 61.
    Wang, X., Ma, X., Grimson, E.: Unsupervised activity perception by hierarchical Bayesian models. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2007) Google Scholar
  62. 62.
    Wang, X., Ma, X., Grimson, E.: Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans. Pattern Anal. Mach. Intell. 31, 539–555 (2009) CrossRefGoogle Scholar
  63. 63.
    Wang, X., Ma, K.T., Ng, G., Grimson, E.: Trajectory analysis and semantic region modeling using a nonparametric Bayesian model. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2008) Google Scholar
  64. 64.
    Wang, X., Tieu, K., Grimson, E.: Learning semantic scene models by trajectory analysis. In: Proc. European Conf. Computer Vision (2006) Google Scholar
  65. 65.
    Wang, X., Tieu, K., Grimson, E.: Correspondence-free multi-camera activity analysis and scene modeling. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2008) Google Scholar
  66. 66.
    Wang, X., Tieu, K., Grimson, E.: Correspondence-free activity analysis and scene modeling in multiple camera views. IEEE Trans. Pattern Anal. Mach. Intell. (2009) Google Scholar
  67. 67.
    Xiang, T., Gong, S.: Beyond tracking: Modelling activity and understanding behaviour. Int. J. Comput. Vis. 67, 21–51 (2006) CrossRefGoogle Scholar
  68. 68.
    Yang, Y., Liu, J., Shah, M.: Video scene understanding using multi-scale analysis. In: Proc. Int’l Conf. Computer Vision (2009) Google Scholar
  69. 69.
    Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2001) Google Scholar
  70. 70.
    Zhong, H., Shi, J., Visontai, M.: Detecting unusual activity in video. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2004) Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Department of Electronic EngineeringChinese University of Hong KongHong KongChina

Personalised recommendations