Skip to main content

Actionlets and Activity Prediction

  • Chapter
  • First Online:
Human Activity Recognition and Prediction
  • 1330 Accesses

Abstract

The increasing ubiquitousness of multimedia information in today's world has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. One important problem that will significantly enhance semantic-level video analysis is activity understanding, which aims at accurately describing video contents using key semantic elements, especially activities. We notice that in case a time-critical decision is needed, there is a potential to utilize the temporal structure of videos for early prediction of ongoing human activity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(\bigcirc \!\!\!\!c\) {Kang Li and Yun Fu | IEEE}, {2014}. This is a minor revision of the work published in {Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 1644–1657. vol.36, no.8.}, http://dx.doi.org/10.1109/TPAMI.2013.2297321.

  2. 2.

    Concepts “action” and “event” are always interchangeably used in computer vision and other AI fields. In our discussion, we prefer to use “action” when referring human activity, and use “event” to refer more general things, such as “stock rising.”

  3. 3.

    Rand index is a measure of the similarity between data clustering and ground truth. It has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.

  4. 4.

    In this chapter, we use eventlet to refer to observation of actionlet and objects co-occurrence. An eventlet \(e = \left \langle \{a^{{\ast}}\}\bigcup \{o_{1},o_{2},\ldots,o_{m}\}\right \rangle\), where a represents a particular actionlet, and o i represents a particular object interacting with a within its segment. In our case, n will always be 0, 1, or 2 with the meaning of none, one, or two co-occurrent interacting objects (we assume one person at most can operate two different objects at the same time with two hands).

  5. 5.

    Notice that there are many situations that some periodical actions will be segmented to consecutive duplicate eventlets, e.g. action “cut.”

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of ACM International Conference on Machine Learning, p. 1 (2004)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of IEEE International Conference on Data Engineering, pp. 3–14 (1995)

    Google Scholar 

  3. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of International Conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  4. Baker, C.L., Saxe, R., Tenenbaum, J.B.: Action understanding as inverse learning. J. Cogn. 113(3), 329–349 (2009)

    Article  Google Scholar 

  5. Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order Markov models. J. Artif. Intell. Res. 22, 385–421 (2004)

    MathSciNet  MATH  Google Scholar 

  6. Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: Proceedings of IEEE International Conference on Computer Vision, pp. 778–785 (2011)

    Google Scholar 

  7. Brendel, W., Fern, A., Todorovic, S.: Probabilistic event logic for interval-based event recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3329–3336 (2011)

    Google Scholar 

  8. Brown, P.F., Desouza, P.V., et al.: Class-based n-gram models of natural language. J. Comput. Linguist. 18(4), 467–479 (1992)

    Google Scholar 

  9. Cao, Y., Barrett, D., et al.: Recognizing human activities from partially observed videos. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2013)

    Google Scholar 

  10. Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3273–3280 (2011)

    Google Scholar 

  11. Collins, R., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: IEEE International Workshop Performance Evaluation of Tracking and Surveillance (2005)

    Google Scholar 

  12. Davis, J.W., Tyagi, A.: Minimal-latency human action recognition using reliable-inference. J. Image Vision Comput. 24(5), 455–472 (2006)

    Article  Google Scholar 

  13. Desobry, F., Davy, M., et al.: An online kernel change detection algorithm. IEEE Trans. Signal Process. 53(8), 2961–2974 (2005)

    Article  MathSciNet  Google Scholar 

  14. Dong, G.: Sequence Data Mining. Springer, New York (2009)

    Google Scholar 

  15. Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. J. Data Knowl. Eng. 53(3), 225–241 (2005)

    Article  Google Scholar 

  16. Fan, Q., Bobbitt, R., et al.: Recognition of repetitive sequential human activity. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 943–950 (2009)

    Google Scholar 

  17. Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3201–3208 (2011)

    Google Scholar 

  18. Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)

    Article  Google Scholar 

  19. Haider, P., Brefeld, U., Scheffer, T.: Supervised clustering of streaming data for email batch detection. In: Proceedings of ACM International Conference on Machine learning, pp. 345–352 (2007)

    Google Scholar 

  20. Hamid, R., Maddi, S., et al.: A novel sequence representation for unsupervised analysis of human activities. J. Artif. Intell. 173(14), 1221–1244 (2009)

    Article  MathSciNet  Google Scholar 

  21. Han, D., Bo, L., et al.: Selection and context for action recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1933–1940 (2009)

    Google Scholar 

  22. Hoai, M., De la Torre, F.: Max-margin early event detectors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2863–2870 (2012)

    Google Scholar 

  23. Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Proceedings of European Conference on Computer Vision, pp. 494–507 (2010)

    Google Scholar 

  24. Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)

    Article  Google Scholar 

  25. Jiang, Y.G., Li, Z., Chang, S.F.: Modeling scene and object contexts for human action retrieval with few examples. IEEE Trans. Circuits Syst. Video Technol. 21(5), 674–681 (2011)

    Article  Google Scholar 

  26. Kim, K.-J.: Financial time series forecasting using support vector machines. J. Neurocomput. 55(1), 307–319 (2003)

    Article  Google Scholar 

  27. Kitani, K.M., Ziebart, B.D., et al.: Activity forecasting. In: Proceedings of European Conference on Computer Vision, pp. 201–214 (2012)

    Google Scholar 

  28. Kollar, T., Roy, N.: Utilizing object-object and object-scene context when planning to find things. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2168–2173 (2009)

    Google Scholar 

  29. Kwak, S., Han, B., Han, J.H.: Scenario-based video event recognition by constraint flow. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3345–3352 (2011)

    Google Scholar 

  30. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)

    Article  MathSciNet  Google Scholar 

  31. Levine, S., Popovic, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. In: Proceedings of Neural Information Processing Systems, vol. 24, pp. 1–9 (2011)

    Google Scholar 

  32. Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1644–1657 (2014)

    Article  Google Scholar 

  33. Li, K., Hu, J., Fu, Y.: Modeling complex temporal composition of actionlets for activity prediction. In: Proceedings of European Conference on Computer Vision, pp. 286–299 (2012)

    Google Scholar 

  34. Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM J. Comput. Surv. 43(1), 3 (2010)

    Google Scholar 

  35. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2929–2936 (2009)

    Google Scholar 

  36. Munoz, D., Bagnell, J., Hebert, M.: Stacked hierarchical labeling. In: Proceedings of European Conference on Computer Vision, pp. 57–70 (2010)

    Google Scholar 

  37. Munoz, D., Bagnell, J.A., Hebert, M.: Co-inference for multi-modal scene analysis. In: Proceedings of European Conference on Computer Vision, pp. 668–681 (2012)

    Google Scholar 

  38. Nasraoui, O., Soliman, M., et al.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. Knowl. Data Eng. 20(2), 202–215 (2008)

    Article  Google Scholar 

  39. Neill, D., Moore, A., Cooper, G.: A Bayesian spatial scan statistic. In: Proceedings of Neural Information Processing Systems, vol. 18, pp. 1003–1010 (2006)

    Google Scholar 

  40. Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)

    Article  Google Scholar 

  41. Niebles, J., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Proceedings of European Conference on Computer Vision, pp. 392–405 (2010)

    Google Scholar 

  42. Pei, M., Jia, Y., Zhu, S.-C.: Parsing video events with goal inference and intent prediction. In: Proceedings of IEEE International Conference on Computer Vision, pp. 487–494 (2011)

    Google Scholar 

  43. Roggen, D., Calatroni, A., et al.: Collecting complex activity data sets in highly rich networked sensor environments. In: Proceedings of International Conference on Networked Sensing Systems, pp. 233–240 (2010)

    Google Scholar 

  44. Rohrbach, M., Amin, S., et al.: A database for fine grained activity detection of cooking activities. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1194–1201 (2012)

    Google Scholar 

  45. Ron, D., Singer, Y., Tishby, N.: The power of amnesia: learning probabilistic automata with variable memory length. J. Mach. Learn. 25(2), 117–149 (1996)

    Article  MATH  Google Scholar 

  46. Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1036–1043 (2011)

    Google Scholar 

  47. Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1709–1718 (2006)

    Google Scholar 

  48. Si, Z., Pei, M., et al.: Unsupervised learning of event and-or grammar and semantics from video. In: Proceedings of IEEE International Conference on Computer Vision, pp. 41–48 (2011)

    Google Scholar 

  49. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  50. Srivastava, J., Cooley, R., et al.: Web usage mining: discovery and applications of usage patterns from web data. ACM SIGKDD Explorations Newsl. 1(2), 12–23 (2000)

    Article  Google Scholar 

  51. Turaga, P.K., Veeraraghavan, A., Chellappa, R.: From videos to verbs: mining videos for activities using a cascade of dynamical systems. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)

    Google Scholar 

  52. Wang, K., Xu, Y., Yu, J.X.: Scalable sequential pattern mining for biological sequences. In: Proceedings of ACM International Conference on Information and Knowledge Management, pp. 178–187 (2004)

    Google Scholar 

  53. Wang, H., Klaser, A., et al.: Action recognition by dense trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3169–3176 (2011)

    Google Scholar 

  54. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297. IEEE (2012)

    Google Scholar 

  55. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2010)

    Google Scholar 

  56. Zhao, Q., Bhowmick, S.S.: Sequential pattern mining: a survey. Technical Report CAIS Nanyang Technological University, Singapore, pp. 1–26 (2003)

    Google Scholar 

  57. Ziebart, B.D., Maas, A., et al.: Maximum entropy inverse reinforcement learning. In: Proceedings of AAAI, pp. 1433–1438 (2008)

    Google Scholar 

  58. Ziebart, B.D., Ratliff, N., et al.: Planning-based prediction for pedestrians. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 3931–3936 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Li, K., Fu, Y. (2016). Actionlets and Activity Prediction. In: Fu, Y. (eds) Human Activity Recognition and Prediction. Springer, Cham. https://doi.org/10.1007/978-3-319-27004-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27004-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27002-9

  • Online ISBN: 978-3-319-27004-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics