Actionlets and Activity Prediction

Li, Kang; Fu, Yun

doi:10.1007/978-3-319-27004-3_7

Kang Li² &
Yun Fu³

1330 Accesses

Abstract

The increasing ubiquitousness of multimedia information in today's world has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. One important problem that will significantly enhance semantic-level video analysis is activity understanding, which aims at accurately describing video contents using key semantic elements, especially activities. We notice that in case a time-critical decision is needed, there is a potential to utilize the temporal structure of videos for early prediction of ongoing human activity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
\(\bigcirc \!\!\!\!c\) {Kang Li and Yun Fu | IEEE}, {2014}. This is a minor revision of the work published in {Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 1644–1657. vol.36, no.8.}, http://dx.doi.org/10.1109/TPAMI.2013.2297321.
2.
Concepts “action” and “event” are always interchangeably used in computer vision and other AI fields. In our discussion, we prefer to use “action” when referring human activity, and use “event” to refer more general things, such as “stock rising.”
3.
Rand index is a measure of the similarity between data clustering and ground truth. It has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.
4.
In this chapter, we use eventlet to refer to observation of actionlet and objects co-occurrence. An eventlet \(e = \left \langle \{a^{{\ast}}\}\bigcup \{o_{1},o_{2},\ldots,o_{m}\}\right \rangle\), where a ^∗ represents a particular actionlet, and o _i represents a particular object interacting with a ^∗ within its segment. In our case, n will always be 0, 1, or 2 with the meaning of none, one, or two co-occurrent interacting objects (we assume one person at most can operate two different objects at the same time with two hands).
5.
Notice that there are many situations that some periodical actions will be segmented to consecutive duplicate eventlets, e.g. action “cut.”

References

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of ACM International Conference on Machine Learning, p. 1 (2004)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of IEEE International Conference on Data Engineering, pp. 3–14 (1995)
Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of International Conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)
Google Scholar
Baker, C.L., Saxe, R., Tenenbaum, J.B.: Action understanding as inverse learning. J. Cogn. 113(3), 329–349 (2009)
Article Google Scholar
Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order Markov models. J. Artif. Intell. Res. 22, 385–421 (2004)
MathSciNet MATH Google Scholar
Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: Proceedings of IEEE International Conference on Computer Vision, pp. 778–785 (2011)
Google Scholar
Brendel, W., Fern, A., Todorovic, S.: Probabilistic event logic for interval-based event recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3329–3336 (2011)
Google Scholar
Brown, P.F., Desouza, P.V., et al.: Class-based n-gram models of natural language. J. Comput. Linguist. 18(4), 467–479 (1992)
Google Scholar
Cao, Y., Barrett, D., et al.: Recognizing human activities from partially observed videos. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2013)
Google Scholar
Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3273–3280 (2011)
Google Scholar
Collins, R., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: IEEE International Workshop Performance Evaluation of Tracking and Surveillance (2005)
Google Scholar
Davis, J.W., Tyagi, A.: Minimal-latency human action recognition using reliable-inference. J. Image Vision Comput. 24(5), 455–472 (2006)
Article Google Scholar
Desobry, F., Davy, M., et al.: An online kernel change detection algorithm. IEEE Trans. Signal Process. 53(8), 2961–2974 (2005)
Article MathSciNet Google Scholar
Dong, G.: Sequence Data Mining. Springer, New York (2009)
Google Scholar
Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. J. Data Knowl. Eng. 53(3), 225–241 (2005)
Article Google Scholar
Fan, Q., Bobbitt, R., et al.: Recognition of repetitive sequential human activity. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 943–950 (2009)
Google Scholar
Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3201–3208 (2011)
Google Scholar
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Article Google Scholar
Haider, P., Brefeld, U., Scheffer, T.: Supervised clustering of streaming data for email batch detection. In: Proceedings of ACM International Conference on Machine learning, pp. 345–352 (2007)
Google Scholar
Hamid, R., Maddi, S., et al.: A novel sequence representation for unsupervised analysis of human activities. J. Artif. Intell. 173(14), 1221–1244 (2009)
Article MathSciNet Google Scholar
Han, D., Bo, L., et al.: Selection and context for action recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1933–1940 (2009)
Google Scholar
Hoai, M., De la Torre, F.: Max-margin early event detectors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2863–2870 (2012)
Google Scholar
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Proceedings of European Conference on Computer Vision, pp. 494–507 (2010)
Google Scholar
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)
Article Google Scholar
Jiang, Y.G., Li, Z., Chang, S.F.: Modeling scene and object contexts for human action retrieval with few examples. IEEE Trans. Circuits Syst. Video Technol. 21(5), 674–681 (2011)
Article Google Scholar
Kim, K.-J.: Financial time series forecasting using support vector machines. J. Neurocomput. 55(1), 307–319 (2003)
Article Google Scholar
Kitani, K.M., Ziebart, B.D., et al.: Activity forecasting. In: Proceedings of European Conference on Computer Vision, pp. 201–214 (2012)
Google Scholar
Kollar, T., Roy, N.: Utilizing object-object and object-scene context when planning to find things. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2168–2173 (2009)
Google Scholar
Kwak, S., Han, B., Han, J.H.: Scenario-based video event recognition by constraint flow. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3345–3352 (2011)
Google Scholar
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)
Article MathSciNet Google Scholar
Levine, S., Popovic, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. In: Proceedings of Neural Information Processing Systems, vol. 24, pp. 1–9 (2011)
Google Scholar
Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1644–1657 (2014)
Article Google Scholar
Li, K., Hu, J., Fu, Y.: Modeling complex temporal composition of actionlets for activity prediction. In: Proceedings of European Conference on Computer Vision, pp. 286–299 (2012)
Google Scholar
Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM J. Comput. Surv. 43(1), 3 (2010)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2929–2936 (2009)
Google Scholar
Munoz, D., Bagnell, J., Hebert, M.: Stacked hierarchical labeling. In: Proceedings of European Conference on Computer Vision, pp. 57–70 (2010)
Google Scholar
Munoz, D., Bagnell, J.A., Hebert, M.: Co-inference for multi-modal scene analysis. In: Proceedings of European Conference on Computer Vision, pp. 668–681 (2012)
Google Scholar
Nasraoui, O., Soliman, M., et al.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. Knowl. Data Eng. 20(2), 202–215 (2008)
Article Google Scholar
Neill, D., Moore, A., Cooper, G.: A Bayesian spatial scan statistic. In: Proceedings of Neural Information Processing Systems, vol. 18, pp. 1003–1010 (2006)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Article Google Scholar
Niebles, J., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Proceedings of European Conference on Computer Vision, pp. 392–405 (2010)
Google Scholar
Pei, M., Jia, Y., Zhu, S.-C.: Parsing video events with goal inference and intent prediction. In: Proceedings of IEEE International Conference on Computer Vision, pp. 487–494 (2011)
Google Scholar
Roggen, D., Calatroni, A., et al.: Collecting complex activity data sets in highly rich networked sensor environments. In: Proceedings of International Conference on Networked Sensing Systems, pp. 233–240 (2010)
Google Scholar
Rohrbach, M., Amin, S., et al.: A database for fine grained activity detection of cooking activities. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1194–1201 (2012)
Google Scholar
Ron, D., Singer, Y., Tishby, N.: The power of amnesia: learning probabilistic automata with variable memory length. J. Mach. Learn. 25(2), 117–149 (1996)
Article MATH Google Scholar
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1036–1043 (2011)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1709–1718 (2006)
Google Scholar
Si, Z., Pei, M., et al.: Unsupervised learning of event and-or grammar and semantics from video. In: Proceedings of IEEE International Conference on Computer Vision, pp. 41–48 (2011)
Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
Srivastava, J., Cooley, R., et al.: Web usage mining: discovery and applications of usage patterns from web data. ACM SIGKDD Explorations Newsl. 1(2), 12–23 (2000)
Article Google Scholar
Turaga, P.K., Veeraraghavan, A., Chellappa, R.: From videos to verbs: mining videos for activities using a cascade of dynamical systems. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Google Scholar
Wang, K., Xu, Y., Yu, J.X.: Scalable sequential pattern mining for biological sequences. In: Proceedings of ACM International Conference on Information and Knowledge Management, pp. 178–187 (2004)
Google Scholar
Wang, H., Klaser, A., et al.: Action recognition by dense trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3169–3176 (2011)
Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297. IEEE (2012)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2010)
Google Scholar
Zhao, Q., Bhowmick, S.S.: Sequential pattern mining: a survey. Technical Report CAIS Nanyang Technological University, Singapore, pp. 1–26 (2003)
Google Scholar
Ziebart, B.D., Maas, A., et al.: Maximum entropy inverse reinforcement learning. In: Proceedings of AAAI, pp. 1433–1438 (2008)
Google Scholar
Ziebart, B.D., Ratliff, N., et al.: Planning-based prediction for pedestrians. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 3931–3936 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Northeastern University, 360 Huntington Avenue, Boston, MA, 02115, USA
Kang Li
Department of Electrical and Computer Engineering and College of Computer and Information Science (Affiliated), Northeastern University, 360 Huntington Avenue, Boston, MA, 02115, USA
Yun Fu

Authors

Kang Li
View author publications
You can also search for this author in PubMed Google Scholar
Yun Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kang Li .

Editor information

Editors and Affiliations

Northeastern University, Boston, Massachusetts, USA
Yun Fu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, K., Fu, Y. (2016). Actionlets and Activity Prediction. In: Fu, Y. (eds) Human Activity Recognition and Prediction. Springer, Cham. https://doi.org/10.1007/978-3-319-27004-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-27004-3_7
Published: 21 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27002-9
Online ISBN: 978-3-319-27004-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics