Abstract
The increasing ubiquitousness of multimedia information in today's world has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. One important problem that will significantly enhance semantic-level video analysis is activity understanding, which aims at accurately describing video contents using key semantic elements, especially activities. We notice that in case a time-critical decision is needed, there is a potential to utilize the temporal structure of videos for early prediction of ongoing human activity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
\(\bigcirc \!\!\!\!c\) {Kang Li and Yun Fu | IEEE}, {2014}. This is a minor revision of the work published in {Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 1644–1657. vol.36, no.8.}, http://dx.doi.org/10.1109/TPAMI.2013.2297321.
- 2.
Concepts “action” and “event” are always interchangeably used in computer vision and other AI fields. In our discussion, we prefer to use “action” when referring human activity, and use “event” to refer more general things, such as “stock rising.”
- 3.
Rand index is a measure of the similarity between data clustering and ground truth. It has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.
- 4.
In this chapter, we use eventlet to refer to observation of actionlet and objects co-occurrence. An eventlet \(e = \left \langle \{a^{{\ast}}\}\bigcup \{o_{1},o_{2},\ldots,o_{m}\}\right \rangle\), where a ∗ represents a particular actionlet, and o i represents a particular object interacting with a ∗ within its segment. In our case, n will always be 0, 1, or 2 with the meaning of none, one, or two co-occurrent interacting objects (we assume one person at most can operate two different objects at the same time with two hands).
- 5.
Notice that there are many situations that some periodical actions will be segmented to consecutive duplicate eventlets, e.g. action “cut.”
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of ACM International Conference on Machine Learning, p. 1 (2004)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of IEEE International Conference on Data Engineering, pp. 3–14 (1995)
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of International Conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)
Baker, C.L., Saxe, R., Tenenbaum, J.B.: Action understanding as inverse learning. J. Cogn. 113(3), 329–349 (2009)
Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order Markov models. J. Artif. Intell. Res. 22, 385–421 (2004)
Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: Proceedings of IEEE International Conference on Computer Vision, pp. 778–785 (2011)
Brendel, W., Fern, A., Todorovic, S.: Probabilistic event logic for interval-based event recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3329–3336 (2011)
Brown, P.F., Desouza, P.V., et al.: Class-based n-gram models of natural language. J. Comput. Linguist. 18(4), 467–479 (1992)
Cao, Y., Barrett, D., et al.: Recognizing human activities from partially observed videos. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2013)
Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3273–3280 (2011)
Collins, R., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: IEEE International Workshop Performance Evaluation of Tracking and Surveillance (2005)
Davis, J.W., Tyagi, A.: Minimal-latency human action recognition using reliable-inference. J. Image Vision Comput. 24(5), 455–472 (2006)
Desobry, F., Davy, M., et al.: An online kernel change detection algorithm. IEEE Trans. Signal Process. 53(8), 2961–2974 (2005)
Dong, G.: Sequence Data Mining. Springer, New York (2009)
Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. J. Data Knowl. Eng. 53(3), 225–241 (2005)
Fan, Q., Bobbitt, R., et al.: Recognition of repetitive sequential human activity. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 943–950 (2009)
Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3201–3208 (2011)
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Haider, P., Brefeld, U., Scheffer, T.: Supervised clustering of streaming data for email batch detection. In: Proceedings of ACM International Conference on Machine learning, pp. 345–352 (2007)
Hamid, R., Maddi, S., et al.: A novel sequence representation for unsupervised analysis of human activities. J. Artif. Intell. 173(14), 1221–1244 (2009)
Han, D., Bo, L., et al.: Selection and context for action recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1933–1940 (2009)
Hoai, M., De la Torre, F.: Max-margin early event detectors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2863–2870 (2012)
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Proceedings of European Conference on Computer Vision, pp. 494–507 (2010)
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)
Jiang, Y.G., Li, Z., Chang, S.F.: Modeling scene and object contexts for human action retrieval with few examples. IEEE Trans. Circuits Syst. Video Technol. 21(5), 674–681 (2011)
Kim, K.-J.: Financial time series forecasting using support vector machines. J. Neurocomput. 55(1), 307–319 (2003)
Kitani, K.M., Ziebart, B.D., et al.: Activity forecasting. In: Proceedings of European Conference on Computer Vision, pp. 201–214 (2012)
Kollar, T., Roy, N.: Utilizing object-object and object-scene context when planning to find things. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2168–2173 (2009)
Kwak, S., Han, B., Han, J.H.: Scenario-based video event recognition by constraint flow. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3345–3352 (2011)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)
Levine, S., Popovic, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. In: Proceedings of Neural Information Processing Systems, vol. 24, pp. 1–9 (2011)
Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1644–1657 (2014)
Li, K., Hu, J., Fu, Y.: Modeling complex temporal composition of actionlets for activity prediction. In: Proceedings of European Conference on Computer Vision, pp. 286–299 (2012)
Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM J. Comput. Surv. 43(1), 3 (2010)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2929–2936 (2009)
Munoz, D., Bagnell, J., Hebert, M.: Stacked hierarchical labeling. In: Proceedings of European Conference on Computer Vision, pp. 57–70 (2010)
Munoz, D., Bagnell, J.A., Hebert, M.: Co-inference for multi-modal scene analysis. In: Proceedings of European Conference on Computer Vision, pp. 668–681 (2012)
Nasraoui, O., Soliman, M., et al.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. Knowl. Data Eng. 20(2), 202–215 (2008)
Neill, D., Moore, A., Cooper, G.: A Bayesian spatial scan statistic. In: Proceedings of Neural Information Processing Systems, vol. 18, pp. 1003–1010 (2006)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Niebles, J., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Proceedings of European Conference on Computer Vision, pp. 392–405 (2010)
Pei, M., Jia, Y., Zhu, S.-C.: Parsing video events with goal inference and intent prediction. In: Proceedings of IEEE International Conference on Computer Vision, pp. 487–494 (2011)
Roggen, D., Calatroni, A., et al.: Collecting complex activity data sets in highly rich networked sensor environments. In: Proceedings of International Conference on Networked Sensing Systems, pp. 233–240 (2010)
Rohrbach, M., Amin, S., et al.: A database for fine grained activity detection of cooking activities. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1194–1201 (2012)
Ron, D., Singer, Y., Tishby, N.: The power of amnesia: learning probabilistic automata with variable memory length. J. Mach. Learn. 25(2), 117–149 (1996)
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1036–1043 (2011)
Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1709–1718 (2006)
Si, Z., Pei, M., et al.: Unsupervised learning of event and-or grammar and semantics from video. In: Proceedings of IEEE International Conference on Computer Vision, pp. 41–48 (2011)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Srivastava, J., Cooley, R., et al.: Web usage mining: discovery and applications of usage patterns from web data. ACM SIGKDD Explorations Newsl. 1(2), 12–23 (2000)
Turaga, P.K., Veeraraghavan, A., Chellappa, R.: From videos to verbs: mining videos for activities using a cascade of dynamical systems. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Wang, K., Xu, Y., Yu, J.X.: Scalable sequential pattern mining for biological sequences. In: Proceedings of ACM International Conference on Information and Knowledge Management, pp. 178–187 (2004)
Wang, H., Klaser, A., et al.: Action recognition by dense trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3169–3176 (2011)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297. IEEE (2012)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2010)
Zhao, Q., Bhowmick, S.S.: Sequential pattern mining: a survey. Technical Report CAIS Nanyang Technological University, Singapore, pp. 1–26 (2003)
Ziebart, B.D., Maas, A., et al.: Maximum entropy inverse reinforcement learning. In: Proceedings of AAAI, pp. 1433–1438 (2008)
Ziebart, B.D., Ratliff, N., et al.: Planning-based prediction for pedestrians. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 3931–3936 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Li, K., Fu, Y. (2016). Actionlets and Activity Prediction. In: Fu, Y. (eds) Human Activity Recognition and Prediction. Springer, Cham. https://doi.org/10.1007/978-3-319-27004-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-27004-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27002-9
Online ISBN: 978-3-319-27004-3
eBook Packages: EngineeringEngineering (R0)