Sequential Max-Margin Event Detectors
Abstract
Many applications in computer vision (e.g., games, human computer interaction) require a reliable and early detector of visual events. Existing event detection methods rely on one-versus-all or multi-class classifiers that do not scale well to online detection of large number of events. This paper proposes Sequential Max-Margin Event Detectors (SMMED) to efficiently detect an event in the presence of a large number of event classes. SMMED sequentially discards classes until only one class is identified as the detected class. This approach has two main benefits w.r.t. standard approaches: (1) It provides an efficient solution for early detection of events in the presence of large number of classes, and (2) it is computationally efficient because only a subset of likely classes are evaluated. The benefits of SMMED in comparison with existing approaches is illustrated in three databases using different modalities: MSRDaliy Activity (3D depth videos), UCF101 (RGB videos) and the CMU-Multi-Modal Action Detection (MAD) database (depth, RGB and skeleton). The CMU-MAD was recorded to target the problem of event detection (not classification), and the data and labels are available at http://humansensing.cs.cmu.edu/mad/ .
Keywords
Event Detection Activity Recognition Time Series Analysis Multi-Modal Action DetectionReferences
- 1.Aggarwal, J., Ryoo, M.: Human activity analysis: A review. ACM Computing Surveys (CSUR) 43(3) (2011)Google Scholar
- 2.Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi-class tasks. In: NIPS (2010)Google Scholar
- 3.Brand, M., Kettnaker, V.: Discovery and segmentation of activities in video. PAMI 22(8), 844–851 (2000)CrossRefGoogle Scholar
- 4.Chapelle, O.: Training a support vector machine in the primal. Neural Computation 19(5), 1155–1178 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
- 5.Crammer, K., Singer, Y.: On the algorithmic implementation of multi-class svms. JMLR, 265–292 (2001)Google Scholar
- 6.Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. PAMI 33(11), 2188–2202 (2011)CrossRefGoogle Scholar
- 7.Hoai, M., Lan, Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: CVPR (2011)Google Scholar
- 8.Hoai, M., De la Torre, F.: Max-margin early event detectors. In: CVPR (2012)Google Scholar
- 9.Hongeng, S., Nevatia, R., Bremond, F.: Video-based event recognition: activity representation and probabilistic recognition methods. CVIU 96(2), 129–162 (2004)Google Scholar
- 10.Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV (2007)Google Scholar
- 11.Laptev, I., Marsza, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
- 12.Mitra, S., Acharya, T.: Gesture recognition: A survey. TSMC-C 37(3), 311–324 (2007)Google Scholar
- 13.Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)CrossRefGoogle Scholar
- 14.Niu, W., Long, J., Han, D., Wang, Y.F.: Human activity detection and recognition for video surveillance. In: ICME (2004)Google Scholar
- 15.Oh, S., Rehg, J., Balch, T., Dellaert, F.: Learning and inferring motion patterns using parametric segmental switching linear dynamic systems. IJCV 77, 103–124 (2008)CrossRefGoogle Scholar
- 16.Simon, T., Nguyen, M., De La Torre, F., Cohn, J.: Action unit detection with segment-based SVMs. In: CVPR (2010)Google Scholar
- 17.Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: ICCV (2005)Google Scholar
- 18.Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human action classes from videos in the wild. In: CRCV-TR-12-01 (2012)Google Scholar
- 19.Swears, E., Hoogs, A.: Learning and recognizing complex multi-agent activities with applications to american football plays. In: IEEE Workshop on the Applications of Computer Vision (2012)Google Scholar
- 20.Tapia, E., Intille, S., Haskell, W., Larson, K.: Real-time recognition of physical activities and their intensities using wireless accelerometers and a heart rate monitor. In: IEEE Int. Symp. Wearable Computers (2007)Google Scholar
- 21.Tsochantaridis, I., Joachims, T., Hofmann, T., Atun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2005)zbMATHGoogle Scholar
- 22.Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)Google Scholar
- 23.Xia, L., Aggarwal, J.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: CVPR (2013)Google Scholar