Abstract
This chapter focuses on a systematic and generic approach which is experimented on scalable video genre classification and event detection. The system aims at the event detection scenario of an input video with an orderly sequential process. Initially, domain-knowledge independent local descriptors are extracted homogeneously from the input video sequence. Then the video representation is created by adopting a Bag-of-word (BoW) model. The video’s genre is firstly identified by applying the k-nearest neighbor (k-NN) classifiers on the initially obtained video representation. Various dissimilarity measures are assessed and evaluated analytically. Then, at the high-level event detection, a hidden conditional random field (HCRF) structured prediction model is utilized for interesting event detection. The input of this event detection relies on middle-level view agents in characterizing each frame of video sequence into one of four view groups, namely closed-up-view, mid-view, long-view and outer-field-view. Unsupervised probabilistic latent semantic analysis (PLSA) based approach is employed at the histogram-based video representation to achieve these middle-level view groups. The framework demonstrates the efficiency and generality in processing voluminous video collection and achieves various tasks in video analysis. The affectiveness of the framework is justified by extensive experimentation. Results are compared with benchmarks and state of the art algorithms. Limited human expertise and effort is involved in both domain-knowledge independent video representation and annotation free unsupervised view labeling. As a result, such a systematic and scalable approach can be widely applied in processing massive videos generically.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J. Sivic, A. Zisserman.: Video Google: Efficient visual search of videos. Toward Category-Level Object Recognition, 127–144, (2006)
J. Sivic, A. Zisserman.: Video data mining using configurations of viewpoint invariant regions. Proc. IEEE CVPR, 479–488 (2004)
T. Quack, V. Ferrari, L. Van Gool.: Video mining with frequent itemset configurations. Image and Video Retrieval, 360–369 (2006)
J. Sivic, A. Zisserman.: Efficient visual search for objects in videos. Proceedings of the IEEE, vol. 96, no. 4, 548–566 (2008)
J. Sivic, F. Schaffalitzky, A. Zisserman.: Object level grouping for video shots. Proc. Computer Vision-ECCV 2004, 85–98, (2004)
Y. Jiang, C. Ngo, and J. Yang.: Towards optimal bag-of-features for object categorization and semantic video retrieval. Proc. ACM CIVR, 501–510 (2007)
J. Sivic, A. Zisserman.: Efficient visual search of videos cast as text retrieval. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, 591–606 (2009)
A. Basharat, Y. Zhai, and M. Shah.: Content based video matching using spatiotemporal volumes. Computer Vision and Image Understanding, vol. 110, no. 3, 360–377 (2008)
J. Law-To, O. Buisson, V. Gouet-Brunet, N. Boujemaa.: Robust voting algorithm based on labels of behavior for video copy detection. Proc. ACM Multimedia, 835–844 (2006)
J. Sivic, M. Everingham, A. Zisserman.: Person spotting: video shot retrieval for face sets. Image and Video Retrieval, 592–592 (2005)
X. Zhou, X. Zhuang, S. Yan, S. Chang, M. Hasegawa-Johnson, T. Huang.: Sift-bag kernel for video event analysis. Proc. ACM Multimedia, 229–238 (2008)
D. Xu, S. Chang.: Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, 1985–1997 (2008)
P. Xu, L. Xie, S. Chang, A. Divakaran, A. Vetro, H. Sun.: Algorithms and system for segmentation and structure analysis in soccer video. Proc. IEEE ICME, 928–931 (2001)
A. Ekin, A. Tekalp.: Framework for tracking and analysis of soccer video. Proc. SPIE VCIP, vol. 4671, 763–774 (2002)
L. Xu, Y. Li.: Video classification using spatial-temporal features and PCA. Proc. IEEE ICME. vol. 3, 485–488 (2003)
S. Nepal, U. Srinivasan, G. Reynolds.: Automatic detection of “Goal” segments in basketball videos. Proc. ACM MM, 261–269 (2001)
G. Zhu, C. Xu, Q. Huang, Y. Rui, S. Jiang, W. Gao, H. Yao.: Event tactic analysis based on broadcast sports video. IEEE Transactions on Multimedia. vol. 11, no. 1, 49–67 (2009)
S. Fischer, R. Lienhart, W. Effelsberg.: Automatic recognition of film genres. Proc. ACM MM. vol. 95, 295–304 (1995)
D. Brezeale, D. Cook.: Automatic video classification: A survey of the literature. IEEE Trans. on Systems, Man, Cybernetics, Part C: Applications and Reviews. vol. 38, no. 3, 416–430 (2008)
B. Truong, C. Dorai, S. Venkatesh.: Automatic genre identification for content-based video categorization. Proc. IEEE ICPR, vol. 15, 230–233 (2000)
S. Takagi, S. Hattori, K. Yokoyama, A. Kodate, H. Tominaga.: Sports video categorizing method using camera motion parameters. Proc. IEEE ICME, vol. 2, 461–464 (2003)
E. Jaser, J. Kittler, W. Christmas.: Hierarchical decision making scheme for sports video categorisation with temporal post-processing. Proc. IEEE CVPR, vol. 2, 908–913 (2004)
J. Wang, C. Xu, E. Chng.: Automatic sports video genre classification using pseudo-2d-hmm. Proc. ICPR, 778–781 (2006)
X. Yuan, W. Lai, T. Mei, X. Hua, X. Wu, S. Li.: Automatic video genre categorization using hierarchical svm. Proc. IEEE ICIP, 2905–2908 (2006)
R. Glasberg, S. Schmiedeke, M. Mocigemba, T. Sikora.: New Real-Time Approaches for Video-Genre-Classification Using High-Level Descriptors and a Set of Classifiers. Proc. IEEE ICSC, 120–127 (2008)
M. Montagnuolo, A. Messina.: Parallel neural networks for multimodal video genre classification. Journal of Multimedia Tools and Applications, vol. 41, no. 1, 125–159 (2009)
A. Ekin, A. M. Teklap, R. Mehrotra.: Automatic soccer video analysis and summarization. IEEE Trans. on Image Processing, vol. 12, no. 7, 796–807 (2003)
Y. Jiang, J. Yang, C. Ngo, A. Hauptmann.: Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Trans. on Multimedia. vol. 12, no. 1, 42–53 (2010)
D. Lowe.: Distinctive image features from scale-invariant keypoints. Int. J. of computer vision, vol. 60, no. 2, 91–110 (2004)
J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman.: Object retrieval with large vocabularies and fast spatial matching. Proc. IEEE CVPR, vol. 3613, 1575–1589 (2007)
J. Yang, Y. Jiang, A. Hauptmann, C. Ngo.: Evaluating bag-of-visual-words representations in scene classification. Proc. ACM MIR, 197–206 (2007)
S. Lazebnik, C. Schmid, J. Ponce.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Proc. IEEE CVPR, vol. 2, 2169–2178 (2006)
J. Zhang, M. Marszalek, S. Lazebnik, C. Schmid.: Local features and kernels for classification of texture and object categories: A comprehensive study. Int. J. of Computer Vision. vol. 73, no. 2, 213–238 (2007)
J. Sivic, A. Zisserman.: Video Google: A text retrieval approach to object matching in videos. Proc. ICCV. vol. 2, 1470–1477 (2003)
L. Li, N. Zhang, L. Duan, Q. Huang, J. Du, L. Guan.: Automatic sports genre categorization and view-type classification over large-scale dataset. Proc. ACM MM, 653–656 (2009)
G. Lavee, E. Rivlin, M. Rudzsky.: Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video. IEEE Trans. on Systems, Man, Cybernetics, Part C: Applications and Reviews, vol. 39, no. 5, 489–504 (2009)
D. Sadlier, N. O’Connor.: Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans. on Circuits and Systems for Video Technology. vol. 15, no. 10, 1225–1233 (2005)
M. Xu, L. Duan, C. Xu, Q. Tian.: A fusion scheme of visual and auditory modalities for event detection in sports video. Proc. IEEE ICASSP, vol. 3, 189–192 (2003)
Q. Ye, Q. Huang, W. Gao, S. Jiang.: Exciting event detection in broadcast soccer video with mid-level description and incremental learning. Proc. ACM MM, 455–458 (2005)
L. Li, Y. Chen, W. Hu, W. Li, X. Zhang.: Recognition of Semantic Basketball Events Based on Optical Flow Patterns. Proc. ISVC, 480–488 (2009)
N. Babaguchi, Y. Kawai, T. Kitahashi.: Event based indexing of broadcasted sports video by intermodal collaboration. IEEE Trans. on Multimedia. vol. 4, no. 1, 68–75 (2002)
D. Zhang, S. Chang.: Event detection in baseball video using superimposed caption recognition. Proc. ACM MM, 315–318 (2002)
L. Duan, M. Xu, T. Chua, Q. Tian, C. Xu.: A mid-level representation framework for semantic sports video analysis. Proc. ACM MM, 33–44 (2003)
M. Tien, Y. Wang, C. Chou, K. Hsieh, W. Chu, J. Wu.: Event detection in tennis matches based on video data mining. Proc. IEEE ICME, 1477–1480 (2008)
Y. Zhang, C. Xu, Y. Rui, J. Wang, H. Lu.: Semantic event extraction from basketball games using multi-modal analysis. Proc. IEEE ICME, 2190–2193 (2007)
X. Tong, H. Lu, Q. Liu.: A three-layer event detection framework and its application in soccer video. Proc. IEEE ICME, 1551–1554 (2004)
T. Mei and X. Hua.: Structure and event mining in sports video with efficient mosaic. Multimedia Tools and Applications, vol. 40, no. 1, 89–110 (2008)
T. Wang, J. Li, Q. Diao, W. Hu, Y. Zhang, C. Dulong.: Semantic event detection using conditional random fields. Proc. IEEE CVPRW, 109–114 (2006)
C. Xu, Y. Zhang, G. Zhu, Y. Rui, H. Lu, Q. Huang.: Using webcast text for semantic event detection in broadcast sports video. IEEE Trans. on Multimedia, vol. 10, no. 7, 1342–1355 (2008)
P. Wang, Z. Liu, S. Yang.: Investigation on unsupervised clustering algorithms for video shot categorization. J. of Soft Computing-A Fusion of Foundations, Methodologies and Applications, vol. 11, no. 4, 355–360 (2007)
L. Zhong, C. Li, H. Li, Z. Xiong.: Unsupervised Clustering Algorithm for Video Shots Using Spectral Division. Proc. ISVC, 782–792 (2008)
L. Duan, M. Xu, Q. Tian.: Semantic shot classification in sports video. Proc. SPIE, 300–313 (2003)
X. Tong, Q. Liu, H. Lu, H. Jin.: Shot classification in sports video. Proc. ICSP. vol. 2, 1364–1367 (2004)
J. Wang, E. Chng, C. Xu.: Soccer replay detection using scene transition structure analysis. Proc. IEEE ICASSP, 433–437 (2005)
M. Kolekar and K. Palaniappan.: Semantic concept mining based on hierarchical event detection for soccer video indexing. J. of Multimedia, vol. 4, no. 5, 298–312 (2009)
R. Benmokhtar, B. Huet, S. Berrani.: Low-level feature fusion models for soccer scene classification. Proc. IEEE ICME, 1329–1332 (2008)
T. Hofmann.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. NIPS, vol. 12, 914–920 (2000)
T. Hofmann.: Probabilistic latent semantic indexing. Proc. ACM SIGIR, 50–57 (1999)
C. Chang and C. Lin.: LIBSVM: a library for support vector machines. (2001)
G. Miao, G. Zhu, S. Jiang, Q. Huang, C. Xu, W. Gao.: A Real-Time Score Detection and Recognition Approach for Broadcast Basketball Video. Proc. IEEE ICME, 1691–1694 (2007)
J. Dai, L. Duan, X. Tong, C. Xu, Q. Tian, H. Lu, J. Jin.: Replay scene classification in soccer video using web broadcast text. Proc. IEEE ICME, 1098–1101 (2005)
C. Xu, J. Wang, K. Wan, Y. Li, L. Duan.: Live sports event detection based on broadcast video and web-casting text. Proc. ACM MM, 230–237 (2006)
A. Quattoni, S. Wang, L. Morency, M. Collins, T. Darrell, M. Csail.: Hidden-state conditional random fields. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, 1848–1852 (2007)
S. Wang, A. Quattoni, L. Morency, D. Demirdjian, T. Darrell.: Hidden conditional random fields for gesture recognition. Proc. IEEE CVPR, 1521–1527 (2006)
A. Gunawardana, M. Mahajan, A. Acero, J. Platt.: Hidden conditional random fields for phone classification. Proc. Interspeech, 1117–1120 (2005)
Y. Tan, D. Saur, S. Kulkarni, P. Ramadge.: Rapid estimation of camera motion from compressed video with application to video annotation. IEEE Trans. on circuits and systems for video technology. vol. 10, no. 1, 133–146 (2000)
L. Morency, A. Quattoni, C. Christoudias, S. Wang.: Hidden-state Conditional Random Field Library. (2008)
F. Sha and F. Pereira.: Shallow parsing with conditional random fields. in Proc. of HLT-NAACL, 213–220 (2003)
J. Lafferty, A. McCallum, F. Pereira.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. in Proc. ICML, 282–289 (2001)
Y. Rubner, C. Tomasi, L. Guibas.: The earth mover’s distance as a metric for image retrieval. Inter. J. of Computer Vision, vol. 40, no. 2, 99–121 (2000)
R. Duda, P. Hart, D. Stork.: Pattern classification. Wiley-Interscience. (2001)
A. Jain, M. Murty, P. Flynn.: Data clustering: a review. ACM computing surveys, vol. 31, no. 3, 264–323 (1999)
H. Bay, T. Tuytelaars, L. Van Gool.: Surf: Speeded up robust features. Lecture notes in computer science, vol. 3951, 404–411 (2006)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Muneesawang, P., Zhang, N., Guan, L. (2014). Scalable Video Genre Classification and Event Detection. In: Multimedia Database Retrieval. Multimedia Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-11782-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-11782-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11781-2
Online ISBN: 978-3-319-11782-9
eBook Packages: Computer ScienceComputer Science (R0)