Scalable Video Genre Classification and Event Detection

Muneesawang, Paisarn; Zhang, Ning; Guan, Ling

doi:10.1007/978-3-319-11782-9_9

Paisarn Muneesawang⁵,
Ning Zhang⁶ &
Ling Guan⁷

Part of the book series: Multimedia Systems and Applications ((MMSA))

627 Accesses
1 Citations

Abstract

This chapter focuses on a systematic and generic approach which is experimented on scalable video genre classification and event detection. The system aims at the event detection scenario of an input video with an orderly sequential process. Initially, domain-knowledge independent local descriptors are extracted homogeneously from the input video sequence. Then the video representation is created by adopting a Bag-of-word (BoW) model. The video’s genre is firstly identified by applying the k-nearest neighbor (k-NN) classifiers on the initially obtained video representation. Various dissimilarity measures are assessed and evaluated analytically. Then, at the high-level event detection, a hidden conditional random field (HCRF) structured prediction model is utilized for interesting event detection. The input of this event detection relies on middle-level view agents in characterizing each frame of video sequence into one of four view groups, namely closed-up-view, mid-view, long-view and outer-field-view. Unsupervised probabilistic latent semantic analysis (PLSA) based approach is employed at the histogram-based video representation to achieve these middle-level view groups. The framework demonstrates the efficiency and generality in processing voluminous video collection and achieves various tasks in video analysis. The affectiveness of the framework is justified by extensive experimentation. Results are compared with benchmarks and state of the art algorithms. Limited human expertise and effort is involved in both domain-knowledge independent video representation and annotation free unsupervised view labeling. As a result, such a systematic and scalable approach can be widely applied in processing massive videos generically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

J. Sivic, A. Zisserman.: Video Google: Efficient visual search of videos. Toward Category-Level Object Recognition, 127–144, (2006)
Google Scholar
J. Sivic, A. Zisserman.: Video data mining using configurations of viewpoint invariant regions. Proc. IEEE CVPR, 479–488 (2004)
Google Scholar
T. Quack, V. Ferrari, L. Van Gool.: Video mining with frequent itemset configurations. Image and Video Retrieval, 360–369 (2006)
Google Scholar
J. Sivic, A. Zisserman.: Efficient visual search for objects in videos. Proceedings of the IEEE, vol. 96, no. 4, 548–566 (2008)
Google Scholar
J. Sivic, F. Schaffalitzky, A. Zisserman.: Object level grouping for video shots. Proc. Computer Vision-ECCV 2004, 85–98, (2004)
Google Scholar
Y. Jiang, C. Ngo, and J. Yang.: Towards optimal bag-of-features for object categorization and semantic video retrieval. Proc. ACM CIVR, 501–510 (2007)
Google Scholar
J. Sivic, A. Zisserman.: Efficient visual search of videos cast as text retrieval. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, 591–606 (2009)
Google Scholar
A. Basharat, Y. Zhai, and M. Shah.: Content based video matching using spatiotemporal volumes. Computer Vision and Image Understanding, vol. 110, no. 3, 360–377 (2008)
Google Scholar
J. Law-To, O. Buisson, V. Gouet-Brunet, N. Boujemaa.: Robust voting algorithm based on labels of behavior for video copy detection. Proc. ACM Multimedia, 835–844 (2006)
Google Scholar
J. Sivic, M. Everingham, A. Zisserman.: Person spotting: video shot retrieval for face sets. Image and Video Retrieval, 592–592 (2005)
Google Scholar
X. Zhou, X. Zhuang, S. Yan, S. Chang, M. Hasegawa-Johnson, T. Huang.: Sift-bag kernel for video event analysis. Proc. ACM Multimedia, 229–238 (2008)
Google Scholar
D. Xu, S. Chang.: Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, 1985–1997 (2008)
Google Scholar
P. Xu, L. Xie, S. Chang, A. Divakaran, A. Vetro, H. Sun.: Algorithms and system for segmentation and structure analysis in soccer video. Proc. IEEE ICME, 928–931 (2001)
Google Scholar
A. Ekin, A. Tekalp.: Framework for tracking and analysis of soccer video. Proc. SPIE VCIP, vol. 4671, 763–774 (2002)
Google Scholar
L. Xu, Y. Li.: Video classification using spatial-temporal features and PCA. Proc. IEEE ICME. vol. 3, 485–488 (2003)
Google Scholar
S. Nepal, U. Srinivasan, G. Reynolds.: Automatic detection of “Goal” segments in basketball videos. Proc. ACM MM, 261–269 (2001)
Google Scholar
G. Zhu, C. Xu, Q. Huang, Y. Rui, S. Jiang, W. Gao, H. Yao.: Event tactic analysis based on broadcast sports video. IEEE Transactions on Multimedia. vol. 11, no. 1, 49–67 (2009)
Google Scholar
S. Fischer, R. Lienhart, W. Effelsberg.: Automatic recognition of film genres. Proc. ACM MM. vol. 95, 295–304 (1995)
Google Scholar
D. Brezeale, D. Cook.: Automatic video classification: A survey of the literature. IEEE Trans. on Systems, Man, Cybernetics, Part C: Applications and Reviews. vol. 38, no. 3, 416–430 (2008)
Google Scholar
B. Truong, C. Dorai, S. Venkatesh.: Automatic genre identification for content-based video categorization. Proc. IEEE ICPR, vol. 15, 230–233 (2000)
Google Scholar
S. Takagi, S. Hattori, K. Yokoyama, A. Kodate, H. Tominaga.: Sports video categorizing method using camera motion parameters. Proc. IEEE ICME, vol. 2, 461–464 (2003)
Google Scholar
E. Jaser, J. Kittler, W. Christmas.: Hierarchical decision making scheme for sports video categorisation with temporal post-processing. Proc. IEEE CVPR, vol. 2, 908–913 (2004)
Google Scholar
J. Wang, C. Xu, E. Chng.: Automatic sports video genre classification using pseudo-2d-hmm. Proc. ICPR, 778–781 (2006)
Google Scholar
X. Yuan, W. Lai, T. Mei, X. Hua, X. Wu, S. Li.: Automatic video genre categorization using hierarchical svm. Proc. IEEE ICIP, 2905–2908 (2006)
Google Scholar
R. Glasberg, S. Schmiedeke, M. Mocigemba, T. Sikora.: New Real-Time Approaches for Video-Genre-Classification Using High-Level Descriptors and a Set of Classifiers. Proc. IEEE ICSC, 120–127 (2008)
Google Scholar
M. Montagnuolo, A. Messina.: Parallel neural networks for multimodal video genre classification. Journal of Multimedia Tools and Applications, vol. 41, no. 1, 125–159 (2009)
Google Scholar
A. Ekin, A. M. Teklap, R. Mehrotra.: Automatic soccer video analysis and summarization. IEEE Trans. on Image Processing, vol. 12, no. 7, 796–807 (2003)
Google Scholar
Y. Jiang, J. Yang, C. Ngo, A. Hauptmann.: Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Trans. on Multimedia. vol. 12, no. 1, 42–53 (2010)
Google Scholar
D. Lowe.: Distinctive image features from scale-invariant keypoints. Int. J. of computer vision, vol. 60, no. 2, 91–110 (2004)
Google Scholar
J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman.: Object retrieval with large vocabularies and fast spatial matching. Proc. IEEE CVPR, vol. 3613, 1575–1589 (2007)
Google Scholar
J. Yang, Y. Jiang, A. Hauptmann, C. Ngo.: Evaluating bag-of-visual-words representations in scene classification. Proc. ACM MIR, 197–206 (2007)
Google Scholar
S. Lazebnik, C. Schmid, J. Ponce.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Proc. IEEE CVPR, vol. 2, 2169–2178 (2006)
Google Scholar
J. Zhang, M. Marszalek, S. Lazebnik, C. Schmid.: Local features and kernels for classification of texture and object categories: A comprehensive study. Int. J. of Computer Vision. vol. 73, no. 2, 213–238 (2007)
Google Scholar
J. Sivic, A. Zisserman.: Video Google: A text retrieval approach to object matching in videos. Proc. ICCV. vol. 2, 1470–1477 (2003)
Google Scholar
L. Li, N. Zhang, L. Duan, Q. Huang, J. Du, L. Guan.: Automatic sports genre categorization and view-type classification over large-scale dataset. Proc. ACM MM, 653–656 (2009)
Google Scholar
G. Lavee, E. Rivlin, M. Rudzsky.: Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video. IEEE Trans. on Systems, Man, Cybernetics, Part C: Applications and Reviews, vol. 39, no. 5, 489–504 (2009)
Google Scholar
D. Sadlier, N. O’Connor.: Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans. on Circuits and Systems for Video Technology. vol. 15, no. 10, 1225–1233 (2005)
Google Scholar
M. Xu, L. Duan, C. Xu, Q. Tian.: A fusion scheme of visual and auditory modalities for event detection in sports video. Proc. IEEE ICASSP, vol. 3, 189–192 (2003)
Google Scholar
Q. Ye, Q. Huang, W. Gao, S. Jiang.: Exciting event detection in broadcast soccer video with mid-level description and incremental learning. Proc. ACM MM, 455–458 (2005)
Google Scholar
L. Li, Y. Chen, W. Hu, W. Li, X. Zhang.: Recognition of Semantic Basketball Events Based on Optical Flow Patterns. Proc. ISVC, 480–488 (2009)
Google Scholar
N. Babaguchi, Y. Kawai, T. Kitahashi.: Event based indexing of broadcasted sports video by intermodal collaboration. IEEE Trans. on Multimedia. vol. 4, no. 1, 68–75 (2002)
Google Scholar
D. Zhang, S. Chang.: Event detection in baseball video using superimposed caption recognition. Proc. ACM MM, 315–318 (2002)
Google Scholar
L. Duan, M. Xu, T. Chua, Q. Tian, C. Xu.: A mid-level representation framework for semantic sports video analysis. Proc. ACM MM, 33–44 (2003)
Google Scholar
M. Tien, Y. Wang, C. Chou, K. Hsieh, W. Chu, J. Wu.: Event detection in tennis matches based on video data mining. Proc. IEEE ICME, 1477–1480 (2008)
Google Scholar
Y. Zhang, C. Xu, Y. Rui, J. Wang, H. Lu.: Semantic event extraction from basketball games using multi-modal analysis. Proc. IEEE ICME, 2190–2193 (2007)
Google Scholar
X. Tong, H. Lu, Q. Liu.: A three-layer event detection framework and its application in soccer video. Proc. IEEE ICME, 1551–1554 (2004)
Google Scholar
T. Mei and X. Hua.: Structure and event mining in sports video with efficient mosaic. Multimedia Tools and Applications, vol. 40, no. 1, 89–110 (2008)
Google Scholar
T. Wang, J. Li, Q. Diao, W. Hu, Y. Zhang, C. Dulong.: Semantic event detection using conditional random fields. Proc. IEEE CVPRW, 109–114 (2006)
Google Scholar
C. Xu, Y. Zhang, G. Zhu, Y. Rui, H. Lu, Q. Huang.: Using webcast text for semantic event detection in broadcast sports video. IEEE Trans. on Multimedia, vol. 10, no. 7, 1342–1355 (2008)
Google Scholar
P. Wang, Z. Liu, S. Yang.: Investigation on unsupervised clustering algorithms for video shot categorization. J. of Soft Computing-A Fusion of Foundations, Methodologies and Applications, vol. 11, no. 4, 355–360 (2007)
Google Scholar
L. Zhong, C. Li, H. Li, Z. Xiong.: Unsupervised Clustering Algorithm for Video Shots Using Spectral Division. Proc. ISVC, 782–792 (2008)
Google Scholar
L. Duan, M. Xu, Q. Tian.: Semantic shot classification in sports video. Proc. SPIE, 300–313 (2003)
Google Scholar
X. Tong, Q. Liu, H. Lu, H. Jin.: Shot classification in sports video. Proc. ICSP. vol. 2, 1364–1367 (2004)
Google Scholar
J. Wang, E. Chng, C. Xu.: Soccer replay detection using scene transition structure analysis. Proc. IEEE ICASSP, 433–437 (2005)
Google Scholar
M. Kolekar and K. Palaniappan.: Semantic concept mining based on hierarchical event detection for soccer video indexing. J. of Multimedia, vol. 4, no. 5, 298–312 (2009)
Google Scholar
R. Benmokhtar, B. Huet, S. Berrani.: Low-level feature fusion models for soccer scene classification. Proc. IEEE ICME, 1329–1332 (2008)
Google Scholar
T. Hofmann.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. NIPS, vol. 12, 914–920 (2000)
Google Scholar
T. Hofmann.: Probabilistic latent semantic indexing. Proc. ACM SIGIR, 50–57 (1999)
Google Scholar
C. Chang and C. Lin.: LIBSVM: a library for support vector machines. (2001)
Google Scholar
G. Miao, G. Zhu, S. Jiang, Q. Huang, C. Xu, W. Gao.: A Real-Time Score Detection and Recognition Approach for Broadcast Basketball Video. Proc. IEEE ICME, 1691–1694 (2007)
Google Scholar
J. Dai, L. Duan, X. Tong, C. Xu, Q. Tian, H. Lu, J. Jin.: Replay scene classification in soccer video using web broadcast text. Proc. IEEE ICME, 1098–1101 (2005)
Google Scholar
C. Xu, J. Wang, K. Wan, Y. Li, L. Duan.: Live sports event detection based on broadcast video and web-casting text. Proc. ACM MM, 230–237 (2006)
Google Scholar
A. Quattoni, S. Wang, L. Morency, M. Collins, T. Darrell, M. Csail.: Hidden-state conditional random fields. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, 1848–1852 (2007)
Google Scholar
S. Wang, A. Quattoni, L. Morency, D. Demirdjian, T. Darrell.: Hidden conditional random fields for gesture recognition. Proc. IEEE CVPR, 1521–1527 (2006)
Google Scholar
A. Gunawardana, M. Mahajan, A. Acero, J. Platt.: Hidden conditional random fields for phone classification. Proc. Interspeech, 1117–1120 (2005)
Google Scholar
Y. Tan, D. Saur, S. Kulkarni, P. Ramadge.: Rapid estimation of camera motion from compressed video with application to video annotation. IEEE Trans. on circuits and systems for video technology. vol. 10, no. 1, 133–146 (2000)
Google Scholar
L. Morency, A. Quattoni, C. Christoudias, S. Wang.: Hidden-state Conditional Random Field Library. (2008)
Google Scholar
F. Sha and F. Pereira.: Shallow parsing with conditional random fields. in Proc. of HLT-NAACL, 213–220 (2003)
Google Scholar
J. Lafferty, A. McCallum, F. Pereira.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. in Proc. ICML, 282–289 (2001)
Google Scholar
Y. Rubner, C. Tomasi, L. Guibas.: The earth mover’s distance as a metric for image retrieval. Inter. J. of Computer Vision, vol. 40, no. 2, 99–121 (2000)
Google Scholar
R. Duda, P. Hart, D. Stork.: Pattern classification. Wiley-Interscience. (2001)
Google Scholar
A. Jain, M. Murty, P. Flynn.: Data clustering: a review. ACM computing surveys, vol. 31, no. 3, 264–323 (1999)
Google Scholar
H. Bay, T. Tuytelaars, L. Van Gool.: Surf: Speeded up robust features. Lecture notes in computer science, vol. 3951, 404–411 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Naresuan University, Muang, Phitsanulok, Thailand
Paisarn Muneesawang
Department of Electrical and Computing Engineering, Ryerson University, Toronto, ON, Canada
Ning Zhang
Department of Electrical and Computer Engineering, Ryerson University, Toronto, ON, Canada
Ling Guan

Authors

Paisarn Muneesawang
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ling Guan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Muneesawang, P., Zhang, N., Guan, L. (2014). Scalable Video Genre Classification and Event Detection. In: Multimedia Database Retrieval. Multimedia Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-11782-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-11782-9_9
Published: 26 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11781-2
Online ISBN: 978-3-319-11782-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics