Skip to main content

Scalable Video Genre Classification and Event Detection

  • Chapter
  • First Online:
Multimedia Database Retrieval

Part of the book series: Multimedia Systems and Applications ((MMSA))

Abstract

This chapter focuses on a systematic and generic approach which is experimented on scalable video genre classification and event detection. The system aims at the event detection scenario of an input video with an orderly sequential process. Initially, domain-knowledge independent local descriptors are extracted homogeneously from the input video sequence. Then the video representation is created by adopting a Bag-of-word (BoW) model. The video’s genre is firstly identified by applying the k-nearest neighbor (k-NN) classifiers on the initially obtained video representation. Various dissimilarity measures are assessed and evaluated analytically. Then, at the high-level event detection, a hidden conditional random field (HCRF) structured prediction model is utilized for interesting event detection. The input of this event detection relies on middle-level view agents in characterizing each frame of video sequence into one of four view groups, namely closed-up-view, mid-view, long-view and outer-field-view. Unsupervised probabilistic latent semantic analysis (PLSA) based approach is employed at the histogram-based video representation to achieve these middle-level view groups. The framework demonstrates the efficiency and generality in processing voluminous video collection and achieves various tasks in video analysis. The affectiveness of the framework is justified by extensive experimentation. Results are compared with benchmarks and state of the art algorithms. Limited human expertise and effort is involved in both domain-knowledge independent video representation and annotation free unsupervised view labeling. As a result, such a systematic and scalable approach can be widely applied in processing massive videos generically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J. Sivic, A. Zisserman.: Video Google: Efficient visual search of videos. Toward Category-Level Object Recognition, 127–144, (2006)

    Google Scholar 

  2. J. Sivic, A. Zisserman.: Video data mining using configurations of viewpoint invariant regions. Proc. IEEE CVPR, 479–488 (2004)

    Google Scholar 

  3. T. Quack, V. Ferrari, L. Van Gool.: Video mining with frequent itemset configurations. Image and Video Retrieval, 360–369 (2006)

    Google Scholar 

  4. J. Sivic, A. Zisserman.: Efficient visual search for objects in videos. Proceedings of the IEEE, vol. 96, no. 4, 548–566 (2008)

    Google Scholar 

  5. J. Sivic, F. Schaffalitzky, A. Zisserman.: Object level grouping for video shots. Proc. Computer Vision-ECCV 2004, 85–98, (2004)

    Google Scholar 

  6. Y. Jiang, C. Ngo, and J. Yang.: Towards optimal bag-of-features for object categorization and semantic video retrieval. Proc. ACM CIVR, 501–510 (2007)

    Google Scholar 

  7. J. Sivic, A. Zisserman.: Efficient visual search of videos cast as text retrieval. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, 591–606 (2009)

    Google Scholar 

  8. A. Basharat, Y. Zhai, and M. Shah.: Content based video matching using spatiotemporal volumes. Computer Vision and Image Understanding, vol. 110, no. 3, 360–377 (2008)

    Google Scholar 

  9. J. Law-To, O. Buisson, V. Gouet-Brunet, N. Boujemaa.: Robust voting algorithm based on labels of behavior for video copy detection. Proc. ACM Multimedia, 835–844 (2006)

    Google Scholar 

  10. J. Sivic, M. Everingham, A. Zisserman.: Person spotting: video shot retrieval for face sets. Image and Video Retrieval, 592–592 (2005)

    Google Scholar 

  11. X. Zhou, X. Zhuang, S. Yan, S. Chang, M. Hasegawa-Johnson, T. Huang.: Sift-bag kernel for video event analysis. Proc. ACM Multimedia, 229–238 (2008)

    Google Scholar 

  12. D. Xu, S. Chang.: Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, 1985–1997 (2008)

    Google Scholar 

  13. P. Xu, L. Xie, S. Chang, A. Divakaran, A. Vetro, H. Sun.: Algorithms and system for segmentation and structure analysis in soccer video. Proc. IEEE ICME, 928–931 (2001)

    Google Scholar 

  14. A. Ekin, A. Tekalp.: Framework for tracking and analysis of soccer video. Proc. SPIE VCIP, vol. 4671, 763–774 (2002)

    Google Scholar 

  15. L. Xu, Y. Li.: Video classification using spatial-temporal features and PCA. Proc. IEEE ICME. vol. 3, 485–488 (2003)

    Google Scholar 

  16. S. Nepal, U. Srinivasan, G. Reynolds.: Automatic detection of “Goal” segments in basketball videos. Proc. ACM MM, 261–269 (2001)

    Google Scholar 

  17. G. Zhu, C. Xu, Q. Huang, Y. Rui, S. Jiang, W. Gao, H. Yao.: Event tactic analysis based on broadcast sports video. IEEE Transactions on Multimedia. vol. 11, no. 1, 49–67 (2009)

    Google Scholar 

  18. S. Fischer, R. Lienhart, W. Effelsberg.: Automatic recognition of film genres. Proc. ACM MM. vol. 95, 295–304 (1995)

    Google Scholar 

  19. D. Brezeale, D. Cook.: Automatic video classification: A survey of the literature. IEEE Trans. on Systems, Man, Cybernetics, Part C: Applications and Reviews. vol. 38, no. 3, 416–430 (2008)

    Google Scholar 

  20. B. Truong, C. Dorai, S. Venkatesh.: Automatic genre identification for content-based video categorization. Proc. IEEE ICPR, vol. 15, 230–233 (2000)

    Google Scholar 

  21. S. Takagi, S. Hattori, K. Yokoyama, A. Kodate, H. Tominaga.: Sports video categorizing method using camera motion parameters. Proc. IEEE ICME, vol. 2, 461–464 (2003)

    Google Scholar 

  22. E. Jaser, J. Kittler, W. Christmas.: Hierarchical decision making scheme for sports video categorisation with temporal post-processing. Proc. IEEE CVPR, vol. 2, 908–913 (2004)

    Google Scholar 

  23. J. Wang, C. Xu, E. Chng.: Automatic sports video genre classification using pseudo-2d-hmm. Proc. ICPR, 778–781 (2006)

    Google Scholar 

  24. X. Yuan, W. Lai, T. Mei, X. Hua, X. Wu, S. Li.: Automatic video genre categorization using hierarchical svm. Proc. IEEE ICIP, 2905–2908 (2006)

    Google Scholar 

  25. R. Glasberg, S. Schmiedeke, M. Mocigemba, T. Sikora.: New Real-Time Approaches for Video-Genre-Classification Using High-Level Descriptors and a Set of Classifiers. Proc. IEEE ICSC, 120–127 (2008)

    Google Scholar 

  26. M. Montagnuolo, A. Messina.: Parallel neural networks for multimodal video genre classification. Journal of Multimedia Tools and Applications, vol. 41, no. 1, 125–159 (2009)

    Google Scholar 

  27. A. Ekin, A. M. Teklap, R. Mehrotra.: Automatic soccer video analysis and summarization. IEEE Trans. on Image Processing, vol. 12, no. 7, 796–807 (2003)

    Google Scholar 

  28. Y. Jiang, J. Yang, C. Ngo, A. Hauptmann.: Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Trans. on Multimedia. vol. 12, no. 1, 42–53 (2010)

    Google Scholar 

  29. D. Lowe.: Distinctive image features from scale-invariant keypoints. Int. J. of computer vision, vol. 60, no. 2, 91–110 (2004)

    Google Scholar 

  30. J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman.: Object retrieval with large vocabularies and fast spatial matching. Proc. IEEE CVPR, vol. 3613, 1575–1589 (2007)

    Google Scholar 

  31. J. Yang, Y. Jiang, A. Hauptmann, C. Ngo.: Evaluating bag-of-visual-words representations in scene classification. Proc. ACM MIR, 197–206 (2007)

    Google Scholar 

  32. S. Lazebnik, C. Schmid, J. Ponce.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Proc. IEEE CVPR, vol. 2, 2169–2178 (2006)

    Google Scholar 

  33. J. Zhang, M. Marszalek, S. Lazebnik, C. Schmid.: Local features and kernels for classification of texture and object categories: A comprehensive study. Int. J. of Computer Vision. vol. 73, no. 2, 213–238 (2007)

    Google Scholar 

  34. J. Sivic, A. Zisserman.: Video Google: A text retrieval approach to object matching in videos. Proc. ICCV. vol. 2, 1470–1477 (2003)

    Google Scholar 

  35. L. Li, N. Zhang, L. Duan, Q. Huang, J. Du, L. Guan.: Automatic sports genre categorization and view-type classification over large-scale dataset. Proc. ACM MM, 653–656 (2009)

    Google Scholar 

  36. G. Lavee, E. Rivlin, M. Rudzsky.: Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video. IEEE Trans. on Systems, Man, Cybernetics, Part C: Applications and Reviews, vol. 39, no. 5, 489–504 (2009)

    Google Scholar 

  37. D. Sadlier, N. O’Connor.: Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans. on Circuits and Systems for Video Technology. vol. 15, no. 10, 1225–1233 (2005)

    Google Scholar 

  38. M. Xu, L. Duan, C. Xu, Q. Tian.: A fusion scheme of visual and auditory modalities for event detection in sports video. Proc. IEEE ICASSP, vol. 3, 189–192 (2003)

    Google Scholar 

  39. Q. Ye, Q. Huang, W. Gao, S. Jiang.: Exciting event detection in broadcast soccer video with mid-level description and incremental learning. Proc. ACM MM, 455–458 (2005)

    Google Scholar 

  40. L. Li, Y. Chen, W. Hu, W. Li, X. Zhang.: Recognition of Semantic Basketball Events Based on Optical Flow Patterns. Proc. ISVC, 480–488 (2009)

    Google Scholar 

  41. N. Babaguchi, Y. Kawai, T. Kitahashi.: Event based indexing of broadcasted sports video by intermodal collaboration. IEEE Trans. on Multimedia. vol. 4, no. 1, 68–75 (2002)

    Google Scholar 

  42. D. Zhang, S. Chang.: Event detection in baseball video using superimposed caption recognition. Proc. ACM MM, 315–318 (2002)

    Google Scholar 

  43. L. Duan, M. Xu, T. Chua, Q. Tian, C. Xu.: A mid-level representation framework for semantic sports video analysis. Proc. ACM MM, 33–44 (2003)

    Google Scholar 

  44. M. Tien, Y. Wang, C. Chou, K. Hsieh, W. Chu, J. Wu.: Event detection in tennis matches based on video data mining. Proc. IEEE ICME, 1477–1480 (2008)

    Google Scholar 

  45. Y. Zhang, C. Xu, Y. Rui, J. Wang, H. Lu.: Semantic event extraction from basketball games using multi-modal analysis. Proc. IEEE ICME, 2190–2193 (2007)

    Google Scholar 

  46. X. Tong, H. Lu, Q. Liu.: A three-layer event detection framework and its application in soccer video. Proc. IEEE ICME, 1551–1554 (2004)

    Google Scholar 

  47. T. Mei and X. Hua.: Structure and event mining in sports video with efficient mosaic. Multimedia Tools and Applications, vol. 40, no. 1, 89–110 (2008)

    Google Scholar 

  48. T. Wang, J. Li, Q. Diao, W. Hu, Y. Zhang, C. Dulong.: Semantic event detection using conditional random fields. Proc. IEEE CVPRW, 109–114 (2006)

    Google Scholar 

  49. C. Xu, Y. Zhang, G. Zhu, Y. Rui, H. Lu, Q. Huang.: Using webcast text for semantic event detection in broadcast sports video. IEEE Trans. on Multimedia, vol. 10, no. 7, 1342–1355 (2008)

    Google Scholar 

  50. P. Wang, Z. Liu, S. Yang.: Investigation on unsupervised clustering algorithms for video shot categorization. J. of Soft Computing-A Fusion of Foundations, Methodologies and Applications, vol. 11, no. 4, 355–360 (2007)

    Google Scholar 

  51. L. Zhong, C. Li, H. Li, Z. Xiong.: Unsupervised Clustering Algorithm for Video Shots Using Spectral Division. Proc. ISVC, 782–792 (2008)

    Google Scholar 

  52. L. Duan, M. Xu, Q. Tian.: Semantic shot classification in sports video. Proc. SPIE, 300–313 (2003)

    Google Scholar 

  53. X. Tong, Q. Liu, H. Lu, H. Jin.: Shot classification in sports video. Proc. ICSP. vol. 2, 1364–1367 (2004)

    Google Scholar 

  54. J. Wang, E. Chng, C. Xu.: Soccer replay detection using scene transition structure analysis. Proc. IEEE ICASSP, 433–437 (2005)

    Google Scholar 

  55. M. Kolekar and K. Palaniappan.: Semantic concept mining based on hierarchical event detection for soccer video indexing. J. of Multimedia, vol. 4, no. 5, 298–312 (2009)

    Google Scholar 

  56. R. Benmokhtar, B. Huet, S. Berrani.: Low-level feature fusion models for soccer scene classification. Proc. IEEE ICME, 1329–1332 (2008)

    Google Scholar 

  57. T. Hofmann.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. NIPS, vol. 12, 914–920 (2000)

    Google Scholar 

  58. T. Hofmann.: Probabilistic latent semantic indexing. Proc. ACM SIGIR, 50–57 (1999)

    Google Scholar 

  59. C. Chang and C. Lin.: LIBSVM: a library for support vector machines. (2001)

    Google Scholar 

  60. G. Miao, G. Zhu, S. Jiang, Q. Huang, C. Xu, W. Gao.: A Real-Time Score Detection and Recognition Approach for Broadcast Basketball Video. Proc. IEEE ICME, 1691–1694 (2007)

    Google Scholar 

  61. J. Dai, L. Duan, X. Tong, C. Xu, Q. Tian, H. Lu, J. Jin.: Replay scene classification in soccer video using web broadcast text. Proc. IEEE ICME, 1098–1101 (2005)

    Google Scholar 

  62. C. Xu, J. Wang, K. Wan, Y. Li, L. Duan.: Live sports event detection based on broadcast video and web-casting text. Proc. ACM MM, 230–237 (2006)

    Google Scholar 

  63. A. Quattoni, S. Wang, L. Morency, M. Collins, T. Darrell, M. Csail.: Hidden-state conditional random fields. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, 1848–1852 (2007)

    Google Scholar 

  64. S. Wang, A. Quattoni, L. Morency, D. Demirdjian, T. Darrell.: Hidden conditional random fields for gesture recognition. Proc. IEEE CVPR, 1521–1527 (2006)

    Google Scholar 

  65. A. Gunawardana, M. Mahajan, A. Acero, J. Platt.: Hidden conditional random fields for phone classification. Proc. Interspeech, 1117–1120 (2005)

    Google Scholar 

  66. Y. Tan, D. Saur, S. Kulkarni, P. Ramadge.: Rapid estimation of camera motion from compressed video with application to video annotation. IEEE Trans. on circuits and systems for video technology. vol. 10, no. 1, 133–146 (2000)

    Google Scholar 

  67. L. Morency, A. Quattoni, C. Christoudias, S. Wang.: Hidden-state Conditional Random Field Library. (2008)

    Google Scholar 

  68. F. Sha and F. Pereira.: Shallow parsing with conditional random fields. in Proc. of HLT-NAACL, 213–220 (2003)

    Google Scholar 

  69. J. Lafferty, A. McCallum, F. Pereira.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. in Proc. ICML, 282–289 (2001)

    Google Scholar 

  70. Y. Rubner, C. Tomasi, L. Guibas.: The earth mover’s distance as a metric for image retrieval. Inter. J. of Computer Vision, vol. 40, no. 2, 99–121 (2000)

    Google Scholar 

  71. R. Duda, P. Hart, D. Stork.: Pattern classification. Wiley-Interscience. (2001)

    Google Scholar 

  72. A. Jain, M. Murty, P. Flynn.: Data clustering: a review. ACM computing surveys, vol. 31, no. 3, 264–323 (1999)

    Google Scholar 

  73. H. Bay, T. Tuytelaars, L. Van Gool.: Surf: Speeded up robust features. Lecture notes in computer science, vol. 3951, 404–411 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Muneesawang, P., Zhang, N., Guan, L. (2014). Scalable Video Genre Classification and Event Detection. In: Multimedia Database Retrieval. Multimedia Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-11782-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11782-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11781-2

  • Online ISBN: 978-3-319-11782-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics