Multimedia Tools and Applications

, Volume 30, Issue 3, pp 289–311 | Cite as

Audiovisual integration for tennis broadcast structuring

  • Ewa Kijak
  • Guillaume Gravier
  • Lionel Oisel
  • Patrick Gros


This paper focuses on the integration of multimodal features for sport video structure analysis. The method relies on a statistical model which takes into account both the shot content and the interleaving of shots. This stochastic modelling is performed in the global framework of Hidden Markov Models (HMMs) that can be efficiently applied to merge audio and visual cues. Our approach is validated in the particular domain of tennis videos. The model integrates prior information about tennis content and editing rules. The basic temporal unit is the video shot. Visual features are used to characterize the type of shot view. Audio features describe the audio events within a video shot. Two sets of audio features are used in this study: the first one is extracted from a manual segmentation of the soundtrack and is more reliable. The second one is provided by an automatic segmentation and classification process. As a result of the overall HMM process, typical tennis scenes are simultaneously segmented and identified. The experiments illustrate the improvement of HMM-based fusion over indexing using only the best single media, when both media are of similar quality.


Video structure analysis Macro-segmentation Cross-modality Hidden Markov models 


  1. 1.
    Alatan AA, Akansu AN, Wolf W (2001) Multi-modal dialog scene detection using Hidden Markov Models for content-based multimedia indexing. Multimed Tools Appl 14(2):137–151MATHCrossRefGoogle Scholar
  2. 2.
    Betser M, Gravier G, Gribonval R, Bimbot F (2003, September) Extraction of information from video sound tracks—can we dectect simultaneous events? In: Third International Workshop on Content-Based Multimedia Indexing (CBMI03), pp 71–77Google Scholar
  3. 3.
    Chang P, Han M, Gong Y (2002, September) Extract highlights from baseball game video with Hidden Markov Models. In: Proc. of IEEE International Conference on Image Processing (ICIP02), Rochester, NY, USAGoogle Scholar
  4. 4.
    Dayhot R, Kokaram A, Rea N, Denman H (2003, April) Joint audio visual retrieval for tennis broadcasts. In: IEEE Int. Conference on Acoustics, Speech, and Signal Processing (ICASSP03), Hong KongGoogle Scholar
  5. 5.
    Duan L-Y, Xu M, Tian Q (2003, January) Semantic shot classification in sports video. IS&T/SPIE storage and retrieval for media databases, SPIE-5021, pp 300–313Google Scholar
  6. 6.
    Hua W, Han M, Gong Y (2002, August) Baseball scene classification using multimedia features. In: IEEE International Conference on Multimedia and Expo (ICME02)Google Scholar
  7. 7.
    Huang J, Liu Z, Wang Y (1999, September) Integration of multimodal features for video scene classification based on HMM. In: Proc. of IEEE Workshop on Multimedia Signal Processing, Copenhagen, Denmark, pp 53–58Google Scholar
  8. 8.
    Jiang H, Lin T, Zhang H (2000, August) Video segmentation with the support of audio segmentation and classification. In: IEEE International Conference on Multimedia and Expo (I)(ICME00), Vol. 3, pp 1551–1554Google Scholar
  9. 9.
    Kawashima T, Tateyama K, Iijima T, Aoki T (1998, October) Indexing of baseball telecast for content-based video retrieval. In: IEEE International Conference on Image Processing (ICIP98), Vol. 1, pp 871–875Google Scholar
  10. 10.
    Kim K, Choi J, Kim N, Kim P (2002, July) Extracting semantic information from basketball video based on audio-visual features. In: Proc. of Int’l Conf. on Image and Video Retrieval, Vol. 2383, London, UK, Springer, Lecture Notes in Computer Science, pp 278–288Google Scholar
  11. 11.
    Lienhart R (2001) Reliable transition detection in videos: a survey and practitioner’s guide. International Journal of Image and Graphics 1(3):469–486CrossRefGoogle Scholar
  12. 12.
    Liu Z, Huang Q (1999, October) Detecting news reporting using audio/visual information. In: Proc. of IEEE International Conference on Image Processing (ICIP99), Vol. 1, pp 324–328Google Scholar
  13. 13.
    Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley and Sons, New YorkMATHCrossRefGoogle Scholar
  14. 14.
    Snoek CGM, Worring M (2003) Multimodal video indexing: a review of the state-of-the-art. Multimed Tools Appl (to appear)Google Scholar
  15. 15.
    Sudhir G, Lee JCM, Jain AK (1998, January) Automatic classification of tennis video for high-level content-based retrieval. In: Proc. of IEEE Workshop on Content-Based Access of Image and Video Databases, BombayGoogle Scholar
  16. 16.
    Wang Y, Liu Z, Huang J-C (2000, November) Multimedia content analysis using both audio and visual cues. IEEE Signal Process Mag 12–36Google Scholar
  17. 17.
    Xie L, Chang S-F, Divakaran A, Sun H (2002, May) Structure analysis of soccer video with Hidden Markov Models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP02), Orlando, FL, USAGoogle Scholar
  18. 18.
    Xu P, Xie L, Chang S-F, Divakaram A, Vetro A, Sun H (2001, August) Algorithms and system for segmentation and structure analysis in soccer video. In: IEEE International Conference on Multimedia and Expo (ICME01), pp 928–931Google Scholar
  19. 19.
    Xu M, Duan L-Y, Xu C-S, Tian Q (2003, April) A fusion scheme of visual and auditory modalities for event detection in sports video. In: IEEE Int. Conference on Acoustics, Speech, and Signal Processing (ICASSP03), Hong KongGoogle Scholar
  20. 20.
    Zhang HJ, Kankanhalli A, Smoliar SW (1993) Automatic partitioning of full-motion video. Multimedia Syst 1(1):10–28CrossRefGoogle Scholar
  21. 21.
    Zhong D, Chang S-F (2001, August) Structure analysis of sports video using domain models. In: IEEE International Conference on Multimedia and Expo (ICME01), Tokyo, JapanGoogle Scholar
  22. 22.
    Zhou W, Vellaikal A, Kuo C-CJ (2000, November) Rule-based video classification system for basketball video indexing. In: Proc. ACM International Multimedia Conference, Los Angeles, California, pp 213–216Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  • Ewa Kijak
    • 1
  • Guillaume Gravier
    • 2
  • Lionel Oisel
    • 3
  • Patrick Gros
    • 4
  1. 1.Université de Rennes IRennes CedexFrance
  2. 2.CNRSRennes CedexFrance
  3. 3.Thomson multimedia R&DCesson-SévignéFrance
  4. 4.IRISARennes CedexFrance

Personalised recommendations