Multimodal Semantic Analysis and Annotation for Basketball Video

  • Song Liu
  • Min Xu
  • Haoran Yi
  • Liang-Tien Chia
  • Deepu Rajan
Open Access
Research Article
Part of the following topical collections:
  1. Information Mining from Multimedia Databases


This paper presents a new multiple-modality method for extracting semantic information from basketball video. The visual, motion, and audio information are extracted from video to first generate some low-level video segmentation and classification. Domain knowledge is further exploited for detecting interesting events in the basketball video. For video, both visual and motion prediction information are utilized for shot and scene boundary detection algorithm; this will be followed by scene classification. For audio, audio keysounds are sets of specific audio sounds related to semantic events and a classification method based on hidden Markov model (HMM) is used for audio keysound identification. Subsequently, by analyzing the multimodal information, the positions of potential semantic events, such as "foul" and "shot at the basket," are located with additional domain knowledge. Finally, a video annotation is generated according to MPEG-7 multimedia description schemes (MDSs). Experimental results demonstrate the effectiveness of the proposed method.


Hide Markov Model Domain Knowledge Boundary Detection Motion Prediction Semantic Event 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Gong YH, Sin LT, Chuan CH, Zhang H, Sakauchi M: Automatic parsing of TV soccer programs. Proceedings of International Conference on Multimedia Computing and Systems (ICMCS '95), May 1995, Washington, DC, USA 167-174.CrossRefGoogle Scholar
  2. 2.
    Tan Y-P, Saur DD, Kulkami SR, Ramadge PJ: Rapid estimation of camera motion from compressed video with application to video annotation. IEEE Transactions on Circuits and Systems for Video Technology 2000, 10(1):133-146. 10.1109/76.825867CrossRefGoogle Scholar
  3. 3.
    Xu P, Xie L, Chang S-F, Divakaran A, Vetro A, Sun H: Algorithms and system for segmentation and structure analysis in soccer video. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '01), August 2001, Tokyo, Japan 721-724.Google Scholar
  4. 4.
    Ekin A, Tekalp AM, Mehrotra R: Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing 2003, 12(7):796-807. 10.1109/TIP.2003.812758CrossRefGoogle Scholar
  5. 5.
    Lu H, Tan Y-P: Content-based sports video analysis and modeling. Proceedings of 7th International Conference on Control, Automation, Robotics and Vision (ICARCV '02), December 2002, Singapore 1198-1203.Google Scholar
  6. 6.
    Fu Y, Ekin A, Tekalp AM, Mehrotra R: Temporal segmentation of video objects for hierarchical object-based motion description. IEEE Transactions on Image Processing 2002, 11(2):135-145. 10.1109/83.982821CrossRefGoogle Scholar
  7. 7.
    Duan L-Y, Xu M, Chua T-S, Tian Q, Xu C-S: A mid-level representation framework for semantic sports video analysis. Proceedings of 11th ACM International Conference on Multimedia, November 2003, Berkeley, Calif, USA 33-44.Google Scholar
  8. 8.
    Han M, Hua W, Xu W, Gong YH: An integrated baseball digest system using maximum entropy method. Proceedings of 10th ACM International Conference on Multimedia, December 2002, Juan les Pins, France 347-350.Google Scholar
  9. 9.
    Nepal S, Srinivasan U, Reynolds G: Automatic detection of goal segments in basketball videos. Proceedings of 9th ACM International Conference on Multimedia, September 2001, Ottawa, Ontario, Canada 9: 261-269.Google Scholar
  10. 10.
    Xu M, Duan L-Y, Xu C-S, Kankanhalli M, Tian Q: Event detection in basketball video using multiple modalities. Proceedings of 4th International Conference on Information, Communications and Signal Processing and the 4th Pacific Rim Conference on Multimedia (ICICS-PCM '03), December 2003, Singapore 3: 1526-1530.Google Scholar
  11. 11.
    Naphade MR, Huang TS: Semantic video indexing using a probabilistic framework. Proceedings of International Conference on Pattern Recognition (ICPR '00), September 2000, Barcelona, Spain 3: 3083-3088.Google Scholar
  12. 12.
    Snoek CGM, Worring M: Multimedia event-based video indexing using time intervals. IEEE Transactions on Multimedia 2005, 7(4):638-647.CrossRefGoogle Scholar
  13. 13.
    Rui Y, Gupta A, Acero A: Automatically extracting highlights for TV baseball programs. Proceedings of 8th ACM International Conference on Multimedia, October–November 2000, Los Angeles, Calif, USA 105-115.Google Scholar
  14. 14.
    Xu M, Maddage NC, Xu C-S, Kankanhalli M, Tian Q: Creating audio keywords for event detection in soccer video. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 2: 281-284.Google Scholar
  15. 15.
    Rabiner L, Juang B-H: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ, USA; 1993.MATHGoogle Scholar
  16. 16.
    Assfalg J, Bertini M, Colombo C, Del Bimbo A, Nunziati W: Semantic annotation of soccer videos: automatic highlights identification. Computer Vision and Image Understanding 2003, 92(2-3):285-305. 10.1016/j.cviu.2003.06.004CrossRefGoogle Scholar
  17. 17.
    Pan H, van Beek P, Sezan MI: Detection of slow-motion replay segments in sports video for highlights generation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake City, Utah, USA 3: 1649-1652.Google Scholar
  18. 18.
    Xie L, Xu P, Chang S-F, Divakaran A, Sun H: Structure analysis of soccer video with domain knowledge and hidden Markov models . Pattern Recognition Letters 2004, 25(7):767-775. 10.1016/j.patrec.2004.01.005CrossRefGoogle Scholar
  19. 19.
    Xiong Z, Radhakrishnan R, Divakaran A, Huang TS: Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong, China 5: 632-635.Google Scholar
  20. 20.
    Nam J, Tewfik A: Combined audio and visual streams analysis for video sequence segmentation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), April 1997, Munich, Germany 4: 2665-2668.Google Scholar
  21. 21.
    Saraceno C, Leonardi R: Identification of story units in audio-visual sequences by joint audio and video processing. Proceedings of International Conference on Image Processing (ICIP '98), October 1998, Chicago, Ill, USA 1: 363-367.CrossRefGoogle Scholar
  22. 22.
    Yi H, Rajan D, Chia LT: A unified approach to detection of shot boundaries and subshots in compressed video. Proceedings of International Conference on Image Processing (ICIP '03), September 2003, Barcelona, Spain 2: 1005-1008.Google Scholar
  23. 23.
    Siew LH, Hodgson RM, Wood EJ: Texture measures for carpet wear assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence 1988, 10(1):92-105. 10.1109/34.3870CrossRefGoogle Scholar
  24. 24.
    Haralick RM, Shanmugam K, Dinstein I: Textural features for image classification. IEEE Transactions System, Man, and Cybernetics 1973, 3(6):610-621.CrossRefGoogle Scholar
  25. 25.
    Stiller C, Konrad J: Estimating motion in image sequences. IEEE Signal Processing Magazine 1999, 16(4):70-91. 10.1109/79.774934CrossRefGoogle Scholar
  26. 26.
    Szeliski R: Video mosaics for virtual environments. IEEE Computer Graphics and Applications 1996, 16(2):22-30. 10.1109/38.486677CrossRefGoogle Scholar
  27. 27.
    Young S, Evermann G, Kershaw D, et al.: The HTK Book (for HTK Version 3.1). Cambridge University Engineering Department, Cambridge, UK, December 2002Google Scholar
  28. 28.
    Manjunath BS, Salembier P, Sikora T: Introduction to MPEG-7. John Wiley & Sons, New York, NY, USA; 2002.Google Scholar

Copyright information

© Liu et al. 2006

Authors and Affiliations

  • Song Liu
    • 1
  • Min Xu
    • 1
  • Haoran Yi
    • 1
  • Liang-Tien Chia
    • 1
  • Deepu Rajan
    • 1
  1. 1.School of Computer EngineeringNanyang Technological UniversitySingapore

Personalised recommendations