Skip to main content
Log in

Classification-oriented structure learning in Bayesian networks for multimodal event detection in videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We investigate the use of structure learning in Bayesian networks for a complex multimodal task of action detection in soccer videos. We illustrate that classical score-oriented structure learning algorithms, such as the K2 one whose usefulness has been demonstrated on simple tasks, fail in providing a good network structure for classification tasks where many correlated observed variables are necessary to make a decision. We then compare several structure learning objective functions, which aim at finding out the structure that yields the best classification results, extending existing solutions in the literature. Experimental results on a comprehensive data set of 7 videos show that a discriminative objective function based on conditional likelihood yields the best results, while augmented approaches offer a good compromise between learning speed and classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bae TM, Kim CS, Jin SH, Kim KH, Ro YM (2005) Semantic event detection in structured video using hybrid HMM/SVM. In: Intl. conf. on image and video retrieval, pp 113–122

  2. Baghdadi S, Gravier G, Demarty C-H, Gros P (2008) Structure learning in Bayesian network based video indexing. In: IEEE intl. conf. on multimedia and exhibition, pp 667–680

  3. Chickering D, Geiger D, Heckerman D (1995) Learning Bayesian networks: search methods and experimental results. In: Conf. on artificial intelligence and statistics, pp 112–128

  4. Choudhury T, Rehg JM, Pavlovic V, Pentland A (2002) Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection. In: IEEE intl. conf. on pattern recognition.

  5. Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 11(3):462–467

    Article  Google Scholar 

  6. Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9(2):309–347

    MATH  Google Scholar 

  7. Delakis M, Gravier G, Gros P (2008) Audiovisual integration with segment models for tennis video parsing. Comput Vis Image Underst 111(2):142–154

    Article  Google Scholar 

  8. Friedman N, Geiger D, Goldszmid M (1997) Bayesian network classifiers. Mach Learn 29(2):131–163

    Article  MATH  Google Scholar 

  9. Friedman N, Linial M, Nachman I, Peer D (2000) Using Bayesian network to analyze expression data. J Comput Biol 7(3–4):601–620

    Article  Google Scholar 

  10. Geiger D (1992) An entropy-based learning algorithm of Bayesian conditional trees. In: Conf. on uncertainty in artificial intelligence, pp 92–97

  11. Ghahramani Z (1998) Learning dynamic Bayesian networks. In: Giles C, Gori M (eds) Adaptive processing of sequences and data structures. Lecture notes in artificial intelligence, pp 168–197. Springer

  12. Gibert X, Li H, Doermann D (2003) Sports video classification using HMMs. In: IEEE intl. conf. on multimedia and exhibition, pp 345–348

  13. Greiner R, Su X, Shen B, Zhou W (2005) Structural extension to logistic regression: discriminative parameter learning of belief net classifiers. Mach Learn J 59(3):297–322

    Article  MATH  Google Scholar 

  14. Grossman D, Domingo P (2004) Learning Bayesian network classifiers by maximizing conditional likelihood. In: Intl. conf. on machine learning, pp 46–53

  15. Haering N, Qian R, Sezan M (2000) A semantic event-detection approach and its application to detecting hunts in wildlife video. IEEE Trans Circuits Syst Video Technol 10(6):857–868

    Article  Google Scholar 

  16. Heckerman D (1995) A tutorial on learning with Bayesian networks. Tech. Rep. MSR-TR-95-06, Microsoft Research

  17. Huang J, Liu Z, Wang Y, Chen Y, Wong EK (1999) Integration of multimodal features for video scene classification based on HMM. In: Wokshop on multimedia signal processing, pp 53–58

  18. Huang C-L, Shih H-C, Chao C-Y (2006) Semantic analysis of soccer video using dynamic bayesian network. IEEE Trans Multimedia 15(10):1225–1233

    Google Scholar 

  19. Jensen FV, Lauritzen SL, Olsen KG (1990) Bayesian updating in recursive graphical models by local computations. Computat Stat Q 4:269–282

    Google Scholar 

  20. Kijak E, Gravier G, Oisel L, Gros P (2006) Audiovisual integration for tennis broadcast structuring. Multimed Tools Appl 30(3):289–311

    Article  Google Scholar 

  21. Kim JH, Peal J (1983) A computational model for causal and diagnostic reasoning in inference systems. In: Intl. joint conf. on artificial intelligence, pp 190–193

  22. Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7:48–50

    Article  MATH  MathSciNet  Google Scholar 

  23. Lakka C, Nikolopoulos S, Varytimidis C, Kompatsiaris I (2011) A Bayesian network modeling approach for cross media analysis. Signal Process Image Commun 26(3):175–193

    Article  Google Scholar 

  24. Lienhart R, Pfeiffer S, Effelsberg W (1998) Scene determination based on video and audio features. Multimed Tools Appl 15(1):59–81

    Google Scholar 

  25. Mizutani M, Ebadollahi S, Chang S (2005) Commercial detection in heterogeneous video streams using fused multi-modal and temporal features. In: IEEE intl. conf. on acoustic, speech and signal processing, pp 157–160

  26. Murphy K (2002) Dynamic Bayesian networks: representation, inference and learning. PhD thesis, University of California, Berkeley

  27. Nefian AV, Liang L, Pi X, Xiaoxiang L, Mao C, Murphy K (2002) A coupled HMM for audio-visual speech recognition. In: IEEE intl. conf. on acoustic, speech and signal processing

  28. Pearl J, Verma T (1992) A statistical semantics for causation. Stat Comput 2(2):91–95

    Article  Google Scholar 

  29. Perlovsky LI (1998) Conundrum of combinatorial complexity. IEEE Trans Pattern Anal Mach Intell 20(6):666–670

    Article  Google Scholar 

  30. Petkovic M, Mihajlovic V, Jonker W, Djordjevic-Kajan S (2002) Multi-modal extraction of highlights from TV Formula 1 programs. In: IEEE intl. conf. on multimedia and exhibition, pp 817–820

  31. Qian X, Liu G, Wang H, Li Z, Wang Z (2011) Soccer video event detection by fusing middle level visual semantics of an event clip. In: Qiu G, Lam K, Kiya H, Xue X-Y, Kuo C-C, Lew M (eds) Advances in multimedia information processing - PCM 2010. Lecture notes in computer science, vol 6298, pp 439–451. Springer, Berlin/Heidelberg

    Google Scholar 

  32. Saraceno C, Leonardi R (1998) Identification of story units in audio-visual sequences by joint audio and video processing. In: IEEE intl. conf. on image processing, pp 363–367

  33. Snoek C, Worring M (2005) Multimedia event-based video indexing using time intervals. IEEE Trans Multimedia 7(4):638–647

    Article  Google Scholar 

  34. Spirtes P, Glymour C, Scheines R (1993) Causation, predection and search. Springer

  35. Tovinkere V, Qian R (2001) Detecting semantic events in soccer games: towards a complete solution. In: IEEE intl. conf. on multimedia and exhibition, pp 833–836

  36. Wang F, Ma Y-F, Zhang H-J, Li J-T (2004) Dynamic Bayesian network based event detection for soccer highlight extraction. In: IEEE intl. conf. on image processing, pp 633–636

  37. Xu G, Ma Y-F, Zhang H-J, Yang S (2002) Motion based event recognition using HMM. In: IEEE intl. conf. on pattern recognition, pp 831–834

  38. Zhong D, Chang S (2001) Structure analysis of sports video using domain models. In: IEEE intl. conf. on multimedia and exhibition, pp 713–716

Download references

Acknowledgements

This work was partially funded by OSEO, French state agency for innovation, in the framework of the Quaero project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Gravier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gravier, G., Demarty, CH., Baghdadi, S. et al. Classification-oriented structure learning in Bayesian networks for multimodal event detection in videos. Multimed Tools Appl 70, 1421–1437 (2014). https://doi.org/10.1007/s11042-012-1169-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1169-y

Keywords

Navigation