Stochastic Modeling of Video Events

  • Milan Petković
  • Willem Jonker
Part of the The Springer International Series in Engineering and Computer Science book series (MMSA, volume 25)


Although we have demonstrated in the previous chapter that spatiotemporal formalization can be used for inferring video semantics from low-level feature representations and extracting events like net-playing and rally, the presented approach has some drawbacks. Firstly, it is essentially restricted to the extent of recognizable events, since it might become difficult to formalize complex actions of non-rigid objects using the proposed approach. This especially holds for an ordinary user who is not familiar with video features and spatio-temporal reasoning. An expert can help, but even then for some events the approach will not grant the best results. If we consider the tennis strokes for example, one can argue that they can be formalized like in the last section of the previous chapter. However, that will not result in reasonable accuracy (see [1] for example). On the other hand, introducing the ball position and some other features in the event descriptions might increase the accuracy, but unfortunately, it will make these descriptions too complicated. Furthermore, it is very difficult to find and track the ball because of its high speed (can be more than 200km/h) and occlusion problems. Finally, the proposed approach requires that someone, either a user or an expert, creates object and event descriptions, which can be time-consuming and error-prone.


Hide Markov Model Bayesian Network Audio Signal Dynamic Bayesian Network Text Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    G. Sudhir, J. Lee, A. Jain, “Automatic Classification of Tennis Video for High-level Content-based Retrieval”, In Proceedings of IEEE Workshop on Content-based Access and Image and Video Databases, Bombay, India, 1998, pp. 81–90.CrossRefGoogle Scholar
  2. [2]
    J. Yamato, J. Ohya, K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model”, In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 1992, pp. 379–385.Google Scholar
  3. [3]
    M. Naphade, T. Kristjansson, B. Frey, T.S. Huang, “Probabilistic multimedia objects (multijects): A novel approach to indexing and retrieval in multimedia systems”, In Proceedings of the IEEE International Conference on Image Processing (ICIP), Chicago, IL, 1998, vol. 3, pp. 536–540.Google Scholar
  4. [4]
    Mitrovic D., “Machine Learning for Car Navigation”, Proc. 14th Int. Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems IEA/AIE-2001, Budapest, June 2001, L. Monostori, J. Vancza and M. Ali (eds), Springer-Verlag, LNAI 2070, pp. 670–675.Google Scholar
  5. [5]
    D. Moore, I. Essa, M. Hayes, “Exploiting Human Actions and Object Context for Recognition Tasks”, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), vol. 1, Corfu, Greece, 1999, pp. 80–86.CrossRefGoogle Scholar
  6. [6]
    N. Vasconcelos, A. Lippman, “Bayesian Modeling of video editing and structure: Semantic features for video summarization and browsing”, In Proceeding of the IEEE International Conference on Image Processing (ICIP), vol. 2, Chicago, IL, 1998, pp. 550–555.Google Scholar
  7. [7]
    A.M. Ferman, A.M. Tekalp, “Probabilistic Analysis and Extraction of Video Content”, In Proceedings of the IEEE International Conference on Image Processing (ICIP), vol. 2, Tokyo, Japan, 1999, pp. 91–95.Google Scholar
  8. [8]
    T. Syeda-Mahmood, S. Srinivasan, “Detecting Topical Events in Digital Video”, In Proceedings of ACM Multimedia International Conference, Los Angeles, CA, 2000, pp. 85–94.Google Scholar
  9. [9]
    M. Naphade, T. S. Huang, “A Probabilistic Framework for Semantic Video Indexing, Filtering and Retrieval”, IEEE Transaction on Multimedia, Vol 3, No. 1, March 2001.Google Scholar
  10. [10]
    R.S. Jasinschi, N. Dimitrova, T. McGee, L. Agnihotri, J. Zimmerman, D. Li, “Integrated Multimedia Processing for Topic Segmentation and Classification”, In the Proceedings of IEEE Intl. Conf on Image Processing (ICIP), Greece, 2001.Google Scholar
  11. [11]
    M.Petkovic, W. Jonker, Z. Zivkovic, “Recognizing Strokes in Tennis Videos Using Hidden Markov Models”, In the Proceedings of the International Conference on Visualization, Imaging and Image Processing, Marbella, Spain, 2001, pp. 512–516.Google Scholar
  12. [12]
    M. Petkovic, V. Mihajlovic, W. Jonker, “Multi-Modal Extraction of Highlights from TV Formula 1 Programs”, In the Proceedings of the IEEE Intl. Conference on Multimedia and Expo (ICME), Lausanne, Switzerland, 2002.Google Scholar
  13. [13]
    L. R. Rabiner, B.H. Juang, “An Introduction to Hidden Markov Models”, IEEE ASSP Magazine, January 1986.Google Scholar
  14. [14]
    L. Baum, T. Petrie, G. Soules, N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains”, Annals of Mathematical Statistics, 41 (1), 1970, pp. 164–171.MathSciNetzbMATHCrossRefGoogle Scholar
  15. [15]
    T. Stamer, J. Weaver, A. Pentland, “Real-time American Sign Language Recognition Using Desk and Wearable Computer Based Video”, Pattern Recognition and Machine Intelligence IEEE Transaction on, 20 (12), 1998, pp. 1371–1375.CrossRefGoogle Scholar
  16. [16]
    S. Muller, S. Eickeler, G. Rigoll, “Pseudo 3-D HMMs for Image Sequence Recognition”, In Proc. of the IEEE International Conference on Image Processing (ICIP), Tokyo, Japan, 1999, pp. 237–241.Google Scholar
  17. [17]
    S. Michaelson, M. Steedman, Hidden Markov Models for Speech Recognition, Edinburgh University Press, 1990.Google Scholar
  18. [18]
    F. Jensen, An Introduction to Bayesian Networks, Springer-Verlag, 1996.Google Scholar
  19. [19]
    D. Heckerman, “A Tutorial on Learning Bayesian Networks”, Microsoft Research Technical Report, MSR-TR-95–06, 1995.Google Scholar
  20. [20]
    X. Boyen, N.Fiderman, D. Koller, “Discovering the Hidden Structure of Complex Dynamic Systems”, In Proceedings of International Conference on Uncertainty in Artificial Intelligence, 1999, pp. 91–100.Google Scholar
  21. [21]
    Z. Ghahramani, “Learning Dynamic Bayesian Networks”, In Adaptive Processing of Temporal Information ( C. L. Giles, M. Gori, eds.), Lecture Notes in Artificial Intelligence, Springer-Verlag, 1997.Google Scholar
  22. [22]
    J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann Publishers, 1988.Google Scholar
  23. [23]
    V. Mihajlovic, M. Petkovic, Dynamic Bayesian Networks: A State of the Art,TR-CTIT01–34,2001.Google Scholar
  24. [24]
    Z. Zivkovic, F. van der Heijden, M. Petkovic, W. Jonker, “Image processing and feature extraction for recognizing strokes in tennis game videos”, In the Proceedings of the Seventh Annual Conference of the Advanced School for Computing and Imaging, the Netherlands, June 2001.Google Scholar
  25. [25]
    A. Rosenfeld (ed.), Multiresolution image processing and analysis, Springer-Verlag, 1984.Google Scholar
  26. [26]
    G. Johansson, “Visual perception of biological motion and a model for its analysis ”. Perception and Psychoph. 14 (2), pp. 210–11, 1973.Google Scholar
  27. [27]
    H. Fujiyoshi and A. Lipton, “Real-time Human Motion Analysis by Image Skeletonization” IEEE Workshop on Applications of Computer Vision (WACV), Princeton NJ, pp. 15–21, 1998.Google Scholar
  28. [28]
    R. Rosales and S. Sclaroff, “Inferring Body Pose without Tracking Body Parts”, In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2000.Google Scholar
  29. [29]
    R. Duda, R. Hart, Pattern Classification and Scene Analysis, John Wiley, 1973.Google Scholar
  30. [30]
    J. Yandell, Visual Tennis, Human Kinetics, 1999.Google Scholar
  31. [31]
    T-L. Liu, D. Geiger, “Approximate Tree Matching and Shape Similarity”, In the Proceedings of the 7 6 International Conference on Computer Vision, Greece, 1999, pp. 456–462.Google Scholar
  32. [32]
    G. Pingali, Y. Jean, A. Opalach, I. Carlbom, “LucentVision: Converting Real World Events into Multimedia Experiences” In Proceeding of the IEEE International Conference on Multimedia and Expo (ICME), New York City, vol.3, pp. 1433–1436, 2000.Google Scholar
  33. [33]
    V. Mihajlovic, M. Petkovic, Automatic Annotation of Formula 1 Races for Content-based Video Retrieval, Technical Report, TR-CTIT-01–41, 2001.Google Scholar
  34. [34]
    W. Hess, Pitch Determination of Speech Signal, Springer-Verlag, New York, 1983.CrossRefGoogle Scholar
  35. [35]
    L. Rabiner, B.-H. Jang, Fundamentals of Speech Recognition, Englewood Cliffs, Prentice Hall, New York, 1993.Google Scholar
  36. [36]
    J. Christie, Completion of TNO-Abbot Research Project, Technical Report Cambridge University, Cambridge, England, December 1996.Google Scholar
  37. [37]
    X. Boyen, D. Koller, “Tractable Inference for Complex Stochastic Processes,” In Proceedings of the 14 th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, 1998.Google Scholar

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • Milan Petković
    • 1
  • Willem Jonker
    • 2
  1. 1.University of TwenteThe Netherlands
  2. 2.University of Twente and Philips ResearchThe Netherlands

Personalised recommendations