Skip to main content

Video Analysis and Summarization at Structural and Semantic Levels

  • Chapter
Multimedia Information Retrieval and Management

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

In this chapter, we present an overview of some of the research issues related to three areas of audio-visual analysis — (a) segmentation, (b) event detection and (c) summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Arons Pitch-Based Emphasis Detection For Segmenting Speech Recordings, Proc. ICSLP 1994, Sep. 1994, vol. 4, pp. 1931–1934, Yokohama, Japan, 1994.

    Google Scholar 

  2. B. Adams et. al. Automated Film Rhythm Extraction for Scene Analysis, Proc. ICME 2001, Aug. 2001, Japan.

    Google Scholar 

  3. A.B. Benitez, S.F. Chang, J.R. Smith IMKA: A Multimedia Organization System Combining Perceptual and Semantic Knowledge, Proc. ACM MM 2001, Nov. 2001, Ottawa Canada.

    Google Scholar 

  4. A.S. Bregman Auditory Scene Analysis: The Perceptual Organization of Sound, MIT Press, 1990.

    Google Scholar 

  5. B. Burke and F. Shook, “Sports photography and reporting”, Chapter 12, in Television field production and reporting, 2nd Ed, Longman Publisher USA, 1996

    Google Scholar 

  6. M. Burrows, D.J. Wheeler A Block-sorting Lossless Data Compression Algorithm, Digital Systems Research Center Research Report #124, 1994.

    Google Scholar 

  7. C.-C. Chang, C.-J.Lin, LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/—cj1 in/l ibsvm

    Google Scholar 

  8. M.G. Christel et. al Evolving Video Skims into Useful Multimedia Abstractions, ACM CHI ‘88, pp. 171–78, Los Angeles, CA, Apr. 1998.

    Google Scholar 

  9. N. Christianini, J. Shawe-Taylor Support Vector Machines and other kernel-based learning methods, 2000, Cambridge University Press, New York.

    Google Scholar 

  10. T.M. Cover, J.A. Thomas Elements of Information Theory,1991, John Wiley and Sons.

    Google Scholar 

  11. S. Ebadollahi, S.F. Chang, H. Wu, Echocardiogram Videos: Summarization, Temporal Segmentation And Browsing, to appear in ICIP 2002, Sep. 2002, Rochester NY.

    Google Scholar 

  12. D.P.W. Ellis Prediction-Driven Computational Auditory Scene Analysis, Ph.D. thesis, Dept. of EECS, MIT, 1996.

    Google Scholar 

  13. J. Feldman Minimization of Boolean complexity in human concept learning, Nature, pp. 630–633, vol. 407, Oct. 2000.

    Article  Google Scholar 

  14. Bob Foss Filmmaking: Narrative and Structural techniques Silman James Press LA, 1992.

    Google Scholar 

  15. Y. Gong; L.T. Sin; C. Chuan; H. Zhang; and M. Sakauchi, Automatic parsing of TV soccer programs, Proc. ICMCS ‘85, Washington D.C, May, 1995

    Google Scholar 

  16. B. Grosz J. Hirshberg Some Intonational Characteristics of Discourse Structure, Proc. Int. Conf. on Spoken Lang. Processing, pp. 429–432, 1992.

    Google Scholar 

  17. A. Hanjalic, R.L. Lagendijk, J. Biemond Automated high-level movie segmentation for advanced video-retrieval systems, IEEE Trans. on CSVT, Vol. 9 No. 4, pp. 580–88, Jun. 1999.

    Google Scholar 

  18. L. He et. al. Auto-Summarization of Audio-Video Presentations, ACM MM ‘89, Orlando FL, Nov. 1999.

    Google Scholar 

  19. J. Hirschberg, B. Groz Some Intonational Characteristics of Discourse Structure, Proc. ICSLP 1992.

    Google Scholar 

  20. J. Hirschberg D. Litman Empirical Studies on the Disambiguation of Cue Phrases, Computational Linguistics, 1992.

    Google Scholar 

  21. J. Huang; Z. Liu; Y. Wang, Joint video scene segmentation and classification based on hidden Markov model, Proc. ICME 2000, P 1551–1554 vol.3, New York, NY, July 30-Aug3, 2000

    Google Scholar 

  22. J. Huang; Z. Liu; Y. Wang, Integration of Audio and Visual Information for Content-Based Video Segmentation, Proc. ICIP 98. pp. 526–30, Chicago IL. Oct. 1998.

    Google Scholar 

  23. A. Jaimes and S.F. Chang, Concepts and Techniques for Indexing Visual Semantics, book chapter in Image Databases, Search and Retrieval of Digital Imagery, edited by V. Castelli and L. Bergman. Wiley Sons, New York, 2002

    Google Scholar 

  24. J.R. Kender B.L. Yeo, Video Scene Segmentation Via Continuous Video Coherence, CVPR ‘88, Santa Barbara CA, Jun. 1998.

    Google Scholar 

  25. R. Lienhart et. al. Automatic Movie Abstracting, Technical Report TR-97–003, Praktische Informatik IV, University of Mannheim, Jul. 1997.

    Google Scholar 

  26. L. Lu et. al. A robust audio classification and segmentation method, ACM Multimedia 2001, pp. 203–211, Ottawa, Canada, Oct. 2001.

    Google Scholar 

  27. T.S-Mahmood, D. Ponceleon, Learning video browsing behavior and its application in the generation of video previews, Proc. ACM Multimedia 2001, pp. 119–128, Ottawa, Canada, Oct. 2001.

    Google Scholar 

  28. [28] MPEG MDS Group, Text of ISO/IEC 15938–5 FDIS Information Technology Multimedia

    Google Scholar 

  29. Content Decsription Interface Part 5 Multimedia Description Schemes,ISO/IEC

    Google Scholar 

  30. JTC1/SC29/WG11 MPEG01/N4242, Sydney, July 2001.

    Google Scholar 

  31. J. Nam, A.H. Tewfik Combined audio and visual streams analysis for video sequence segmentation, Proc. ICASSP 97, pp. 2665 —2668, Munich, Germany, Apr. 1997.

    Google Scholar 

  32. M. Naphade et. al. Probabilistic Multimedia Objects Multijects: A novel Approach to Indexing and Retrieval in Multimedia Systems, Proc. I.E.E.E. International Conference on Image Processing, Volume 3, pages 536–540, Chicago, IL, Oct 1998.

    Google Scholar 

  33. M. Naphade et. al A Factor Graph Framework for Semantic Indexing and Retrieval in Video, Content-Based Access of Image and Video Library 2000 June 12, 2000 held in conjunction with the IEEE Computer Vision and Pattern Recognition 2000.

    Google Scholar 

  34. R. Patterson et. al. Complex Sounds and Auditory Images, in Auditory Physiology and Perception eds. Y Cazals et. al. pp. 429–46, Oxford, 1992.

    Google Scholar 

  35. S. Paek and S.-F. Chang, A Knowledge Engineering Approach for Image Classification Based on Probabilistic Reasoning Systems, IEEE International Conference on Multimedia and Expo. (ICME-2000), New York City, NY, USA, Jul 30-Aug 2, 2000.

    Google Scholar 

  36. S. Pfeiffer et. al. Abstracting Digital Movies Automatically, J. of Visual Communication and Image Representation, pp. 345–53, vol. 7, No. 4, Dec. 1996.

    Article  MATH  Google Scholar 

  37. W.H. Press et. al Numerical recipes in C, 2nd ed. Cambridge University Press, 1992.

    Google Scholar 

  38. L. R. Rabiner B.H. Huang Fundamentals of Speech Recognition, Prentice-Hall 1993.

    Google Scholar 

  39. K. Reisz, G. Millar, The Technique of Film Editing,2nd ed. 1968, Focal Press.

    Google Scholar 

  40. C. Saraceno, R. Leonardi Identification of story units in audio-visual sequences by joint audio and video processing, Proc. ICIP 98. pp. 363–67, Chicago IL. Oct. 1998.

    Google Scholar 

  41. E. Scheirer M.Slaney Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator Proc. ICASSP ‘87, Munich, Germany Apr. 1997.

    Google Scholar 

  42. S. Pfeiffer et. al. Automatic Audio Content Analysis,Proc. ACM Multimedia ‘86, pp. 21–30. Boston, MA, Nov. 1996

    Google Scholar 

  43. J.Saunders Real Time Discrimination of Broadcast Speech/Music,Proc. ICASSP ‘86, pp. 993–6, Atlanata GA May 1996.

    Google Scholar 

  44. B. Shahraray, D.C. Gibbon Automated Authoring of Hypermedia Documents of Video Programs, in Proc. ACM MM 95, pp. 401–409, 1995.

    Google Scholar 

  45. D. O’Shaughnessy Recognition of Hesitations in Spontaneous Speech, Proc. ICASSP, 1992.

    Google Scholar 

  46. S. Sharff The Elements of Cinema: Towards a Theory of Cinesthetic Impact,1982, Columbia University Press.

    Google Scholar 

  47. L.J. Stifelman The Audio Notebook: Pen and Paper Interaction with Structured Speech, PhD Thesis, Program in Media Arts and Sciences, School of Architecture and Planning, MIT, Sep. 1997.

    Google Scholar 

  48. S. Subramaniam et. al. Towards Robust Features for Classifying Audio in the CueVideo System, Proc. ACM Multimedia ‘89, pp. 393–400, Orlando FL, Nov. 1999.

    Google Scholar 

  49. H. Sundaram S.F. Chang Audio Scene Segmentation Using Multiple Features, Models And Time Scales, ICASSP 2000, International Conference in Acoustics, Speech and Signal Processing, Istanbul Turkey, Jun. 2000.

    Google Scholar 

  50. H. Sundaram, S.F. Chang Determining Computable Scenes in Films and their Structures using Audio-Visual Memory Models, Proc. Of ACM Multimedia 2000, pp. 95–104, Nov. 2000, Los Angeles, CA.

    Google Scholar 

  51. H. Sundaram, S.F. Chang, Condensing Computable Scenes using Visual Complexity and Film Syntax Analysis, IEEE Proc. ICME 2001, Tokyo, Japan, Aug 22–25, 2001.

    Google Scholar 

  52. H. Sundaram L. Xie Shih-Fu Chang A framework work audio-visual skim generation. Tech. Rep. # 2002–14, Columbia University, April 2002.

    Google Scholar 

  53. H. Sundaram, S.F. Chang Computable Scenes and structures in Films, IEEE Trans. on Multimedia, Vol. 4, No. 2, June 2002.

    Google Scholar 

  54. Y. Taniguchi et. al. PanoramiaExcerpts: Extracting and Packing Panoramas for Video Browsing, in Proc. ACM MM 97, pp. 427–436, Seattle WA, Nov. 1997.

    Google Scholar 

  55. R. Tansley. The Multimedia Thesaurus: Adding A Semantic Layer to Multimedia Information. Ph.D. Thesis, Computer Science, University of Southampton, Southampton UK, August 2000.

    Google Scholar 

  56. V. Tovinkere, R. J. Qian, Detecting Semantic Events in Soccer Games: Towards A Complete Solution, Proc. ICME 2001, Tokyo, Japan, Aug 22–25, 2001

    Google Scholar 

  57. S. Uchihashi et. al. Video Manga:: Generating Semantically Meaningful Video Summaries Proc. ACM Multimedia ‘89, pp. 383–92, Orlando FL, Nov. 1999.

    Google Scholar 

  58. T. Verma A Perceptually Based Audio Signal Model with application to Scalable Audio Compression, PhD thesis, Dept. Of Electrical Eng. Stanford University, Oct. 1999.

    Google Scholar 

  59. L. Xie et. al Structure Analysis Of Soccer Video With Hidden Markov Models, to appear in ICASSP 2002, Orlando, FI, May 2002.

    Google Scholar 

  60. P. Xu, L. Xie, S.F. Chang, A. Divakaran, A, Vetro, and H. Sun, Algorithms and system for segmentation and structure analysis in soccer video, Proc. ICME 2001, Tokyo, Japan, Aug 2001

    Google Scholar 

  61. M. Yeung B.L. Yeo Time-Constrained Clustering for Segmentation of Video into Story Units, Proc. Int. Conf. on Pattern Recognition, ICPR ‘86, Vol. C pp. 375–380, Vienna Austria, Aug. 1996.

    Google Scholar 

  62. B.L. Yeo, M. Yeung Classification, Simplification and Dynamic Visualization of Scene Transition Graphs for Video Browsing, Proc. SPIE ‘88, Storage and Retrieval of Image and Video Databases VI, San Jose CA, Feb. 1998.

    Google Scholar 

  63. M. Yeung, B.L. Yeo and B. Liu, Segmentation of Video by Clustering and Graph Analysis, Computer Vision and Image Understanding, V. 71, No. 1, July 1998.

    Google Scholar 

  64. D. Yow, B.L.Yeo, M. Yeung, and G. Liu, “Analysis and Presentation of Soccer Highlights from Digital Video” Proc. ACCV, 1995, Singapore, Dec. 1995

    Google Scholar 

  65. T. Zhang C.0 Jay Kuo Heuristic Approach for Generic Audio Segmentation and Annotation, Proc. ACM Multimedia ‘89, pp. 67–76, Orlando FL, Nov. 1999.

    Google Scholar 

  66. D. Zhong and S.F. Chang, “Structure Analysis of Sports Video Using Domain Models”, Proc. 1CME 2001, Tokyo, Japan, Aug. 2001

    Google Scholar 

  67. D. Zhong Segmentation, Indexing and Summarization of Digital Video Content PhD Thesis, Dept. Of Electrical Eng. Columbia University, NY, Jan. 2001.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sundaram, H., Chang, SF. (2003). Video Analysis and Summarization at Structural and Semantic Levels. In: Feng, D.D., Siu, WC., Zhang, HJ. (eds) Multimedia Information Retrieval and Management. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-05300-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-05300-3_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05533-1

  • Online ISBN: 978-3-662-05300-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics