Video Scene Retrieval with Sign Sequence Matching Based on Audio Features

Morisawa, Keisuke; Nitta, Naoko; Babaguchi, Noboru

doi:10.1007/978-3-540-30542-2_16

Keisuke Morisawa¹⁹,
Naoko Nitta¹⁹ &
Noboru Babaguchi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3332))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

753 Accesses
2 Citations

Abstract

In this paper, we propose a method of quickly retrieving semantically similar scenes to a query video segment from large-scale videos with audio features. This method first classifies the sound of the target and query videos into voices and background sounds and extracts feature vectors by focusing on the sound sources. The feature vectors are then clustered by K-means algorithm and the cluster ID, which we call sign, is assigned to the feature vectors in the corresponding cluster, consequently representing a video segment as a sign sequence. Finally, the video scenes are retrieved by sign sequences matching using Dynamic Programming. The experimental results show this method is potentially useful for scene retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sundaram, H., Chang, S.-F.: Video Scene Segmentation Using Video and Audio Features. In: Proc. IEEE ICME 2000 (July 2000)
Google Scholar
Wang, Y., Liu, Z., Huang, J.: Multimedia Content Analysis Using Both Audio and Visual Clues. IEEE Signal Processing Magazine, 12–36 (November 2000)
Google Scholar
Deng, Y., Manjunath, B.S.: Content Based Search of Video using Color, Texture and Motion. In: Proc. IEEE ICIP, vol. 2, pp. 13–16 (October 1997)
Google Scholar
Petkovic, M., Mihajlovic, V., Jonker, W., Djordjevic-Kajan, S.: Multi-Modal Extraction of Highlights from TV Formula 1 Programs. In: IEEE ICME (August 2002)
Google Scholar
Zhang, T., Jay Kuo, C.-C.: Audio Content Analysis for On-line Audiovisual Data Segmentation. IEEE Trans. on Speech and Audio Processing 9(4), 441–457 (2001)
Article Google Scholar
Liu, Z., Huang, J., Wang, Y.: Classification of TV Programs Based on Audio Information Using Hidden Markov Model. In: Proc. IEEE 2nd Workshop on Multimedia Signal Processing, December 1998, pp. 27–32 (1998)
Google Scholar
Patel, N., Sethi, I.: Audio Characterization for Video Indexing. In: Proc. SPIE, vol. 2670, pp. 373–384 (1996)
Google Scholar
Minami, K., Akutsu, A., Hamada, H., Tonomura, Y.: Video Handling with Music and Speech Detection. IEEE Multimedia, 17–25 (Fall 1998)
Google Scholar
Scheirer, E., Slane, M.: Construction and Evaluation of a Robust Multifeature Speech / Music Discriminator. In: Proc. ICASSP 1997, vol. II, pp. 1331–1334 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Engineering, Osaka University, 2-1 Yamadaoka Suita, 565-0871, Japan
Keisuke Morisawa, Naoko Nitta & Noboru Babaguchi

Authors

Keisuke Morisawa
View author publications
You can also search for this author in PubMed Google Scholar
Naoko Nitta
View author publications
You can also search for this author in PubMed Google Scholar
Noboru Babaguchi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8656, Tokyo, Japan
Kiyoharu Aizawa
Tokyo Research Laboratory, IBM Research, 1623-14 Shimo-tsuruma, 242-0001, Yamato, Kanagawa, Japan
Yuichi Nakamura
National Institute of Informatics, Tokyo, Japan
Shin’ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morisawa, K., Nitta, N., Babaguchi, N. (2004). Video Scene Retrieval with Sign Sequence Matching Based on Audio Features. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds) Advances in Multimedia Information Processing - PCM 2004. PCM 2004. Lecture Notes in Computer Science, vol 3332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30542-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-30542-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23977-2
Online ISBN: 978-3-540-30542-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics