Abstract
In this paper, we propose a method of quickly retrieving semantically similar scenes to a query video segment from large-scale videos with audio features. This method first classifies the sound of the target and query videos into voices and background sounds and extracts feature vectors by focusing on the sound sources. The feature vectors are then clustered by K-means algorithm and the cluster ID, which we call sign, is assigned to the feature vectors in the corresponding cluster, consequently representing a video segment as a sign sequence. Finally, the video scenes are retrieved by sign sequences matching using Dynamic Programming. The experimental results show this method is potentially useful for scene retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sundaram, H., Chang, S.-F.: Video Scene Segmentation Using Video and Audio Features. In: Proc. IEEE ICME 2000 (July 2000)
Wang, Y., Liu, Z., Huang, J.: Multimedia Content Analysis Using Both Audio and Visual Clues. IEEE Signal Processing Magazine, 12–36 (November 2000)
Deng, Y., Manjunath, B.S.: Content Based Search of Video using Color, Texture and Motion. In: Proc. IEEE ICIP, vol. 2, pp. 13–16 (October 1997)
Petkovic, M., Mihajlovic, V., Jonker, W., Djordjevic-Kajan, S.: Multi-Modal Extraction of Highlights from TV Formula 1 Programs. In: IEEE ICME (August 2002)
Zhang, T., Jay Kuo, C.-C.: Audio Content Analysis for On-line Audiovisual Data Segmentation. IEEE Trans. on Speech and Audio Processing 9(4), 441–457 (2001)
Liu, Z., Huang, J., Wang, Y.: Classification of TV Programs Based on Audio Information Using Hidden Markov Model. In: Proc. IEEE 2nd Workshop on Multimedia Signal Processing, December 1998, pp. 27–32 (1998)
Patel, N., Sethi, I.: Audio Characterization for Video Indexing. In: Proc. SPIE, vol. 2670, pp. 373–384 (1996)
Minami, K., Akutsu, A., Hamada, H., Tonomura, Y.: Video Handling with Music and Speech Detection. IEEE Multimedia, 17–25 (Fall 1998)
Scheirer, E., Slane, M.: Construction and Evaluation of a Robust Multifeature Speech / Music Discriminator. In: Proc. ICASSP 1997, vol. II, pp. 1331–1334 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Morisawa, K., Nitta, N., Babaguchi, N. (2004). Video Scene Retrieval with Sign Sequence Matching Based on Audio Features. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds) Advances in Multimedia Information Processing - PCM 2004. PCM 2004. Lecture Notes in Computer Science, vol 3332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30542-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-30542-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23977-2
Online ISBN: 978-3-540-30542-2
eBook Packages: Computer ScienceComputer Science (R0)