Audio Content Discovery: An Unsupervised Approach

Lu, Lie; Hanjalic, Alan

doi:10.1007/978-0-387-76569-3_4

Lie Lu² &
Alan Hanjalic

Part of the book series: Signals and Communication Technology ((SCT))

1772 Accesses

Abstract

Automatically extracting semantic content from audio streams can be helpful in many multimedia applications. Motivated by the known limitations of traditional supervised approaches to content extraction, which are hard to generalize and require suitable training data, we propose in this chapter a completely unsupervised approach to content discovery in composite audio signals. The approach adopts the ideas from text analysis to find the fundamental and representative audio segments (analog to words and keywords), and to employ them for parsing a general audio document into meaningful "paragraphs" and "paragraphs" clusters. In our approach, we first employ spectral clustering to discover natural semantic sound clusters (e.g. speech, music, noise, applause, speech mixed with music). These clusters are referred to as audio elements, and analog to words in text analysis. Based on the obtained set of audio elements, the key audio elements, which are most prominent in characterizing the content of input audio data, are selected. The obtained (key) audio elements are then used to detect potential boundaries of semantic audio "paragraphs" denoted as auditory scenes, which are finally clustered in terms of the audio elements appearing therein, by investigating the relations between audio elements and auditory scenes with an information-theoretic co-clustering scheme. Evaluations of the proposed approach performed on 5 hours of diverse audio data indicate that promising results can be achieved, both regarding audio element discovery and auditory scene segmentation/clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates R, and Ribeiro-Neto B. Modern Information Retrieval. Addison-Wesley, Boston, MA, 1999.
Google Scholar
Cai R, Lu L, and Cai L-H. Unsupervised auditory scene categorization via key audio effects and information-theoretic co-clustering. Proc. the 30th IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, 1073–1076, 2005.
Google Scholar
Cai R, Lu L, Hanjalic A, Zhang H-J, and Cai L-H. A flexible framework for key audio effects detection and auditory context inference. IEEE Trans. Audio, Speech and Language Processing, Vol. 14, No. 3, 1026–1039, 2006
Article Google Scholar
Cai R, Lu L, and Hanjalic A. Unsupervised Content Discovery in Composite Audio, Proc. ACM Multimedia 05, 628–637, 2005
Google Scholar
Cheng W-H, Chu W-T, and Wu J-L. Semantic context detection based on hierarchical audio models. Proc. the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, 109–115, 2003.
Google Scholar
Dhillon IS, Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning. Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 269–274, 2001.
Google Scholar
Dhillon IS, Mallela S, and Modha DS. Information-theoretic co-clustering. Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, 89–98.
Google Scholar
Dhillon IS, and Guan Y. Information theoretic clustering of sparse co-occurrence data. Proc. the 3rd IEEE International Conference on Data Mining, 517–520, 2003.
Google Scholar
Duda RO, Hart PE, and Stork DG. Pattern Classification, Second Edition. John Wiley & Sons, NJ, 2000.
Google Scholar
Ellis D, and Lee K. Minimal-impact audio-based personal archives. Proc. ACM Workshop on Continuous Archival and Retrieval of Personal Experiences, 39–47, 2004.
Google Scholar
Gu J, Lu L, Cai R, Zhang H-J, Yang J. “Dominant Feature Vectors Based Audio Similarity Measure”, Proc. of Pacific-Rim Conference on Multimedia (PCM), 2, 890–897, 2004
Google Scholar
Hanjalic A., Lagendijk RL, and Biemond J. Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans. Circuits and Systems for Video Technology, Vol. 9, No. 4, pp. 580–588, 1999.
Article Google Scholar
Hanjalic A, and Xu L-Q. Affective video content representation and modeling. IEEE Trans. Multimedia, Vol. 7, No. 1, pp. 143–154, 2005.
Article Google Scholar
Kass RE, and Wasserman L. A Reference Bayesian Test for Nested Hypotheses and Its Relationship to the Schwarz Criterion. Journal of the American Statistical Association, Vol. 90, No. 431, 928–934, 1995
Article MathSciNet MATH Google Scholar
Kender JR, and Yeo BL. Video scene segmentation via continuous video coherence. Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 367–373, 1998.
Google Scholar
Liu Z, Wang Y and Chen T. Audio Feature Extraction and Analysis for Scene Segmentation and Classification. Journal of VLSI Signal Processing Systems, Vol. 20, pp.61–79, 1998 http://www.springerlink.com/content/n57147513455454k/
Google Scholar
Lu L, Cai R, and Hanjalic A. Towards a unified framework for content-based audio analysis. Proc. the 30th IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, 1069–1072, 2005.
Google Scholar
Lu L, Cai R, and Hanjalic A. Audio Elements based Auditory Scene Segmentation, Proc. ICASSP06, Vol. V, pp.17–20, 2006.
Google Scholar
Lu L, and Hanjalic A. Towards Optimal Audio Keywords Detection for Audio Content Analysis and Discovery, Proc. ACM Multimedia 06, 825–834, 2006
Google Scholar
Lu L, Zhang H-J, and Jiang H. Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Processing, Vol. 10, No. 7, 504–516, 2002.
Article Google Scholar
Ma Y-F, Lu L, Zhang H-J, and Li M-J. A user attention model for video summarization. Proc. ACM International Conference on Multimedia, 533–542, 2002.
Google Scholar
Moncrieff S, Dorai C, and Venkatesh S. Detecting indexical signs in film audio for scene interpretation. Proc. the 2nd IEEE International Conference on Multimedia and Expo, 989–992, 2001.
Google Scholar
Ng AY, Jordan MI, and Weiss Y. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems (NIPS) 14, 849–856, 2001.
Google Scholar
Ngo C-W, Ma Y-F, and Zhang H-J. Video summarization and scene detection by graph modeling. IEEE Trans. Circuits and Systems for Video Technology, Vol. 15, No. 2, 296–305, 2005.
Article Google Scholar
Pelleg D, and Moore AW. X-means: extending K-means with efficient estimation of the number of clusters. Proc. the 17th International Conference on Machine Learning, 727–734, 2000.
Google Scholar
Peltonen V, Tuomi J, Klapuri AP, Huopaniemi J, and Sorsa T. Computational auditory scene recognition. Proc. the 27th IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, 1941–1944, 2002.
Google Scholar
Radhakrishnan R, Divakaran A, and Xiong Z. A time series clustering based framework for multimedia mining and summarization using audio features. Proc. the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, 157–164, 2004.
Google Scholar
Scott GL, and Longuet-Higgins HC. Feature grouping by relocalisation of eigenvectors of the proximity matrix. Proc. British Machine Vision Conference, 103–108, 1990
Google Scholar
Shi J, and Malik J. Normalized cuts and image segmentation. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 731–737, 1997.
Google Scholar
Sundaram H, and Chang S-F. Determining Computable scenes in films and their structures using audio visual memory models. Proc. the 8th ACM International Conference on Multimedia, 95–104, 2000.
Google Scholar
Venugopal S, Ramakrishnan KR, Srinivas SH, and Balakrishnan N. “Audio scene analysis and scene change detection in the MPEG compressed domain,” Proc. MMSP99, 191–196, 1999.
Google Scholar
Wang D, Lu L, Zhang H-J. Speech Segmentation without Speech Segmentation, Proc. the 28th IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, 468–471, 2003.
Google Scholar
Weiss Y, Segmentation using eigenvectors: a unifying view. Proc. IEEE International Conference on Computer Vision, Vol. 2, 975–982, 1999
Article Google Scholar
Xie L, Chang S-F, Divakaran A, and Sun H. Unsupervised mining of statistical temporal structures in video. Video Mining, Kluwer Academic Publishers, 279–307, 2003.
Google Scholar
Xu M, Maddage N, Xu CS, Kankanhalli M, and Tian Q. Creating audio keywords for event detection in soccer video. Proc. the 4th IEEE International Conference on Multimedia and Expo, Vol. 2, 281–284, 2003.
Google Scholar
Yu SX, and Shi J. Multiclass spectral clustering. Proc. the 9th IEEE International Conference on Computer Vision, 2003, Vol. 1, 313–319.
Article Google Scholar
Zelnik-Manor L, and Perona P. Self-tuning spectral clustering. Proc. Advances in Neural Information Processing Systems (NIPS) 17, 2004, 1601–1608.
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Group, Microsoft Research Asia, 5F, Beijing Sigma Center, 100080, No.49, Zhichun Road, Haidian District, Beijing, P.R.China
Lie Lu

Authors

Lie Lu
View author publications
You can also search for this author in PubMed Google Scholar
Alan Hanjalic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lie Lu .

Editor information

Editors and Affiliations

Vision & Visualization Group, Sarnoff Corporation, Washington Rd. 201, Princeton, 08543, U.S.A.
Ajay Divakaran

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lu, L., Hanjalic, A. (2009). Audio Content Discovery: An Unsupervised Approach. In: Divakaran, A. (eds) Multimedia Content Analysis. Signals and Communication Technology. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-76569-3_4

Download citation

DOI: https://doi.org/10.1007/978-0-387-76569-3_4
Published: 22 September 2008
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-76567-9
Online ISBN: 978-0-387-76569-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics