Spectral similarity metrics for sound source formation based on the common variation cue

Lagrange, Mathieu; Raspaud, Martin

doi:10.1007/s11042-009-0382-9

Spectral similarity metrics for sound source formation based on the common variation cue

Published: 08 October 2009

Volume 48, pages 185–205, (2010)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mathieu Lagrange¹ &
Martin Raspaud²

122 Accesses
2 Citations
Explore all metrics

Abstract

Scene analysis is a relevant way of gathering information about the structure of an audio stream. For content extraction purposes, it also provides prior knowledge that can be taken into account in order to provide more robust results for standard classification approaches. In order to perform such scene analysis, we believe that the notion of temporality is important. Consequently, we study in this paper a new way of modeling the evolution over time of the frequency and amplitude parameters of spectral components. We evaluate its benefits by considering its ability to automatically gather the components of the same sound source. The evaluation of the proposed metric shows that it achieves good performance and takes better account of micro-modulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Sound Scene and Event Analysis

Relevance-based quantization of scattering features for unsupervised mining of environmental audio

Article Open access 29 September 2018

Separation of Known Sources Using Non-negative Spectrogram Factorisation

References

Abe M, Smith IJO (2005) Am/fm rate estimation for time-varying sinusoidal modeling. In: Proc. IEEE international conference on acoustics, speech, and signal processing (ICASSP ’05), 18–23 March 2005, vol 3, pp iii/201–iii/204
Aucouturier J-J, Pachet F (2007) The influence of polyphony on the dynamical modelling of musical timbre. Pattern Recogn Lett 28(5):654–661
Article Google Scholar
Auger F, Flandrin P (1995) Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Trans Signal Process 43:1068–1089
Article Google Scholar
Badeau R, Richard G, David B (2008) Performance of esprit for estimating mixtures of complex exponentials modulated by polynomials. In:IEEE Trans Signal Process. See also IEEE Transactions on Acoustics, Speech, and Signal Processing 56:492–504
MathSciNet Google Scholar
Bello JP, Pickens J (2005) A Robust Mid-level representation for harmonic content in music signals. In: ISMIR
Bregman AS (1990) Auditory scene analysis: the perceptual organization of sound. MIT, Cambridge
Google Scholar
Burg JP (1975) Maximum entropy spectral analysis. Ph.D. thesis, Stanford University
Christensen MG, Jensen SH (2006) On perceptual distortion minimization and nonlinear least-squares frequency estimation. IEEE Transactions on Audio, Speech, and Language Processing 14(1):99–109
Article Google Scholar
Cooke M (1993) Modelling auditory processing and organization. Cambridge University Press, New York
Google Scholar
Daudet L (2006) Sparse and structured decompositions of signals with the molecular matching pursuit. IEEE Transactions on Audio, Speech, and Language Processing 14(5):1808–1816
Article Google Scholar
Depalle P, Garcia G, Rodet X (1993) Tracking of partials for additive sound synthesis using hidden markov models. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 1, pp 225–228
Ellis D (1996) Prediction-driven computational auditory scene analysis. PhD thesis, Department. of Electrical Engineering & Computer Science, M.I.T
Ellis D, Rosenthal D (1995) Mid-level representations for Computational Auditory Scene Analysis. In: International Joint Conference on Artificial Intelligence (IJCAI) - workshop on computational auditory scene analysis
Ellis D, Vercoe B (1992) A perceptual representation of sound for auditory signal separation. In: 123rd meeting of the acoustical society of America
Fernandez P, Casajus-Quiros J (1998) Multi-pitch estimation for polyphonic musical signals. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3565–3568
Fritts L (1997) The IOWA music instrument samples. http://theremin.music.uiowa.edu
Grossberg S (1996) Pitch based streaming in auditory perception. MIT, Cambridge
Google Scholar
Herrera P, Peeters G, Dubnov S (2003) Automatic classification of musical sounds. J New Music Res 32(1):3–21
Article Google Scholar
A. N. S. Institute (1960) USA Standard Acoustical Terminology
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 23(1):67–72
Article Google Scholar
Joder C, Essid S, Richard G (2009) Temporal integration for audio classification with application to musical instrument classification. IEEE Transactions on Audio, Speech and Language Processing 17(1):174–186
Article Google Scholar
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 2(2):241–254
Article Google Scholar
Klapuri A (2002) Separation of harmonic sounds using linear models for the overtone series. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Lagrange M (2005) A new dissimilarity metric for the clustering of partials using the common variation cue. In: Proceedings of the International Computer Music Conference (ICMC), Barcelona, Spain, September 2005. International Computer Music Association (ICMA)
Lagrange M, Marchand S (2007) Estimating the instantaneous frequency of sinusoidal components using phase-based methods. J Audio Eng Soc 55(5):385–399
Google Scholar
Lagrange M, Marchand S, Rault J (2007) Enhancing the tracking of partials for the sinusoidal modeling of polyphonic sounds. IEEE Transactions on Audio, Speech and Language Processing 28:357–366
Google Scholar
Lagrange M, Marchand S, Rault J-B (2004) Using linear prediction to enhance the tracking of partials. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 4, pp 241–244
Lagrange M, Martins LG, Murdoch J, Tzanetakis G (2008) Normalized cuts for predominant melodic source separation. IEEE Transactions on Audio, Speech and Language Processing 16(2):278–290
Article Google Scholar
Laroche J (1993) The use of the matrix pencil method for the spectrum analysis of musical signals. J Acoust Soc Am 94(4):1958–1965
Article Google Scholar
Marchand S, Raspaud M (2004) Enhanced time-stretching using order-2 sinusoidal modeling. In: Proc. DAFx. Federico II University of Naple, Italy, pp 76–82
Martin KD, Kim YE (1998) Musical Instrument Recognition: a pattern-recognition approach. In: 136th meeting of the Acoustical Society of America
McAdams S (1989) Segregation of concurrrents sounds: effects of frequency modulation coherence. J Audio Eng Soc 86(6):2148–2159
Google Scholar
McAulay RJ, Quatieri TF (1986) Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech and Signal Processing 34(4):744–754
Article Google Scholar
Nealen A (2004) An as-short-as-possible introduction to the least squares, weighted least squares and moving least squares methods for scattered data approximation and interpolation. http://www.nealen.com/projects/
Nunes L, Merched R, Biscainho L (2007) Recursive least-squares estimation of the evolution of partials in sinusoidal analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Ramona M, Richard G (2008) Vocal detection in music with support vector machines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Raspaud M, Evangelista G (2008) Binaural partial tracking. In: Proc. DAFx. Espoo, Finland, pp 123–128
Raspaud M, Marchand S, Girin L (2005) A generalized polynomial and sinusoidal model for partial tracking and time stretching. In: Proc. DAFx. Universidad Politécnica de Madrid, pp 24–29, ISBN: 84-7402-318-1
Régnier L, Peeters G (2009) Singing voice detection in music tracks using direct voice vibrato detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Röbel A (2006) Adaptive additive modeling with continuous parameter trajectories. IEEE Transactions on Acoustics, Speech and Signal Processing 14(4):1440–1453
Google Scholar
Rosier J, Grenier Y (2004) Unsupervised classification techniques for multipitch estimation. In: 116th Convention of the Audio Engineering Society. Audio Engineering Society (AES)
Röbel A (2008) Frequency-slope estimation and its application to parameter estimation for non-stationary sinusoids. Comp Music J 32:68–79
Article Google Scholar
Serra X (1997) Musical signal processing with sinusoids plus noise, chap 3. In: Studies on New Music Research. Swets & Zeitlinger, Lisse, The Netherlands, pp 91–122
Google Scholar
Sterian A, Wakefield GH (1998) A model-based approach to partial tracking for musical transcription. In: SPIE annual meeting, San Diego, California
Google Scholar
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Transactions on Audio, Speech and Language Processing 10(5):293–302
Article Google Scholar
Virtanen T, Klapuri A (2000) Separation of harmonic sound sources using sinusoidal modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 2, pp 765–768
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:238–244
Article Google Scholar

Download references

Acknowledgements

This work has been initiated when the authors were at the LaBRI (UMR-Cnrs 5800, University of Bordeaux 1) and has been partly funded by the OSEO project Quaero within the task 6.4: “Music Search by Similarity” and the French GIP ANR DESAM under contract ANR-06-JCJC-0027-01.

Author information

Authors and Affiliations

Telecom ParisTech, 46, rue Barrault, 75634, Paris Cedex 13, France
Mathieu Lagrange
Linköping University, Bredgatan 33, 60174, Norrköping, Sweden
Martin Raspaud

Authors

Mathieu Lagrange
View author publications
You can also search for this author in PubMed Google Scholar
Martin Raspaud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathieu Lagrange.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lagrange, M., Raspaud, M. Spectral similarity metrics for sound source formation based on the common variation cue. Multimed Tools Appl 48, 185–205 (2010). https://doi.org/10.1007/s11042-009-0382-9

Download citation

Published: 08 October 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s11042-009-0382-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spectral similarity metrics for sound source formation based on the common variation cue

Abstract

Access this article

Similar content being viewed by others

Introduction to Sound Scene and Event Analysis

Relevance-based quantization of scattering features for unsupervised mining of environmental audio

Separation of Known Sources Using Non-negative Spectrogram Factorisation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spectral similarity metrics for sound source formation based on the common variation cue

Abstract

Access this article

Similar content being viewed by others

Introduction to Sound Scene and Event Analysis

Relevance-based quantization of scattering features for unsupervised mining of environmental audio

Separation of Known Sources Using Non-negative Spectrogram Factorisation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation