Abstract
Scene analysis is a relevant way of gathering information about the structure of an audio stream. For content extraction purposes, it also provides prior knowledge that can be taken into account in order to provide more robust results for standard classification approaches. In order to perform such scene analysis, we believe that the notion of temporality is important. Consequently, we study in this paper a new way of modeling the evolution over time of the frequency and amplitude parameters of spectral components. We evaluate its benefits by considering its ability to automatically gather the components of the same sound source. The evaluation of the proposed metric shows that it achieves good performance and takes better account of micro-modulations.
Similar content being viewed by others
References
Abe M, Smith IJO (2005) Am/fm rate estimation for time-varying sinusoidal modeling. In: Proc. IEEE international conference on acoustics, speech, and signal processing (ICASSP ’05), 18–23 March 2005, vol 3, pp iii/201–iii/204
Aucouturier J-J, Pachet F (2007) The influence of polyphony on the dynamical modelling of musical timbre. Pattern Recogn Lett 28(5):654–661
Auger F, Flandrin P (1995) Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Trans Signal Process 43:1068–1089
Badeau R, Richard G, David B (2008) Performance of esprit for estimating mixtures of complex exponentials modulated by polynomials. In:IEEE Trans Signal Process. See also IEEE Transactions on Acoustics, Speech, and Signal Processing 56:492–504
Bello JP, Pickens J (2005) A Robust Mid-level representation for harmonic content in music signals. In: ISMIR
Bregman AS (1990) Auditory scene analysis: the perceptual organization of sound. MIT, Cambridge
Burg JP (1975) Maximum entropy spectral analysis. Ph.D. thesis, Stanford University
Christensen MG, Jensen SH (2006) On perceptual distortion minimization and nonlinear least-squares frequency estimation. IEEE Transactions on Audio, Speech, and Language Processing 14(1):99–109
Cooke M (1993) Modelling auditory processing and organization. Cambridge University Press, New York
Daudet L (2006) Sparse and structured decompositions of signals with the molecular matching pursuit. IEEE Transactions on Audio, Speech, and Language Processing 14(5):1808–1816
Depalle P, Garcia G, Rodet X (1993) Tracking of partials for additive sound synthesis using hidden markov models. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 1, pp 225–228
Ellis D (1996) Prediction-driven computational auditory scene analysis. PhD thesis, Department. of Electrical Engineering & Computer Science, M.I.T
Ellis D, Rosenthal D (1995) Mid-level representations for Computational Auditory Scene Analysis. In: International Joint Conference on Artificial Intelligence (IJCAI) - workshop on computational auditory scene analysis
Ellis D, Vercoe B (1992) A perceptual representation of sound for auditory signal separation. In: 123rd meeting of the acoustical society of America
Fernandez P, Casajus-Quiros J (1998) Multi-pitch estimation for polyphonic musical signals. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3565–3568
Fritts L (1997) The IOWA music instrument samples. http://theremin.music.uiowa.edu
Grossberg S (1996) Pitch based streaming in auditory perception. MIT, Cambridge
Herrera P, Peeters G, Dubnov S (2003) Automatic classification of musical sounds. J New Music Res 32(1):3–21
A. N. S. Institute (1960) USA Standard Acoustical Terminology
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 23(1):67–72
Joder C, Essid S, Richard G (2009) Temporal integration for audio classification with application to musical instrument classification. IEEE Transactions on Audio, Speech and Language Processing 17(1):174–186
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 2(2):241–254
Klapuri A (2002) Separation of harmonic sounds using linear models for the overtone series. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Lagrange M (2005) A new dissimilarity metric for the clustering of partials using the common variation cue. In: Proceedings of the International Computer Music Conference (ICMC), Barcelona, Spain, September 2005. International Computer Music Association (ICMA)
Lagrange M, Marchand S (2007) Estimating the instantaneous frequency of sinusoidal components using phase-based methods. J Audio Eng Soc 55(5):385–399
Lagrange M, Marchand S, Rault J (2007) Enhancing the tracking of partials for the sinusoidal modeling of polyphonic sounds. IEEE Transactions on Audio, Speech and Language Processing 28:357–366
Lagrange M, Marchand S, Rault J-B (2004) Using linear prediction to enhance the tracking of partials. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 4, pp 241–244
Lagrange M, Martins LG, Murdoch J, Tzanetakis G (2008) Normalized cuts for predominant melodic source separation. IEEE Transactions on Audio, Speech and Language Processing 16(2):278–290
Laroche J (1993) The use of the matrix pencil method for the spectrum analysis of musical signals. J Acoust Soc Am 94(4):1958–1965
Marchand S, Raspaud M (2004) Enhanced time-stretching using order-2 sinusoidal modeling. In: Proc. DAFx. Federico II University of Naple, Italy, pp 76–82
Martin KD, Kim YE (1998) Musical Instrument Recognition: a pattern-recognition approach. In: 136th meeting of the Acoustical Society of America
McAdams S (1989) Segregation of concurrrents sounds: effects of frequency modulation coherence. J Audio Eng Soc 86(6):2148–2159
McAulay RJ, Quatieri TF (1986) Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech and Signal Processing 34(4):744–754
Nealen A (2004) An as-short-as-possible introduction to the least squares, weighted least squares and moving least squares methods for scattered data approximation and interpolation. http://www.nealen.com/projects/
Nunes L, Merched R, Biscainho L (2007) Recursive least-squares estimation of the evolution of partials in sinusoidal analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Ramona M, Richard G (2008) Vocal detection in music with support vector machines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Raspaud M, Evangelista G (2008) Binaural partial tracking. In: Proc. DAFx. Espoo, Finland, pp 123–128
Raspaud M, Marchand S, Girin L (2005) A generalized polynomial and sinusoidal model for partial tracking and time stretching. In: Proc. DAFx. Universidad Politécnica de Madrid, pp 24–29, ISBN: 84-7402-318-1
Régnier L, Peeters G (2009) Singing voice detection in music tracks using direct voice vibrato detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Röbel A (2006) Adaptive additive modeling with continuous parameter trajectories. IEEE Transactions on Acoustics, Speech and Signal Processing 14(4):1440–1453
Rosier J, Grenier Y (2004) Unsupervised classification techniques for multipitch estimation. In: 116th Convention of the Audio Engineering Society. Audio Engineering Society (AES)
Röbel A (2008) Frequency-slope estimation and its application to parameter estimation for non-stationary sinusoids. Comp Music J 32:68–79
Serra X (1997) Musical signal processing with sinusoids plus noise, chap 3. In: Studies on New Music Research. Swets & Zeitlinger, Lisse, The Netherlands, pp 91–122
Sterian A, Wakefield GH (1998) A model-based approach to partial tracking for musical transcription. In: SPIE annual meeting, San Diego, California
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Transactions on Audio, Speech and Language Processing 10(5):293–302
Virtanen T, Klapuri A (2000) Separation of harmonic sound sources using sinusoidal modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 2, pp 765–768
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:238–244
Acknowledgements
This work has been initiated when the authors were at the LaBRI (UMR-Cnrs 5800, University of Bordeaux 1) and has been partly funded by the OSEO project Quaero within the task 6.4: “Music Search by Similarity” and the French GIP ANR DESAM under contract ANR-06-JCJC-0027-01.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lagrange, M., Raspaud, M. Spectral similarity metrics for sound source formation based on the common variation cue. Multimed Tools Appl 48, 185–205 (2010). https://doi.org/10.1007/s11042-009-0382-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-009-0382-9