Abstract
We propose two speech/music discrimination methods using timbre models and measure their performances on a 3 hour long database of radio podcasts from the BBC. In the first method, the machine estimated classifications obtained with an automatic timbre recognition (ATR) model are post-processed using median filtering. The classification system (LSF/K-means) was trained using two different taxonomic levels, a high-level one (speech, music), and a lower-level one (male and female speech, classical, jazz, rock & pop). The second method combines automatic structural segmentation and timbre recognition (ASS/ATR). The ASS evaluates the similarity between feature distributions (MFCC, RMS) using HMM and soft K-means algorithms. Both methods were evaluated at a semantic (relative correct overlap RCO), and temporal (boundary retrieval F-measure) levels. The ASS/ATR method obtained the best results (average RCO of 94.5% and boundary F-measure of 50.1%). These performances were favourably compared with that obtained by a SVM-based technique providing a good benchmark of the state of the art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ajmera, J., McCowan, I., Bourlard, H.: Robust HMM-Based Speech/Music Segmentation. In: Proc. ICASSP 2002, vol. 1, pp. 297–300 (2002)
Alexandre-Cortizo, E., Rosa-Zurera, M., Lopez-Ferreras, F.: Application of Fisher Linear Discriminant Analysis to Speech Music Classification. In: Proc. EUROCON 2005, vol. 2, pp. 1666–1669 (2005)
ANSI: USA Standard Acoustical Terminology. American National Standards Institute, New York (1960)
Barthet, M., Depalle, P., Kronland-Martinet, R., Ystad, S.: Acoustical Correlates of Timbre and Expressiveness in Clarinet Performance. Music Perception 28(2), 135–153 (2010)
Barthet, M., Depalle, P., Kronland-Martinet, R., Ystad, S.: Analysis-by-Synthesis of Timbre, Timing, and Dynamics in Expressive Clarinet Performance. Music Perception 28(3), 265–278 (2011)
Barthet, M., Guillemain, P., Kronland-Martinet, R., Ystad, S.: From Clarinet Control to Timbre Perception. Acta Acustica United with Acustica 96(4), 678–689 (2010)
Barthet, M., Sandler, M.: Time-Dependent Automatic Musical Instrument Recognition in Solo Recordings. In: 7th Int. Symposium on Computer Music Modeling and Retrieval (CMMR 2010), Malaga, Spain, pp. 183–194 (2010)
Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., Sandler, M.: A Tutorial on Onset Detection in Music Signals. IEEE Transactions on Speech and Audio Processing (2005)
Burred, J.J., Lerch, A.: Hierarchical Automatic Audio Signal Classification. Journal of the Audio Engineering Society 52(7/8), 724–739 (2004)
Caclin, A., McAdams, S., Smith, B.K., Winsberg, S.: Acoustic Correlates of Timbre Space Dimensions: A Confirmatory Study Using Synthetic Tones. J. Acoust. Soc. Am. 118(1), 471–482 (2005)
Cannam, C.: Queen Mary University of London: Sonic Annotator, http://omras2.org/SonicAnnotator
Cannam, C.: Queen Mary University of London: Sonic Visualiser, http://www.sonicvisualiser.org/
Cannam, C.: Queen Mary University of London: Vamp Audio Analysis Plugin System, http://www.vamp-plugins.org/
Carey, M., Parris, E., Lloyd-Thomas, H.: A Comparison of Features for Speech, Music Discrimination. In: Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 149–152 (1999)
Castellengo, M., Dubois, D.: Timbre ou Timbres? Propriété du Signal, de l’Instrument, ou Construction Cognitive (Timbre or Timbres? Property of the Signal, the Instrument, or Cognitive Construction?). In: Proc. of the Conf. on Interdisciplinary Musicology (CIM 2005), Montréal, Québec, Canada (2005)
Chétry, N., Davies, M., Sandler, M.: Musical Instrument Identification using LSF and K-Means. In: Proc. AES 118th Convention (2005)
Childers, D., Skinner, D., Kemerait, R.: The Cepstrum: A Guide to Processing. Proc. of the IEEE 65, 1428–1443 (1977)
Davies, M.E.P., Degara, N., Plumbley, M.D.: Evaluation Methods for Musical Audio Beat Tracking Algorithms. Technical report C4DM-TR-09-06, Queen Mary University of London, Centre for Digital Music (2009), http://www.eecs.qmul.ac.uk/~matthewd/pdfs/DaviesDegaraPlumbley09-evaluation-tr.pdf
Davis, S.B., Mermelstein, P.: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-28(4), 357–366 (1980)
El-Maleh, K., Klein, M., Petrucci, G., Kabal, P.: Speech/Music Discrimination for Multimedia Applications. In: Proc. ICASSP 2000, vol. 6, pp. 2445–2448 (2000)
Fazekas, G., Sandler, M.: Intelligent Editing of Studio Recordings With the Help of Automatic Music Structure Extraction. In: Proc. of the AES 122nd Convention, Vienna, Austria (2007)
Galliano, S., Georois, E., Mostefa, D., Choukri, K., Bonastre, J.F., Gravier, G.: The ESTER Phase II Evaluation Campaign for the Rich Transcription of French Broadcast News. In: Proc. Interspeech (2005)
Gauvain, J.L., Lamel, L., Adda, G.: Audio Partitioning and Transcription for Broadcast Data Indexation. Multimedia Tools and Applications 14(2), 187–200 (2001)
Grey, J.M., Gordon, J.W.: Perception of Spectral Modifications on Orchestral Instrument Tones. Computer Music Journal 11(1), 24–31 (1978)
Hain, T., Johnson, S., Tuerk, A., Woodland, P.C., Young, S.: Segment Generation and Clustering in the HTK Broadcast News Transcription System. In: Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 133–137 (1998)
Hajda, J.M., Kendall, R.A., Carterette, E.C., Harshberger, M.L.: Methodological Issues in Timbre Research. In: Deliége, I., Sloboda, J. (eds.) Perception and Cognition of Music, 2nd edn., pp. 253–306. Psychology Press, New York (1997)
Handel, S.: Hearing. In: Timbre Perception and Auditory Object Identification, 2nd edn., pp. 425–461. Academic Press, San Diego (1995)
Harte, C.: Towards Automatic Extraction of Harmony Information From Music Signals. Ph.D. thesis, Queen Mary University of London (2010)
Helmholtz, H.v.: On the Sensations of Tone. Dover, New York (1954); (from the works of 1877). English trad. with notes and appendix from E.J. Ellis
Houtgast, T., Steeneken, H.J.M.: The Modulation Transfer Function in Room Acoustics as a Predictor of Speech Intelligibility. Acustica 28, 66–73 (1973)
Itakura, F.: Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals. J. Acoust. Soc. Am. 57(S35) (1975)
Jarina, R., O’Connor, N., Marlow, S., Murphy, N.: Rhythm Detection For Speech-Music Discrimination In MPEG Compressed Domain. In: Proc. of the IEEE 14th International Conference on Digital Signal Processing (DSP), Santorini (2002)
Kedem, B.: Spectral Analysis and Discrimination by Zero-Crossings. Proc. IEEE 74, 1477–1493 (1986)
Kim, H.G., Berdahl, E., Moreau, N., Sikora, T.: Speaker Recognition Using MPEG-7 Descriptors. In: Proc. of EUROSPEECH (2003)
Levy, M., Sandler, M.: Structural Segmentation of Musical Audio by Constrained Clustering. IEEE. Transac. on Audio, Speech, and Language Proc. 16(2), 318–326 (2008)
Linde, Y., Buzo, A., Gray, R.M.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28, 702–710 (1980)
Lu, L., Jiang, H., Zhang, H.J.: A Robust Audio Classification and Segmentation Method. In: Proc. ACM International Multimedia Conference, vol. 9, pp. 203–211 (2001)
Marozeau, J., de Cheveigné, A., McAdams, S., Winsberg, S.: The Dependency of Timbre on Fundamental Frequency. Journal of the Acoustical Society of America 114(5), 2946–2957 (2003)
Mauch, M.: Automatic Chord Transcription from Audio using Computational Models of Musical Context. Ph.D. thesis, Queen Mary University of London (2010)
McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., Krimphoff, J.: Perceptual Scaling of Synthesized Musical Timbres: Common Dimensions, Specificities, and Latent Subject Classes. Psychological Research 58, 177–192 (1995)
Music Information Retrieval Evaluation Exchange Wiki: Structural Segmentation (2010), http://www.music-ir.org/mirex/wiki/2010:Structural_Segmentation
Peeters, G.: Automatic Classification of Large Musical Instrument Databases Using Hierarchical Classifiers with Inertia Ratio Maximization. In: Proc. AES 115th Convention, New York (2003)
Queen Mary University of London: QM Vamp Plugins, http://www.omras2.org/SonicAnnotator
Ramona, M., Richard, G.: Comparison of Different Strategies for a SVM-Based Audio Segmentation. In: Proc. of the 17th European Signal Processing Conference (EUSIPCO 2009), pp. 20–24 (2009)
Risset, J.C., Wessel, D.L.: Exploration of Timbre by Analysis and Synthesis. In: Deutsch, D. (ed.) Psychology of Music, 2nd edn. Academic Press, London (1999)
Saunders, J.: Real-Time Discrimination of Broadcast Speech Music. In: Proc. ICASSP 1996, vol. 2, pp. 993–996 (1996)
Schaeffer, P.: Traité des Objets Musicaux (Treaty of Musical Objects). Éditions du seuil (1966)
Scheirer, E., Slaney, M.: Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator. In: Proc. ICASSP 1997, vol. 2, pp. 1331–1334 (1997)
Slawson, A.W.: Vowel Quality and Musical Timbre as Functions of Spectrum Envelope and Fundamental Frequency. J. Acoust. Soc. Am. 43(1) (1968)
Sundberg, J.: Articulatory Interpretation of the ‘Singing Formant’. J. Acoust. Soc. Am. 55, 838–844 (1974)
Terasawa, H., Slaney, M., Berger, J.: A Statistical Model of Timbre Perception. In: ISCA Tutorial and Research Workshop on Statistical And Perceptual Audition (SAPA 2006), pp. 18–23 (2006)
Gil de Zúñiga, H., Veenstra, A., Vraga, E., Shah, D.: Digital Democracy: Reimagining Pathways to Political Participation. Journal of Information Technology & Politics 7(1), 36–51 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barthet, M., Hargreaves, S., Sandler, M. (2011). Speech/Music Discrimination in Audio Podcast Using Structural Segmentation and Timbre Recognition. In: Ystad, S., Aramaki, M., Kronland-Martinet, R., Jensen, K. (eds) Exploring Music Contents. CMMR 2010. Lecture Notes in Computer Science, vol 6684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23126-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-23126-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23125-4
Online ISBN: 978-3-642-23126-1
eBook Packages: Computer ScienceComputer Science (R0)