Score Level versus Audio Level Fusion for Voice Pathology Detection on the Saarbrücken Voice Database

Martínez, David; Lleida, Eduardo; Ortega, Alfonso; Miguel, Antonio

doi:10.1007/978-3-642-35292-8_12

Score Level versus Audio Level Fusion for Voice Pathology Detection on the Saarbrücken Voice Database

David Martínez⁷,
Eduardo Lleida⁷,
Alfonso Ortega⁷ &
…
Antonio Miguel⁷

Conference paper

762 Accesses
6 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 328))

Abstract

The article presents a set of experiments on pathological voice detection over the Saarbrücken Voice Database (SVD). The SVD is freely available online containing a collection of voice recordings of different pathologies, both functional and organic. It includes recordings for more than 2000 speakers in which sustained vowels /a/, /i/, and /u/ are pronounced with normal, low, high, and low-high-low intonations. This variety of sounds makes possible to set different experiments, and in this paper a comparison between the performance of a system where all the vowels and intonations are pooled together to train a single model per class, and a system where a different model per class is trained for each vowel and intonation, and the scores of each subsystem are fused at the end, is conducted. The first approach is what we call audio level fusion, and the second is what we call score level fusion. For classification, a generative Gaussian mixture model trained with mel-frequency cepstral coefficients, harmonics-to-noise ratio, normalized noise energy and glottal-to-noise excitation ratio, is used. It is shown that the score level fusion is far more effective than the audio level fusion.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Godino Llorente, J.I., et al.: Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters. IEEE Tr. Biomed. Eng. 53(10) (2006)
Google Scholar
Sáenz-Lechón, N., et al.: Methodological Issues in the Development of Automatic Systems for Voice Pathology Detection. Biomed. Signal Proc. and Control 1(2) (2006)
Google Scholar
Jiang, J.J., Zhang, Y.: Nonlinear Dynamic Analysis of Speech from Pathological Subjects. Electron. Lett. 38(6) (2002)
Google Scholar
Zhang, Y., Jiang, J.J.: Nonlinear Dynamic Analysis in Signals Typing of Pathological Human Voices. Electron. Lett. 39(13) (2003)
Google Scholar
Markaki, M., Stylianou, Y.: Using Modulation Spectra for Voice Pathology Detection and Classification. In: Proc. IEEE EMBS Annual Intern. Conf., Minneapolis, MN (2009)
Google Scholar
Parsa, V., Jamieson, D.G.: Identification of Pathological Voices Using Glottal Noise Measures. J. Speech, Lang. and Hearing Res. 43(2) (2000)
Google Scholar
Gavidia-Ceballos, L., Hansen, J.H.L.: Direct Speech Feature Estimation Using an Iterative EM Algorithm for Vocal Fold Pathology Detection. IEEE Tr. Biomed. Eng. 43(4) (1996)
Google Scholar
Tadeusiewicz, R., et al.: The Evaluation of Speech Deformation Treated for Larynx Cancer Using Neural Network and Pattern Recognition Methods. In: Proc. EANN 1998 (1998)
Google Scholar
Gelzinis, A., et al.: Automated Speech Analysis Applied to Laryngeal Disease Categorization. Comput. Methods Programs Biomed. 91 (2008)
Google Scholar
Arias-Londoño, J.D., et al.: On Combining Information from Modulation Spectra and Mel-Frequency Cepstral Coefficients for Automatic Detection of Pathological Voices. Logop. Phoniatrics Vocology (2010)
Google Scholar
Sáenz Lechón, N.: Contribuciones Metodológicas para la Evaluación Objetiva de Patologías Laríngeas a partir del Ánalisis Acústico de la Voz en Diferentes Escenarios de Producción. PhD Thesis (2010)
Google Scholar
Kay Elemetrics Corp., Disordered Voice Database, Version 1.03 (CD-ROM), MEEI, Voice and Speech Lab, Boston, MA (October 1994)
Google Scholar
Barry, W.J., Pützer, M.: Saarbrücken Voice Database, Institute of Phonetics, Univ. of Saarland, http://www.stimmdatenbank.coli.uni-saarland.de/
Yumoto, E., et al.: Harmonics-To-Noise Ratio as an Index of the Degree of Hoarseness. J. Acoust. Soc. Am. 71 (1982)
Google Scholar
Kasuya, H., et al.: Normalized Noise Energy as an Acoustic Measure to Evaluate Pathologic Voice. J. Acoust. Soc. Am. 80(5) (1986)
Google Scholar
Michaelis, D., et al.: Glottal-to-Noise Excitation Ratio. A New Measure for Describing Pathological Voices. Acustica/Acta Acustica 83 (1997)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Tr. Acoust. 28(4) (1980)
Google Scholar
Brümmer, N.: FoCal Multi-class: Toolkit for Evaluation, Fusion and Calibration of Multi-class Recognition Scores - Tutorial and User Manual, http://sites.google.com/site/nikobrummer/focalmulticlass
Brümmer, N.: The BOSARIS ToolkitUser Guide: Theory, Algorithms and Code for Binary Classifier Score Processing, http://sites.google.com/site/bosaristoolkit
Brümmer, N., du Preez, J.A.: Application-Independent Evaluation of Speaker Detection. Computer Speech and Language 20(2-3) (2006)
Google Scholar
Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Models. IEEE Tr. on Speech and Audio Proc. 3 (1995)
Google Scholar
Hirano, M.: Clinical Examination of Voice. Springer, New York (1981)
Google Scholar
Sáenz-Lechón, N., et al.: Automatic Assessment of Voice Quality According to the GRBAS scale. In: Proc. 28th IEEE EMBS Annual Intern. Conf. (2006)
Google Scholar
Carding, P., et al.: Formal Perceptual Evaluation of Voice Quality in the United Kingdom. Logop. Phoniatrics Vocology 25 (2000)
Google Scholar
Wuyts, F., et al.: The Dysphonia Severity Index: An Objective Measure of Vocal Quality Based on a Multiparameter Approach. J. Speech, Lang. and Hearing Res. 43 (2000)
Google Scholar
Hakkesteegt, M.M., et al.: The Relationship between Perceptual Evaluation and Objective Multiparametric Evaluation of Dysphonia Severity. J. of Voice 22 (2008)
Google Scholar
González, D.M., Solana, E.L., Giménez, A.O., Artiaga, A.M., Villalba, J.: Voice Pathology Detection on the Saarbrücken Voice Database with Calilbration and Fusion of Scores Using MultiFocal Toolkit. In: Toledano, D.T., Giménez, A.O., Teixeira, A. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 99–109. Springer, Heidelberg (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain
David Martínez, Eduardo Lleida, Alfonso Ortega & Antonio Miguel

Authors

David Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Lleida
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Miguel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid. C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Doroteo Torre Toledano
Centro Politécnico Superior, Edificio Ada Byron, C/ María de Luna nº 1, 50018, Zaragoza, Spain
Alfonso Ortega Giménez
Universidade de Aveiro, Campus Universitário Aveiro, 3810-193, Aveiro, Portugal
António Teixeira
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Joaquín González Rodríguez
E.T.S.I.Telecomunicacion, Universidad Politécnica de Madrid, Ciudad Universitaria s/n, 28040, Madrid, Spain
Luis Hernández Gómez & Rubén San Segundo Hernández &
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Daniel Ramos Castro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martínez, D., Lleida, E., Ortega, A., Miguel, A. (2012). Score Level versus Audio Level Fusion for Voice Pathology Detection on the Saarbrücken Voice Database. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-35292-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35291-1
Online ISBN: 978-3-642-35292-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics