Skip to main content

Score Level versus Audio Level Fusion for Voice Pathology Detection on the Saarbrücken Voice Database

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 328))

Abstract

The article presents a set of experiments on pathological voice detection over the Saarbrücken Voice Database (SVD). The SVD is freely available online containing a collection of voice recordings of different pathologies, both functional and organic. It includes recordings for more than 2000 speakers in which sustained vowels /a/, /i/, and /u/ are pronounced with normal, low, high, and low-high-low intonations. This variety of sounds makes possible to set different experiments, and in this paper a comparison between the performance of a system where all the vowels and intonations are pooled together to train a single model per class, and a system where a different model per class is trained for each vowel and intonation, and the scores of each subsystem are fused at the end, is conducted. The first approach is what we call audio level fusion, and the second is what we call score level fusion. For classification, a generative Gaussian mixture model trained with mel-frequency cepstral coefficients, harmonics-to-noise ratio, normalized noise energy and glottal-to-noise excitation ratio, is used. It is shown that the score level fusion is far more effective than the audio level fusion.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Godino Llorente, J.I., et al.: Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters. IEEE Tr. Biomed. Eng. 53(10) (2006)

    Google Scholar 

  2. Sáenz-Lechón, N., et al.: Methodological Issues in the Development of Automatic Systems for Voice Pathology Detection. Biomed. Signal Proc. and Control 1(2) (2006)

    Google Scholar 

  3. Jiang, J.J., Zhang, Y.: Nonlinear Dynamic Analysis of Speech from Pathological Subjects. Electron. Lett. 38(6) (2002)

    Google Scholar 

  4. Zhang, Y., Jiang, J.J.: Nonlinear Dynamic Analysis in Signals Typing of Pathological Human Voices. Electron. Lett. 39(13) (2003)

    Google Scholar 

  5. Markaki, M., Stylianou, Y.: Using Modulation Spectra for Voice Pathology Detection and Classification. In: Proc. IEEE EMBS Annual Intern. Conf., Minneapolis, MN (2009)

    Google Scholar 

  6. Parsa, V., Jamieson, D.G.: Identification of Pathological Voices Using Glottal Noise Measures. J. Speech, Lang. and Hearing Res. 43(2) (2000)

    Google Scholar 

  7. Gavidia-Ceballos, L., Hansen, J.H.L.: Direct Speech Feature Estimation Using an Iterative EM Algorithm for Vocal Fold Pathology Detection. IEEE Tr. Biomed. Eng. 43(4) (1996)

    Google Scholar 

  8. Tadeusiewicz, R., et al.: The Evaluation of Speech Deformation Treated for Larynx Cancer Using Neural Network and Pattern Recognition Methods. In: Proc. EANN 1998 (1998)

    Google Scholar 

  9. Gelzinis, A., et al.: Automated Speech Analysis Applied to Laryngeal Disease Categorization. Comput. Methods Programs Biomed. 91 (2008)

    Google Scholar 

  10. Arias-Londoño, J.D., et al.: On Combining Information from Modulation Spectra and Mel-Frequency Cepstral Coefficients for Automatic Detection of Pathological Voices. Logop. Phoniatrics Vocology (2010)

    Google Scholar 

  11. Sáenz Lechón, N.: Contribuciones Metodológicas para la Evaluación Objetiva de Patologías Laríngeas a partir del Ánalisis Acústico de la Voz en Diferentes Escenarios de Producción. PhD Thesis (2010)

    Google Scholar 

  12. Kay Elemetrics Corp., Disordered Voice Database, Version 1.03 (CD-ROM), MEEI, Voice and Speech Lab, Boston, MA (October 1994)

    Google Scholar 

  13. Barry, W.J., Pützer, M.: Saarbrücken Voice Database, Institute of Phonetics, Univ. of Saarland, http://www.stimmdatenbank.coli.uni-saarland.de/

  14. Yumoto, E., et al.: Harmonics-To-Noise Ratio as an Index of the Degree of Hoarseness. J. Acoust. Soc. Am. 71 (1982)

    Google Scholar 

  15. Kasuya, H., et al.: Normalized Noise Energy as an Acoustic Measure to Evaluate Pathologic Voice. J. Acoust. Soc. Am. 80(5) (1986)

    Google Scholar 

  16. Michaelis, D., et al.: Glottal-to-Noise Excitation Ratio. A New Measure for Describing Pathological Voices. Acustica/Acta Acustica 83 (1997)

    Google Scholar 

  17. Davis, S.B., Mermelstein, P.: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Tr. Acoust. 28(4) (1980)

    Google Scholar 

  18. Brümmer, N.: FoCal Multi-class: Toolkit for Evaluation, Fusion and Calibration of Multi-class Recognition Scores - Tutorial and User Manual, http://sites.google.com/site/nikobrummer/focalmulticlass

  19. Brümmer, N.: The BOSARIS ToolkitUser Guide: Theory, Algorithms and Code for Binary Classifier Score Processing, http://sites.google.com/site/bosaristoolkit

  20. Brümmer, N., du Preez, J.A.: Application-Independent Evaluation of Speaker Detection. Computer Speech and Language 20(2-3) (2006)

    Google Scholar 

  21. Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Models. IEEE Tr. on Speech and Audio Proc. 3 (1995)

    Google Scholar 

  22. Hirano, M.: Clinical Examination of Voice. Springer, New York (1981)

    Google Scholar 

  23. Sáenz-Lechón, N., et al.: Automatic Assessment of Voice Quality According to the GRBAS scale. In: Proc. 28th IEEE EMBS Annual Intern. Conf. (2006)

    Google Scholar 

  24. Carding, P., et al.: Formal Perceptual Evaluation of Voice Quality in the United Kingdom. Logop. Phoniatrics Vocology 25 (2000)

    Google Scholar 

  25. Wuyts, F., et al.: The Dysphonia Severity Index: An Objective Measure of Vocal Quality Based on a Multiparameter Approach. J. Speech, Lang. and Hearing Res. 43 (2000)

    Google Scholar 

  26. Hakkesteegt, M.M., et al.: The Relationship between Perceptual Evaluation and Objective Multiparametric Evaluation of Dysphonia Severity. J. of Voice 22 (2008)

    Google Scholar 

  27. González, D.M., Solana, E.L., Giménez, A.O., Artiaga, A.M., Villalba, J.: Voice Pathology Detection on the Saarbrücken Voice Database with Calilbration and Fusion of Scores Using MultiFocal Toolkit. In: Toledano, D.T., Giménez, A.O., Teixeira, A. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 99–109. Springer, Heidelberg (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martínez, D., Lleida, E., Ortega, A., Miguel, A. (2012). Score Level versus Audio Level Fusion for Voice Pathology Detection on the Saarbrücken Voice Database. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35292-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35291-1

  • Online ISBN: 978-3-642-35292-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics