Advertisement

Whispered speech recognition based on gammatone filterbank cepstral coefficients

  • B. MarkovićEmail author
  • J. Galić
  • Ð. Grozdić
  • S. T. Jovičić
  • M. Mijić
Theory and Methods of Signal Processing

Abstract

This paper presents the results on whispered speech recognition using gammatone filterbank cepstral coefficients for speaker dependent mode. The isolated words used for this experiment are taken from the Whi-Spe database. Whispered speech recognition is based on dynamic time warping and hidden Markov models methods. The experiments are focused on the following modes: normal speech, whispered speech and their combinations (normal/whispered and whispered/normal). The results demonstrated an important improvement in recognition after application of cepstral mean subtraction, especially in mixed train/test scenarios.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    C. Zhang and J. H. L. Hansen, Interspeech 2007, 2289 (2007).Google Scholar
  2. 2.
    J. T. Ito, K. Takeda and F. Itakura, Speech Commun., 45, 129 (2005).CrossRefGoogle Scholar
  3. 3.
    S. T. Jovičić and Z. M. Šarić, J. Voice, 22, 263 (2008).CrossRefGoogle Scholar
  4. 4.
    S. T. Jovičić, “Formant feature differences between whispered and voiced sustained vowels,” ACUSTICA–Acta Acustica, 84, 739 (1998).Google Scholar
  5. 5.
    L. Rabiner and B-H. Juang, Fundamentals of Speech Recognition (Prentice Hall, New Jersey, 1993).zbMATHGoogle Scholar
  6. 6.
    A. V. Savchenko, J. Commun. Technol. Electron. 51, 202 (2006).CrossRefGoogle Scholar
  7. 7.
    V. V. Savchenko and P. G. Lukin, J. Commun. Technol. Electron. 59, 310 (2014).CrossRefGoogle Scholar
  8. 8.
    B. Marković, J. Galić, Ð. Grozdić, and S. T. Jovičić, “Application of DTW method for whispered speech recognition,” in Proc. Speech Language 2013, 4th Int. Conf. Fundamental and Applied Aspects of Speech and Language, Belgrade, Serbia, 2013 (FAASL, 2013), p. 308.Google Scholar
  9. 9.
    J. Galić, S. T. Jovičić, Ð. Grozdić, and B. Marković, HTK-Based Recognition of Whispered Speech, Ed. by A. Ronzhin et al., (SPECOM 2014, LNAI 8773, Springer Int. Publishing, Switzerland, 2014), p. 251.Google Scholar
  10. 10.
    B. Marković, S. T. Jovičić, J. Galić, and Ð. Grozdić, Whispered Speech Database: Design, Processing and Application, Ed. by I. Habernal and V. Matousek (TSD 2013, LNAI 8082, Springer-Verlag, Berlin, 2013), p. 591.Google Scholar
  11. 11.
    S. T. Jovičić, Z. Kašić, M. Ðordević, and M. Rajković, “Serbian emotional speech database: design, processing and evaluation,” (SPECOM-2004, St. Petersburg, Russia, 2004), p. 77.Google Scholar
  12. 12.
    R. Petterson, K. Robinson, J. Holdsworth, et al., “Complex sounds and auditory images,” in Auditory Physiology and Perception, Ed. by Y. Cazals, L. Demany, and K. Horner (Pergamon, Oxford, 1992), p. 429.CrossRefGoogle Scholar
  13. 13.
    B. Glasberg and B. Moore, “Derivation of auditory filter shapes from notched-noise data,” Hearing Research, 47, 103 (1990).CrossRefGoogle Scholar
  14. 14.
    H. Hermansky, J. Acoust. Soc. Am. 87, 1738 (1990).CrossRefGoogle Scholar
  15. 15.
    J. De Veth and L. Boves, Speech Commun. 25, 149 (1998).CrossRefGoogle Scholar
  16. 16.
    The Hidden Markov Model Toolkit. URL: http://htk.eng.cam.ac.uk/.Google Scholar

Copyright information

© Pleiades Publishing, Inc. 2017

Authors and Affiliations

  • B. Marković
    • 1
    Email author
  • J. Galić
    • 1
  • Ð. Grozdić
    • 1
  • S. T. Jovičić
    • 1
  • M. Mijić
    • 1
  1. 1.Telecommunication Department, School of Electrical EngineeringUniversity of BelgradeBelgradeSerbia

Personalised recommendations