This paper presents the results on whispered speech recognition using gammatone filterbank cepstral coefficients for speaker dependent mode. The isolated words used for this experiment are taken from the Whi-Spe database. Whispered speech recognition is based on dynamic time warping and hidden Markov models methods. The experiments are focused on the following modes: normal speech, whispered speech and their combinations (normal/whispered and whispered/normal). The results demonstrated an important improvement in recognition after application of cepstral mean subtraction, especially in mixed train/test scenarios.
This is a preview of subscription content,to check access.
Access this article
C. Zhang and J. H. L. Hansen, Interspeech 2007, 2289 (2007).
J. T. Ito, K. Takeda and F. Itakura, Speech Commun., 45, 129 (2005).
S. T. Jovičić and Z. M. Šarić, J. Voice, 22, 263 (2008).
S. T. Jovičić, “Formant feature differences between whispered and voiced sustained vowels,” ACUSTICA–Acta Acustica, 84, 739 (1998).
L. Rabiner and B-H. Juang, Fundamentals of Speech Recognition (Prentice Hall, New Jersey, 1993).
A. V. Savchenko, J. Commun. Technol. Electron. 51, 202 (2006).
V. V. Savchenko and P. G. Lukin, J. Commun. Technol. Electron. 59, 310 (2014).
B. Marković, J. Galić, Ð. Grozdić, and S. T. Jovičić, “Application of DTW method for whispered speech recognition,” in Proc. Speech Language 2013, 4th Int. Conf. Fundamental and Applied Aspects of Speech and Language, Belgrade, Serbia, 2013 (FAASL, 2013), p. 308.
J. Galić, S. T. Jovičić, Ð. Grozdić, and B. Marković, HTK-Based Recognition of Whispered Speech, Ed. by A. Ronzhin et al., (SPECOM 2014, LNAI 8773, Springer Int. Publishing, Switzerland, 2014), p. 251.
B. Marković, S. T. Jovičić, J. Galić, and Ð. Grozdić, Whispered Speech Database: Design, Processing and Application, Ed. by I. Habernal and V. Matousek (TSD 2013, LNAI 8082, Springer-Verlag, Berlin, 2013), p. 591.
S. T. Jovičić, Z. Kašić, M. Ðordević, and M. Rajković, “Serbian emotional speech database: design, processing and evaluation,” (SPECOM-2004, St. Petersburg, Russia, 2004), p. 77.
R. Petterson, K. Robinson, J. Holdsworth, et al., “Complex sounds and auditory images,” in Auditory Physiology and Perception, Ed. by Y. Cazals, L. Demany, and K. Horner (Pergamon, Oxford, 1992), p. 429.
B. Glasberg and B. Moore, “Derivation of auditory filter shapes from notched-noise data,” Hearing Research, 47, 103 (1990).
H. Hermansky, J. Acoust. Soc. Am. 87, 1738 (1990).
J. De Veth and L. Boves, Speech Commun. 25, 149 (1998).
The Hidden Markov Model Toolkit. URL: http://htk.eng.cam.ac.uk/.
The article is published in the original.
About this article
Cite this article
Marković, B., Galić, J., Grozdić, Ð. et al. Whispered speech recognition based on gammatone filterbank cepstral coefficients. J. Commun. Technol. Electron. 62, 1255–1261 (2017). https://doi.org/10.1134/S1064226917110134