Skip to main content
Log in

Whispered speech recognition based on gammatone filterbank cepstral coefficients

Journal of Communications Technology and Electronics Aims and scope Submit manuscript

Cite this article


This paper presents the results on whispered speech recognition using gammatone filterbank cepstral coefficients for speaker dependent mode. The isolated words used for this experiment are taken from the Whi-Spe database. Whispered speech recognition is based on dynamic time warping and hidden Markov models methods. The experiments are focused on the following modes: normal speech, whispered speech and their combinations (normal/whispered and whispered/normal). The results demonstrated an important improvement in recognition after application of cepstral mean subtraction, especially in mixed train/test scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions


  1. C. Zhang and J. H. L. Hansen, Interspeech 2007, 2289 (2007).

    Google Scholar 

  2. J. T. Ito, K. Takeda and F. Itakura, Speech Commun., 45, 129 (2005).

    Article  Google Scholar 

  3. S. T. Jovičić and Z. M. Šarić, J. Voice, 22, 263 (2008).

    Article  Google Scholar 

  4. S. T. Jovičić, “Formant feature differences between whispered and voiced sustained vowels,” ACUSTICA–Acta Acustica, 84, 739 (1998).

    Google Scholar 

  5. L. Rabiner and B-H. Juang, Fundamentals of Speech Recognition (Prentice Hall, New Jersey, 1993).

    MATH  Google Scholar 

  6. A. V. Savchenko, J. Commun. Technol. Electron. 51, 202 (2006).

    Article  Google Scholar 

  7. V. V. Savchenko and P. G. Lukin, J. Commun. Technol. Electron. 59, 310 (2014).

    Article  Google Scholar 

  8. B. Marković, J. Galić, Ð. Grozdić, and S. T. Jovičić, “Application of DTW method for whispered speech recognition,” in Proc. Speech Language 2013, 4th Int. Conf. Fundamental and Applied Aspects of Speech and Language, Belgrade, Serbia, 2013 (FAASL, 2013), p. 308.

    Google Scholar 

  9. J. Galić, S. T. Jovičić, Ð. Grozdić, and B. Marković, HTK-Based Recognition of Whispered Speech, Ed. by A. Ronzhin et al., (SPECOM 2014, LNAI 8773, Springer Int. Publishing, Switzerland, 2014), p. 251.

  10. B. Marković, S. T. Jovičić, J. Galić, and Ð. Grozdić, Whispered Speech Database: Design, Processing and Application, Ed. by I. Habernal and V. Matousek (TSD 2013, LNAI 8082, Springer-Verlag, Berlin, 2013), p. 591.

  11. S. T. Jovičić, Z. Kašić, M. Ðordević, and M. Rajković, “Serbian emotional speech database: design, processing and evaluation,” (SPECOM-2004, St. Petersburg, Russia, 2004), p. 77.

    Google Scholar 

  12. R. Petterson, K. Robinson, J. Holdsworth, et al., “Complex sounds and auditory images,” in Auditory Physiology and Perception, Ed. by Y. Cazals, L. Demany, and K. Horner (Pergamon, Oxford, 1992), p. 429.

    Chapter  Google Scholar 

  13. B. Glasberg and B. Moore, “Derivation of auditory filter shapes from notched-noise data,” Hearing Research, 47, 103 (1990).

    Article  Google Scholar 

  14. H. Hermansky, J. Acoust. Soc. Am. 87, 1738 (1990).

    Article  Google Scholar 

  15. J. De Veth and L. Boves, Speech Commun. 25, 149 (1998).

    Article  Google Scholar 

  16. The Hidden Markov Model Toolkit. URL:

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to B. Marković.

Additional information

The article is published in the original.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marković, B., Galić, J., Grozdić, Ð. et al. Whispered speech recognition based on gammatone filterbank cepstral coefficients. J. Commun. Technol. Electron. 62, 1255–1261 (2017).

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: