Whispered speech recognition based on gammatone filterbank cepstral coefficients

Abstract

This paper presents the results on whispered speech recognition using gammatone filterbank cepstral coefficients for speaker dependent mode. The isolated words used for this experiment are taken from the Whi-Spe database. Whispered speech recognition is based on dynamic time warping and hidden Markov models methods. The experiments are focused on the following modes: normal speech, whispered speech and their combinations (normal/whispered and whispered/normal). The results demonstrated an important improvement in recognition after application of cepstral mean subtraction, especially in mixed train/test scenarios.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    C. Zhang and J. H. L. Hansen, Interspeech 2007, 2289 (2007).

    Google Scholar 

  2. 2.

    J. T. Ito, K. Takeda and F. Itakura, Speech Commun., 45, 129 (2005).

    Article  Google Scholar 

  3. 3.

    S. T. Jovičić and Z. M. Šarić, J. Voice, 22, 263 (2008).

    Article  Google Scholar 

  4. 4.

    S. T. Jovičić, “Formant feature differences between whispered and voiced sustained vowels,” ACUSTICA–Acta Acustica, 84, 739 (1998).

    Google Scholar 

  5. 5.

    L. Rabiner and B-H. Juang, Fundamentals of Speech Recognition (Prentice Hall, New Jersey, 1993).

    Google Scholar 

  6. 6.

    A. V. Savchenko, J. Commun. Technol. Electron. 51, 202 (2006).

    Article  Google Scholar 

  7. 7.

    V. V. Savchenko and P. G. Lukin, J. Commun. Technol. Electron. 59, 310 (2014).

    Article  Google Scholar 

  8. 8.

    B. Marković, J. Galić, Ð. Grozdić, and S. T. Jovičić, “Application of DTW method for whispered speech recognition,” in Proc. Speech Language 2013, 4th Int. Conf. Fundamental and Applied Aspects of Speech and Language, Belgrade, Serbia, 2013 (FAASL, 2013), p. 308.

    Google Scholar 

  9. 9.

    J. Galić, S. T. Jovičić, Ð. Grozdić, and B. Marković, HTK-Based Recognition of Whispered Speech, Ed. by A. Ronzhin et al., (SPECOM 2014, LNAI 8773, Springer Int. Publishing, Switzerland, 2014), p. 251.

  10. 10.

    B. Marković, S. T. Jovičić, J. Galić, and Ð. Grozdić, Whispered Speech Database: Design, Processing and Application, Ed. by I. Habernal and V. Matousek (TSD 2013, LNAI 8082, Springer-Verlag, Berlin, 2013), p. 591.

  11. 11.

    S. T. Jovičić, Z. Kašić, M. Ðordević, and M. Rajković, “Serbian emotional speech database: design, processing and evaluation,” (SPECOM-2004, St. Petersburg, Russia, 2004), p. 77.

    Google Scholar 

  12. 12.

    R. Petterson, K. Robinson, J. Holdsworth, et al., “Complex sounds and auditory images,” in Auditory Physiology and Perception, Ed. by Y. Cazals, L. Demany, and K. Horner (Pergamon, Oxford, 1992), p. 429.

    Google Scholar 

  13. 13.

    B. Glasberg and B. Moore, “Derivation of auditory filter shapes from notched-noise data,” Hearing Research, 47, 103 (1990).

    Article  Google Scholar 

  14. 14.

    H. Hermansky, J. Acoust. Soc. Am. 87, 1738 (1990).

    Article  Google Scholar 

  15. 15.

    J. De Veth and L. Boves, Speech Commun. 25, 149 (1998).

    Article  Google Scholar 

  16. 16.

    The Hidden Markov Model Toolkit. URL: http://htk.eng.cam.ac.uk/.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to B. Marković.

Additional information

The article is published in the original.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Marković, B., Galić, J., Grozdić, Ð. et al. Whispered speech recognition based on gammatone filterbank cepstral coefficients. J. Commun. Technol. Electron. 62, 1255–1261 (2017). https://doi.org/10.1134/S1064226917110134

Download citation