Whispered speech recognition based on gammatone filterbank cepstral coefficients

Marković, B.; Galić, J.; Grozdić, Ð.; Jovičić, S. T.; Mijić, M.

doi:10.1134/S1064226917110134

Whispered speech recognition based on gammatone filterbank cepstral coefficients

Theory and Methods of Signal Processing
Published: 22 November 2017

Volume 62, pages 1255–1261, (2017)
Cite this article

Journal of Communications Technology and Electronics Aims and scope Submit manuscript

B. Marković¹,
J. Galić¹,
Ð. Grozdić¹,
S. T. Jovičić¹ &
…
M. Mijić¹

60 Accesses
3 Citations
Explore all metrics

Abstract

This paper presents the results on whispered speech recognition using gammatone filterbank cepstral coefficients for speaker dependent mode. The isolated words used for this experiment are taken from the Whi-Spe database. Whispered speech recognition is based on dynamic time warping and hidden Markov models methods. The experiments are focused on the following modes: normal speech, whispered speech and their combinations (normal/whispered and whispered/normal). The results demonstrated an important improvement in recognition after application of cepstral mean subtraction, especially in mixed train/test scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

HTK-Based Recognition of Whispered Speech

Automatic Recognition System for Dysarthric Speech Based on MFCC’s, PNCC’s, JITTER and SHIMMER Coefficients

References

C. Zhang and J. H. L. Hansen, Interspeech 2007, 2289 (2007).
Google Scholar
J. T. Ito, K. Takeda and F. Itakura, Speech Commun., 45, 129 (2005).
Article Google Scholar
S. T. Jovičić and Z. M. Šarić, J. Voice, 22, 263 (2008).
Article Google Scholar
S. T. Jovičić, “Formant feature differences between whispered and voiced sustained vowels,” ACUSTICA–Acta Acustica, 84, 739 (1998).
Google Scholar
L. Rabiner and B-H. Juang, Fundamentals of Speech Recognition (Prentice Hall, New Jersey, 1993).
MATH Google Scholar
A. V. Savchenko, J. Commun. Technol. Electron. 51, 202 (2006).
Article Google Scholar
V. V. Savchenko and P. G. Lukin, J. Commun. Technol. Electron. 59, 310 (2014).
Article Google Scholar
B. Marković, J. Galić, Ð. Grozdić, and S. T. Jovičić, “Application of DTW method for whispered speech recognition,” in Proc. Speech Language 2013, 4th Int. Conf. Fundamental and Applied Aspects of Speech and Language, Belgrade, Serbia, 2013 (FAASL, 2013), p. 308.
Google Scholar
J. Galić, S. T. Jovičić, Ð. Grozdić, and B. Marković, HTK-Based Recognition of Whispered Speech, Ed. by A. Ronzhin et al., (SPECOM 2014, LNAI 8773, Springer Int. Publishing, Switzerland, 2014), p. 251.
B. Marković, S. T. Jovičić, J. Galić, and Ð. Grozdić, Whispered Speech Database: Design, Processing and Application, Ed. by I. Habernal and V. Matousek (TSD 2013, LNAI 8082, Springer-Verlag, Berlin, 2013), p. 591.
S. T. Jovičić, Z. Kašić, M. Ðordević, and M. Rajković, “Serbian emotional speech database: design, processing and evaluation,” (SPECOM-2004, St. Petersburg, Russia, 2004), p. 77.
Google Scholar
R. Petterson, K. Robinson, J. Holdsworth, et al., “Complex sounds and auditory images,” in Auditory Physiology and Perception, Ed. by Y. Cazals, L. Demany, and K. Horner (Pergamon, Oxford, 1992), p. 429.
Chapter Google Scholar
B. Glasberg and B. Moore, “Derivation of auditory filter shapes from notched-noise data,” Hearing Research, 47, 103 (1990).
Article Google Scholar
H. Hermansky, J. Acoust. Soc. Am. 87, 1738 (1990).
Article Google Scholar
J. De Veth and L. Boves, Speech Commun. 25, 149 (1998).
Article Google Scholar
The Hidden Markov Model Toolkit. URL: http://htk.eng.cam.ac.uk/.

Download references

Author information

Authors and Affiliations

Telecommunication Department, School of Electrical Engineering, University of Belgrade, Belgrade, 11000, Serbia
B. Marković, J. Galić, Ð. Grozdić, S. T. Jovičić & M. Mijić

Authors

B. Marković
View author publications
You can also search for this author in PubMed Google Scholar
J. Galić
View author publications
You can also search for this author in PubMed Google Scholar
Ð. Grozdić
View author publications
You can also search for this author in PubMed Google Scholar
S. T. Jovičić
View author publications
You can also search for this author in PubMed Google Scholar
M. Mijić
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. Marković.

Additional information

The article is published in the original.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marković, B., Galić, J., Grozdić, Ð. et al. Whispered speech recognition based on gammatone filterbank cepstral coefficients. J. Commun. Technol. Electron. 62, 1255–1261 (2017). https://doi.org/10.1134/S1064226917110134

Download citation

Received: 28 September 2015
Published: 22 November 2017
Issue Date: November 2017
DOI: https://doi.org/10.1134/S1064226917110134

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Whispered speech recognition based on gammatone filterbank cepstral coefficients

Abstract

Access this article

Similar content being viewed by others

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

HTK-Based Recognition of Whispered Speech

Automatic Recognition System for Dysarthric Speech Based on MFCC’s, PNCC’s, JITTER and SHIMMER Coefficients

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Whispered speech recognition based on gammatone filterbank cepstral coefficients

Abstract

Access this article

Similar content being viewed by others

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

HTK-Based Recognition of Whispered Speech

Automatic Recognition System for Dysarthric Speech Based on MFCC’s, PNCC’s, JITTER and SHIMMER Coefficients

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation