Efficient Distribution of Feature Parameters for Speech Recognition in Network Environments

Yoon, Jae Sam; Lee, Gil Ho; Kim, Hong Kook

doi:10.1007/11581772_42

Jae Sam Yoon¹⁸,
Gil Ho Lee¹⁸ &
Hong Kook Kim¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3767))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1187 Accesses

Abstract

In network or ubiquitous environments, there are difficulties in performing large vocabulary speech recognition by a small device due to its limited power. Therefore, an approach, so-called distributed speech recognition (DSR), that distributes the processing modules of automatic speech recognition into a device and a server has been attractive. Of all processing modules of DSR, quantization of feature parameters plays a main role in terms of the transmission bandwidth and the recognition performance. In this paper, we propose an efficient quantizer of feature parameters by incorporating the correlation between successive analysis frames of speech. The proposed quantizer is based on the predictive multi-stage vector quantization scheme and designed with different bit rates by trading off with the performance of speech recognition. It is shown from speech recognition experiments that the DSR system employing the proposed quantization method can reduce a bit rate by 20% with a comparable recognition performance to the ETSI DSR standard.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Viikki, O.: ASR in portable wireless devices. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Tranto, Italy, pp. 96–102 (2001)
Google Scholar
Kim, H.K., Cox, R.V.: A bitstream-based front-end for wireless speech recognition on IS-136 communications system. IEEE Speech Audio Process 9(5), 558–568 (2001)
Google Scholar
ETSI ES 201 108 v1.1.3, Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI (2003)
Google Scholar
Yifang, X., et al.: Roubust recongition of noisy speech using speech enhancement. In: Proc. IEEE International Conference on Signal Processing, Beijing, China, pp. 734–737 (2000)
Google Scholar
Zhigang, C., Wentao, Z.: Speech enhancement based on minimum mean-square error short-time spectral estiamtion and its realization. In: Proc. IEEE International Conference on Intelligent Systems, vol. 2, pp. 1794–1797 (1997)
Google Scholar
Zhu, Q., Alwan, A.: An efficient and scalable 2-D DCT-based feature coding scheme for remote speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, pp. 7–11 (2001)
Google Scholar
Juang, B.H., Gray, A.H.: Interframe LSF quantization for noisy channels. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Paris, France, pp. 597–600 (1982)
Google Scholar
Hirsch, G.: Experimental framework for the performance evaluation of speech recogntion front-ends on a large vocabulary task. ETSI STQ Aurora DSR Working Gruop (2002)
Google Scholar
Young, S., et al.: The HTK Book (for HTK Version 3.2). Microsoft Corporation. Cambridge University Enginnering Department, Cambridge (2002)
Google Scholar
The CMU Pronouncing Dictionary, Speech at Carnegie Mellon University, Carnegie Mellon University, PA (2001), http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Paul, D., Necioglu, B.: The Lincoln large-vocabulary stack-decoder HMM CSR. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, pp. 660–663 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Information and Communications, Gwangju Institute of Science and Technology (GIST), Gwangju, 500-712, Korea
Jae Sam Yoon, Gil Ho Lee & Hong Kook Kim

Authors

Jae Sam Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Gil Ho Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hong Kook Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Gwangju Institute of Science and Technology (GIST), 1 Oryong-dong Buk-gu, 500-712, Gwangju, Korea
Yo-Sung Ho
Multimedia Security Lab, Korea University, Science Campus, 136-701, Seoul, Korea
Hyoung Joong Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yoon, J.S., Lee, G.H., Kim, H.K. (2005). Efficient Distribution of Feature Parameters for Speech Recognition in Network Environments. In: Ho, YS., Kim, H.J. (eds) Advances in Multimedia Information Processing - PCM 2005. PCM 2005. Lecture Notes in Computer Science, vol 3767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11581772_42

Download citation

DOI: https://doi.org/10.1007/11581772_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30027-4
Online ISBN: 978-3-540-32130-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics