Abstract
In network or ubiquitous environments, there are difficulties in performing large vocabulary speech recognition by a small device due to its limited power. Therefore, an approach, so-called distributed speech recognition (DSR), that distributes the processing modules of automatic speech recognition into a device and a server has been attractive. Of all processing modules of DSR, quantization of feature parameters plays a main role in terms of the transmission bandwidth and the recognition performance. In this paper, we propose an efficient quantizer of feature parameters by incorporating the correlation between successive analysis frames of speech. The proposed quantizer is based on the predictive multi-stage vector quantization scheme and designed with different bit rates by trading off with the performance of speech recognition. It is shown from speech recognition experiments that the DSR system employing the proposed quantization method can reduce a bit rate by 20% with a comparable recognition performance to the ETSI DSR standard.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Viikki, O.: ASR in portable wireless devices. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Tranto, Italy, pp. 96–102 (2001)
Kim, H.K., Cox, R.V.: A bitstream-based front-end for wireless speech recognition on IS-136 communications system. IEEE Speech Audio Process 9(5), 558–568 (2001)
ETSI ES 201 108 v1.1.3, Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI (2003)
Yifang, X., et al.: Roubust recongition of noisy speech using speech enhancement. In: Proc. IEEE International Conference on Signal Processing, Beijing, China, pp. 734–737 (2000)
Zhigang, C., Wentao, Z.: Speech enhancement based on minimum mean-square error short-time spectral estiamtion and its realization. In: Proc. IEEE International Conference on Intelligent Systems, vol. 2, pp. 1794–1797 (1997)
Zhu, Q., Alwan, A.: An efficient and scalable 2-D DCT-based feature coding scheme for remote speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, pp. 7–11 (2001)
Juang, B.H., Gray, A.H.: Interframe LSF quantization for noisy channels. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Paris, France, pp. 597–600 (1982)
Hirsch, G.: Experimental framework for the performance evaluation of speech recogntion front-ends on a large vocabulary task. ETSI STQ Aurora DSR Working Gruop (2002)
Young, S., et al.: The HTK Book (for HTK Version 3.2). Microsoft Corporation. Cambridge University Enginnering Department, Cambridge (2002)
The CMU Pronouncing Dictionary, Speech at Carnegie Mellon University, Carnegie Mellon University, PA (2001), http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Paul, D., Necioglu, B.: The Lincoln large-vocabulary stack-decoder HMM CSR. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, pp. 660–663 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yoon, J.S., Lee, G.H., Kim, H.K. (2005). Efficient Distribution of Feature Parameters for Speech Recognition in Network Environments. In: Ho, YS., Kim, H.J. (eds) Advances in Multimedia Information Processing - PCM 2005. PCM 2005. Lecture Notes in Computer Science, vol 3767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11581772_42
Download citation
DOI: https://doi.org/10.1007/11581772_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30027-4
Online ISBN: 978-3-540-32130-9
eBook Packages: Computer ScienceComputer Science (R0)