Skip to main content

Efficient Distribution of Feature Parameters for Speech Recognition in Network Environments

  • Conference paper
Advances in Multimedia Information Processing - PCM 2005 (PCM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3767))

Included in the following conference series:

  • 1187 Accesses

Abstract

In network or ubiquitous environments, there are difficulties in performing large vocabulary speech recognition by a small device due to its limited power. Therefore, an approach, so-called distributed speech recognition (DSR), that distributes the processing modules of automatic speech recognition into a device and a server has been attractive. Of all processing modules of DSR, quantization of feature parameters plays a main role in terms of the transmission bandwidth and the recognition performance. In this paper, we propose an efficient quantizer of feature parameters by incorporating the correlation between successive analysis frames of speech. The proposed quantizer is based on the predictive multi-stage vector quantization scheme and designed with different bit rates by trading off with the performance of speech recognition. It is shown from speech recognition experiments that the DSR system employing the proposed quantization method can reduce a bit rate by 20% with a comparable recognition performance to the ETSI DSR standard.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Viikki, O.: ASR in portable wireless devices. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Tranto, Italy, pp. 96–102 (2001)

    Google Scholar 

  2. Kim, H.K., Cox, R.V.: A bitstream-based front-end for wireless speech recognition on IS-136 communications system. IEEE Speech Audio Process 9(5), 558–568 (2001)

    Google Scholar 

  3. ETSI ES 201 108 v1.1.3, Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI (2003)

    Google Scholar 

  4. Yifang, X., et al.: Roubust recongition of noisy speech using speech enhancement. In: Proc. IEEE International Conference on Signal Processing, Beijing, China, pp. 734–737 (2000)

    Google Scholar 

  5. Zhigang, C., Wentao, Z.: Speech enhancement based on minimum mean-square error short-time spectral estiamtion and its realization. In: Proc. IEEE International Conference on Intelligent Systems, vol. 2, pp. 1794–1797 (1997)

    Google Scholar 

  6. Zhu, Q., Alwan, A.: An efficient and scalable 2-D DCT-based feature coding scheme for remote speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, pp. 7–11 (2001)

    Google Scholar 

  7. Juang, B.H., Gray, A.H.: Interframe LSF quantization for noisy channels. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Paris, France, pp. 597–600 (1982)

    Google Scholar 

  8. Hirsch, G.: Experimental framework for the performance evaluation of speech recogntion front-ends on a large vocabulary task. ETSI STQ Aurora DSR Working Gruop (2002)

    Google Scholar 

  9. Young, S., et al.: The HTK Book (for HTK Version 3.2). Microsoft Corporation. Cambridge University Enginnering Department, Cambridge (2002)

    Google Scholar 

  10. The CMU Pronouncing Dictionary, Speech at Carnegie Mellon University, Carnegie Mellon University, PA (2001), http://www.speech.cs.cmu.edu/cgi-bin/cmudict

  11. Paul, D., Necioglu, B.: The Lincoln large-vocabulary stack-decoder HMM CSR. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, pp. 660–663 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yoon, J.S., Lee, G.H., Kim, H.K. (2005). Efficient Distribution of Feature Parameters for Speech Recognition in Network Environments. In: Ho, YS., Kim, H.J. (eds) Advances in Multimedia Information Processing - PCM 2005. PCM 2005. Lecture Notes in Computer Science, vol 3767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11581772_42

Download citation

  • DOI: https://doi.org/10.1007/11581772_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30027-4

  • Online ISBN: 978-3-540-32130-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics