In distributed and network speech recognition the actual recognition task is not carried out on the user’s terminal but rather on a remote server in the network. While there are good reasons for doing so, a disadvantage of this client-server architecture is clearly that the communication medium may introduce errors, which then impairs speech recognition accuracy. Even sophisticated channel coding cannot completely prevent the occurrence of residual bit errors in the case of temporarily adverse channel conditions, and in packet-oriented transmission packets of data may arrive too late for the given real-time constraints and have to be declared lost. The goal of error concealment is to reduce the detrimental effect that such errors may induce on the recipient of the transmitted speech signal by exploiting residual redundancy in the bit stream at the source coder output. In classical speech transmission a human is the recipient, and erroneous data are reconstructed so as to reduce the subjectively annoying effect of corrupted bits or lost packets. Here, however, a statistical classifier is at the receiving end, which can benefit from knowledge about the quality of the reconstruction. In this book chapter we show how the classical Bayesian decision rule needs to be modified to account for uncertain features, and illustrate how the required feature posterior density can be estimated in the case of distributed speech recognition. Some other techniques for error concealment can be related to this approach. Experimental results are given for both a small and a medium vocabulary recognition task and both for a channel exhibiting bit errors and a packet erasure channel.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arrowood, J.A. and Clements, M.A. (2002). Using observation uncertainty in HMM decod-ing. In Proc. ICSLP, Denver, Colorado.
Bahl, L., Cocke, J., Jelinek, F. and Raviv, J. (1974). Optimal decoding of linear codes for minimizing symbol error rate, IEEE Trans. Inf. Theory, vol. 10, pp. 284-287.
Bernard, A. and Alwan, A. (2001). Joint channel decoding—Viterbi recognition for wireless applications. In Proc. Eurospeech, Aalborg, Denmark.
Bernard, A. and Alwan, A. (2002). Low-bitrate distributed speech recognition for packet-based and wireless communication. IEEE Trans. Speech and Audio Process., vol. 10, no. 8, Nov., 2002.
Boulis, C., Ostendorf, M., Riskin, E.A. and Otterson, S. (2002). Graceful degradation of speech recognition performance over packet-erasure networks. IEEE Trans. on Speech and Audio Processing, vol. 10, no. 8, Nov. pp. 580-590.
Cardenal-López, A., García-Mateo, C. and Docío-Fernández, L. (2006). Weighted Viterbi decoding strategies for distributed speech recognition over IP networks, Speech Commu-nication, vol. 48, no. 11, Nov., pp. 1422-1434.
COST 207 (1989). Digital land mobile radio communication—Final report. Office for offi-cial publications of the European Communities, Luxembourg.
Cox, R.V., Kleijn, W.B. and Kroon, P. (1989). Robust CELP coders for noisy backgrounds and noisy channels. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 1989, pp. 739-742.
Davis, S.B. and Mermelstein P. (1980). Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoust. Speech and Signal Process., vol. 28, pp. 357-366.
Droppo, J., Acero, A. and Deng, L. (2002). Uncertainty decoding with Splice for noise robust speech recognition. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, Florida.
Endo, T., Kuroiwa, S. and Nakamura, S. (2003). Missing feature theory applied to robust speech recognition over IP networks. In Proc. Eurospeech, Geneva, Switzerland. ETSI Standard ES 202 050 (2002). Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. v1.1.1, Oct.
ETSI Standard ES 201 108 (2003a). Speech processing, transmission and quality aspects (STQ);distributed speech recognition; front-end feature extraction algorithm; compression algorithms. v1.1.3, Sep.
ETSI Standard TS 100 909 v8.7.1 (2003b). Digital cellular telecommunications system (phase 2+); channel coding. (3GPP TS 05.03 version 8.7.0; Release 1999).
Fingscheidt, T., Aalburg, S., Stan, T. and Beaugeant, C. (2002). Network-based versus distrib-uted speech recognition in adaptive multi-rate wireless systems. In Proc. Int. Conf. on Spoken Language Proc., Denver.
Fingscheidt, T. and Vary, P. (2001). Softbit speech decoding: A new approach to error con-cealment. IEEE Trans. Speech and Audio Proc., vol. 9, no. 3, March, pp. 1-11.
Gómez, A.M., Peinado, A.M., Sánchez, V. and Rubio, J. (2007). On the Ramsey class of interleavers for robust speech recognition in burst-like packet loss, IEEE Trans. Audio Speech and Lang. Process., vol. 15, no. 4, May, pp. 1496-1499.
GSM 06.11 Recommendation (1992). Substitution and muting of lost frames for full rate speech traffic channels. ETSI TC-SMG.
Haeb-Umbach, R. and Ion, V. (2004). Soft features for improved distributed speech recogni-tion over wireless networks. In Proc. ICSLP, Jeju, Korea.
Hirsch, H.G. and Pearce, D. (2000). The Aurora experimental framework for the performance evaluation ofspeech recognition systems undernoisy conditions. In Proc. ISCA ITRW Workshop ASR2000, Paris, France, pp. 181-188.
Ion, V. and Haeb-Umbach, R. (2005). A unified probabilistic approach to error concealment for distributed speech recognition. In Proc. Interspeech, Lisbon.
Ion, V. and Haeb-Umbach, R. (2006a). Uncertainty decoding for distributed speech recogni-tion over error-prone networks, Speech Communication 48, pp. 1435-1446.
Ion, V. and Haeb-Umbach, R. (2006b). An inexpensive packet loss compensation scheme for distributed speech recognition based on soft-features. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Toulouse, France.
Ion, V. and Haeb-Umbach, R. (2006c). Improved source modeling and predictive classifica-tion for channel robust speech recognition. In Proc. Interspeech, Pittsburgh. ITU-T Recommendation G.711 Appendix I (1999). A high quality low-complexity algorithm for packet loss concealment with G.711.
James, A.B., Gomez, A. and Milner, B.P. (2004). A comparison of packet loss compensation methods and interleaving for speech recognition in burst-like packet loss. In Proc. ICSLP, Jeju, Korea.
Kristjansson, T.T. and Frey, B.J. (2002). Accounting for uncertainty in observations: A new paradigm for robust speech recognition. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, Florida.
Lahouti, F. and Khandani, A.K. (2007). Soft reconstruction of speech in the presence of noise and packet loss. IEEE Trans. Audio Speech and Lang. Proc., vol. 15, no. 1, Jan., pp. 44-56.
Liao, H. and Gales, M.J.F. (2004). Uncertainty decoding for noise robust automatic speech recognition. Technical Report TR.499, Cambridge University Engineering Department.
Milner, B. and Semnani, S. (2000). Robust speech recognition over IP networks. In Proc. Int. Conf. Acoust. Speech Signal Process., Istanbul, Turkey.
Morris, A., Cooke, M. and Green, P. (1998). Some solutions to the missing feature problem in data classification, with application to noise-robust ASR. In Proc. Int. Conf. Acoust. Speech Signal Process., Seattle.
Morris, A., Barker, J. and Bourlard, H. (2001). From missing data to maybe useful data: Soft data modeling for noise robust ASR. In Proc. WISP, vol. 6.
Paul, D. and Baker, J. (1992). The design for the Wall Street Journal-based CSR corpus. DARPA Technical Report.
Pearce, D. (2000). Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities for distributed speech recognition front-ends. In Proc. Voice Input/Output Soc. Speech Applications Conference, May.
Peinado, A.M., Sanchez, V., Perez-Cordoba, J.L. and de la Torre, A. (2003). HMM-based channel error mitigation and its application to distributed speech recognition. Speech Communication, 41, pp. 549-561.
Potamianos, A. and Weerackody, V. (2001). Soft-feature decoding for speech recognition over wireless channels. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Salt Lake City, Utah.
RFC 2460 (1998). Internet Protocol, Version 6 (IPv6) Specification, http://www.ietf.org/rfc/ rfc2460.txt, Internet Engineering Task Force, Dec.
RFC 3828 (2004). The Lightweight User Datagram Protocol (UDP-Lite), http://www.ietf.org/ rfc/rfc3828.txt, Internet Engineering Task Force, July.
Tan, Z.-H., Dalsgaard, P. and Lindberg, B. (2004). A subvector-based error concealment algorithm for speech recognition over mobile networks. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Montreal, Quebec, Canada.
Tan, Z.H., Dalsgaard, P. and Lindberg, B. (2005). Automatic speech recognition over error-prone wireless networks, Speech Communication, vol. 47, no. 1-2, Sep.-Oct., pp 220-242.
Vary, P. and Martin, R. (2006). Digital Speech Transmission—Enhancement, Coding and Error Concealment. John Wiley, New York.
Weerackody, V., Reichl, W. and Potamianos, A. (2002). An error-protected speech recogni-tion system for wireless communications. IEEE Trans. on Wireless Communications, vol. 1, no. 2, April, pp. 282-291.
Young, S.J. et al. (2004). HTK: Hidden Markov Model Toolkit V3.2.1 Reference Manual. Cambridge University Speech Group, Cambridge, U.K.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this chapter
Cite this chapter
Haeb-Umbach, R., Ion, V. (2008). Error Concealment. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_9
Download citation
DOI: https://doi.org/10.1007/978-1-84800-143-5_9
Publisher Name: Springer, London
Print ISBN: 978-1-84800-142-8
Online ISBN: 978-1-84800-143-5
eBook Packages: Computer ScienceComputer Science (R0)