Skip to main content

Error Resilient Speech Coding Using Sub-band Hilbert Envelopes

  • Conference paper
Text, Speech and Dialogue (TSD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5729))

Included in the following conference series:

  • 847 Accesses

Abstract

Frequency Domain Linear Prediction (FDLP) represents a technique for auto-regressive modelling of Hilbert envelopes of a signal. In this paper, we propose a speech coding technique that uses FDLP in Quadrature Mirror Filter (QMF) sub-bands of short segments of the speech signal (25 ms). Line Spectral Frequency parameters related to autoregressive models and the spectral components of the residual signals are transmitted. For simulating the effects of lossy transmission channels, bit-packets are dropped randomly. In the objective and subjective quality evaluations, the proposed FDLP speech codec is judged to be more resilient to bit-packet losses compared to the state-of-the-art Adaptive Multi-Rate Wide-Band (AMR-WB) codec at 12 kbps.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schroeder, M.R., Atal, B.S.: Code-excited linear prediction (CELP): high-quality speech at very low bit rates. In: Proc. of the ICASSP, April 1985, vol. 10, pp. 937–940 (1985)

    Google Scholar 

  2. Enhanced aacPlus General Audio Codec, 3GPP TS 26.401

    Google Scholar 

  3. Athineos, M., Ellis, D.: Autoregressive Modeling of Temporal Envelopes. IEEE Trans. on Signal Proc. 55, 5237–5245 (2007)

    Article  Google Scholar 

  4. Kumerasan, R., Rao, A.: Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications. Journal of Acoustical Society of America 105(3), 1912–1924 (1999)

    Article  Google Scholar 

  5. Herre, J., Johnston, J.D.: Enhancing the Performance of Perceptual Audio Coders by using Temporal Noise Shaping (TNS). In: Proc. of 101st AES Conv., Los Angeles, USA, pp. 1–24 (1996)

    Google Scholar 

  6. Makhoul, J.: Linear Prediction: A Tutorial Review. Proc. of the IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  7. Motlicek, P., Ganapathy, S., Hermansky, H., Garudadri, H.: Frequency domain linear prediction for QMF sub-bands and applications to audio coding. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 248–258. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Marple, L.S.: Computing the Discrete-Time Analytic Signal via FFT. IEEE Trans. on Acoustics, Speech and Signal Proc. 47, 2600–2603 (1999)

    Article  Google Scholar 

  9. Fisher, W.M., et al.: The DARPA speech recognition research database: specifications and status. In: Proc. DARPA Workshop on Speech Recognition, February 1986, pp. 93–99 (1986)

    Google Scholar 

  10. ITU-T Rec. P.862: Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-end Speech Quality Assessment of Narrowband Telephone Networks

    Google Scholar 

  11. Extended AMR Wideband codec, http://www.3gpp.org/ftp/Specs/html-info/26290.htm

  12. Hirsch, H.G., Finster, H.: The Simulation of Realistic Acoustic Input Scenarios for Speech Recognition Systems. In: Proc. of Interspeech, September 2005, pp. 2697–3000 (2005)

    Google Scholar 

  13. ITU-R BS.1284-1: General methods for the subjective assessment of sound quality (2003)

    Google Scholar 

  14. ISO/IEC JTC1/SC29/WG11: Framework for Exploration of Speech and Audio Coding, MPEG2007/N9254, Lausanne, CH (July 2007)

    Google Scholar 

  15. Voice Age, http://www.voiceage.com/audiosamples.php

  16. ITU-R Recommendation BS.1534: Method for the subjective assessment of intermediate audio quality (June 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ganapathy, S., Motlicek, P., Hermansky, H. (2009). Error Resilient Speech Coding Using Sub-band Hilbert Envelopes . In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2009. Lecture Notes in Computer Science(), vol 5729. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04208-9_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04208-9_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04207-2

  • Online ISBN: 978-3-642-04208-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics