Skip to main content

Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

  • Conference paper
Book cover Text, Speech and Dialogue (TSD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Included in the following conference series:

Abstract

We describe novel speech/audio coding technique designed to operate at medium bit-rates. Unlike classical state-of-the-art coders that are based on short-term spectra, our approach uses relatively long temporal segments of audio signal in critical-band-sized sub-bands. We apply auto-regressive model to approximate Hilbert envelopes in frequency sub-bands. Residual signals (Hilbert carriers) are demodulated and thresholding functions are applied in spectral domain. The Hilbert envelopes and carriers are quantized and transmitted to the decoder. Our experiments focused on designing speech/audio coder to provide broadcast radio-like quality audio around 15 − 25kbps. Obtained objective quality measures, carried out on standard speech recordings, were compared to the state-of-the-art 3GPP-AMR speech coding system.

This work was partially supported by grants from ICSI Berkeley, USA; the Swiss National Center of Competence in Research (NCCR) on “Inter active Multi-modal Information Management (IM)2”; managed by the IDIAP Research Institute on behalf of the Swiss Federal Authorities, and by the European Commission 6th Framework DIRAC Integrated Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Spanias, A.S.: Speech Coding: A Tutorial Review. Proc. of IEEE 82(10) (October 1994)

    Google Scholar 

  2. Makhoul, J.: Linear Prediction: A Tutorial Review. Proc. of IEEE 63(4) (April 1975)

    Google Scholar 

  3. Motlicek, P., Hermansky, H., Garudadri, H., Srinivasamurthy, N.: Speech Coding Based on Spectral Dynamics. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)

    Google Scholar 

  4. Quackenbush, S.R., Barnwell, T.P., Clements, M.A.: Objective Measures of Speech Quality. Advanced Reference Series. Prentice-Hall, Englewood Cliffs, NJ (1988)

    Google Scholar 

  5. ITU-T Rec. P.862: Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-end Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs, ITU, Geneva, Switzerland (2001)

    Google Scholar 

  6. Herre, J., Johnston, J.H.: Enhancing the performance of perceptual audio coders by using temporal noise shaping (TNS), in 101st Conv. Aud. Eng. Soc. (1996)

    Google Scholar 

  7. Athineos, M., Hermansky, H., Ellis, D.P.W.: LP-TRAP: Linear predictive temporal patterns. In: Proc. of ICSLP, Jeju, S. Korea, pp. 1154–1157 (October 2004)

    Google Scholar 

  8. Schimmel, S., Atlas, L.: Coherent Envelope Detector for Modulation Filtering of Speech. In: Proc. of ICASSP, Philadelphia, USA, vol. 1, pp. 221–224 (May 2005)

    Google Scholar 

  9. Fisher, W.M., et al.: The DARPA speech recognition research database: specifications and status. In: Proc. DARPA Workshop on Speech Recognition, pp. 93–99 (February 1986)

    Google Scholar 

  10. Hansen, J.H.L., Pellom, B.: An Effective Quality Evaluation Protocol for Speech Enhancement Algorithms. In: Proc. of ICSLP, Sydney, Australia, vol. 7, pp. 2819–2822 (December 1998)

    Google Scholar 

  11. 3GPP TS 26.071: AMR speech CODEC, General description, http://www.3gpp.org/ftp/Specs/html-info/26071.htm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Motlicek, P., Hermansky, H., Ganapathy, S., Garudadri, H. (2007). Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74628-7_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74627-0

  • Online ISBN: 978-3-540-74628-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics