Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

Motlicek, Petr; Hermansky, Hynek; Ganapathy, Sriram; Garudadri, Harinath

doi:10.1007/978-3-540-74628-7_46

Petr Motlicek^1,2,
Hynek Hermansky^1,2,3,
Sriram Ganapathy^1,3 &
…
Harinath Garudadri⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1734 Accesses
1 Citations

Abstract

We describe novel speech/audio coding technique designed to operate at medium bit-rates. Unlike classical state-of-the-art coders that are based on short-term spectra, our approach uses relatively long temporal segments of audio signal in critical-band-sized sub-bands. We apply auto-regressive model to approximate Hilbert envelopes in frequency sub-bands. Residual signals (Hilbert carriers) are demodulated and thresholding functions are applied in spectral domain. The Hilbert envelopes and carriers are quantized and transmitted to the decoder. Our experiments focused on designing speech/audio coder to provide broadcast radio-like quality audio around 15 − 25kbps. Obtained objective quality measures, carried out on standard speech recordings, were compared to the state-of-the-art 3GPP-AMR speech coding system.

This work was partially supported by grants from ICSI Berkeley, USA; the Swiss National Center of Competence in Research (NCCR) on “Inter active Multi-modal Information Management (IM)2”; managed by the IDIAP Research Institute on behalf of the Swiss Federal Authorities, and by the European Commission 6th Framework DIRAC Integrated Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Spanias, A.S.: Speech Coding: A Tutorial Review. Proc. of IEEE 82(10) (October 1994)
Google Scholar
Makhoul, J.: Linear Prediction: A Tutorial Review. Proc. of IEEE 63(4) (April 1975)
Google Scholar
Motlicek, P., Hermansky, H., Garudadri, H., Srinivasamurthy, N.: Speech Coding Based on Spectral Dynamics. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)
Google Scholar
Quackenbush, S.R., Barnwell, T.P., Clements, M.A.: Objective Measures of Speech Quality. Advanced Reference Series. Prentice-Hall, Englewood Cliffs, NJ (1988)
Google Scholar
ITU-T Rec. P.862: Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-end Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs, ITU, Geneva, Switzerland (2001)
Google Scholar
Herre, J., Johnston, J.H.: Enhancing the performance of perceptual audio coders by using temporal noise shaping (TNS), in 101st Conv. Aud. Eng. Soc. (1996)
Google Scholar
Athineos, M., Hermansky, H., Ellis, D.P.W.: LP-TRAP: Linear predictive temporal patterns. In: Proc. of ICSLP, Jeju, S. Korea, pp. 1154–1157 (October 2004)
Google Scholar
Schimmel, S., Atlas, L.: Coherent Envelope Detector for Modulation Filtering of Speech. In: Proc. of ICASSP, Philadelphia, USA, vol. 1, pp. 221–224 (May 2005)
Google Scholar
Fisher, W.M., et al.: The DARPA speech recognition research database: specifications and status. In: Proc. DARPA Workshop on Speech Recognition, pp. 93–99 (February 1986)
Google Scholar
Hansen, J.H.L., Pellom, B.: An Effective Quality Evaluation Protocol for Speech Enhancement Algorithms. In: Proc. of ICSLP, Sydney, Australia, vol. 7, pp. 2819–2822 (December 1998)
Google Scholar
3GPP TS 26.071: AMR speech CODEC, General description, http://www.3gpp.org/ftp/Specs/html-info/26071.htm

Download references

Author information

Authors and Affiliations

IDIAP Research Institute, Rue du Simplon 4, CH-1920, Martigny, Switzerland
Petr Motlicek, Hynek Hermansky & Sriram Ganapathy
Faculty of Information Technology, Brno University of Technology, Božetěchova 2, Brno, 612 66, Czech Republic
Petr Motlicek & Hynek Hermansky
École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Hynek Hermansky & Sriram Ganapathy
Qualcomm Inc., San Diego, California, USA
Harinath Garudadri

Authors

Petr Motlicek
View author publications
You can also search for this author in PubMed Google Scholar
Hynek Hermansky
View author publications
You can also search for this author in PubMed Google Scholar
Sriram Ganapathy
View author publications
You can also search for this author in PubMed Google Scholar
Harinath Garudadri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Motlicek, P., Hermansky, H., Ganapathy, S., Garudadri, H. (2007). Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_46

Download citation

DOI: https://doi.org/10.1007/978-3-540-74628-7_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics