Abstract
Traditionally algorithms for speech coding exploit the features of speech signals by employing algorithmic models of the human vocal tract. More recently, the use of generic audio coders for coding of speech signals has gained increasing importance. Based on the properties of human hearing, such perceptual audio coders offer attractive properties including full-bandwidth audio output, increased naturalness, and good handling of any type of non-speech material. The chapter discusses the principles of perceptual audio coding, some relevant standards, and a number of perceptual audio coders that find application in speech and audio transmission and storage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ACELP:
-
algebraic code excited linear prediction
- AMR-WB+:
-
extended wide-band adaptive multirate coder
- AMR-WB:
-
wide-band AMR speech coder
- CELP:
-
code-excited linear prediction
- DSP:
-
digital signal processing
- ERB:
-
equivalent rectangular bandwidth
- FSS:
-
frequency selective switch
- GSM:
-
Groupe Spéciale Mobile
- HPF:
-
high-pass filter
- IFSS:
-
inverse frequency selective switch
- IMDCT:
-
inverse MDCT
- IQMF:
-
QMF synthesis filterbank
- ITU:
-
International Telecommunication Union
- LPC:
-
linear predictive coding
- LSR:
-
low sampling rates
- LTP:
-
long term prediction
- MDCT:
-
modified discrete cosine transform
- MPEG:
-
Moving Pictures Expert Ggroup
- MSE:
-
mean-square error
- NLMS:
-
normalized least-mean-square
- QMF:
-
quadrature mirror filter
- SNR:
-
signal-to-noise ratio
- TCX:
-
transform coded excitation
- TDAC:
-
time-domain aliasing cancelation
- TDBWE:
-
time-domain bandwidth extension
- TNS:
-
temporal noise shaping
- ULD:
-
ultra-low delay
References
B.C.J. Moore: Introduction to the Psychology of Hearing, 3rd edn. (Academic, New York 1989)
E. Zwicker, H. Fastl: Psychoacoustics, Facts and Models (Springer, Berlin, Heidelberg 1990)
J. Princen, A. Johnson, A. Bradley: Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation, IEEE ICASSP, 2161-2164 (1987)
J.H. Rothweiler: Polyphase Quadrature Filters - a new Subband Coding Technique, IEEE ICASSP, 1280-1283 (1983)
K. Brandenburg, E. Eberlein, J. Herre, B. Edler: Comparison of Filterbanks for High Quality Audio Coding, IEEE ISCAS (1992)
M. Bosi: Filter Banks in Perceptual Audio Coding, Proc. of the 17th International AES Conference on High Quality Audio Coding (1999)
R.P. Hellman: Asymmetry of Masking between Noise and Tone, Percept. Psychophys. 11, 241-246 (1972)
J. Herre: Temporal Noise Shaping, Quantization and Coding Methods in Perceptual Audio Coding: A Tutorial Introduction, Proc. of the 17th International AES Conference on High Quality Audio Coding (1999)
ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 11172, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s (ISO, Geneva 1993)
ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 13818-3, Generic Coding of Moving Pictures and Associated Audio: Audio (1994)
ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 13818-7, Generic Coding of Moving Pictures and Associated Audio: Advanced Audio Coding (1997)
ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 14496-3:2001, Coding of Audio-Visual Objects, Part 3 Audio (2001)
F. Pereira, T. Ebrahimi (Eds.): The MPEG-4 Book, IMSC Multimedia Series (Prentice Hall, Englewood Cliffs 2002)
ISO/IEC: JTC1/SC29/WG11 MPEG 14496-3:2001/Amd.1:2003, Coding of Audio-Visual Objects - Part 3: Audio, Amendment 1: Bandwidth extension (2003)
M. Dietz, L. Liljeryd, K. Kjoerling, O. Kunz: Spectral Band Replication, a Novel Approach in Audio Coding (112th AES Convention, Munich 2002), Preprint 5553
ISO/IEC: JTC1/SC29/WG11 MPEG 14496-3:2001/Amd.1:2003, Coding of Audio-Visual Objects - Part 3: Audio, Amendment 2: Parametric coding for high quality audio (2004)
W. Oomen, E. Schuijers, B. den Brinker, J. Breebaart: Advances in Parametric Coding for High-Quality Audio (114th AES Convention, Amsterdam 2002), Preprint 5852
B. Edler: Codierung von Audiosignalen mit überlappender Transformation und adaptiven Fensterfunktionen, Frequenz 43, 252-256 (1989), in German
E. Allamanche, R. Geiger, J. Herre, T. Sporer: MPEG-4 Low Delay Audio Coding based on the AAC Codec (106th AES Convention, Munich 1999), Preprint 4929
J. Herre, D. Schulz: Extending the MPEG-4 AAC Codec by Perceptual Noise Substitution (104th AES Convention, Amsterdam 1998), Preprint 4720
ITU-T Recommendation G.722.1 (5/2005): Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss
R. Geiger, M. Lutzky, M. Schmidt, M. Schnell: Structural Analysis of Low Latency Audio Coding Scheme (119th AES Convention, New York 2005), Preprint 6601
G. Schuller, A. Härmä: Low Delay Audio Compression using Predictive Coding, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando (2002)
G. Schuller, B. Yu, D. Huang, and B. Edler: Perceptual Audio Coding using Adaptive Pre and Post-Filters and Lossless Compression, IEEE Transactions on Speech and Audio Processing (2002) pp. 379-390
B. Edler, G. Schuller: Audio Coding Using a Psychoacoustic Pre- and Post-Filter, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul (2000)
J.-H. Chen, R.V. Cox, Y.-C. Lin, N. Jayant, M.J. Melchner: A low-delay CELP coder for the CCITT 16 kb/s speech coding standard, IEEE J. Sel Areas in Comm 10, 830-849 (1992)
A. Härmä, U. K. Laine, and M. Karjalainen: Backward adaptive warped lattice for wideband stereo coding in Proc. of EUSIPCO ʼ98, Greece (1998)
S.S. Haykin: Adaptive Filter Theory (Prentice Hall, Englewood Cliffs 1999)
U. Krämer, G. Schuller, S. Wabnik, J. Klier, J. Hirschfeld: Ultra Low Delay audio coding with constant bit rate, 117th AES Convention, San Francisco, Preprint 6197
S. Wabnik, G. Schuller, J. Hirschfeld, U. Kraemer: Packet Loss Concealment in Predictive Audio Coding, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (2005) New Paltz
ITU-T Recommendation G.729.1 (5/2006): G.729 based Embedded Variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729
GSM 3rd Generation Partnership Project (3GPP), 3GPP TS 26.290: Audio codec processing functions; Extended AMR Wideband codec; Transcoding functions
B. Bessette, R. Lefebvre, and R. Salami: Universal Speech/Audio Coding Using Hybrid ACELP/TCX Techniques, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia (2005)
ETSI TR 126 936 V6.1.0 (2006-03): Universal Mobile Telecommunications System (UMTS), Performance characterization of 3GPP audio codecs
R. Salami, R. Lefebvre, K. Kontola, S. Bruhn, A. Taleb: Extended AMR-WB for high-quality audio on mobile devices, IEEE Commun. Mag. 44(5), 90-97 (2006)
N.H. van Schijndel, S. van de Par: Rate-distortion optimized hybrid sound coding. In:, Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2005) (2005) pp. 235-238
R. Vafin, W. B. Kleijn: Rate-Distortion Optimized Quantization in Multistage Audio Coding, IEEE Transactions on Speech and Audio Processing 14:311-320 (2006)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Herre, J., Lutzky, M. (2008). Perceptual Audio Coding of Speech Signals. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)