Abstract
This chapter presents an introduction to speech compression techniques, together with a detailed description of speech/audio compression standards including narrowband, wideband and fullband codecs. We will start with the fundamental concepts of speech signal digitisation, speech signal characteristics such as voiced speech and unvoiced speech and speech signal representation. We will then discuss three key speech compression techniques, namely waveform compression, parametric compression and hybrid compression methods. This is followed by a consideration of the concept of narrowband, wideband and fullband speech/audio compression. Key features of standards for narrowband, wideband and fullband codecs are then summarised. These include ITU-T, ETSI and IETF speech/audio codecs, such as G.726, G.728, G.729, G.723.1, G.722.1, G.719, GSM/AMR, iLBC and SILK codecs. Many of these codecs are widely used in VoIP applications and some have also been used in teleconferencing and telepresence applications. Understanding the principles of speech compression and main parameters of speech codecs such as frame size, codec delay, bitstream is important to gain a deeper understanding of the later chapters on Media Transport, Signalling and Quality of Experience (QoE) for VoIP applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
3GPP (2011) Adaptive Multi-Rate—Wideband (AMR-WB) speech codec, transcoding functions (Release 10). 3GPP TS 26.190 V10.0.0
Andersen S, Duric A, et al (2004) Internet Low Bit rate Codec (iLBC). IETF RFC 3951
Andersen SV, Kleijn WB, Hagen R, Linden J, Murthi MN, Skoglund J (2002) iLBC—a linear predictive coder with robustness to packet losses. In: Proceedings of IEEE 2002 workshop on speech coding, Tsukuba Ibaraki, Japan, pp 23–25
Atal BS, Hanauer SL (1971) Speech analysis and synthesis by linear prediction. J Acoust Soc Am 50:637–655
Atal BS, Remde JR (1982) A new model of LPC excitation for producing natural-sounding speech at low bit rates. In: Proc IEEE int conf acoust speech, signal processing, pp 614–617
ETSI (1991) GSM full rate speech transcoding. GSM Rec 06.10
ETSI (1999) Digital cellular telecommunications system (Phase 2+); half rate speech; half rate speech transcoding. ETSI-EN-300-969 V6.0.1
ETSI (2000) Digital cellular telecommunications system (Phase 2+); Adaptive Multi-Rate (AMR) speech transcoding. ETSI-EN-301-704 V7.2.1
ETSI (2000) digital cellular telecommunications system (phase 2+); Enhanced Full Rate (EFR) speech transcoding. ETSI-EN-300-726 V8.0.1
Griffin DW, Lim JS (1988) Multiband excitation vocoder. IEEE Trans Acoust Speech Signal Process 36:1223–1235
ITU-T (1988) 32 kbit/s adaptive differential pulse code modulation (ADPCM). ITU-T G.721
ITU-T (1988) 7 kHz audio-coding within 64 kbit/s. ITU-T Recommendation G.722
ITU-T (1988) Extensions of Recommendation G.721 adaptive differential pulse code modulation to 24 and 40 kbit/s for digital circuit multiplication equipment application. ITU-T G.723
ITU-T (1988) Pulse code modulation (PCM) of voice frequencies. ITU-T G.711
ITU-T (1990) 40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM). ITU-T G.726
ITU-T (1992) Coding of speech at 16 kbit/s using low-delay code excited linear prediction. ITU-T G.728
ITU-T (1996) Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP). ITU-T G.729
ITU-T (1996) Dual rate speech coder for multimedia communication transmitting at 5.3 and 6.3 kbit/s. ITU-T Recommendation G.723.1
ITU-T (1999) G.711: a high quality low-complexity algorithm for packet loss concealment with G.711. ITU-T G.711 Appendix I
ITU-T (1999) Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. ITU-T Recommendation G.722.1
ITU-T (2003) Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB). ITU-T Recommendation G.722.2
ITU-T (2005) Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. ITU-T Recommendation G.722.1
ITU-T (2008) Low-complexity, full-band audio coding for high-quality, conversational applications. ITU-T Recommendation G.719. http://www.itu.int/rec/T-REC-G.719-200806-I
ITU-T (2008) Wideband embedded extension for G.711 pulse code modulation. ITU-T G.711.1
Jayant NS (1974) Digital coding of speech waveforms: PCM, DPCM and DM quantizers. Proc IEEE 62:611–632
Kondoz AM (2004) Digital speech: coding for low bit rate communication systems, 2nd ed. Wiley, New York. ISBN:0-470-87008-7
Mkwawa IH, Jammeh E, Sun L, Ifeachor E (2010) Feedback-free early VoIP quality adaptation scheme in next generation networks. In: Proceedings of IEEE Globecom 2010, Miami, Florida
Schroeder MR (1966) Vocoders: analysis and synthesis of speech. Proc IEEE 54:720–734
Sun L, Ifeachor E (2006) Voice quality prediction models and their applications in VoIP networks. IEEE Trans Multimed 8:809–820
TIA/EIA (1997) Enhanced Variable Rate Codec (EVRC). TIA-EIA-IS-127. http://www.3gpp2.org/public_html/specs/C.S0014-0_v1.0_revised.pdf
Tremain TE (1982) The government standard linear predictive coding algorithm: LPC-10. Speech Technol Mag 40–49
Vos K, Jensen S, et al (2009) SILK speech codec. IETF RFC draft-vos-silk-00
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this chapter
Cite this chapter
Sun, L., Mkwawa, IH., Jammeh, E., Ifeachor, E. (2013). Speech Compression. In: Guide to Voice and Video over IP. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4905-7_2
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4905-7_2
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4904-0
Online ISBN: 978-1-4471-4905-7
eBook Packages: Computer ScienceComputer Science (R0)