Speech Compression

Sun, Lingfen; Mkwawa, Is-Haka; Jammeh, Emmanuel; Ifeachor, Emmanuel

doi:10.1007/978-1-4471-4905-7_2

Lingfen Sun⁵,
Is-Haka Mkwawa⁵,
Emmanuel Jammeh⁵ &
…
Emmanuel Ifeachor⁵

Part of the book series: Computer Communications and Networks ((CCN))

1671 Accesses
6 Citations

Abstract

This chapter presents an introduction to speech compression techniques, together with a detailed description of speech/audio compression standards including narrowband, wideband and fullband codecs. We will start with the fundamental concepts of speech signal digitisation, speech signal characteristics such as voiced speech and unvoiced speech and speech signal representation. We will then discuss three key speech compression techniques, namely waveform compression, parametric compression and hybrid compression methods. This is followed by a consideration of the concept of narrowband, wideband and fullband speech/audio compression. Key features of standards for narrowband, wideband and fullband codecs are then summarised. These include ITU-T, ETSI and IETF speech/audio codecs, such as G.726, G.728, G.729, G.723.1, G.722.1, G.719, GSM/AMR, iLBC and SILK codecs. Many of these codecs are widely used in VoIP applications and some have also been used in teleconferencing and telepresence applications. Understanding the principles of speech compression and main parameters of speech codecs such as frame size, codec delay, bitstream is important to gain a deeper understanding of the later chapters on Media Transport, Signalling and Quality of Experience (QoE) for VoIP applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

3GPP (2011) Adaptive Multi-Rate—Wideband (AMR-WB) speech codec, transcoding functions (Release 10). 3GPP TS 26.190 V10.0.0
Google Scholar
Andersen S, Duric A, et al (2004) Internet Low Bit rate Codec (iLBC). IETF RFC 3951
Google Scholar
Andersen SV, Kleijn WB, Hagen R, Linden J, Murthi MN, Skoglund J (2002) iLBC—a linear predictive coder with robustness to packet losses. In: Proceedings of IEEE 2002 workshop on speech coding, Tsukuba Ibaraki, Japan, pp 23–25
Chapter Google Scholar
Atal BS, Hanauer SL (1971) Speech analysis and synthesis by linear prediction. J Acoust Soc Am 50:637–655
Article Google Scholar
Atal BS, Remde JR (1982) A new model of LPC excitation for producing natural-sounding speech at low bit rates. In: Proc IEEE int conf acoust speech, signal processing, pp 614–617
Google Scholar
ETSI (1991) GSM full rate speech transcoding. GSM Rec 06.10
Google Scholar
ETSI (1999) Digital cellular telecommunications system (Phase 2+); half rate speech; half rate speech transcoding. ETSI-EN-300-969 V6.0.1
Google Scholar
ETSI (2000) Digital cellular telecommunications system (Phase 2+); Adaptive Multi-Rate (AMR) speech transcoding. ETSI-EN-301-704 V7.2.1
Google Scholar
ETSI (2000) digital cellular telecommunications system (phase 2+); Enhanced Full Rate (EFR) speech transcoding. ETSI-EN-300-726 V8.0.1
Google Scholar
Griffin DW, Lim JS (1988) Multiband excitation vocoder. IEEE Trans Acoust Speech Signal Process 36:1223–1235
Article MATH Google Scholar
ITU-T (1988) 32 kbit/s adaptive differential pulse code modulation (ADPCM). ITU-T G.721
Google Scholar
ITU-T (1988) 7 kHz audio-coding within 64 kbit/s. ITU-T Recommendation G.722
Google Scholar
ITU-T (1988) Extensions of Recommendation G.721 adaptive differential pulse code modulation to 24 and 40 kbit/s for digital circuit multiplication equipment application. ITU-T G.723
Google Scholar
ITU-T (1988) Pulse code modulation (PCM) of voice frequencies. ITU-T G.711
Google Scholar
ITU-T (1990) 40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM). ITU-T G.726
Google Scholar
ITU-T (1992) Coding of speech at 16 kbit/s using low-delay code excited linear prediction. ITU-T G.728
Google Scholar
ITU-T (1996) Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP). ITU-T G.729
Google Scholar
ITU-T (1996) Dual rate speech coder for multimedia communication transmitting at 5.3 and 6.3 kbit/s. ITU-T Recommendation G.723.1
Google Scholar
ITU-T (1999) G.711: a high quality low-complexity algorithm for packet loss concealment with G.711. ITU-T G.711 Appendix I
Google Scholar
ITU-T (1999) Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. ITU-T Recommendation G.722.1
Google Scholar
ITU-T (2003) Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB). ITU-T Recommendation G.722.2
Google Scholar
ITU-T (2005) Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. ITU-T Recommendation G.722.1
Google Scholar
ITU-T (2008) Low-complexity, full-band audio coding for high-quality, conversational applications. ITU-T Recommendation G.719. http://www.itu.int/rec/T-REC-G.719-200806-I
ITU-T (2008) Wideband embedded extension for G.711 pulse code modulation. ITU-T G.711.1
Google Scholar
Jayant NS (1974) Digital coding of speech waveforms: PCM, DPCM and DM quantizers. Proc IEEE 62:611–632
Article Google Scholar
Kondoz AM (2004) Digital speech: coding for low bit rate communication systems, 2nd ed. Wiley, New York. ISBN:0-470-87008-7
Book Google Scholar
Mkwawa IH, Jammeh E, Sun L, Ifeachor E (2010) Feedback-free early VoIP quality adaptation scheme in next generation networks. In: Proceedings of IEEE Globecom 2010, Miami, Florida
Google Scholar
Schroeder MR (1966) Vocoders: analysis and synthesis of speech. Proc IEEE 54:720–734
Article Google Scholar
Sun L, Ifeachor E (2006) Voice quality prediction models and their applications in VoIP networks. IEEE Trans Multimed 8:809–820
Article Google Scholar
TIA/EIA (1997) Enhanced Variable Rate Codec (EVRC). TIA-EIA-IS-127. http://www.3gpp2.org/public_html/specs/C.S0014-0_v1.0_revised.pdf
Tremain TE (1982) The government standard linear predictive coding algorithm: LPC-10. Speech Technol Mag 40–49
Google Scholar
Vos K, Jensen S, et al (2009) SILK speech codec. IETF RFC draft-vos-silk-00
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, University of Plymouth, Plymouth, UK
Lingfen Sun, Is-Haka Mkwawa, Emmanuel Jammeh & Emmanuel Ifeachor

Authors

Lingfen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Is-Haka Mkwawa
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Jammeh
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Ifeachor
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sun, L., Mkwawa, IH., Jammeh, E., Ifeachor, E. (2013). Speech Compression. In: Guide to Voice and Video over IP. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4905-7_2

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4905-7_2
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4904-0
Online ISBN: 978-1-4471-4905-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics