Skip to main content

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

This chapter presents an introduction to speech compression techniques, together with a detailed description of speech/audio compression standards including narrowband, wideband and fullband codecs. We will start with the fundamental concepts of speech signal digitisation, speech signal characteristics such as voiced speech and unvoiced speech and speech signal representation. We will then discuss three key speech compression techniques, namely waveform compression, parametric compression and hybrid compression methods. This is followed by a consideration of the concept of narrowband, wideband and fullband speech/audio compression. Key features of standards for narrowband, wideband and fullband codecs are then summarised. These include ITU-T, ETSI and IETF speech/audio codecs, such as G.726, G.728, G.729, G.723.1, G.722.1, G.719, GSM/AMR, iLBC and SILK codecs. Many of these codecs are widely used in VoIP applications and some have also been used in teleconferencing and telepresence applications. Understanding the principles of speech compression and main parameters of speech codecs such as frame size, codec delay, bitstream is important to gain a deeper understanding of the later chapters on Media Transport, Signalling and Quality of Experience (QoE) for VoIP applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 69.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.globalipsound.com

  2. 2.

    https://developer.skype.com/silk/

  3. 3.

    http://www.polycom.com

  4. 4.

    http://www.ericsson.com

References

  1. 3GPP (2011) Adaptive Multi-Rate—Wideband (AMR-WB) speech codec, transcoding functions (Release 10). 3GPP TS 26.190 V10.0.0

    Google Scholar 

  2. Andersen S, Duric A, et al (2004) Internet Low Bit rate Codec (iLBC). IETF RFC 3951

    Google Scholar 

  3. Andersen SV, Kleijn WB, Hagen R, Linden J, Murthi MN, Skoglund J (2002) iLBC—a linear predictive coder with robustness to packet losses. In: Proceedings of IEEE 2002 workshop on speech coding, Tsukuba Ibaraki, Japan, pp 23–25

    Chapter  Google Scholar 

  4. Atal BS, Hanauer SL (1971) Speech analysis and synthesis by linear prediction. J Acoust Soc Am 50:637–655

    Article  Google Scholar 

  5. Atal BS, Remde JR (1982) A new model of LPC excitation for producing natural-sounding speech at low bit rates. In: Proc IEEE int conf acoust speech, signal processing, pp 614–617

    Google Scholar 

  6. ETSI (1991) GSM full rate speech transcoding. GSM Rec 06.10

    Google Scholar 

  7. ETSI (1999) Digital cellular telecommunications system (Phase 2+); half rate speech; half rate speech transcoding. ETSI-EN-300-969 V6.0.1

    Google Scholar 

  8. ETSI (2000) Digital cellular telecommunications system (Phase 2+); Adaptive Multi-Rate (AMR) speech transcoding. ETSI-EN-301-704 V7.2.1

    Google Scholar 

  9. ETSI (2000) digital cellular telecommunications system (phase 2+); Enhanced Full Rate (EFR) speech transcoding. ETSI-EN-300-726 V8.0.1

    Google Scholar 

  10. Griffin DW, Lim JS (1988) Multiband excitation vocoder. IEEE Trans Acoust Speech Signal Process 36:1223–1235

    Article  MATH  Google Scholar 

  11. ITU-T (1988) 32 kbit/s adaptive differential pulse code modulation (ADPCM). ITU-T G.721

    Google Scholar 

  12. ITU-T (1988) 7 kHz audio-coding within 64 kbit/s. ITU-T Recommendation G.722

    Google Scholar 

  13. ITU-T (1988) Extensions of Recommendation G.721 adaptive differential pulse code modulation to 24 and 40 kbit/s for digital circuit multiplication equipment application. ITU-T G.723

    Google Scholar 

  14. ITU-T (1988) Pulse code modulation (PCM) of voice frequencies. ITU-T G.711

    Google Scholar 

  15. ITU-T (1990) 40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM). ITU-T G.726

    Google Scholar 

  16. ITU-T (1992) Coding of speech at 16 kbit/s using low-delay code excited linear prediction. ITU-T G.728

    Google Scholar 

  17. ITU-T (1996) Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP). ITU-T G.729

    Google Scholar 

  18. ITU-T (1996) Dual rate speech coder for multimedia communication transmitting at 5.3 and 6.3 kbit/s. ITU-T Recommendation G.723.1

    Google Scholar 

  19. ITU-T (1999) G.711: a high quality low-complexity algorithm for packet loss concealment with G.711. ITU-T G.711 Appendix I

    Google Scholar 

  20. ITU-T (1999) Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. ITU-T Recommendation G.722.1

    Google Scholar 

  21. ITU-T (2003) Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB). ITU-T Recommendation G.722.2

    Google Scholar 

  22. ITU-T (2005) Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. ITU-T Recommendation G.722.1

    Google Scholar 

  23. ITU-T (2008) Low-complexity, full-band audio coding for high-quality, conversational applications. ITU-T Recommendation G.719. http://www.itu.int/rec/T-REC-G.719-200806-I

  24. ITU-T (2008) Wideband embedded extension for G.711 pulse code modulation. ITU-T G.711.1

    Google Scholar 

  25. Jayant NS (1974) Digital coding of speech waveforms: PCM, DPCM and DM quantizers. Proc IEEE 62:611–632

    Article  Google Scholar 

  26. Kondoz AM (2004) Digital speech: coding for low bit rate communication systems, 2nd ed. Wiley, New York. ISBN:0-470-87008-7

    Book  Google Scholar 

  27. Mkwawa IH, Jammeh E, Sun L, Ifeachor E (2010) Feedback-free early VoIP quality adaptation scheme in next generation networks. In: Proceedings of IEEE Globecom 2010, Miami, Florida

    Google Scholar 

  28. Schroeder MR (1966) Vocoders: analysis and synthesis of speech. Proc IEEE 54:720–734

    Article  Google Scholar 

  29. Sun L, Ifeachor E (2006) Voice quality prediction models and their applications in VoIP networks. IEEE Trans Multimed 8:809–820

    Article  Google Scholar 

  30. TIA/EIA (1997) Enhanced Variable Rate Codec (EVRC). TIA-EIA-IS-127. http://www.3gpp2.org/public_html/specs/C.S0014-0_v1.0_revised.pdf

  31. Tremain TE (1982) The government standard linear predictive coding algorithm: LPC-10. Speech Technol Mag 40–49

    Google Scholar 

  32. Vos K, Jensen S, et al (2009) SILK speech codec. IETF RFC draft-vos-silk-00

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Sun, L., Mkwawa, IH., Jammeh, E., Ifeachor, E. (2013). Speech Compression. In: Guide to Voice and Video over IP. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4905-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4905-7_2

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4904-0

  • Online ISBN: 978-1-4471-4905-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics