Skip to main content

Recent Speech Coding Technologies and Standards

  • Chapter
  • First Online:
Book cover Speech and Audio Processing for Coding, Enhancement and Recognition

Abstract

This chapter presents an overview of recent developments in conversational speech coding technologies, important new algorithmic advances, and recent standardization activities in ITU-T, 3GPP, 3GPP2, MPEG and IETF that offer a significantly improved user experience during voice calls on existing and future communication systems. User experience is determined by speech quality, hence network operators are very concerned about quality of speech coders. Operators are also concerned about capacity, hence coding efficiency is another important measure. Advanced speech coding technologies provide the capability to both improve coding efficiency and user experience. One option to improve quality is to extend the audio bandwidth from traditional narrowband to wideband (16 kHz sampling) and super-wideband (32 kHz sampling). Another method is in increasing the robustness of the coder against transmission errors. Error concealment algorithms are used which substitute the missing parts of the audio signal as far as possible. In packet-switched applications (VoIP systems), special mechanisms are included in jitter buffer management (JBM) algorithms to maximize sound quality. It is of high importance to ensure standardization and deployment of speech coders that meet quality expectations. As an example of this, we refer to the Enhanced Voice Services (EVS) project in 3GPP that is developing the next generation speech coder in 3GPP. The basic motivation for 3GPP to start the EVS project was to extend the path of codec evolution by providing super-wideband experience at around 13 kb/s and better quality for music and mixed content in conversational applications. Optimized behavior in VoIP applications is achieved through the introduction of high error robustness, jitter buffer management, inclusion of source-controlled variable bit rate operation, support of various audio bandwidths, and stereo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. IETF Datatracker (2014), https://datatracker.ietf.org/doc/

  2. Study of use cases and requirements for enhanced voice codecs for the evolved packet system (EPS). 3GPP TSG-SA TR 22.813 (2010), http://www.3gpp.org/ftp/Specs/archive/22_series/22.813/22813a00.zip

  3. Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems. 3GPP2 3GPP2 C.S0014-A v1.0 (2004), http://www.3gpp2.org/Public_html/specs/C.S0014-A_v1.0_040426.pdf

  4. Enhanced variable rate codec, speech service option 3 and 68 for wideband spread spectrum digital systems. 3GPP2 3GPP2 C.S0014-B v1.0 (2006), http://www.3gpp2.org/Public_html/specs/C.S0014-B_v1.0_060501.pdf

  5. Enhanced variable rate codec, speech service options 3, 68, and 70 for wideband spread spectrum digital systems. 3GPP2 C.S0014-C v1.0 (2007), http://www.3gpp2.org/Public_html/specs/C.S0014-C_v1.0_070116.pdf

  6. Enhanced variable rate codec, speech service options 3, 68, 70, & 73 for wideband spread spectrum digital systems. 3GPP2 C.S0014-D v3.0 (2010), http://www.3gpp2.org/Public_html/specs/C.S0014-D_v3.0_EVRC.pdf

  7. Enhanced variable rate codec, speech service options 3, 68, 70, 73 and 77 for wideband spread spectrum digital systems. 3GPP2 C.S0014-E v1.0 (2011), http://www.3gpp2.org/Public_html/specs/C.S0014-E_v1.0_EVRC_20111231.pdf

  8. Introduction to CDMA2000 extended cell high rate packet data air interface specification. 3GPP2 C.S0098-100-0 v1.0 (2011), http://www.3gpp2.org/Public_html/specs/C.S0098-100-0_v1.0_xHRPD_Intro.pdf

  9. System requirements for extended cell HRPD (xHRPD). 3GPP2 S.R0143-0 v1.0 (2010), http://www.3gpp2.org/Public_html/specs/S.R0143-0v1.0ExtendedRangexHRPDSRD.pdf

  10. The AAC-ELD family for high quality communication services. Fraunhofer IIS Technical Paper (2013), http://www.iis.fraunhofer.de/content/dam/iis/de/dokumente/amm/wp/AAC-ELD-family_TechnicalPaper.pdf

  11. B. Bessette, The adaptive multirate wideband speech codec (AMR-WB). IEEE Trans. Speech Audio Process. 10, 620–636 (2002)

    Article  Google Scholar 

  12. B. Geiser, P. Jax, P. Vary, H. Taddei, S. Schandl, M. Gartner, C. Guillaum’e, S. Ragot, Bandwidth extension for hierarchical speech and audio coding in ITU-T Rec. G.729.1. IEEE Trans. Audio Speech Lang. Process. 15(8), 2496–2509 (2007)

    Google Scholar 

  13. B. Geiser, P. Vary, High rate data hiding in ACELP speech codecs, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2008 (ICASSP 2008) (2008), pp. 4005–4008. doi:10.1109/ICASSP.2008.4518532

  14. M. Handley, H. Schulzrinne, E. Schooler, J. Rosenberg, SIP: Session Initiation Protocol. RFC 2543 (Proposed Standard) (1999), http://www.ietf.org/rfc/rfc2543.txt. Obsoleted by RFCs 3261, 3262, 3263, 3264, 3265

  15. Y. Hiwasaki, H. Ohmuro, ITU-T G.711.1: Extending G.711 to higher-quality wideband speech. IEEE Commun. Mag. 47(10), 110–116 (2009)

    Google Scholar 

  16. ITU-T Recommendation G.114: One-way Transmission Time (2003), http://www.itu.int/rec/T-REC-G.114

  17. ITU-T Recommendation G.711 Appendix I: Lower-band postfiltering for R1 mode (2012)

    Google Scholar 

  18. ITU-T Recommendation G.711.0: Lossless compression for G.711 PCM (2009)

    Google Scholar 

  19. ITU-T Recommendation G.711.1: Wideband embedded extension for ITU-T G.711 (2012)

    Google Scholar 

  20. ITU-T Recommendation G.711.1 Annex C: Lossless compression of ITU-T G.711 PCM compatible bitstream in ITU-T G.711.1 (2012)

    Google Scholar 

  21. ITU-T Recommendation G.711.1 Annex D: Superwideband extension (2012)

    Google Scholar 

  22. ITU-T Recommendation G.711.1 Annex F: Stereo embedded extension for ITU-T G.711.1 (2012)

    Google Scholar 

  23. ITU-T Recommendation G.711.1 Appendix IV: Mid-side stereo (2012)

    Google Scholar 

  24. ITU-T Recommendation G.718: Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s (2008)

    Google Scholar 

  25. ITU-T Recommendation G.718 Annex B: Superwideband scalable extension for G.718 (2009)

    Google Scholar 

  26. ITU-T Recommendation G.719: Low-complexity full-band audio coding for high-quality conversational applications (2008)

    Google Scholar 

  27. ITU-T Recommendation G.722: 7 kHz Audio coding within 64 kb/s (2012)

    Google Scholar 

  28. ITU-T Recommendation G.722 Annex B: Superwideband embedded extension for G.722 (2012)

    Google Scholar 

  29. ITU-T Recommendation G.722 Annex D: Stereo embedded extension for G.722 (2012)

    Google Scholar 

  30. ITU-T Recommendation G.722 Appendix III: A high-quality packet loss concealment algorithm for G.722 (2012)

    Google Scholar 

  31. ITU-T Recommendation G.722 Appendix IV: A low-complexity packet loss concealment algorithm for G.722 (2012)

    Google Scholar 

  32. ITU-T Recommendation G.722 Appendix V: Mid-side stereo (2012)

    Google Scholar 

  33. ITU-T Recommendation G.729.1: G.729 Based embedded variable bit-rate coder: An 8–32 kb/s scalable wideband coder bitstream interoperable with G.729 (2006)

    Google Scholar 

  34. ITU-T Recommendation G.729.1 Annex E: Superwideband scalable extension for G.729.1 (2010)

    Google Scholar 

  35. ITU-T Recommendation H.323: Packet-based multimedia communications systems (2009), http://www.itu.int/rec/T-REC-H.323

  36. V. Krishnan, V. Rajendran, A. Kandhadai, S. Manjunath, EVRC-Wideband: the new 3GPP2 wideband vocoder standard, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007), vol. 2 (2007), pp. II-333–II-336. doi:10.1109/ICASSP.2007.366240

  37. Y. Liang, N. Färber, B. Girod, Adaptive playout scheduling and loss concealment for voice communication over IP networks. IEEE Trans. Multimed. 5(4), 532–543 (2003). doi:10.1109/TMM.2003.819095

    Article  Google Scholar 

  38. M. Dietz, L. Liljeryd, K. Kjorling, O. Kunz, Spectral band replication, a novel approach in audio coding, in Proceedings of the 112th Convention of the Audio Engineering Society, vol. 1 (2002)

    Google Scholar 

  39. J. Makhoul, M. Berouti, High frequency regeneration in speech coding systems, in Proceedings of IEEE ICASSP, vol. 1 (1979)

    Google Scholar 

  40. J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proceedings of IEEE ICASSP, vol. 2 (2005)

    Google Scholar 

  41. H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A Transport Protocol for Real-Time Applications. RFC 3550 (INTERNET STANDARD) (2003), http://www.ietf.org/rfc/rfc3550.txt. Updated by RFCs 5506, 5761, 6051, 6222, 7022

  42. J. Sjoberg, M. Westerlund, A. Lakaniemi, Q. Xie, RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs. RFC 4867 (Proposed Standard) (2007), http://www.ietf.org/rfc/rfc4867.txt

  43. H. Taddei, I. Varga, L. Gros, C. Quinquis, J.Y., Monfort, F. Mertz, T. Clevorn, Evaluation of AMR-NB and AMR-WB in packet switched conversational communications, in International Conference on Multimedia and Expo (ICME) (2004)

    Google Scholar 

  44. I. Varga, R.D.D. Iacovo, P. Usai, Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Commun. Mag. 44(5), 66–73 (2006)

    Article  Google Scholar 

  45. I. Varga, S. Proust, H. Taddei, ITU-T G.729.1 scalable codec for new wideband services. IEEE Commun. Mag. 47(10), 131–137 (2009)

    Google Scholar 

  46. S. Voran, Subjective ratings of instantaneous and gradual transitions from narrowband to wideband active speech, in Proceedings of IEEE ICASSP (2010)

    Google Scholar 

  47. M. Yavuz, S. Diaz, R. Kapoor, M. Grob, P. Black, Y. Tokgoz, C. Lott, VoIP over cdma2000 1xEV-DO revision A. IEEE Commun. Mag. 44(2), 88–95 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel J. Sinder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Sinder, D.J., Varga, I., Krishnan, V., Rajendran, V., Villette, S. (2015). Recent Speech Coding Technologies and Standards. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1456-2_4

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-1455-5

  • Online ISBN: 978-1-4939-1456-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics