Recent Speech Coding Technologies and Standards

Sinder, Daniel J.; Varga, Imre; Krishnan, Venkatesh; Rajendran, Vivek; Villette, Stéphane

doi:10.1007/978-1-4939-1456-2_4

Daniel J. Sinder⁴,
Imre Varga⁵,
Venkatesh Krishnan⁴,
Vivek Rajendran⁴ &
…
Stéphane Villette⁴

2034 Accesses

Abstract

This chapter presents an overview of recent developments in conversational speech coding technologies, important new algorithmic advances, and recent standardization activities in ITU-T, 3GPP, 3GPP2, MPEG and IETF that offer a significantly improved user experience during voice calls on existing and future communication systems. User experience is determined by speech quality, hence network operators are very concerned about quality of speech coders. Operators are also concerned about capacity, hence coding efficiency is another important measure. Advanced speech coding technologies provide the capability to both improve coding efficiency and user experience. One option to improve quality is to extend the audio bandwidth from traditional narrowband to wideband (16 kHz sampling) and super-wideband (32 kHz sampling). Another method is in increasing the robustness of the coder against transmission errors. Error concealment algorithms are used which substitute the missing parts of the audio signal as far as possible. In packet-switched applications (VoIP systems), special mechanisms are included in jitter buffer management (JBM) algorithms to maximize sound quality. It is of high importance to ensure standardization and deployment of speech coders that meet quality expectations. As an example of this, we refer to the Enhanced Voice Services (EVS) project in 3GPP that is developing the next generation speech coder in 3GPP. The basic motivation for 3GPP to start the EVS project was to extend the path of codec evolution by providing super-wideband experience at around 13 kb/s and better quality for music and mixed content in conversational applications. Optimized behavior in VoIP applications is achieved through the introduction of high error robustness, jitter buffer management, inclusion of source-controlled variable bit rate operation, support of various audio bandwidths, and stereo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

IETF Datatracker (2014), https://datatracker.ietf.org/doc/
Study of use cases and requirements for enhanced voice codecs for the evolved packet system (EPS). 3GPP TSG-SA TR 22.813 (2010), http://www.3gpp.org/ftp/Specs/archive/22_series/22.813/22813a00.zip
Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems. 3GPP2 3GPP2 C.S0014-A v1.0 (2004), http://www.3gpp2.org/Public_html/specs/C.S0014-A_v1.0_040426.pdf
Enhanced variable rate codec, speech service option 3 and 68 for wideband spread spectrum digital systems. 3GPP2 3GPP2 C.S0014-B v1.0 (2006), http://www.3gpp2.org/Public_html/specs/C.S0014-B_v1.0_060501.pdf
Enhanced variable rate codec, speech service options 3, 68, and 70 for wideband spread spectrum digital systems. 3GPP2 C.S0014-C v1.0 (2007), http://www.3gpp2.org/Public_html/specs/C.S0014-C_v1.0_070116.pdf
Enhanced variable rate codec, speech service options 3, 68, 70, & 73 for wideband spread spectrum digital systems. 3GPP2 C.S0014-D v3.0 (2010), http://www.3gpp2.org/Public_html/specs/C.S0014-D_v3.0_EVRC.pdf
Enhanced variable rate codec, speech service options 3, 68, 70, 73 and 77 for wideband spread spectrum digital systems. 3GPP2 C.S0014-E v1.0 (2011), http://www.3gpp2.org/Public_html/specs/C.S0014-E_v1.0_EVRC_20111231.pdf
Introduction to CDMA2000 extended cell high rate packet data air interface specification. 3GPP2 C.S0098-100-0 v1.0 (2011), http://www.3gpp2.org/Public_html/specs/C.S0098-100-0_v1.0_xHRPD_Intro.pdf
System requirements for extended cell HRPD (xHRPD). 3GPP2 S.R0143-0 v1.0 (2010), http://www.3gpp2.org/Public_html/specs/S.R0143-0v1.0ExtendedRangexHRPDSRD.pdf
The AAC-ELD family for high quality communication services. Fraunhofer IIS Technical Paper (2013), http://www.iis.fraunhofer.de/content/dam/iis/de/dokumente/amm/wp/AAC-ELD-family_TechnicalPaper.pdf
B. Bessette, The adaptive multirate wideband speech codec (AMR-WB). IEEE Trans. Speech Audio Process. 10, 620–636 (2002)
Article Google Scholar
B. Geiser, P. Jax, P. Vary, H. Taddei, S. Schandl, M. Gartner, C. Guillaum’e, S. Ragot, Bandwidth extension for hierarchical speech and audio coding in ITU-T Rec. G.729.1. IEEE Trans. Audio Speech Lang. Process. 15(8), 2496–2509 (2007)
Google Scholar
B. Geiser, P. Vary, High rate data hiding in ACELP speech codecs, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2008 (ICASSP 2008) (2008), pp. 4005–4008. doi:10.1109/ICASSP.2008.4518532
M. Handley, H. Schulzrinne, E. Schooler, J. Rosenberg, SIP: Session Initiation Protocol. RFC 2543 (Proposed Standard) (1999), http://www.ietf.org/rfc/rfc2543.txt. Obsoleted by RFCs 3261, 3262, 3263, 3264, 3265
Y. Hiwasaki, H. Ohmuro, ITU-T G.711.1: Extending G.711 to higher-quality wideband speech. IEEE Commun. Mag. 47(10), 110–116 (2009)
Google Scholar
ITU-T Recommendation G.114: One-way Transmission Time (2003), http://www.itu.int/rec/T-REC-G.114
ITU-T Recommendation G.711 Appendix I: Lower-band postfiltering for R1 mode (2012)
Google Scholar
ITU-T Recommendation G.711.0: Lossless compression for G.711 PCM (2009)
Google Scholar
ITU-T Recommendation G.711.1: Wideband embedded extension for ITU-T G.711 (2012)
Google Scholar
ITU-T Recommendation G.711.1 Annex C: Lossless compression of ITU-T G.711 PCM compatible bitstream in ITU-T G.711.1 (2012)
Google Scholar
ITU-T Recommendation G.711.1 Annex D: Superwideband extension (2012)
Google Scholar
ITU-T Recommendation G.711.1 Annex F: Stereo embedded extension for ITU-T G.711.1 (2012)
Google Scholar
ITU-T Recommendation G.711.1 Appendix IV: Mid-side stereo (2012)
Google Scholar
ITU-T Recommendation G.718: Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s (2008)
Google Scholar
ITU-T Recommendation G.718 Annex B: Superwideband scalable extension for G.718 (2009)
Google Scholar
ITU-T Recommendation G.719: Low-complexity full-band audio coding for high-quality conversational applications (2008)
Google Scholar
ITU-T Recommendation G.722: 7 kHz Audio coding within 64 kb/s (2012)
Google Scholar
ITU-T Recommendation G.722 Annex B: Superwideband embedded extension for G.722 (2012)
Google Scholar
ITU-T Recommendation G.722 Annex D: Stereo embedded extension for G.722 (2012)
Google Scholar
ITU-T Recommendation G.722 Appendix III: A high-quality packet loss concealment algorithm for G.722 (2012)
Google Scholar
ITU-T Recommendation G.722 Appendix IV: A low-complexity packet loss concealment algorithm for G.722 (2012)
Google Scholar
ITU-T Recommendation G.722 Appendix V: Mid-side stereo (2012)
Google Scholar
ITU-T Recommendation G.729.1: G.729 Based embedded variable bit-rate coder: An 8–32 kb/s scalable wideband coder bitstream interoperable with G.729 (2006)
Google Scholar
ITU-T Recommendation G.729.1 Annex E: Superwideband scalable extension for G.729.1 (2010)
Google Scholar
ITU-T Recommendation H.323: Packet-based multimedia communications systems (2009), http://www.itu.int/rec/T-REC-H.323
V. Krishnan, V. Rajendran, A. Kandhadai, S. Manjunath, EVRC-Wideband: the new 3GPP2 wideband vocoder standard, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007), vol. 2 (2007), pp. II-333–II-336. doi:10.1109/ICASSP.2007.366240
Y. Liang, N. Färber, B. Girod, Adaptive playout scheduling and loss concealment for voice communication over IP networks. IEEE Trans. Multimed. 5(4), 532–543 (2003). doi:10.1109/TMM.2003.819095
Article Google Scholar
M. Dietz, L. Liljeryd, K. Kjorling, O. Kunz, Spectral band replication, a novel approach in audio coding, in Proceedings of the 112th Convention of the Audio Engineering Society, vol. 1 (2002)
Google Scholar
J. Makhoul, M. Berouti, High frequency regeneration in speech coding systems, in Proceedings of IEEE ICASSP, vol. 1 (1979)
Google Scholar
J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proceedings of IEEE ICASSP, vol. 2 (2005)
Google Scholar
H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A Transport Protocol for Real-Time Applications. RFC 3550 (INTERNET STANDARD) (2003), http://www.ietf.org/rfc/rfc3550.txt. Updated by RFCs 5506, 5761, 6051, 6222, 7022
J. Sjoberg, M. Westerlund, A. Lakaniemi, Q. Xie, RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs. RFC 4867 (Proposed Standard) (2007), http://www.ietf.org/rfc/rfc4867.txt
H. Taddei, I. Varga, L. Gros, C. Quinquis, J.Y., Monfort, F. Mertz, T. Clevorn, Evaluation of AMR-NB and AMR-WB in packet switched conversational communications, in International Conference on Multimedia and Expo (ICME) (2004)
Google Scholar
I. Varga, R.D.D. Iacovo, P. Usai, Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Commun. Mag. 44(5), 66–73 (2006)
Article Google Scholar
I. Varga, S. Proust, H. Taddei, ITU-T G.729.1 scalable codec for new wideband services. IEEE Commun. Mag. 47(10), 131–137 (2009)
Google Scholar
S. Voran, Subjective ratings of instantaneous and gradual transitions from narrowband to wideband active speech, in Proceedings of IEEE ICASSP (2010)
Google Scholar
M. Yavuz, S. Diaz, R. Kapoor, M. Grob, P. Black, Y. Tokgoz, C. Lott, VoIP over cdma2000 1xEV-DO revision A. IEEE Commun. Mag. 44(2), 88–95 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Qualcomm Technologies, Inc., 5775 Morehouse Dr., San Diego, CA, 92121, USA
Daniel J. Sinder, Venkatesh Krishnan, Vivek Rajendran & Stéphane Villette
QUALCOMM CDMA Technologies GmbH, Franziskaner Str. 14, D-81669, Munich, Germany
Imre Varga

Authors

Daniel J. Sinder
View author publications
You can also search for this author in PubMed Google Scholar
Imre Varga
View author publications
You can also search for this author in PubMed Google Scholar
Venkatesh Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Rajendran
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Villette
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel J. Sinder .

Editor information

Editors and Affiliations

Dept. of Electrical Engineering, Santa Clara University, Santa Clara, California, USA
Tokunbo Ogunfunmi
School of EE&C Engineering, The University of Western Australia, Crawley, West Australia, Australia
Roberto Togneri
Qualcomm Inc., Santa Clara, California, USA
Madihally (Sim) Narasimha

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sinder, D.J., Varga, I., Krishnan, V., Rajendran, V., Villette, S. (2015). Recent Speech Coding Technologies and Standards. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_4

Download citation

DOI: https://doi.org/10.1007/978-1-4939-1456-2_4
Published: 18 September 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1455-5
Online ISBN: 978-1-4939-1456-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics