Abstract
This chapter presents an overview of recent developments in conversational speech coding technologies, important new algorithmic advances, and recent standardization activities in ITU-T, 3GPP, 3GPP2, MPEG and IETF that offer a significantly improved user experience during voice calls on existing and future communication systems. User experience is determined by speech quality, hence network operators are very concerned about quality of speech coders. Operators are also concerned about capacity, hence coding efficiency is another important measure. Advanced speech coding technologies provide the capability to both improve coding efficiency and user experience. One option to improve quality is to extend the audio bandwidth from traditional narrowband to wideband (16 kHz sampling) and super-wideband (32 kHz sampling). Another method is in increasing the robustness of the coder against transmission errors. Error concealment algorithms are used which substitute the missing parts of the audio signal as far as possible. In packet-switched applications (VoIP systems), special mechanisms are included in jitter buffer management (JBM) algorithms to maximize sound quality. It is of high importance to ensure standardization and deployment of speech coders that meet quality expectations. As an example of this, we refer to the Enhanced Voice Services (EVS) project in 3GPP that is developing the next generation speech coder in 3GPP. The basic motivation for 3GPP to start the EVS project was to extend the path of codec evolution by providing super-wideband experience at around 13 kb/s and better quality for music and mixed content in conversational applications. Optimized behavior in VoIP applications is achieved through the introduction of high error robustness, jitter buffer management, inclusion of source-controlled variable bit rate operation, support of various audio bandwidths, and stereo.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
IETF Datatracker (2014), https://datatracker.ietf.org/doc/
Study of use cases and requirements for enhanced voice codecs for the evolved packet system (EPS). 3GPP TSG-SA TR 22.813 (2010), http://www.3gpp.org/ftp/Specs/archive/22_series/22.813/22813a00.zip
Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems. 3GPP2 3GPP2 C.S0014-A v1.0 (2004), http://www.3gpp2.org/Public_html/specs/C.S0014-A_v1.0_040426.pdf
Enhanced variable rate codec, speech service option 3 and 68 for wideband spread spectrum digital systems. 3GPP2 3GPP2 C.S0014-B v1.0 (2006), http://www.3gpp2.org/Public_html/specs/C.S0014-B_v1.0_060501.pdf
Enhanced variable rate codec, speech service options 3, 68, and 70 for wideband spread spectrum digital systems. 3GPP2 C.S0014-C v1.0 (2007), http://www.3gpp2.org/Public_html/specs/C.S0014-C_v1.0_070116.pdf
Enhanced variable rate codec, speech service options 3, 68, 70, & 73 for wideband spread spectrum digital systems. 3GPP2 C.S0014-D v3.0 (2010), http://www.3gpp2.org/Public_html/specs/C.S0014-D_v3.0_EVRC.pdf
Enhanced variable rate codec, speech service options 3, 68, 70, 73 and 77 for wideband spread spectrum digital systems. 3GPP2 C.S0014-E v1.0 (2011), http://www.3gpp2.org/Public_html/specs/C.S0014-E_v1.0_EVRC_20111231.pdf
Introduction to CDMA2000 extended cell high rate packet data air interface specification. 3GPP2 C.S0098-100-0 v1.0 (2011), http://www.3gpp2.org/Public_html/specs/C.S0098-100-0_v1.0_xHRPD_Intro.pdf
System requirements for extended cell HRPD (xHRPD). 3GPP2 S.R0143-0 v1.0 (2010), http://www.3gpp2.org/Public_html/specs/S.R0143-0v1.0ExtendedRangexHRPDSRD.pdf
The AAC-ELD family for high quality communication services. Fraunhofer IIS Technical Paper (2013), http://www.iis.fraunhofer.de/content/dam/iis/de/dokumente/amm/wp/AAC-ELD-family_TechnicalPaper.pdf
B. Bessette, The adaptive multirate wideband speech codec (AMR-WB). IEEE Trans. Speech Audio Process. 10, 620–636 (2002)
B. Geiser, P. Jax, P. Vary, H. Taddei, S. Schandl, M. Gartner, C. Guillaum’e, S. Ragot, Bandwidth extension for hierarchical speech and audio coding in ITU-T Rec. G.729.1. IEEE Trans. Audio Speech Lang. Process. 15(8), 2496–2509 (2007)
B. Geiser, P. Vary, High rate data hiding in ACELP speech codecs, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2008 (ICASSP 2008) (2008), pp. 4005–4008. doi:10.1109/ICASSP.2008.4518532
M. Handley, H. Schulzrinne, E. Schooler, J. Rosenberg, SIP: Session Initiation Protocol. RFC 2543 (Proposed Standard) (1999), http://www.ietf.org/rfc/rfc2543.txt. Obsoleted by RFCs 3261, 3262, 3263, 3264, 3265
Y. Hiwasaki, H. Ohmuro, ITU-T G.711.1: Extending G.711 to higher-quality wideband speech. IEEE Commun. Mag. 47(10), 110–116 (2009)
ITU-T Recommendation G.114: One-way Transmission Time (2003), http://www.itu.int/rec/T-REC-G.114
ITU-T Recommendation G.711 Appendix I: Lower-band postfiltering for R1 mode (2012)
ITU-T Recommendation G.711.0: Lossless compression for G.711 PCM (2009)
ITU-T Recommendation G.711.1: Wideband embedded extension for ITU-T G.711 (2012)
ITU-T Recommendation G.711.1 Annex C: Lossless compression of ITU-T G.711 PCM compatible bitstream in ITU-T G.711.1 (2012)
ITU-T Recommendation G.711.1 Annex D: Superwideband extension (2012)
ITU-T Recommendation G.711.1 Annex F: Stereo embedded extension for ITU-T G.711.1 (2012)
ITU-T Recommendation G.711.1 Appendix IV: Mid-side stereo (2012)
ITU-T Recommendation G.718: Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s (2008)
ITU-T Recommendation G.718 Annex B: Superwideband scalable extension for G.718 (2009)
ITU-T Recommendation G.719: Low-complexity full-band audio coding for high-quality conversational applications (2008)
ITU-T Recommendation G.722: 7 kHz Audio coding within 64 kb/s (2012)
ITU-T Recommendation G.722 Annex B: Superwideband embedded extension for G.722 (2012)
ITU-T Recommendation G.722 Annex D: Stereo embedded extension for G.722 (2012)
ITU-T Recommendation G.722 Appendix III: A high-quality packet loss concealment algorithm for G.722 (2012)
ITU-T Recommendation G.722 Appendix IV: A low-complexity packet loss concealment algorithm for G.722 (2012)
ITU-T Recommendation G.722 Appendix V: Mid-side stereo (2012)
ITU-T Recommendation G.729.1: G.729 Based embedded variable bit-rate coder: An 8–32 kb/s scalable wideband coder bitstream interoperable with G.729 (2006)
ITU-T Recommendation G.729.1 Annex E: Superwideband scalable extension for G.729.1 (2010)
ITU-T Recommendation H.323: Packet-based multimedia communications systems (2009), http://www.itu.int/rec/T-REC-H.323
V. Krishnan, V. Rajendran, A. Kandhadai, S. Manjunath, EVRC-Wideband: the new 3GPP2 wideband vocoder standard, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007), vol. 2 (2007), pp. II-333–II-336. doi:10.1109/ICASSP.2007.366240
Y. Liang, N. Färber, B. Girod, Adaptive playout scheduling and loss concealment for voice communication over IP networks. IEEE Trans. Multimed. 5(4), 532–543 (2003). doi:10.1109/TMM.2003.819095
M. Dietz, L. Liljeryd, K. Kjorling, O. Kunz, Spectral band replication, a novel approach in audio coding, in Proceedings of the 112th Convention of the Audio Engineering Society, vol. 1 (2002)
J. Makhoul, M. Berouti, High frequency regeneration in speech coding systems, in Proceedings of IEEE ICASSP, vol. 1 (1979)
J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proceedings of IEEE ICASSP, vol. 2 (2005)
H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A Transport Protocol for Real-Time Applications. RFC 3550 (INTERNET STANDARD) (2003), http://www.ietf.org/rfc/rfc3550.txt. Updated by RFCs 5506, 5761, 6051, 6222, 7022
J. Sjoberg, M. Westerlund, A. Lakaniemi, Q. Xie, RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs. RFC 4867 (Proposed Standard) (2007), http://www.ietf.org/rfc/rfc4867.txt
H. Taddei, I. Varga, L. Gros, C. Quinquis, J.Y., Monfort, F. Mertz, T. Clevorn, Evaluation of AMR-NB and AMR-WB in packet switched conversational communications, in International Conference on Multimedia and Expo (ICME) (2004)
I. Varga, R.D.D. Iacovo, P. Usai, Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Commun. Mag. 44(5), 66–73 (2006)
I. Varga, S. Proust, H. Taddei, ITU-T G.729.1 scalable codec for new wideband services. IEEE Commun. Mag. 47(10), 131–137 (2009)
S. Voran, Subjective ratings of instantaneous and gradual transitions from narrowband to wideband active speech, in Proceedings of IEEE ICASSP (2010)
M. Yavuz, S. Diaz, R. Kapoor, M. Grob, P. Black, Y. Tokgoz, C. Lott, VoIP over cdma2000 1xEV-DO revision A. IEEE Commun. Mag. 44(2), 88–95 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this chapter
Cite this chapter
Sinder, D.J., Varga, I., Krishnan, V., Rajendran, V., Villette, S. (2015). Recent Speech Coding Technologies and Standards. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_4
Download citation
DOI: https://doi.org/10.1007/978-1-4939-1456-2_4
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1455-5
Online ISBN: 978-1-4939-1456-2
eBook Packages: EngineeringEngineering (R0)