Scalable and Multi-Rate Speech Coding for Voice-over-Internet Protocol (VoIP) Networks

Ogunfunmi, Tokunbo; Seto, Koji

doi:10.1007/978-1-4939-1456-2_3

Tokunbo Ogunfunmi⁴ &
Koji Seto⁴

1953 Accesses

Abstract

Communication by speech is still a very popular and effective means of transmitting information from one person to another. Speech signals form the basic method of human communication. The information communicated in this case is verbal or auditory information. The methods used for speech coding are very extensive and continuously evolving.

Speech Coding can be defined as the means by which the information-bearing speech signal is coded to remove redundancy thereby reducing transmission bandwidth requirements, improving storage efficiency, and making possible myriad other applications that rely on speech coding techniques.

The medium of speech transmission has also been changing over the years. Currently a large percentage of speech is communicated over channels using internet protocols. The voice-over-internet protocols (VoIP) channels present some challenges that have to be overcome in order to enable error-free, robust speech communication.

There are several advantages to use bit-streams that are multi-rate and scalable for time-varying VoIP channels. In this chapter, we present the methods for scalable, multi-rate speech coding for VoIP channels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

J. Skoglund et~al., Voice over IP: speech transmission over packet networks, in Handbook of Speech Processing, ed. by J. Benesty, M.M. Sondhi, Y. Huang (Berlin, Springer, 2009). Chap. 15
Google Scholar
A. Gersho, E. Paksoy, An overview of variable rate speech coding for cellular networks, in Proc. of the Int. Conf. On Selected Topics in Wireless Communications, Vancouver (1992)
Google Scholar
A. Gersho, E. Paksoy, Variable rate speech coding for cellular networks, in Speech and Audio Coding for Wireless and Network Applications, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Norwell, 1993), pp. 77–84
Chapter Google Scholar
V. Cuperman, P. Lupini, Variable rate speech coding, in Modern Methods of Speech Processing, ed. by R.P. Ramachandran, R.J. Mammone (Kluwer Academic, Norwell, 1995), pp. 101–120
Chapter Google Scholar
W. Gardner, P. Jacobs, C. Lee, QCELP: a variable rate speech coder for CDMA digital cellular, in Speech and Audio Coding for Wireless and Network Applications, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Norwell, 1993), pp. 85–92
Chapter Google Scholar
TIA, Speech service option standard for wideband spread spectrum systems—TIA/EIA/IS-96 (1994)
Google Scholar
TIA, Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems—TIA/EIA/IS-127 (1997)
Google Scholar
K. Järvinen, Standardization of the adaptive multi-rate codec, in Proceedings of European Signal Processing Conference (EUSIPCO), Tampere (2000)
Google Scholar
E. Ekudden, R. Hagen, I. Johansson, J. Svedberg, The AMR speech coder, in Proc. IEEE Workshop on speech coding, Porvoo (1999), pp. 117–119
Google Scholar
ETSI, Digital cellular telecommunications system (Phase 2+); Adaptive multi-rate (AMR) speech transcoding, GSM 06.90, version 7.2.1, Release (1998)
Google Scholar
ETSI, Universal mobile telecommunications system (UMTS); Mandatory speech codec speech processing functions AMR speech codec; Transcoding Functions, 3GPP TS 26.090 Version 3.1.0, Release (1999)
Google Scholar
B. Bessette et~al., The adaptive multirate wideband speech codec (AMR-WB). IEEE Trans. Speech Audio Process. 10, 620–636 (2002)
Google Scholar
ETSI, Adaptive multi-rate – wideband (AMR-WB) speech codec; Transcoding functions, 3GPP TS 26.190 (2001)
Google Scholar
K. Järvinen et~al., Media coding for the next generation mobile system LTE. Elsevier Comput. Commun. 33(16), 1916–1927 (2010)
Google Scholar
C. Laflamme, J-P. Adoul, R. Salami, S. Morisette, P. Mabilleau, 16 kbps wideband speech coding technique based on algebraic CELP, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Toronto (1991), pp. 13–16
Google Scholar
K. Järvinen et~al., GSM enhanced full rate speech codec, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Munich (1997), pp. 771–774
Google Scholar
T. Honkanen et~al., Enhanced full rate speech codec for IS-136 digital cellular system, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Munich (1997), pp. 731–734
Google Scholar
S. Bruhn, P. Blöcher, K. Hellwig, J. Sjöberg, Concepts and solutions for link adaptation and inband signaling for the GSM AMR speech coding standard, in IEEE Vehicular Technology Conference (1999)
Google Scholar
Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, A. Kataoka, Scalable speech coding technology for high-quality ubiquitous communications. NTT Tech. Rev. 2(3), 53–58 (2004)
Google Scholar
B. Geiser et~al., Embedded speech coding: from G.711 to G.729.1, in Advances in Digital Speech Transmission, ed. by R. Martin, U. Heute, C. Antweiler (Wiley, Chichester, 2008), pp. 201–247. Chap. 8
Google Scholar
ITU-T Rec. G.729.1, An 8–32 kbit/s Scalable Wideband Coder Bitstream Interoperable with G.729, International Telecommunication Union (ITU) (2006)
Google Scholar
ITU-T Rec. G.726, Adaptive Differential Pulse Code Modulation (ADPCM) of Voice Frequencies, International Telecommunication Union (ITU) (1990)
Google Scholar
ITU-T Rec. G.728, Coding of Speech at 16 kbit/s Using Low-Delay Code-Excited Linear Prediction (LD-CELP), International Telecommunication Union (ITU) (1992)
Google Scholar
ITU-T Rec. G.729, Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP), International Telecommunication Union (ITU) (1996)
Google Scholar
S. Ragot, B. Kovesi, R. Trilling, D. Virette, N. Duc, D. Massaloux, S. Proust, B. Geiser, M. Gartner, S. Schandl, H. Taddei, Y. Gao, E. Shlomot, H. Ehara, K. Yoshida, T. Vaillancourt, R. Salami, M.S. Lee, D.Y. Kim. ITU-T G.729.1: an 8–32 kb/s scalable coder interoperable with G.729 for wideband telephony and voice over IP, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (2007), pp. 529–532
Google Scholar
TIA, Source-controlled variable-rate multimode wideband speech codec (VMR-WB)—3GPP2 C.S0052-0 (2004)
Google Scholar
M. Jelínek, R. Salami, Wideband speech coding advances in VMR-WB standard. IEEE Trans. Audio Speech Lang. Process.15(4), 1167–1179 (2007)
Article Google Scholar
T. Vaillancourt et~al., ITU-T G.EV-VBR: a Robust 8–32 kb/s scalable coder for error prone telecommunications channels, in Proceedings of the Eusipco, Lausanne, Switzerland (2008)
Google Scholar
V. Eksler, M. Jelínek, Transition coding for source controlled CELP codecs, in Proc. IEEE ICASSP, Las Vegas (2008), pp. 4001–4004
Google Scholar
M. Oshikiri et~al., An 8–32 kb/s scalable wideband coder extended with MDCT-based bandwidth extension on top of a 6.8 kb/s narrowband CELP coder, in Proceedings of Interspeech, Antwerp (2007), pp.1701–1704
Google Scholar
U. Mittal, J.P. Ashley, E. Cruz-Zeno. Low complexity factorial pulse coding of MDCT coefficients using approximation of combinatorial functions, in Proceedings of IEEE ICASSP, Honolulu, vol. 1 (2007), pp. 289–292
Google Scholar
T. Vaillancourt et~al., Efficient frame erasure concealment in predictive speech codecs using glottal pulse resynchronisation, in Proceedings of IEEE ICASSP, Honolulu, vol. 4 (2007) pp. 1113–1116
Google Scholar
T. Ogunfunmi, M.J. Narasimha, Speech over VoIP networks: advanced signal processing and system implementation. IEEE Circuits Syst. Magazine 12(2), 35–55 (2012)
Article Google Scholar
FCC, http://transition.fcc.gov/oet/tac/TACMarch2011mtgfullpresentation.pdf, Meeting presentation of the Technological Advisory Council (2011a)
FCC, http://transition.fcc.gov/oet/tac/TACJune2011mtgfullpresentation.pdf, Meeting presentation of the Technological Advisory Council (2011b)
R. Lefebvre, P. Gournay, R. Salami, A study of design compromises for speech coders in packet networks, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, vol. I (2004) pp. 265–268
Google Scholar
V. Eksler, M. Jelinek, Glottal-shape codebook to improve robust-ness of CELP codecs. IEEE Trans. Audio Speech Lang. Process. 18(6), 1208–1217 (2010)
Article Google Scholar
J.-M. Valin, K. Vos, T. Terriberry, Internet Engineering Task Force RFC6716 (2012)
Google Scholar
S.V. Andersen, W.B. Kleijn, R. Hagen, J. Linden, M.N. Murthi, J. Skoglund, iLBC-A linear predictive coder with robustness to packet losses, in IEEE Speech Coding Workshop Proceedings (2002), pp. 23–25
Google Scholar
T. Ogunfunmi, M.J. Narasimha, Principles of Speech Coding (CRC, BocaRaton, 2010)
Book MATH Google Scholar
K. Seto, T. Ogunfunmi, Multi-rate iLBC using the DCT, in Proceedings of the IEEE Workshop on SiPS (2010), pp. 478–482
Google Scholar
K. Seto, T. Ogunfunmi, Performance enhanced multi-rate iLBC, in Proceedings of the 45th Asilomar Conference (2011)
Google Scholar
K. Seto, T. Ogunfunmi, Scalable multi-rate iLBC, in Proceedings of IEEE International Symposium on Circuits and Systems (2012)
Google Scholar
K. Seto, T. Ogunfunmi, Scalable speech coding for IP networks: beyond iLBC. IEEE Trans. Audio Speech Lang. Process. 21(11), 2337–2345 (2013)
Article Google Scholar
K. Seto, T. Ogunfunmi, Scalable wideband speech coding for IP networks, in Proceedings of the 46th Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove (2012)
Google Scholar
K. Seto, T. Ogunfunmi, A scalable wideband speech codec based on the iLBC, submitted to IEEE Transactions on Audio, Speech, and Language Processing
Google Scholar
S.V. Andersen et~al., Internet low bit-rate codec (iLBC) [Online]. RFC3951, IETF organization (2004), http://tools.ietf.org/html/rfc3951
C.M. Garrido, M.N. Murthi, S.V. Andersen, On variable rate frame independent predictive speech coding: re-engineering iLBC, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 717–720 (2006)
Google Scholar
J. Princen, A. Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Trans. Acoust. Speech Signal Process. 34(5), 1153–1161 (1986)
Article Google Scholar
ITU-T Rec. P.862, Perceptual Evaluation of Speech Quality (PESQ) (2001)
Google Scholar
ITU-T Rec. P.501, Test signals for use in telephonometry (2012)
Google Scholar
ITU-T Rec. G.191, Software tools for speech and audio coding standardization (2010)
Google Scholar
E.N. Gilbert, Capacity of a burst-noise channel. Bell Syst. Tech. J. 39, 1253–1265 (1960)
Article Google Scholar
I. Daubechies, Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996 (1988)
Article MathSciNet MATH Google Scholar
F. Chen, K. Kuo, Complexity scalability design in the internet low bit rate codec (iLBC) for speech coding. IEICE Trans. Inf. Syst. 93(5), 1238–1243 (2010)
Article Google Scholar
D. Collins, Carrier-Grade Voice-over-IP, 2nd edn. (McGraw-Hill, New York, 2002)
Google Scholar
A. Das, E. Paksoy, A. Gersho, Multimode and variable-rate coding of speech, in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam, 1995), pp. 257–288
Google Scholar
J. Davidson, Voice-over-IP Fundamentals, 2nd edn. (Cisco, Indianapolis, 2006)
Google Scholar
G.D. Forney, Coset codes. I. Introduction and geometrical classification. IEEE Trans. Inf. Theory 34(5), 1123–1151 (1988)
Article MathSciNet Google Scholar
A. Gersho, Advances in speech and audio compression. Proc. IEEE 82, 900–918 (1994)
Article Google Scholar
J. Gibson, Speech coding methods, standards and applications. IEEE Circuits Syst. Magazine 5(4), 30–40 (2005)
Article Google Scholar
J. Gibson, J. Hu, Rate distortion bounds for voice and video, Foundations and Trends in Communications and Information Theory 10(4), 379–514 (2013), http://dx.doi.org/10.1561/0100000061, ISBN: 978-1-60198-778-5
L. Hanzo, F.C.A. Somerville, J.P. Woodard, Voice and Audio Compression for Wireless Communications, 2nd edn. (Wiley, Chichester, 2007)
Book Google Scholar
O. Hersent, IP Telephony: Deploying VoIP Protocols and IMS Infrastructure (Wiley, Chichester, 2010)
Google Scholar
K. Homayounfar, Rate adaptive speech coding for universal multimedia access. IEEE Signal Process. Magazine 20(2), 30–39 (2003)
Google Scholar
ITU-T Rec. G.718, Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s, International Telecommunication Union (ITU) (2008)
Google Scholar
M. Jelinek et~al., G.718: a new embedded speech and audio coding standard with high resilience to error-prone transmission channels. IEEE Commun. Magazine 46(10), 117–123 (2009)
Google Scholar
W.B. Kleijn, Enhancement of coded speech by constrained optimization, in Proceedings of the IEEE Speech Coding Workshop (2002)
Google Scholar
J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2, 1109–1112 (2005)
Google Scholar
S. Ragot, B. Bessette, R. Lefebvre, Low-complexity multi-rate lattice vector quantization with application to wideband speech coding at 32 kbit/s, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 501–504 (2004)
Google Scholar
M.R. Schroeder, B.S. Atal, Code-excited linear prediction (CELP): High-quality speech at very low bit rates, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (1984), pp. 937–940
Google Scholar
D. Wright, Voice-over-Packet Networks (Wiley, Chichester, 2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Santa Clara University, Santa Clara, CA, 95053, USA
Tokunbo Ogunfunmi & Koji Seto

Authors

Tokunbo Ogunfunmi
View author publications
You can also search for this author in PubMed Google Scholar
Koji Seto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tokunbo Ogunfunmi .

Editor information

Editors and Affiliations

Dept. of Electrical Engineering, Santa Clara University, Santa Clara, California, USA
Tokunbo Ogunfunmi
School of EE&C Engineering, The University of Western Australia, Crawley, West Australia, Australia
Roberto Togneri
Qualcomm Inc., Santa Clara, California, USA
Madihally (Sim) Narasimha

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ogunfunmi, T., Seto, K. (2015). Scalable and Multi-Rate Speech Coding for Voice-over-Internet Protocol (VoIP) Networks. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_3

Download citation

DOI: https://doi.org/10.1007/978-1-4939-1456-2_3
Published: 18 September 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1455-5
Online ISBN: 978-1-4939-1456-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics