Abstract
Communication by speech is still a very popular and effective means of transmitting information from one person to another. Speech signals form the basic method of human communication. The information communicated in this case is verbal or auditory information. The methods used for speech coding are very extensive and continuously evolving.
Speech Coding can be defined as the means by which the information-bearing speech signal is coded to remove redundancy thereby reducing transmission bandwidth requirements, improving storage efficiency, and making possible myriad other applications that rely on speech coding techniques.
The medium of speech transmission has also been changing over the years. Currently a large percentage of speech is communicated over channels using internet protocols. The voice-over-internet protocols (VoIP) channels present some challenges that have to be overcome in order to enable error-free, robust speech communication.
There are several advantages to use bit-streams that are multi-rate and scalable for time-varying VoIP channels. In this chapter, we present the methods for scalable, multi-rate speech coding for VoIP channels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J. Skoglund et~al., Voice over IP: speech transmission over packet networks, in Handbook of Speech Processing, ed. by J. Benesty, M.M. Sondhi, Y. Huang (Berlin, Springer, 2009). Chap. 15
A. Gersho, E. Paksoy, An overview of variable rate speech coding for cellular networks, in Proc. of the Int. Conf. On Selected Topics in Wireless Communications, Vancouver (1992)
A. Gersho, E. Paksoy, Variable rate speech coding for cellular networks, in Speech and Audio Coding for Wireless and Network Applications, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Norwell, 1993), pp. 77–84
V. Cuperman, P. Lupini, Variable rate speech coding, in Modern Methods of Speech Processing, ed. by R.P. Ramachandran, R.J. Mammone (Kluwer Academic, Norwell, 1995), pp. 101–120
W. Gardner, P. Jacobs, C. Lee, QCELP: a variable rate speech coder for CDMA digital cellular, in Speech and Audio Coding for Wireless and Network Applications, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Norwell, 1993), pp. 85–92
TIA, Speech service option standard for wideband spread spectrum systems—TIA/EIA/IS-96 (1994)
TIA, Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems—TIA/EIA/IS-127 (1997)
K. Järvinen, Standardization of the adaptive multi-rate codec, in Proceedings of European Signal Processing Conference (EUSIPCO), Tampere (2000)
E. Ekudden, R. Hagen, I. Johansson, J. Svedberg, The AMR speech coder, in Proc. IEEE Workshop on speech coding, Porvoo (1999), pp. 117–119
ETSI, Digital cellular telecommunications system (Phase 2+); Adaptive multi-rate (AMR) speech transcoding, GSM 06.90, version 7.2.1, Release (1998)
ETSI, Universal mobile telecommunications system (UMTS); Mandatory speech codec speech processing functions AMR speech codec; Transcoding Functions, 3GPP TS 26.090 Version 3.1.0, Release (1999)
B. Bessette et~al., The adaptive multirate wideband speech codec (AMR-WB). IEEE Trans. Speech Audio Process. 10, 620–636 (2002)
ETSI, Adaptive multi-rate – wideband (AMR-WB) speech codec; Transcoding functions, 3GPP TS 26.190 (2001)
K. Järvinen et~al., Media coding for the next generation mobile system LTE. Elsevier Comput. Commun. 33(16), 1916–1927 (2010)
C. Laflamme, J-P. Adoul, R. Salami, S. Morisette, P. Mabilleau, 16 kbps wideband speech coding technique based on algebraic CELP, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Toronto (1991), pp. 13–16
K. Järvinen et~al., GSM enhanced full rate speech codec, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Munich (1997), pp. 771–774
T. Honkanen et~al., Enhanced full rate speech codec for IS-136 digital cellular system, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Munich (1997), pp. 731–734
S. Bruhn, P. Blöcher, K. Hellwig, J. Sjöberg, Concepts and solutions for link adaptation and inband signaling for the GSM AMR speech coding standard, in IEEE Vehicular Technology Conference (1999)
Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, A. Kataoka, Scalable speech coding technology for high-quality ubiquitous communications. NTT Tech. Rev. 2(3), 53–58 (2004)
B. Geiser et~al., Embedded speech coding: from G.711 to G.729.1, in Advances in Digital Speech Transmission, ed. by R. Martin, U. Heute, C. Antweiler (Wiley, Chichester, 2008), pp. 201–247. Chap. 8
ITU-T Rec. G.729.1, An 8–32 kbit/s Scalable Wideband Coder Bitstream Interoperable with G.729, International Telecommunication Union (ITU) (2006)
ITU-T Rec. G.726, Adaptive Differential Pulse Code Modulation (ADPCM) of Voice Frequencies, International Telecommunication Union (ITU) (1990)
ITU-T Rec. G.728, Coding of Speech at 16Â kbit/s Using Low-Delay Code-Excited Linear Prediction (LD-CELP), International Telecommunication Union (ITU) (1992)
ITU-T Rec. G.729, Coding of Speech at 8Â kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP), International Telecommunication Union (ITU) (1996)
S. Ragot, B. Kovesi, R. Trilling, D. Virette, N. Duc, D. Massaloux, S. Proust, B. Geiser, M. Gartner, S. Schandl, H. Taddei, Y. Gao, E. Shlomot, H. Ehara, K. Yoshida, T. Vaillancourt, R. Salami, M.S. Lee, D.Y. Kim. ITU-T G.729.1: an 8–32 kb/s scalable coder interoperable with G.729 for wideband telephony and voice over IP, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (2007), pp. 529–532
TIA, Source-controlled variable-rate multimode wideband speech codec (VMR-WB)—3GPP2 C.S0052-0 (2004)
M. JelÃnek, R. Salami, Wideband speech coding advances in VMR-WB standard. IEEE Trans. Audio Speech Lang. Process.15(4), 1167–1179 (2007)
T. Vaillancourt et~al., ITU-T G.EV-VBR: a Robust 8–32 kb/s scalable coder for error prone telecommunications channels, in Proceedings of the Eusipco, Lausanne, Switzerland (2008)
V. Eksler, M. JelÃnek, Transition coding for source controlled CELP codecs, in Proc. IEEE ICASSP, Las Vegas (2008), pp. 4001–4004
M. Oshikiri et~al., An 8–32 kb/s scalable wideband coder extended with MDCT-based bandwidth extension on top of a 6.8 kb/s narrowband CELP coder, in Proceedings of Interspeech, Antwerp (2007), pp.1701–1704
U. Mittal, J.P. Ashley, E. Cruz-Zeno. Low complexity factorial pulse coding of MDCT coefficients using approximation of combinatorial functions, in Proceedings of IEEE ICASSP, Honolulu, vol. 1 (2007), pp. 289–292
T. Vaillancourt et~al., Efficient frame erasure concealment in predictive speech codecs using glottal pulse resynchronisation, in Proceedings of IEEE ICASSP, Honolulu, vol. 4 (2007) pp. 1113–1116
T. Ogunfunmi, M.J. Narasimha, Speech over VoIP networks: advanced signal processing and system implementation. IEEE Circuits Syst. Magazine 12(2), 35–55 (2012)
FCC, http://transition.fcc.gov/oet/tac/TACMarch2011mtgfullpresentation.pdf, Meeting presentation of the Technological Advisory Council (2011a)
FCC, http://transition.fcc.gov/oet/tac/TACJune2011mtgfullpresentation.pdf, Meeting presentation of the Technological Advisory Council (2011b)
R. Lefebvre, P. Gournay, R. Salami, A study of design compromises for speech coders in packet networks, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, vol. I (2004) pp. 265–268
V. Eksler, M. Jelinek, Glottal-shape codebook to improve robust-ness of CELP codecs. IEEE Trans. Audio Speech Lang. Process. 18(6), 1208–1217 (2010)
J.-M. Valin, K. Vos, T. Terriberry, Internet Engineering Task Force RFC6716 (2012)
S.V. Andersen, W.B. Kleijn, R. Hagen, J. Linden, M.N. Murthi, J. Skoglund, iLBC-A linear predictive coder with robustness to packet losses, in IEEE Speech Coding Workshop Proceedings (2002), pp. 23–25
T. Ogunfunmi, M.J. Narasimha, Principles of Speech Coding (CRC, BocaRaton, 2010)
K. Seto, T. Ogunfunmi, Multi-rate iLBC using the DCT, in Proceedings of the IEEE Workshop on SiPS (2010), pp. 478–482
K. Seto, T. Ogunfunmi, Performance enhanced multi-rate iLBC, in Proceedings of the 45th Asilomar Conference (2011)
K. Seto, T. Ogunfunmi, Scalable multi-rate iLBC, in Proceedings of IEEE International Symposium on Circuits and Systems (2012)
K. Seto, T. Ogunfunmi, Scalable speech coding for IP networks: beyond iLBC. IEEE Trans. Audio Speech Lang. Process. 21(11), 2337–2345 (2013)
K. Seto, T. Ogunfunmi, Scalable wideband speech coding for IP networks, in Proceedings of the 46th Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove (2012)
K. Seto, T. Ogunfunmi, A scalable wideband speech codec based on the iLBC, submitted to IEEE Transactions on Audio, Speech, and Language Processing
S.V. Andersen et~al., Internet low bit-rate codec (iLBC) [Online]. RFC3951, IETF organization (2004), http://tools.ietf.org/html/rfc3951
C.M. Garrido, M.N. Murthi, S.V. Andersen, On variable rate frame independent predictive speech coding: re-engineering iLBC, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 717–720 (2006)
J. Princen, A. Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Trans. Acoust. Speech Signal Process. 34(5), 1153–1161 (1986)
ITU-T Rec. P.862, Perceptual Evaluation of Speech Quality (PESQ) (2001)
ITU-T Rec. P.501, Test signals for use in telephonometry (2012)
ITU-T Rec. G.191, Software tools for speech and audio coding standardization (2010)
E.N. Gilbert, Capacity of a burst-noise channel. Bell Syst. Tech. J. 39, 1253–1265 (1960)
I. Daubechies, Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996 (1988)
F. Chen, K. Kuo, Complexity scalability design in the internet low bit rate codec (iLBC) for speech coding. IEICE Trans. Inf. Syst. 93(5), 1238–1243 (2010)
D. Collins, Carrier-Grade Voice-over-IP, 2nd edn. (McGraw-Hill, New York, 2002)
A. Das, E. Paksoy, A. Gersho, Multimode and variable-rate coding of speech, in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam, 1995), pp. 257–288
J. Davidson, Voice-over-IP Fundamentals, 2nd edn. (Cisco, Indianapolis, 2006)
G.D. Forney, Coset codes. I. Introduction and geometrical classification. IEEE Trans. Inf. Theory 34(5), 1123–1151 (1988)
A. Gersho, Advances in speech and audio compression. Proc. IEEE 82, 900–918 (1994)
J. Gibson, Speech coding methods, standards and applications. IEEE Circuits Syst. Magazine 5(4), 30–40 (2005)
J. Gibson, J. Hu, Rate distortion bounds for voice and video, Foundations and Trends in Communications and Information Theory 10(4), 379–514 (2013), http://dx.doi.org/10.1561/0100000061, ISBN: 978-1-60198-778-5
L. Hanzo, F.C.A. Somerville, J.P. Woodard, Voice and Audio Compression for Wireless Communications, 2nd edn. (Wiley, Chichester, 2007)
O. Hersent, IP Telephony: Deploying VoIP Protocols and IMS Infrastructure (Wiley, Chichester, 2010)
K. Homayounfar, Rate adaptive speech coding for universal multimedia access. IEEE Signal Process. Magazine 20(2), 30–39 (2003)
ITU-T Rec. G.718, Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s, International Telecommunication Union (ITU) (2008)
M. Jelinek et~al., G.718: a new embedded speech and audio coding standard with high resilience to error-prone transmission channels. IEEE Commun. Magazine 46(10), 117–123 (2009)
W.B. Kleijn, Enhancement of coded speech by constrained optimization, in Proceedings of the IEEE Speech Coding Workshop (2002)
J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2, 1109–1112 (2005)
S. Ragot, B. Bessette, R. Lefebvre, Low-complexity multi-rate lattice vector quantization with application to wideband speech coding at 32 kbit/s, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 501–504 (2004)
M.R. Schroeder, B.S. Atal, Code-excited linear prediction (CELP): High-quality speech at very low bit rates, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (1984), pp. 937–940
D. Wright, Voice-over-Packet Networks (Wiley, Chichester, 2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this chapter
Cite this chapter
Ogunfunmi, T., Seto, K. (2015). Scalable and Multi-Rate Speech Coding for Voice-over-Internet Protocol (VoIP) Networks. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_3
Download citation
DOI: https://doi.org/10.1007/978-1-4939-1456-2_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1455-5
Online ISBN: 978-1-4939-1456-2
eBook Packages: EngineeringEngineering (R0)