Skip to main content

Scalable and Multi-Rate Speech Coding for Voice-over-Internet Protocol (VoIP) Networks

  • Chapter
  • First Online:
Speech and Audio Processing for Coding, Enhancement and Recognition
  • 1953 Accesses

Abstract

Communication by speech is still a very popular and effective means of transmitting information from one person to another. Speech signals form the basic method of human communication. The information communicated in this case is verbal or auditory information. The methods used for speech coding are very extensive and continuously evolving.

Speech Coding can be defined as the means by which the information-bearing speech signal is coded to remove redundancy thereby reducing transmission bandwidth requirements, improving storage efficiency, and making possible myriad other applications that rely on speech coding techniques.

The medium of speech transmission has also been changing over the years. Currently a large percentage of speech is communicated over channels using internet protocols. The voice-over-internet protocols (VoIP) channels present some challenges that have to be overcome in order to enable error-free, robust speech communication.

There are several advantages to use bit-streams that are multi-rate and scalable for time-varying VoIP channels. In this chapter, we present the methods for scalable, multi-rate speech coding for VoIP channels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J. Skoglund et~al., Voice over IP: speech transmission over packet networks, in Handbook of Speech Processing, ed. by J. Benesty, M.M. Sondhi, Y. Huang (Berlin, Springer, 2009). Chap. 15

    Google Scholar 

  2. A. Gersho, E. Paksoy, An overview of variable rate speech coding for cellular networks, in Proc. of the Int. Conf. On Selected Topics in Wireless Communications, Vancouver (1992)

    Google Scholar 

  3. A. Gersho, E. Paksoy, Variable rate speech coding for cellular networks, in Speech and Audio Coding for Wireless and Network Applications, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Norwell, 1993), pp. 77–84

    Chapter  Google Scholar 

  4. V. Cuperman, P. Lupini, Variable rate speech coding, in Modern Methods of Speech Processing, ed. by R.P. Ramachandran, R.J. Mammone (Kluwer Academic, Norwell, 1995), pp. 101–120

    Chapter  Google Scholar 

  5. W. Gardner, P. Jacobs, C. Lee, QCELP: a variable rate speech coder for CDMA digital cellular, in Speech and Audio Coding for Wireless and Network Applications, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Norwell, 1993), pp. 85–92

    Chapter  Google Scholar 

  6. TIA, Speech service option standard for wideband spread spectrum systems—TIA/EIA/IS-96 (1994)

    Google Scholar 

  7. TIA, Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems—TIA/EIA/IS-127 (1997)

    Google Scholar 

  8. K. Järvinen, Standardization of the adaptive multi-rate codec, in Proceedings of European Signal Processing Conference (EUSIPCO), Tampere (2000)

    Google Scholar 

  9. E. Ekudden, R. Hagen, I. Johansson, J. Svedberg, The AMR speech coder, in Proc. IEEE Workshop on speech coding, Porvoo (1999), pp. 117–119

    Google Scholar 

  10. ETSI, Digital cellular telecommunications system (Phase 2+); Adaptive multi-rate (AMR) speech transcoding, GSM 06.90, version 7.2.1, Release (1998)

    Google Scholar 

  11. ETSI, Universal mobile telecommunications system (UMTS); Mandatory speech codec speech processing functions AMR speech codec; Transcoding Functions, 3GPP TS 26.090 Version 3.1.0, Release (1999)

    Google Scholar 

  12. B. Bessette et~al., The adaptive multirate wideband speech codec (AMR-WB). IEEE Trans. Speech Audio Process. 10, 620–636 (2002)

    Google Scholar 

  13. ETSI, Adaptive multi-rate – wideband (AMR-WB) speech codec; Transcoding functions, 3GPP TS 26.190 (2001)

    Google Scholar 

  14. K. Järvinen et~al., Media coding for the next generation mobile system LTE. Elsevier Comput. Commun. 33(16), 1916–1927 (2010)

    Google Scholar 

  15. C. Laflamme, J-P. Adoul, R. Salami, S. Morisette, P. Mabilleau, 16 kbps wideband speech coding technique based on algebraic CELP, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Toronto (1991), pp. 13–16

    Google Scholar 

  16. K. Järvinen et~al., GSM enhanced full rate speech codec, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Munich (1997), pp. 771–774

    Google Scholar 

  17. T. Honkanen et~al., Enhanced full rate speech codec for IS-136 digital cellular system, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Munich (1997), pp. 731–734

    Google Scholar 

  18. S. Bruhn, P. Blöcher, K. Hellwig, J. Sjöberg, Concepts and solutions for link adaptation and inband signaling for the GSM AMR speech coding standard, in IEEE Vehicular Technology Conference (1999)

    Google Scholar 

  19. Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, A. Kataoka, Scalable speech coding technology for high-quality ubiquitous communications. NTT Tech. Rev. 2(3), 53–58 (2004)

    Google Scholar 

  20. B. Geiser et~al., Embedded speech coding: from G.711 to G.729.1, in Advances in Digital Speech Transmission, ed. by R. Martin, U. Heute, C. Antweiler (Wiley, Chichester, 2008), pp. 201–247. Chap. 8

    Google Scholar 

  21. ITU-T Rec. G.729.1, An 8–32 kbit/s Scalable Wideband Coder Bitstream Interoperable with G.729, International Telecommunication Union (ITU) (2006)

    Google Scholar 

  22. ITU-T Rec. G.726, Adaptive Differential Pulse Code Modulation (ADPCM) of Voice Frequencies, International Telecommunication Union (ITU) (1990)

    Google Scholar 

  23. ITU-T Rec. G.728, Coding of Speech at 16 kbit/s Using Low-Delay Code-Excited Linear Prediction (LD-CELP), International Telecommunication Union (ITU) (1992)

    Google Scholar 

  24. ITU-T Rec. G.729, Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP), International Telecommunication Union (ITU) (1996)

    Google Scholar 

  25. S. Ragot, B. Kovesi, R. Trilling, D. Virette, N. Duc, D. Massaloux, S. Proust, B. Geiser, M. Gartner, S. Schandl, H. Taddei, Y. Gao, E. Shlomot, H. Ehara, K. Yoshida, T. Vaillancourt, R. Salami, M.S. Lee, D.Y. Kim. ITU-T G.729.1: an 8–32 kb/s scalable coder interoperable with G.729 for wideband telephony and voice over IP, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (2007), pp. 529–532

    Google Scholar 

  26. TIA, Source-controlled variable-rate multimode wideband speech codec (VMR-WB)—3GPP2 C.S0052-0 (2004)

    Google Scholar 

  27. M. Jelínek, R. Salami, Wideband speech coding advances in VMR-WB standard. IEEE Trans. Audio Speech Lang. Process.15(4), 1167–1179 (2007)

    Article  Google Scholar 

  28. T. Vaillancourt et~al., ITU-T G.EV-VBR: a Robust 8–32 kb/s scalable coder for error prone telecommunications channels, in Proceedings of the Eusipco, Lausanne, Switzerland (2008)

    Google Scholar 

  29. V. Eksler, M. Jelínek, Transition coding for source controlled CELP codecs, in Proc. IEEE ICASSP, Las Vegas (2008), pp. 4001–4004

    Google Scholar 

  30. M. Oshikiri et~al., An 8–32 kb/s scalable wideband coder extended with MDCT-based bandwidth extension on top of a 6.8 kb/s narrowband CELP coder, in Proceedings of Interspeech, Antwerp (2007), pp.1701–1704

    Google Scholar 

  31. U. Mittal, J.P. Ashley, E. Cruz-Zeno. Low complexity factorial pulse coding of MDCT coefficients using approximation of combinatorial functions, in Proceedings of IEEE ICASSP, Honolulu, vol. 1 (2007), pp. 289–292

    Google Scholar 

  32. T. Vaillancourt et~al., Efficient frame erasure concealment in predictive speech codecs using glottal pulse resynchronisation, in Proceedings of IEEE ICASSP, Honolulu, vol. 4 (2007) pp. 1113–1116

    Google Scholar 

  33. T. Ogunfunmi, M.J. Narasimha, Speech over VoIP networks: advanced signal processing and system implementation. IEEE Circuits Syst. Magazine 12(2), 35–55 (2012)

    Article  Google Scholar 

  34. FCC, http://transition.fcc.gov/oet/tac/TACMarch2011mtgfullpresentation.pdf, Meeting presentation of the Technological Advisory Council (2011a)

  35. FCC, http://transition.fcc.gov/oet/tac/TACJune2011mtgfullpresentation.pdf, Meeting presentation of the Technological Advisory Council (2011b)

  36. R. Lefebvre, P. Gournay, R. Salami, A study of design compromises for speech coders in packet networks, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, vol. I (2004) pp. 265–268

    Google Scholar 

  37. V. Eksler, M. Jelinek, Glottal-shape codebook to improve robust-ness of CELP codecs. IEEE Trans. Audio Speech Lang. Process. 18(6), 1208–1217 (2010)

    Article  Google Scholar 

  38. J.-M. Valin, K. Vos, T. Terriberry, Internet Engineering Task Force RFC6716 (2012)

    Google Scholar 

  39. S.V. Andersen, W.B. Kleijn, R. Hagen, J. Linden, M.N. Murthi, J. Skoglund, iLBC-A linear predictive coder with robustness to packet losses, in IEEE Speech Coding Workshop Proceedings (2002), pp. 23–25

    Google Scholar 

  40. T. Ogunfunmi, M.J. Narasimha, Principles of Speech Coding (CRC, BocaRaton, 2010)

    Book  MATH  Google Scholar 

  41. K. Seto, T. Ogunfunmi, Multi-rate iLBC using the DCT, in Proceedings of the IEEE Workshop on SiPS (2010), pp. 478–482

    Google Scholar 

  42. K. Seto, T. Ogunfunmi, Performance enhanced multi-rate iLBC, in Proceedings of the 45th Asilomar Conference (2011)

    Google Scholar 

  43. K. Seto, T. Ogunfunmi, Scalable multi-rate iLBC, in Proceedings of IEEE International Symposium on Circuits and Systems (2012)

    Google Scholar 

  44. K. Seto, T. Ogunfunmi, Scalable speech coding for IP networks: beyond iLBC. IEEE Trans. Audio Speech Lang. Process. 21(11), 2337–2345 (2013)

    Article  Google Scholar 

  45. K. Seto, T. Ogunfunmi, Scalable wideband speech coding for IP networks, in Proceedings of the 46th Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove (2012)

    Google Scholar 

  46. K. Seto, T. Ogunfunmi, A scalable wideband speech codec based on the iLBC, submitted to IEEE Transactions on Audio, Speech, and Language Processing

    Google Scholar 

  47. S.V. Andersen et~al., Internet low bit-rate codec (iLBC) [Online]. RFC3951, IETF organization (2004), http://tools.ietf.org/html/rfc3951

  48. C.M. Garrido, M.N. Murthi, S.V. Andersen, On variable rate frame independent predictive speech coding: re-engineering iLBC, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 717–720 (2006)

    Google Scholar 

  49. J. Princen, A. Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Trans. Acoust. Speech Signal Process. 34(5), 1153–1161 (1986)

    Article  Google Scholar 

  50. ITU-T Rec. P.862, Perceptual Evaluation of Speech Quality (PESQ) (2001)

    Google Scholar 

  51. ITU-T Rec. P.501, Test signals for use in telephonometry (2012)

    Google Scholar 

  52. ITU-T Rec. G.191, Software tools for speech and audio coding standardization (2010)

    Google Scholar 

  53. E.N. Gilbert, Capacity of a burst-noise channel. Bell Syst. Tech. J. 39, 1253–1265 (1960)

    Article  Google Scholar 

  54. I. Daubechies, Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  55. F. Chen, K. Kuo, Complexity scalability design in the internet low bit rate codec (iLBC) for speech coding. IEICE Trans. Inf. Syst. 93(5), 1238–1243 (2010)

    Article  Google Scholar 

  56. D. Collins, Carrier-Grade Voice-over-IP, 2nd edn. (McGraw-Hill, New York, 2002)

    Google Scholar 

  57. A. Das, E. Paksoy, A. Gersho, Multimode and variable-rate coding of speech, in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam, 1995), pp. 257–288

    Google Scholar 

  58. J. Davidson, Voice-over-IP Fundamentals, 2nd edn. (Cisco, Indianapolis, 2006)

    Google Scholar 

  59. G.D. Forney, Coset codes. I. Introduction and geometrical classification. IEEE Trans. Inf. Theory 34(5), 1123–1151 (1988)

    Article  MathSciNet  Google Scholar 

  60. A. Gersho, Advances in speech and audio compression. Proc. IEEE 82, 900–918 (1994)

    Article  Google Scholar 

  61. J. Gibson, Speech coding methods, standards and applications. IEEE Circuits Syst. Magazine 5(4), 30–40 (2005)

    Article  Google Scholar 

  62. J. Gibson, J. Hu, Rate distortion bounds for voice and video, Foundations and Trends in Communications and Information Theory 10(4), 379–514 (2013), http://dx.doi.org/10.1561/0100000061, ISBN: 978-1-60198-778-5

  63. L. Hanzo, F.C.A. Somerville, J.P. Woodard, Voice and Audio Compression for Wireless Communications, 2nd edn. (Wiley, Chichester, 2007)

    Book  Google Scholar 

  64. O. Hersent, IP Telephony: Deploying VoIP Protocols and IMS Infrastructure (Wiley, Chichester, 2010)

    Google Scholar 

  65. K. Homayounfar, Rate adaptive speech coding for universal multimedia access. IEEE Signal Process. Magazine 20(2), 30–39 (2003)

    Google Scholar 

  66. ITU-T Rec. G.718, Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s, International Telecommunication Union (ITU) (2008)

    Google Scholar 

  67. M. Jelinek et~al., G.718: a new embedded speech and audio coding standard with high resilience to error-prone transmission channels. IEEE Commun. Magazine 46(10), 117–123 (2009)

    Google Scholar 

  68. W.B. Kleijn, Enhancement of coded speech by constrained optimization, in Proceedings of the IEEE Speech Coding Workshop (2002)

    Google Scholar 

  69. J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2, 1109–1112 (2005)

    Google Scholar 

  70. S. Ragot, B. Bessette, R. Lefebvre, Low-complexity multi-rate lattice vector quantization with application to wideband speech coding at 32 kbit/s, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 501–504 (2004)

    Google Scholar 

  71. M.R. Schroeder, B.S. Atal, Code-excited linear prediction (CELP): High-quality speech at very low bit rates, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (1984), pp. 937–940

    Google Scholar 

  72. D. Wright, Voice-over-Packet Networks (Wiley, Chichester, 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tokunbo Ogunfunmi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Ogunfunmi, T., Seto, K. (2015). Scalable and Multi-Rate Speech Coding for Voice-over-Internet Protocol (VoIP) Networks. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1456-2_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-1455-5

  • Online ISBN: 978-1-4939-1456-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics