Modeling Techniques for Speech Coding: A Selected Survey

  • C. García-Mateo
  • D. Docampo-Amoedo


Classification of speech coding algorithms is not an easy task, and can be made using very different criteria. We usually distinguish them depending on the target bandwidth. Thus, we talk about telephonic speech coding or just speech coding, when referring to the coding of a speech signal between 300 and 3400 Hz and wideband speech coding when we talk about 7 KHz bandwidth. Both categories have up to now been considered separately. Moreover, wideband speech coding has often been covered by an extension of the telephonic speech algorithms using subband filtering. The cross-fertilization process that takes place when facing a global vision of both scenarios can one day lead to discover more efficient procedures. The present paper will have the following objective: to describe the underlying ideas behind the main coding methods.


Speech Signal Linear Prediction Speech Code Synthetic Speech Pitch Period 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    A. Gersho, “Advances in Speech and Audio Compression,”Proceedings of the IEEE, Vol 82, No. 6, June 1994, pp. 1 – 10.CrossRefGoogle Scholar
  2. [2]
    A.S. Spanias, “Speech Coding: A Tutorial Review”;Proceedings of the IEEE, vol 82., No. 10, October 1994, pp. 1541 – 1582.CrossRefGoogle Scholar
  3. [3]
    P. Noll, “Speeech Coding for Communications”;Proc. EUROSPEECH, 1993, pp. 479–488.Google Scholar
  4. [4]
    P. O’Shaugnhessy, “Speech Communication: Human and Machine”; Addison-Wesley, 1987.Google Scholar
  5. [5]
    J. P. Campbell, T. E. Tremain, and W. E. Welch, “The DoD 4.8 Kbps Standard (Proposed Federal Standard 1016)”; in Advances in Speech Coding, B.S. Atal, V. Cuperman, A. Gersho,Kluwer Academic Publishers, 1991, pp. 121 – 133.CrossRefGoogle Scholar
  6. [6]
    K. K. Paliwal and B. S. Atal “Efficient Vector Quantization of LPC Parameters at 24 Bits/Frame”;Proc. ICASSP, 1991, pp. 661 – 664.Google Scholar
  7. [7]
    J.S. Marques, I. M. Trancoso, J. M. Tribolet, and L. B. Almeida, “Improved Pitch Prediction with Fractional Delays in CELP Coding”;Proc. ICASSP, 1990, pp. 665 – 668.Google Scholar
  8. [8]
    P. Kroon and B. S. Atal. “On the Use of Pitch Predictors with High Temporal Resolution”;IEEE Trans. Acoust., Speech, Signal Process., ASSP-39, no. 3, March 1991, pp. 733 – 735.Google Scholar
  9. [9]
    W. B Kleijn, R. P. Ramachandran and P. Kroon. “Interpolation of the Pitch-Predictor Parameters in Analysis-by-Synthesis Speech Coders”;IEEE Trans, on Speech and Audio Processing,vol. 2, no.3, Part I, January 1994, pp. 42 – 54.CrossRefGoogle Scholar
  10. [10]
    J. Chen, A. Gersho, “Real-time APC Speech Coding at 4800 bps with Adaptive Postfiltering”;Proc. ICASSP, 1987, pp. 2185 – 2189.Google Scholar
  11. [11]
    MR Schroederand BS Atal “High Quality Speech at Low Bit Rates”Proc ICASSP, 1985, pp. 937 – 940.Google Scholar
  12. [12]
    I.A. Gerson and M. A. Jasiuk, “Vector Sum Excited Linear Prediction (VSELP)”; inAdvances in Speech Coding, B.S. Atal, V. Cuperman, and A. Gersho, editors; Kluwer Academic Publishers, 1991, pp. 69 – 79.CrossRefGoogle Scholar
  13. [13]
    S. Miki, K. Mano, H. Ohmuro, and T. Moriya, “Pitch Synchronous Innovation CELP (PSI-CELP)”;Proc. European Conference on Speech Communication and Technology, 1993, pp. 261 – 264.Google Scholar
  14. [14]
    I.Trancoso and B. Atal, “Efficient Search Procedures for Selecting the Optimum Innovation in Stochastic Coders”;IEEE Trans. Acoust., Speech, Signal ProcessASSP-38, no. 3, March 1990, pp. 385 – 396.CrossRefGoogle Scholar
  15. [15]
    I. Trancoso, “An Overview of Different Trends on CELP Coding”; inNATO-ASI Bubión, July 1993.Google Scholar
  16. [16]
    L. A. Hernández-Gómez, F. J. Casajús Quirós, C. Garcia-Mateo, and J. Ortega-Garcia, “Real-Time Implementåtion and Evaluation of Variable Rate CELP Coders”;Proc. ICASSP, 1991, pp. 585 – 588.Google Scholar
  17. [17]
    P. Kroon and B. S. Atal. “A High-Quality Multirate Real-Time CELP Coder”.IEEE Journal on Selected Areas in Communications, vol. 10, no. 5, June 1992, pp. 850 – 857.CrossRefGoogle Scholar
  18. [18]
    I. Boyd and C. Southcott, “A Speech Codec for the Skyphone Service”;British Telecom Tech. J., vol 6., No.2, April 1998.Google Scholar
  19. [19]
    S. Singhal and B.S. Atal,“Amplitude Optimization and Pitch Prediction in Multipulse Coders”;IEEE Trans. Acoust., Speech, Signal Process. vol 37, no. 3 March 1989, pp. 317 – 327.CrossRefGoogle Scholar
  20. [20]
    D. Docampo, V. Abreu, and F. González; “A New Approach to Multipulse Speech Coding”European Transactions on Telecommunications, vol. 3 No. 6, Nov.-Dec. 1992 pp. 617–622.CrossRefGoogle Scholar
  21. [21]
    B. Atal and J. Remde; “A New Model for LPC Excitation for Producing Natural Sounding Speech at Low Bit Rates”;Proc. ICASSP1982 pp. 614 – 617.Google Scholar
  22. [22]
    S. Singhal and B. Atal; “Optimizing LPC Parameters for Multipulse Excitation”;Proc. ICASSP 1983, pp. 781–784.Google Scholar
  23. [23]
    M. Fratti, G.A. Mian, and G. Riccardi; “On the Effectiveness of Parameter Reoptimization in Multipulse Based Coders”;Proc. ICASSP, 1992, pp. 73 – 76.Google Scholar
  24. [24]
    A.R. Figueiras, D. Docampo, J.R. Casar and A. Artés, “Adaptive Iterative Algorithms for Spiky Deconvolution”;IEEE Trans. Acoust., Speech, Signal Process., vol 38, no. 8, August 1990, pp. 1462 – 1466.CrossRefGoogle Scholar
  25. [25]
    D. Docampo, V. Abreu, F. Pérez, and F. González; “A Deconvolution-Based Efficient Method for Generating the Excitation in Multipulse Coders”;Proc. ICASSP, 1992,pp. 341 – 344.Google Scholar
  26. [26]
    V. Abreu and D. Docampo; “A Network-Controlled Variable Rate Multipulse Speech Coder”;Proc.EUSIPCO, pp. 371 – 374, 1994.Google Scholar
  27. [27]
    J. H. Chen and R. V. Cox, “The Creation and Evolution of 16 kbit/s LD-CELP: from Concept to Standard”;Speech Communication,, vol. 12, no. 2, June 1993, pp. 103 – 111.CrossRefGoogle Scholar
  28. [28]
    J. H. Chen, R. V. Cox, Y.-C Lin, and N. Jayant, “A Low-delay CELP Coder for the CCITT 16 kb /s Speech Coding Standard”;IEEE Journal on Selected Areas in Communications, vol. 10, no. 5 June 1992, pp. 830 – 849.CrossRefGoogle Scholar
  29. [29]
    J. S. Lim and D. W. Griffin “Multiband Excitation Vocoder”;IEEE Trans. Acoust., Speech, Signal Process, ASSP-36, 1987, pp. 1223 – 1235.Google Scholar
  30. [30]
    R. J. McAulay, and T. F. Quatieri, “Speech Analysis/ Synthesis Based on a Sinusoidal Representation”;IEEE Trans. Acoust., Speech, Signal Process, Vol. ASSP-34, No. 4 August 1986.Google Scholar
  31. [31]
    R. J. McAulay and T. F. Quatieri, “Low-Rate Speech Coding Based on Sinusoidal Model”; inAdvances in Speech Coding, Ed. Marcel-Dekker, ch. 6, 1992, pp. 165–208.Google Scholar
  32. [32]
    J. S. Marques, L. B. Almeida, and J. M. Tribolet, “;Harmonic Coding at 4.8 Kb/s”Proc. ICASSP1990, pp. 17–20.Google Scholar
  33. [33]
    J. S. Marques and A. J. Abrantes, “Hybrid Harmonic Coding of Speech at Low-Bit Rates”;Speech Communication, vol. 14, June 1994, pp. 231 – 247.CrossRefGoogle Scholar
  34. [34]
    “INMARSAT-M Voice Coding System Description”;DRAFT version 1.3, 1991.Google Scholar
  35. [35]
    R. J. McAulay and T. F. Quatieri, “Pitch-Estimation and Voicing Detection Based on a Sinusoidal Speech Model,” inProc. ICASSP, 1990, pp. 249–252.Google Scholar
  36. [36]
    R. J. McAulay and T. F. Quatieri,“Sine-Wave Phase Coding at Low Data Rates,” inProc. ICASSP. vol. 2, pp. 577 – 580, 1991.Google Scholar
  37. [37]
    H. Carl, “Full-Band Harmonic Sieve Pitch Extractor for a Harmonic Coder,”Proc. EUSIPCO, 1992, pp. 315 – 318.Google Scholar
  38. [38]
    C. Garcia-Mateo, E. R. Banga, J. L. Alba, and L. Hernández, “Analysis, Synthesis and Quantization Procedures for a 2.5 kbps Voice Coder Obtained by Combining LP Coding and Harmonic Coding”;Proc. EUSIPCO, 1992, pp. 471 – 474.Google Scholar
  39. [39]
    S. Mallat and W. L. Hwang, “Singularity Detection and Processing with Wavelets”;IEEE Transactions on Information Theory, Vol. 384, No. 2 March 1992.MathSciNetGoogle Scholar
  40. [40]
    N. González and D. Docampo “Application of Singularity Detection with Wavelets for Pitch Estimation of Speech Signals”;Proc. EUSIPCO, 1994, pp. 1657 – 1660.Google Scholar
  41. [41]
    C. Garcia-Mateo, J. L. Alba, and E. R. Banga, “Speech Coding Using Bi-Harmonic Spectral Modeling,” inProc. EUSIPCO, 1994, pp. 391 – 394.Google Scholar
  42. [42]
    A. Das and A. Gersho. “Enhanced Multiband Excitation Coding of Speech at 2.4 Kb/s with Phonetic Classification and Variable Dimension VQ”;Proc. EUSIPCO, 1994, pp. 943 – 946.Google Scholar
  43. [43]
    P. Mermelstein, “G.722, a New CCITT Coding Standard for Digital Transmission of Wideband Audio Signal”,IEEE Comm. Mag.Jan. 1988, pp. 8 – 15.Google Scholar
  44. [44]
    A. Fuldseth, E. Harborg, F. T. Johansen, and J. E. Knudsen. “A Real-Time Implementable 7 kHz Speech Coder at 16 kbit/s”;Proc. EUR0SPEECH, 1991.Google Scholar
  45. [45]
    A. Fuldseth, E. Harborg, F. T. Johansen, and J. E. Knudsen. “Pitch Prediction in a Wideband CELP Code”,Proc. EUSIPCO, 1992, pp. 499 – 502.Google Scholar
  46. [46]
    E. Harborg, J. E. Knudsen, A. Fuldseth, and F. T. Johansen. “A Real-Time Wideband CELP Coder for a Videophone Application”;Proc. ICASSP, 1994, pp. 121 – 124.Google Scholar
  47. [47]
    R. Lefebvre, R. Salami, C. Laflamme, and J. P. Adoul, “High Quality Coding of Wideband Audio Signals Using.Transform Coded Excitation (TCX)”;Proc. ICASSP, 1994, pp. 193 – 196.Google Scholar
  48. [48]
    E. Ordentlich and Y. Shoham, “Low-Delay Code-Excited Linear- Predictive Coding of Wideband Speech at 32 Kbps”;Proc. ICASSP, 1991, pp. 9 – 12.Google Scholar
  49. [49]
    N. S. Jayant, J. D. Johnston, and Y. Shoham, “Coding of Wideband Speech”;Speech Communication, vol. 11, no. 2–3, June 1992, pp. 127 – 138.CrossRefGoogle Scholar
  50. [50]
    O. Gottesman and Y. Shoham, “Real-Time Implementation of High-Quality 32 Kbps wideband LD-CELP Coder”;Proc. EUROSPEECH, 1993, pp. 1115 – 1119.Google Scholar
  51. [51]
    International Standard ISO/IEC DIS 11172, “Information Technology: Coding of Moving Pictures and Associated Audio for Digital Storage Media up to 1.5 Mbps.- Section 3”; 1992.Google Scholar
  52. [52]
    Y. F. Dehery, M. Lever, and P. Urcun, “A MUSICAM Source Codec for Digital Audio Broadcasting and Storage”;Proc. ICASSP, 1991, pp. 3605 – 3608.Google Scholar
  53. [53]
    A. Pena, A. Martinez, D. Martinez, A. Fraile, and C. Garcia, “On the Implementation of a MUSICAM Mono Channel Coder at 128 kbps in a Single Fixed-Point Processor”;94th Audio Eng. Soc. Conv. Berlin, no. Preprint 3564, 1993.Google Scholar
  54. [54]
    K. Brandenburg, J. Herre J.D. Johnston, Y. Matieux, and E.F. Schroeder,“ASPEC: Adaptive Spectral Perceptual Entropy Coding of High Quality Music Signals”, in90th Audio Eng. Soc. Conv. Berlin, no. Preprint 3011, 1991.Google Scholar
  55. [55]
    A. Pena, A. Fraile and F. Cendán, “Source Perceptual Audio Coding for Application on High and Low Performance Terminals”, inProc. COST 229 Workshop on Intelligent Terminals and Source Channel,1993.Google Scholar
  56. [56]
    W. B. Kleijn, “Encoding Speech Using Prototype Waveforms”,IEEE Trans. Speech and Audio Processing, 1993, pp. 386 – 399. Google Scholar
  57. [57]
    W. B. Kleijn and J. Haagen, “A General Waveform- Interpolation Structure for Speech Coding,”Proc. EUSIPCO, 1994, pp. 1665 – 1669.Google Scholar
  58. [58]
    Y. Shoham, “High-Quality Speech Coding at 2.4 Kbps to 4 Kbps Based on Time-Frequency Interpolation”;Proc. ICASSP, 1993, pp. 167 – 170.Google Scholar
  59. [59]
    Y. Shoham, “High-Quality Speech Coding at 2.4 Kbps Based on Time-Frequency Interpolation”;Proc.EUROSPEECH, Berlin (Germany), 1993, pp. 741 – 744.Google Scholar
  60. [60]
    B. Townshend, “Nonlinear Prediction of Speech Signals”; inNonlinear Modeling and Forecasting, SFI Studies in the Sciences of Complexity, 1992, pp. 433–453.Google Scholar
  61. [61]
    B. Townshend, “Nonlinear Prediction of Speech”;Proc. ICASSP, 1990.Google Scholar
  62. [62]
    A. Gersho, “Variable-Rate Speech Coding”;Proc. EUSIPCO, 1994, pp. 1169 – 1173.Google Scholar
  63. [63]
    D. Sinha and A. H. Tewfik,“Low Bir Rate Transparent Audio Compression using Adaptive Wavelets”;IEEE Trans. on Signal Processing, Vol. 34, No. 12, December 1993, pp. 3463 – 3479.CrossRefGoogle Scholar
  64. [64]
    N. González and D. Docampo “On the Application of the Discrete Wavelet Transform to Speech Coding”;working paper, 1995Google Scholar

Copyright information

© Springer-Verlag London Limited 1996

Authors and Affiliations

  • C. García-Mateo
    • 1
  • D. Docampo-Amoedo
    • 1
  1. 1.E.T.S.I. TelecomunicaciónUniversidad de Vigo-SpainSpain

Personalised recommendations