Advertisement

Codage de la parole a bas et tres bas debits

  • Geneviève Baudoin
  • Jan Cernocky
  • Philippe Gournay
  • Gérard Chollet
Article
  • 45 Downloads

Résumé

Cet article présente les principales techniques de codage de parole à bas et très bas debits, de 50 bit/s à 4 000 bit/s. Puis il présente en détail la méthode hsx pour le codage à 1200 bit/s et une nouvelle approche segmentale utilisant des unités acoustiques obtenues de manière non supervisée pour des débits inférieurs à 400 bit/s.

Mots clés

Codage parole Bas débits Codage prédictif Vocodeur Qualité sonore Etat actuel technique Codeur Description systemé Evaluation systéme Traitement parole 

Speech coding at low and very low bit rates

Abstract

This paper reviews the main algorithms for speech coding at low and very low bit rates, from 50 bps to 4 000 bps. Then the hsx technique for coding at 1200 bps and a new segmental method with automatically derived units for very low bit rate coding are presented in details.

Keywords

Speech coding Low rate coding Predictive coding Vocoder Sound quality State of the art Coder System description System evaluation Speech processing 

Bibliographie

  1. [1]
    Atal (B.S.), Hanauer (S.L.) Speech Analysis and Synthesis by linear Prediction of the speech Wave,J. Acoust. Soc. Amer.,50 n° 2 p. 637–657, 1971.CrossRefGoogle Scholar
  2. [2]
    Atal (B.S.), Efficient coding of lpc parameters by temporal decomposition, InProceedings ieee icassp 83, pp. 1–84, 1983.Google Scholar
  3. [3]
    Baudoin (G.), Cernocky (J.), Chollet (G.), Quantization of spectral sequences using variable length spectral segments for speech coding at very low bit rate,Proceedings Eurospeech-97, pp. 1295–1298, Rhodes, 1997.Google Scholar
  4. [4]
    Bimbot (F.) An evaluation of temporal decomposition, Technical report, Acoustic research department at&t Bell Labs, 1990.Google Scholar
  5. [5]
    Bruhn (S.), Matrix Product Vector Quantization for Very Low bit Rate Speech Coding,Proceedings icassp-95, p. 724–727, 1995.Google Scholar
  6. [6]
    Cernocky (J.), Baudoin (G.), Chollet (G.), Segmental vocoder - going beyond the phonetic approach,Proceedings icassp98, pp. 605–608, Seattle, 1998.Google Scholar
  7. [7]
    Cernocky (J.), Baudoin (G.) and Chollet (G.) The use of ALISP for automatic acoustic-phonetic transcription,Proceedings SPoS-ESCA Workshop on Sound Patterns of Spontaneous Speech, pp. 149–152, Aix en Provence, 1998.Google Scholar
  8. [8]
    Cernocky (J.),Speech Processing Using Automatically Derived Segmental Units: Applications to Very Low Rate Coding and Speaker Verification, PhD thesis, Université Paris XI Orsay, 1998.Google Scholar
  9. [9]
    Cernocky (J.), I. Kopecek, Baudoin (G.), and Chollet (G.), Very low bit rate speech coding: comparison of data-driven units with syllable segments, InProceedings of Workshop on Text Speech and Dialogue (TSD’99), Lecture notes in computer science, Mariànské Làzne, Czech Republic, September 1999. Springer Verlag.Google Scholar
  10. [10]
    Cheng (Y.M.), O’Shaughnessy (D.), A 450 BPS Vocoder with natural sounding Speech.Proceedings icassp-90, p. 649–652, 1990.Google Scholar
  11. [11]
    Chollet (G.), Cernocky (J.), Constantinescu, Deligne (S.), and Bimbot (R).Computational models of speech pattern processing, chapter Towards alisp: a proposal for Automatic Language Independent Speech Processing, pp. 375–388. nato asi Series. Springer Verlag, 1999.Google Scholar
  12. [12]
    Chollet (G.), Cernocky (J.), Gravier (G.), Hennebert (J.), Petrovska (D.), Yvon (E), Toward Fully Automatic Speech Processing Techniques for Interactive Voice Servers, inSpeech Processing, Recognition and Artificial Neural Networks, Chollet (G.), Benedetto (M-G), Esposito (A.), Marino (M.) eds, Springer Verlag. 1999.Google Scholar
  13. [13]
    Chou (P.A.), Lookabaugh (T.), Variable dimension vector quantization of linear predictive coefficients of speech.Proceedings icassp-94. pp. I-505–508, Adelaide, 1994.Google Scholar
  14. [14]
    Crosmer (J.R.), Barnwell (T.P.), A Low Bit Rate Segment Vocoder Based on Line Spectrum Pairs,Proceedings icassp-85 pp. 240–243, 1985.Google Scholar
  15. [15]
    Deligne (S.),Modèles de séquences de longueurs variables: Application au traitement du langage écrit et de la parole, PhD thesis, École nationale supérieure des télécommunications (ENST), Paris, 1996.Google Scholar
  16. [16]
    Fette (B.), Jaskie (C), A 600 bps lpc Voice Coder,Proceedings milcom-91, pp. 1215–1219, 91.Google Scholar
  17. [17]
    Flanagan (J.-L.), Springer Verlag.Speech Analysis, Synthesis and Perception New York, 1965, 2nd ed. 1972.Google Scholar
  18. [18]
    Gersho (A.),Vector Quantization and Signal Compression Kluwer Academic Publisher 1996.Google Scholar
  19. [19]
    Gersho (A.), Advances in speech and audio compression,Proceedings ieee, 82(6):900–918, june 1994.CrossRefGoogle Scholar
  20. [20]
    Gibbon (D.), Moore (R.), and Winski (R.), editors,EAGLES Handbook on Spoken Language Systems, Mouton de Gruyter, 1997.Google Scholar
  21. [21]
    Gournay (P.), Charter (F.), A 1 200 bps hsx speech coder for very low bit rate communications,IEEE Workshop on Signal Processing System SiPS’98, Boston, 1998.Google Scholar
  22. [22]
    Griffin (D.W.) andLim (J.S.), « Multiband Excitation Vocoders »IEEE Trans, on Acoustics, Speech, and Signal Processing,36, n° 8, pp. 1223–1235, 1988.MATHCrossRefGoogle Scholar
  23. [23]
    Guilmin (G.), Le Bouquin-Jeannes (R.) etGournay (P.), Study of the influence of noise pre-processing on the performance of a low bit rate parametric speech coder,Eurospeech’99,5, pp. 2367–2370, Budapest 1999.Google Scholar
  24. [24]
    Ismail (M.) and Ponting (K.), Between recognition and synthesis 300 bps speech coding. InProceedings Eurospeech-97, pp. 441–444, Rhodos, 1997.Google Scholar
  25. [25]
    iso/iec jtc1/sc29/wg11 N2503-sub2, « Final Draft International Standard of iso/iec 14496-3 Subpart 2 », octobre 1998.Google Scholar
  26. [26]
    Jaskie (C), Fette (B.), A survey of low bit rate vocoders,dsp & Multimedia Technology, p 26–40, apr. 94.Google Scholar
  27. [27]
    Jeanrenaud (P.), Peterson (P.), Segment Vocoder Based on Reconstruction with Natural SegmentProceedings icassp-91, pp. 605–608, 1991.Google Scholar
  28. [28]
    Jelinek (M.), Baudoin (G.), Excitation Construction for the robust celp coder, InSpeech Recognition and Coding, new advances and trends. Springer Verlag, nato asi Serie F., Ed. par A. Rubio & J.-M. Lopez, pp. 439–443, 1995.Google Scholar
  29. [29]
    Kang (G.S.), Fransen (I.J.), Application of Line Spectrum Pairs to Low-Bit Rate Speech Encoders,Proceedings icassp-85. pp. 244–247, 85.Google Scholar
  30. [30]
    Kemp (D.P.), Collura (J.S.), Tremain (T.E.), Multiframe Coding of lpc Parameters at 600-800 bps,Proceedings icassp-91, pp. 609–612,91.Google Scholar
  31. [31]
    Kleijn (W.) Encoding Speech Using Prototype Waveforms,ieee Trans. Speech Audio Processing,1, n° 4, pp. 386–399, 1993.CrossRefGoogle Scholar
  32. [32]
    Kleijn (W.B.), Haagen (J.), A Speech Coder based on Decomposition of Characteristic Waveforms,Proceedings icassp-95, pp. 508–511, 1995.Google Scholar
  33. [33]
    Kleijn (W.B.), Haagen (J.), « Waveform Interpolation for Coding and Synthesis », inSpeech Coding and Synthesis, edited by Kleijn (W.B.) and Paliwal (K.K.), Elsevier 1995.Google Scholar
  34. [34]
    Laflamme (C), Salami (R.), Matmti (R.), and Adoul (J.-R), « Harmonic Stochastic Excitation (hsx) speech coding below 4 kbps »,IEEE International Conference on Acoustics, Speech,and Signal Processing, Atlanta, may 1996, pp. 204–207.Google Scholar
  35. [35]
    Linde (Y.), Buzo (A.), Gray (R.M.), Algorithm for Vector Quantization Design,IEEE trans, on communications, 28, p 84–95, jan. 1980.Google Scholar
  36. [36]
    Liu (Y.J.), Rothweiler (J.), A High Quality Speech Coder at 400 BPS,Proceedings icassp-89, pp. 204–206, 1989.Google Scholar
  37. [37]
    Lopez-Soler (E.), Favardin (N.), A combined quantization-Interpolation scheme for Very Low bit rate coding of speech lsp parameters,Proceedings icassp-93, p.II-21–24, 1993.Google Scholar
  38. [38]
    McAulay (R.), Quatieri (T.), Speech Analysis/Synthesis based on a sinusoïdal representation of speech,ieee trans. ASSP-34, n° 4, pp. 744, 1985.Google Scholar
  39. [39]
    McAulay (R.), Champion (T.), Improved Interoperable 2.4 kbps LPC Using Sinusoidal Transform Coder techniques,Proceedings icassp-90, pp. 641–643, 1990.Google Scholar
  40. [40]
    McAulay (R.), Quatieri (T.), Multirate Sinusoïdal Transform Coding at Rates from 2.4 kbps to 8kbps,Proceedings icassp-87, Dallas, 1987.Google Scholar
  41. [41]
    McAulay (R.), Quatieri (T.), Sine-Wave Phase Coding at Low Data Rates,Proceedings icassp-91, pp. 577–580, 1991.Google Scholar
  42. [42]
    McCree (A.), Truong (K.), George (E.B.), Barnwell (T.P.), Viswanathan (V.), A 2.4 Kbits/s melp Coder Candidate for the New U.S. Federal Standard,Proceedings icassp-96, pp. 200–203, 1996.Google Scholar
  43. [43]
    Mouy (B.), de La Noue (P.) and Goudezeune (G.), « nato sta- nag 4479: A standard for an 800 bps vocoder and channel coding in hf-eccm system »,IEEE International Conference on Acoustics, Speech, and Signal Processing, Detroit, may 1995, pp. 480–483.Google Scholar
  44. [44]
    Nishiguchi (M.), Inoue (A.), Maeda (Y), Matsumoto (J.), Parametric Speech Coding - hvxc at 2.0-4.0 kbps,Proc ieee Workshop on Speech Coding, 1999.Google Scholar
  45. [45]
    « Parameters and coding characteristics that must be common to assure interoperability of 2400 bps linear predictive encoded speech », nato Standard STANAG-4198-Edl, 13 february 1984.Google Scholar
  46. [46]
    Peterson (P.), Jeanrenaud (P.), vandegrift (J.), Improving Intelligibility at 300bps Segment Vocoder,Proceedings icassp-90, pp. 653–656, 1990.Google Scholar
  47. [47]
    Picone, Doddington (G.R.), A phonetic Vocoder,Proceedings icassp-89, pp. 580–583, 1989.Google Scholar
  48. [48]
    Potage (J.), Rochette (D.), Mathevon (G.), Speech Encoding Techniques for Low Bit Rate Coding Applicable to Naval Communications,Rev. Tech. Thomson-CSF,18, n° 1 pp. 171-205, mar. 86.Google Scholar
  49. [49]
    Rabiner (L.) and Juang (B.H.) Fundamentals of speech recognition, Signal Processing. Prentice Hall, Engelwood Cliffs, nj, 1993.Google Scholar
  50. [50]
    Ribeiro (C.) and Trancoso (M.), Phonetic vocoding with speaker adaptation, InProceedings Eurospeech-97, pp. 1291–1294, Rhodes, 1997.Google Scholar
  51. [51]
    Rothweiler (J.), Performances of a real time Low Rate Voice Coder.Proceedings icassp-86, pp. 3039–3042, 1986.Google Scholar
  52. [52]
    Roucos (S.), Schwarz (R.), Makhoul (J.), A segment vocoder at 150 bps,Proceedings icassp-83, pp. 61–64, 1983.Google Scholar
  53. [53]
    Roucos (S.), Wilgus (A.M.), The Waveform Segment Vocoder: A New Approach for Very Low Rate Speech Coding,Proceedings icassp-85, pp.236–239, 1985.Google Scholar
  54. [54]
    Roucos (S.), Schwarz (R.), Makhoul (J.), Segment Quantization for very-low rate speech coding,Proceedings icassp-82.Google Scholar
  55. [55]
    Schroeder (M.R.), Atal (B.), Code-Excited Linear Prediction (celp): High Quality Speech at Very Low Bit Rates,Proceedings ieee icassp-85, pp. 937–940, Tamp, 1985.Google Scholar
  56. [56]
    Schwartz (R.M.), Roucos (R.M.), A Comparison of Methods for 300-400 B/S Vocoders,Proceedings icassp-83, 83.Google Scholar
  57. [57]
    Shiraki (Y), Honda (M.), LPC speech coding based on Variable Length Segment Quantization,ieee trans, on assp, vol.36, n° 9, pp. 1437–1444, sept. 1988, pp. 1565-1568, 82.MATHCrossRefGoogle Scholar
  58. [58]
    Shoham (Y), « Very low complexity interpolative speech coding at 1.2 to 2.4 kbps »,IEEE International Conference on Acoustics,Speech, and Signal Processing, Munich, april 1997, pp. 1599–1602.Google Scholar
  59. [59]
    Spanias, Speech coding: A Tutorial Review,Proceedings ieee,82(10) 1541–1582, oct. 1994.Google Scholar
  60. [60]
    Stylianou (Y), Dutoit (T), Schroeter (J.), Diphone concatenation using a Harmonic plus Noise Model of Speech,Proceedings Eurospeech-97, Rhodes, sept. 1997.Google Scholar
  61. [61]
    Supplee (L.M.), Cohn (R.P.), Collura (J.S.), McCree (A.V.), « melp : The new federal standard at 2400 bps »,ieee International Conference on Acoustics, Speech, and Signal Processing, Munich, April 1997, pp. 1591-1594.Google Scholar
  62. [62]
    Specifications for the Analog to Digital Conversion of Voice by 2,400 Bit /Second Mixed Excitation Linear Prediction.Federal Information Processing Standards Publication (FOPS PUB) Draft-may 1998.Google Scholar
  63. [63]
    Tokuda (K.), Masuko (T.), Hiroi (J.), Kobayashi (T), Kitamara (T.), A very low bit rate speech coder using hmm-based speech recognition/synthesis techniques, InProceedings icassp-98, pp. 609–612, 1998.Google Scholar
  64. [64]
    Tremain (T.E.), The government standard Linear Predictive Coding Algorithm: LPClO.Speech Technology,1, n° 2, pp. 40–49, apr. 1982.Google Scholar
  65. [65]
    Young (S.), Jansen (J.), Odell (J.), Ollason (D.), Woodland (P.),The HTK book. Entropics Cambridge Research Lab., Cambridge, UK, 1996.Google Scholar
  66. [66]
    Wong (D.Y.), Juang (B.H.), Cheng (D.Y.), Very Low Data Rate Speech compression using lpc Vector and Matrix Quantization,Proceedings icassp-83, pp. I-65–68, 83.Google Scholar
  67. [67]
    Le test de diagnostic par paires minimales, adaptation au français duDiagnostic rythm test de W.D. Voiers,Revue d’acoustiques, n° 27, 1973.Google Scholar

Copyright information

© Springer-Verlag 2000

Authors and Affiliations

  1. 1.Département Signaux et TélécommunicationsESIEENoisy Le
  2. 2.Institut de RadioelectroniqueUniversité Technique de BRNOBRNORepublique Tcheque
  3. 3.Thomson-csF CommunicationsGennevilliers cedex
  4. 4.CNRS-URA-820, ENST-TSIPARIS cedex 13

Personalised recommendations