Skip to main content

Sine-Wave Amplitude Coding at Low Data Rates

  • Chapter
Advances in Speech Coding

Abstract

An analysis/synthesis system based on the sinusoidal speech model has been developed [1]. In that system, the sine-wave amplitudes and frequencies are located by searching for the peaks of the magnitude of the short-time Fourier transform (STFT) of the input speech. The phases are computed from the real and imaginary parts of the STFT at the measured frequencies. The frequencies on successive frames are matched, used in a cubic phase interpolator and applied to a sine-wave generator. Each sine wave is amplitude-modulated by the linear interpolation of the matched sine-wave amplitudes. At a 10 ms frame rate, this system produces speech that is perceptually indistinguishable from the original [1]. Since it is not possible to code all of the sine-wave parameters at low data rates, a system has been developed that codes the sine-wave frequencies by fitting a harmonic set of sine waves to the input waveform using a modified mean-squared error criterion [2], and codes the phase information implicitly using a voicing adaptive transition frequency to provide for a mixed voiced/unvoiced phase excitation model [3]. Provided a postfilter is used at the synthesizer to attenuate the noise in the formant nulls, the speech synthesized by this system is of quite high quality having achieved a DAM score of 63.0 in the uncoded mode. Since the fundamental frequency can be coded using ≈ 7 bits and the voicing measure can be coded using ≈ 3 bits, then the possibility exists for good speech quality at low data rates provided the sine-wave amplitudes can be coded efficiently. In this paper the zero-phase, harmonic analysis/synthesis system and the post-filter design methodology will be described and then the various techniques that have been examined for coding the sine-wave amplitudes will be discussed.

This work was sponsored by the Department of the Air Force.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. J. McAulay and T. F. Quatieri, “Speech Analysis/Synthesis Based on a Sinusoidal Representation,” IEEE Trans. Acoust., Speech and Signal Proc, Vol. ASSP-34, No. 4, August 1986, p. 744.

    Article  Google Scholar 

  2. R. J. McAulay and T. F. Quatieri, “Pitch Estimation and Voicing Detection Based on a Sinusoidal Speech Model,” IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’90), Albuquerque, NM, April 1990.

    Google Scholar 

  3. J. Makhoul, R. Viswanathan, R. Schwartz, and A. W. F. Huggins, “A Mixed-Source Model for Speech Compression and Synthesis,” Proc. IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’78), Tulsa, OK, p. 163, April 1978.

    Google Scholar 

  4. R. J. McAulay and T. F. Quatieri, “Phase Modeling and Its Application to Sinusoidal Transform Coding,” Proc. IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’86), Tokyo, Japan, p. 1713, April 1986.

    Google Scholar 

  5. R. J. McAulay and T. F. Quatieri, “Phase Coherence in Speech Reconstruction for Enhancement and Coding Applications,” Proc. IEEE International Conf. Acoust., Speech and Signal Proc. (ICASSP’89), Glasgow, Scotland, p. 207, May 1989.

    Google Scholar 

  6. J.-H. Chen and A. Gersho, “Real-Time Vector APC Speech Coding at 4800 b/s with Adaptive Postfiltering,” IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’87), Dallas, TX, p. 51.3.1, April 1987.

    Google Scholar 

  7. D. B. Paul, “The Spectral Envelope Estimation Vocoder,” IEEE Trans. Acoust., Speech and Signal Proc., Vol. ASSP-29, No. 4, p. 786, August 1981.

    Article  Google Scholar 

  8. J.N. Holmes, “The JSRU Channel Vocoder,” in Proc. Inst. Elect. Eng., 127, Pt. F, February 1980.

    Google Scholar 

  9. M. J. Sabin, “DPCM Coding of Spectral Amplitudes without Positive Slope Overload,” IEEE Trans. Acoust., Speech and Signal Proc. (to appear).

    Google Scholar 

  10. F. Itakura and S. Saito, “A Statistical Method for Estimation of Speech Spectral Density and Formant Frequencies,” Electron. Commun. Japan, Vol. 53-A, p. 36, 1970.

    Google Scholar 

  11. R. J. McAulay and T. Champion, “Improved Interoperable 2.4 kb/s LPC Using Sinusoidal Transform Coder Techniques,” IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’90), Albuquerque, NM, April 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1991 Springer Science+Business Media New York

About this chapter

Cite this chapter

McAulay, R., Parks, T., Quatieri, T., Sabin, M. (1991). Sine-Wave Amplitude Coding at Low Data Rates. In: Atal, B.S., Cuperman, V., Gersho, A. (eds) Advances in Speech Coding. The Springer International Series in Engineering and Computer Science, vol 114. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-3266-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-3266-8_20

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-6437-5

  • Online ISBN: 978-1-4615-3266-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics