Advertisement

Spectral Envelope and Perceptual Masking Models

  • Tom BäckströmEmail author
Chapter
Part of the Signals and Communication Technology book series (SCT)

Abstract

Envelope models describe the gross shape of a signal, such as the magnitude spectrum of a speech signal. An envelope model of the spectrum is thus a source model of the speech signal. Perceptual (frequency) masking models are evaluation models, which describe the magnitude of the perceptually detrimental effect of errors in different parts of the spectrum. The two models tend to have similar shapes, whereby they are described jointly in this chapter. In CELP-type codecs, envelope models are usually based on linear prediction, whereby that will be the main theme of this chapter.

Keywords

Speech Signal Linear Prediction Vocal Tract Tube Model Spectral Envelope 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    3GPP. TS 26.190, Adaptive Multi-Rate (AMR-WB) speech codec (2007)Google Scholar
  2. 2.
    3GPP. TS 26.445, EVS Codec Detailed Algorithmic Description; 3GPP Technical Specification (Release 12) (2014)Google Scholar
  3. 3.
    Ammar, G.S., Gragg, W.B.: Superfast solution of real positive definite Toeplitz systems. SIAM J. Matrix Anal. Appl. 9(1), 61–76 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bäckström, T., Ghido, F., Fischer, J.: Blind recovery of perceptual models in distributed speech and audio coding. In: Proceedings of the Interspeech (2016)Google Scholar
  5. 5.
    Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms. Wiley, New York (2013)zbMATHGoogle Scholar
  6. 6.
    Bojanczyk, A.W., Brent, R.P., De Hoog, F.R., Sweet, D.R.: On the stability of the Bareiss and related Toeplitz factorization algorithms. SIAM J. Matrix Anal. Appl. 16(1), 40–57 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Bäckström, T.: Vandermonde factorization of Toeplitz matrices and applications in filtering and warping. IEEE Trans. Signal Process. 61(24), 6257–6263 (2013)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Bäckström, T., Fischer Pedersen, C., Fischer, J., Pietrzyk, G.: Finding line spectral frequencies using the fast Fourier transform. In: Proceedings of the ICASSP, pp. 5122–5126 (2015)Google Scholar
  9. 9.
    Bäckström, T., Helmrich, C.R.: Decorrelated innovative codebooks for ACELP using factorization of autocorrelation matrix. In: Proceedings of the Interspeech, pp. 2794–2798 (2014)Google Scholar
  10. 10.
    Bäckström, T., Helmrich, C.R.: Arithmetic coding of speech and audio spectra using TCX based on linear predictive spectral envelopes. In: Proceedings of the ICASSP, pp. 5127–5131 (2015)Google Scholar
  11. 11.
    Bäckström, T., Magi, C.: Properties of line spectrum pair polynomials - a review. Signal Process. 86(11), 3286–3298 (2006)CrossRefzbMATHGoogle Scholar
  12. 12.
    Bäckström, T., Magi, C.: Effect of white-noise correction on linear predictive coding. IEEE Signal Process. Lett. 14(2), 148–151 (2007)CrossRefGoogle Scholar
  13. 13.
    Cybenko, G.: The numerical stability of the Levinson-Durbin algorithm for Toeplitz systems of equations. SIAM J. Sci. Stat. Comput. 1(3), 303–319 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Durbin, J.: The fitting of time-series models. Revue de l’Institut International de Statistique, pp. 233–244 (1960)Google Scholar
  15. 15.
    Ekudden, E., Hagen, R., Johansson, I., Svedberg, J.: The adaptive multi-rate speech coder. In: 1999 IEEE Workshop on Speech Coding Proceedings, pp. 117–119. IEEE (1999)Google Scholar
  16. 16.
    Fastl, H., Zwicker, E.: Psychoacoustics: Facts and Models, vol. 22. Springer, New York (2006)Google Scholar
  17. 17.
    Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Springer, Berlin (1992)CrossRefzbMATHGoogle Scholar
  18. 18.
    Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. John Hopkins University Press, Baltimore (1996)zbMATHGoogle Scholar
  19. 19.
    Gray, R.: Vector quantization. IEEE Trans. Acoust. Speech Signal Process. 1(2), 4–29 (1984)Google Scholar
  20. 20.
    Hayes, M.H.: Statistical Digital Signal Processing and Modeling. Wiley, New York (1996)Google Scholar
  21. 21.
    Itakura, F.: Line spectrum representation of linear predictor coefficients of speech signals. J. Acoust. Soc. Am. 57, S35 (1975)CrossRefGoogle Scholar
  22. 22.
    Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of the 6th International Congress on Acoustics, vol. 17, pp. C17–C20 (1968)Google Scholar
  23. 23.
    ITU-T Recommendation G.718. Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s (2008)Google Scholar
  24. 24.
    Kabal, P.: Ill-conditioning and bandwidth expansion in linear prediction of speech. In: Proceedings of the ICASSP, vol. 1, p. I–824. IEEE (2003)Google Scholar
  25. 25.
    Kabal, P., Ramachandran, R.P.: The computation of line spectral frequencies using Chebyshev polynomials. IEEE Trans. Acoust. Speech Signal Process. 34(6), 1419–1426 (1986)CrossRefGoogle Scholar
  26. 26.
    Krishna, H., Wang, Y.: The split Levinson algorithm is weakly stable. SIAM J. Numer. Anal. 30(5), 1498–1508 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Le Roux, J., Gueguen, C.: A fixed point computation of partial correlation coefficients. IEEE Trans. Acoust. Speech Signal Process. 25(3), 257–259 (1977)CrossRefzbMATHGoogle Scholar
  28. 28.
    Lee, M.S., Kim, H.K., Lee, H.S.: A new distortion measure for spectral quantization based on the lsf intermodel interlacing property. Speech commun. 35(3), 191–202 (2001)CrossRefzbMATHGoogle Scholar
  29. 29.
    Levinson, N.: The wiener RMS (root mean square) error criterion in filter design and prediction (1947)Google Scholar
  30. 30.
    Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)CrossRefGoogle Scholar
  31. 31.
    Makhoul, J., Roucos, S., Gish, H.: Vector quantization in speech coding. Proc. IEEE 73(11), 1551–1588 (1985)CrossRefGoogle Scholar
  32. 32.
    Markel, J.E., Gray, A.H.: Linear Prediction of Speech. Springer, New York (1982)zbMATHGoogle Scholar
  33. 33.
    McCree, A., Truong, K., George, E.B., Barnwell, T.P., Viswanathan, V.: A 2.4 kbit/s MELP coder candidate for the new US federal standard. In: Proceedings of the ICASSP, vol. 1, pp. 200–203. IEEE (1996)Google Scholar
  34. 34.
    Paliwal, K.K., Kleijn, W.B.: Quantization of LPC parameters. Speech Coding and Synthesis, pp. 433–466. Elsevier, New York (1995)Google Scholar
  35. 35.
    Paliwal, K.K., Atal, B.S.: Efficient vector quantization of LPC parameters at 24 bits/frame. IEEE Trans. Speech Audio Process. 1(1), 3–14 (1993)Google Scholar
  36. 36.
    Pindyck, R.S., Rubinfeld, D.L.: Econometric Models and Economic Forecasts, vol. 4. Irwin/McGraw-Hill, Boston (1998)Google Scholar
  37. 37.
    Soong, F., Juang, B.: Line spectrum pair (LSP) and speech data compression. Proc. ICASSP 9, 37–40 (1984)Google Scholar
  38. 38.
    Stewart, M.: A superfast Toeplitz solver with improved numerical stability. SIAM J. Matrix Anal. Appl. 25(3), 669–693 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Vaidyanathan, P.P.: The theory of linear prediction. In: Synthesis Lectures on Signal Processing, vol. 2, pp. 1–184. Morgan & Claypool publishers (2007)Google Scholar
  40. 40.
    Vu, H.L., Lois, L.: Efficient distance measure for quantization of lsf and its karhunen-loeve transformed parameters. IEEE Trans. Speech Audio Process. 8(6), 744–746 (2000)CrossRefGoogle Scholar
  41. 41.
    Walker, G.: On periodicity in series of related terms. In: Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, pp. 518–532 (1931)Google Scholar
  42. 42.
    Wang, Y., Krishna, H., Krishna, B.: Split Levinson algorithm is weakly stable. In: Proceedings of the ICASSP, pp. 1215–1218. IEEE (1989)Google Scholar
  43. 43.
    Lagrange multiplier. https://en.wikipedia.org/wiki/Lagrange_multiplier. Accessed 2 Oct 2016
  44. 44.
    Yedlapalli, S.S.: Transforming real linear prediction coefficients to line spectral representations with a real FFT. IEEE Trans. Speech Audio Process. 13(5), 733–740 (2005)CrossRefGoogle Scholar
  45. 45.
    Yule, G.U.: On a method of investigating periodicities in disturbed series, with special reference to wolfer’s sunspot numbers. In: Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, pp. 267–298 (1927)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.International Audio Laboratories Erlangen (AudioLabs)Friedrich-Alexander University Erlangen-Nürnberg (FAU)ErlangenGermany

Personalised recommendations