Skip to main content

Principles of Entropy Coding with Perceptual Quality Evaluation

  • Chapter
  • First Online:
Speech Coding

Part of the book series: Signals and Communication Technology ((SCT))

  • 1047 Accesses

Abstract

The objective of speech coding is to transmit speech at the highest possible quality with the lowest possible amount of resources. To achieve the best compromise, we can use available information about 1. the source, which is the speech production system, 2. the quality measure or evaluation criteria, which depends on the performance of the human hearing system and 3. the statistical frequency and distribution of the involved parameters. By developing models for all such information, we can optimise the system to perform efficiently. In practice, the three methods are overlapping in the sense that it is often difficult to make a clear-cut separation between them. While source modelling was already discussed in Chap. 2, this chapter reviews entropy coding methods and the associated perceptual modelling methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms. Wiley, New Jersey (2013)

    MATH  Google Scholar 

  2. Bosi, M., Goldberg, R.E.: Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, Dordrecht (2003)

    Book  Google Scholar 

  3. Bäckström, T.: Vandermonde factorization of Toeplitz matrices and applications in filtering and warping. IEEE Trans. Signal Process. 61(24), 6257–6263 (2013)

    Article  MathSciNet  Google Scholar 

  4. Bäckström, T., Helmrich, C.R.: Decorrelated innovative codebooks for ACELP using factorization of autocorrelation matrix. In: Proceedings of the Interspeech, pp. 2794–2798 (2014)

    Google Scholar 

  5. Edler, B.: Coding of audio signals with overlapping block transform and adaptive window functions. Frequenz 43(9), 252–256 (1989)

    Article  Google Scholar 

  6. Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Springer, New York (1992)

    Book  MATH  Google Scholar 

  7. Gibson, J.D., Sayood, K.: Lattice quantization. Adv. Electron. Electron Phys. 72, 259–330 (1988)

    Article  Google Scholar 

  8. Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. John Hopkins University Press, Maryland (1996)

    MATH  Google Scholar 

  9. Gray, R.M., Neuhoff, D.L.: Quantization. IEEE Trans. Inf. Theory 44(6), 2325–2383 (1998)

    Article  MATH  Google Scholar 

  10. Jayant, N.S., Noll, P.: Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, New Jersey (1984)

    Google Scholar 

  11. Mitra, S.K.: Digital Signal Processing: A Computer-Based Approach. McGraw-Hill, Boston (1998)

    Google Scholar 

  12. Pisoni, D., Remez, R.: The Handbook of Speech Perception. Wiley, New Jersey (2008)

    Google Scholar 

  13. Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)

    Google Scholar 

  14. Sanchez, V.E., Adoul, J.-P.: Low-delay wideband speech coding using a new frequency domain approach. In: Proceedings of the ICASSP, vol. 2, pp. 415–418. IEEE (1993)

    Google Scholar 

  15. Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tom Bäckström .

Appendix

Appendix

1.1 Entropy of a Zero-mean Multivariate Normal Distribution

Recall that the zero-mean multivariate normal distribution of an \(N\times 1\) variable x is defined as

$$\begin{aligned} f(x) = \frac{\exp \left( -\frac{1}{2} x^T R_x^{-1} x\right) }{\sqrt{\left( 2\pi \right) ^{N}\det (R_x)}} . \end{aligned}$$
(3.24)

The differential entropy (also known as the continuous entropy, since it applies to continuous-valued variables x) is defined as

$$\begin{aligned} H(x) = -\int _x f(x) \log _2 f(x)\, dx. \end{aligned}$$
(3.25)

Substituting Eq. 3.24 into 3.25 yields

$$\begin{aligned} \begin{aligned} H(x)&= \int _x f(x)\left[ \log _2\sqrt{\left( 2\pi \right) ^{N}\det (R_x)}+ \frac{\log _2e}{2} x^T R_x^{-1} x\right] \, dx\\&= \frac{1}{2}\left[ \log _2\left[ {\left( 2\pi \right) ^{N}\det (R_x)}\right] + \log _2e\int _x f(x) x^T R_x^{-1} x\, dx\right] , \end{aligned} \end{aligned}$$
(3.26)

since \(\int _x f(x)\, dx = 1\).

We recognise that the autocorrelation is defined as

$$\begin{aligned} R_x = {\mathscr {E}}\{xx^H\} = \int _x f(x)\, xx^H dx. \end{aligned}$$
(3.27)

Moreover, since for the trace operator we have \({{\mathrm{tr}}}(AB)={{\mathrm{tr}}}(BA)\), it follows that

$$\begin{aligned} \begin{aligned} {\mathscr {E}}\{x^HR_x^{-1}x\}&= {\mathscr {E}}\{{{\mathrm{tr}}}(x^HR_x^{-1}x)\} = {\mathscr {E}}\{{{\mathrm{tr}}}(xx^HR_x^{-1})\} = {{\mathrm{tr}}}({\mathscr {E}}\{xx^HR_x^{-1}\}) \\&={{\mathrm{tr}}}({\mathscr {E}}\{xx^H\}R_x^{-1}) = {{\mathrm{tr}}}(I) = N. \end{aligned} \end{aligned}$$
(3.28)

On the other hand, from the definition of the expectation, we obtain

$$\begin{aligned} {\mathscr {E}}\{x^HR_x^{-1}x\} = \int _x f(x)\, x^T R_x^{-1} x\, dx = N . \end{aligned}$$
(3.29)

Substituting into Eq. 3.26 yields

$$\begin{aligned} \boxed {H(x) = \frac{1}{2}\log _2\left[ \left( e2\pi \right) ^N\det (R_x)\right] .} \end{aligned}$$
(3.30)

Let us then assume that x is quantised such that for each x we have a unique quantisation cell \(Q_k\) such that \(x\in Q_k\). The probability that x is within \(Q_k\) is by definition

$$\begin{aligned} p\left( x\in Q_k\right) = \int _{x\in Q_k} f(x)\, dx. \end{aligned}$$
(3.31)

Due to the mean-value theorem, we know that there exist such an \(x_k\in Q_k\) that

$$\begin{aligned} p\left( x\in Q_k\right) = \int _{x\in Q_k} f(x)\, dx = V(Q) f(x_k), \end{aligned}$$
(3.32)

where \(V(Q_k)\) is the volume of the quantisation cell.

Assuming that \(V(Q_k)\) is equal for all k, \(V(Q_k)=V(Q)\), then the entropy of this quantisation scheme is

$$\begin{aligned} \begin{aligned} H(Q)&= -\sum _k P(x\in Q_k) \log _2 P(x\in Q_k) = -\sum _k V(Q)f(x_k) \log _2 V(Q)f(x_k)\\ {}&=-\sum _k f(x_k) \log _2 \frac{\exp \left( -\frac{1}{2} x_k^T R_x^{-1} x_k\right) V(Q)}{\sqrt{\left( 2\pi \right) ^{N}\det (R_x)}} \\ {}&= \frac{1}{2}\sum _k f(x_k) \left[ x_k^T R_x^{-1} x_k\log _2 e + \log _2\frac{\left( 2\pi \right) ^{N}\det (R_x)}{\left[ V(Q)\right] ^2}\right] . \end{aligned} \end{aligned}$$
(3.33)

When the quantisation cells are small \(V(Q)\rightarrow 0\), then due to Eq. 3.29

$$\begin{aligned} \lim _{V(Q)\rightarrow 0}\sum _k P(x_k\in Q_k) x_k^H R_x^{-1} x_k V(Q)= \int _x f(x) x^H R_x^{-1} x\, dx = N. \end{aligned}$$
(3.34)

Using the result from Eqs. 3.28 and 3.29, it follows that

$$\begin{aligned} \boxed { H(Q) \approx \frac{1}{2}\log _2\frac{\left( e 2\pi \right) ^{N}\det (R_x)}{\left[ V(Q)\right] ^2} . } \end{aligned}$$
(3.35)

The remaining component is then to determine the volume of quantisation cells  V(Q). By direct (uniform) quantisation of a sample \(\xi _k\) with accuracy \(\varDelta \xi \), we refer to an operation \(\hat{\xi }_k = \varDelta \xi {{\mathrm{round}}}(\xi _k/\varDelta \xi )\), where \({{\mathrm{round}}}()\) denotes rounding to the nearest integer. The quantisation cells are then \(Q_k=[\varDelta \xi (k-\frac{1}{2}),\,\varDelta \xi (k+\frac{1}{2})]\), whereby the length (the 1-dimensional volume) of the cell is \(V(Q)=\varDelta \xi \).

If we then apply direct quantisation to a \(N\times 1\) vector x, then clearly the quantisation cell size will be \(V(Q)=(\varDelta \xi )^N\), with the assumption that all dimensions are quantised with the same accuracy. Note that here we assumed that quantisation cells are hyper-cubes, which makes analysis simple. It can however be shown that better efficiency can be achieved by lattice quantisation, where quantisation cells are ordered in something like a honeycomb structure. Such methods are however beyond the scope of this work and for more details we refer to [7].

Now suppose that we use an \(N\times N\) orthonormal transform \(x=Ay\) and we quantise the vector y with direct uniform quantisation. Since A is orthonormal then \(A^HA=AA^H=I\) and

$$\begin{aligned} \Vert x\Vert ^2=\Vert Ay\Vert ^2=y^HA^HAy = y^H I y = y^H y = \Vert y\Vert ^2. \end{aligned}$$
(3.36)

It follows that if \(\varDelta y=y-\hat{y}\) is the quantisation error then

$$\begin{aligned} \begin{aligned} {\mathscr {E}}\{\Vert A(y+\varDelta y)\Vert ^2\}&= {\mathscr {E}}\{\Vert y+\varDelta y\Vert ^2\} \\&= {\mathscr {E}}\{\Vert y\Vert ^2\} + 2{\mathscr {E}}\{y^H\varDelta y \} + {\mathscr {E}}\{\Vert \varDelta y\Vert ^2\}\\&= {\mathscr {E}}\{\Vert y\Vert ^2\} + {\mathscr {E}}\{\Vert \varDelta y\Vert ^2\}, \end{aligned} \end{aligned}$$
(3.37)

since the expectation of the correlation between \(\varDelta y\) and y is zero, \({\mathscr {E}}\{y^H\varDelta y \}=0\). In other words, an orthonormal transform does not change the magnitude or distribution of white noise as it is an isomorphism. Moreover, with orthonormal transforms, the quantisation cell volume remains \(V(Q)=(\varDelta \xi )^N\).

The situation is however different if the transform A is not orthonormal;

$$\begin{aligned} \begin{aligned} {\mathscr {E}}\{\Vert A(y+\varDelta y)\Vert ^2\}&= {\mathscr {E}}\{\Vert Ay\Vert ^2\} + 2{\mathscr {E}}\{y^HA^{H}A\varDelta y \} + {\mathscr {E}}\{\Vert A\varDelta y\Vert ^2\}\\&= {\mathscr {E}}\{\Vert Ay\Vert ^2\} + {\mathscr {E}}\{\Vert A\varDelta y\Vert ^2\}. \end{aligned} \end{aligned}$$
(3.38)

Using Eq. 3.28 we have

$$\begin{aligned} \begin{aligned} {\mathscr {E}}\{\Vert A(y+\varDelta y)\Vert ^2\}&= {\mathscr {E}}\{\Vert Ay\Vert ^2\} + {{\mathrm{tr}}}(A^{H}A){\mathscr {E}}\{\Vert \varDelta y\Vert ^2\}. \end{aligned} \end{aligned}$$
(3.39)

It follows that \({\mathscr {E}}\left\{ \Vert \varDelta x\Vert ^2\right\} ={\mathscr {E}}\left\{ \Vert \varDelta \upsilon \Vert ^2\right\} {{\mathrm{tr}}}(A^{H}A)\) whereby

$$\begin{aligned} V(Q)=\left[ \frac{\Vert \varDelta x\Vert ^2}{N}\right] ^{N/2} =\left[ \frac{\Vert \varDelta y\Vert ^2 {{\mathrm{tr}}}(A^{H}A)}{N}\right] ^{N/2} \end{aligned}$$
(3.40)

and

$$\begin{aligned} \boxed { H(Q) \approx \frac{N}{2}\log _2\frac{ e 2\pi \left[ \det (R_x)\right] ^{1/N}}{\Vert \varDelta y\Vert ^2 \frac{1}{N}{{\mathrm{tr}}}(A^{H}A)} . } \end{aligned}$$
(3.41)

Observe that this formula assumes two things: (i) Quantisation cells are small, which means that \(\varDelta y\) must be small (the approximation becomes an equality when \(\varDelta y\rightarrow 0\), whereby, however, the entropy diverges since we have a zero in the denominator) and (ii) Quantisation cells are hyper-cubes, that is, the quantisation accuracy \(\varDelta y\) is equal for each dimension of y.

Optimal Gain

The objective function

$$\begin{aligned} \eta (\hat{x}',\,\gamma )= \Vert W(x-\gamma \hat{x}')\Vert ^2 \end{aligned}$$
(3.42)

can be readily minimised with respect to \(\gamma \) by setting the partial derivative to zero

$$\begin{aligned} 0 = \frac{\partial }{\partial \gamma }\Vert W(x-\gamma \hat{x}')\Vert ^2 = -2 (\hat{x}')^HW^HW(x-\gamma \hat{x}'). \end{aligned}$$
(3.43)

Solving for \(\gamma \) yields its optimal value

$$\begin{aligned} \gamma _\text {opt}=\frac{x^HW^HW\hat{x}'}{\Vert W\hat{x}'\Vert ^2}. \end{aligned}$$
(3.44)

Substituting \(\gamma _\text {opt}\) for \(\gamma \) in Eq. 3.42 yields

$$\begin{aligned} \begin{aligned} \eta (\hat{x}',\,\gamma _\text {opt})&=\Vert W(x-\gamma _\text {opt}\hat{x}')\Vert ^2 = (x-\gamma _\text {opt}\hat{x}')^HW^HW(x-\gamma _\text {opt}\hat{x}') \\&= \Vert Wx\Vert ^2-2\gamma _\text {opt}x^HW^HW\hat{x}'+\gamma _\text {opt}^2\Vert W\hat{x}'\Vert ^2 \\&= \Vert Wx\Vert ^2-2\frac{x^HW^HW\hat{x}'}{\Vert W\hat{x}'\Vert ^2} x^HW^HW\hat{x}'+\left[ \frac{x^HW^HW\hat{x}'}{\Vert W\hat{x}'\Vert ^2}\right] ^2\Vert W\hat{x}'\Vert ^2 \\&= \Vert Wx\Vert ^2-\frac{(x^HW^HW\hat{x}')^2}{\Vert W\hat{x}'\Vert ^2}. \end{aligned} \end{aligned}$$
(3.45)

Since \(\Vert Wx\Vert ^2\) is a constant, we further have

$$\begin{aligned} \begin{aligned} \hat{x}_\text {opt}'&= \arg \min _{\hat{x}'}\eta (\hat{x}',\,\gamma _\text {opt})= \arg \min _{\hat{x}'}\left[ \Vert Wx\Vert ^2-\frac{(x^HW^HW\hat{x}')^2}{\Vert W\hat{x}'\Vert ^2}\right] \\&=\arg \max _{\hat{x}'}\left[ \frac{(x^HW^HW\hat{x}')^2}{\Vert W\hat{x}'\Vert ^2}\right] . \end{aligned} \end{aligned}$$
(3.46)

The optimal \(\hat{x}'\) is therefore obtained as a solution to

$$\begin{aligned} \boxed { \hat{x}_\text {opt}' =\arg \max _{\hat{x}'}\left[ \frac{(x^HW^HW\hat{x}')^2}{\Vert W\hat{x}'\Vert ^2}\right] .} \end{aligned}$$
(3.47)

Observe that the objective function is thus the normalised correlation between \(W\hat{x}\) and \(W\hat{x}'\).

However, if \(\hat{x}_\text {opt}'\) is a solution of the above problem, then the negative \(-\hat{x}_\text {opt}'\) will give the same error, which means that the solution is not unique. The optimal gain would then also have an opposite sign. To obtain an objective function with a unique optimum, we can instead maximise the square root of Eq. 3.47

$$\begin{aligned} \hat{x}_\text {opt}' =\arg \max _{\hat{x}'}\left[ \frac{x^HW^HW\hat{x}'}{\Vert W\hat{x}'\Vert }\right] . \end{aligned}$$
(3.48)

The original objective function has though only two local minima, whereby we can simply change the sign of \(\hat{x}_\text {opt}\) if the gain is negative.

Finally, the last step is to find the optimal quantisation of the gain \(\gamma \), which can be determined by minimising in a least squares sense

$$\begin{aligned} \min _{\gamma }\eta (\hat{x}_\text {opt}',\,\gamma )= \min _{\gamma }\Vert W(x-\gamma \hat{x}_\text {opt}')\Vert ^2. \end{aligned}$$
(3.49)

By writing out the above norm we find that the objective function is a second-order polynomial of \(\gamma \), whereby evaluation is computationally simple.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Bäckström, T. (2017). Principles of Entropy Coding with Perceptual Quality Evaluation. In: Speech Coding. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-50204-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50204-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50202-1

  • Online ISBN: 978-3-319-50204-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics