Principles of Entropy Coding with Perceptual Quality Evaluation

Bäckström, Tom

doi:10.1007/978-3-319-50204-5_3

Tom Bäckström²

Part of the book series: Signals and Communication Technology ((SCT))

1047 Accesses

Abstract

The objective of speech coding is to transmit speech at the highest possible quality with the lowest possible amount of resources. To achieve the best compromise, we can use available information about 1. the source, which is the speech production system, 2. the quality measure or evaluation criteria, which depends on the performance of the human hearing system and 3. the statistical frequency and distribution of the involved parameters. By developing models for all such information, we can optimise the system to perform efficiently. In practice, the three methods are overlapping in the sense that it is often difficult to make a clear-cut separation between them. While source modelling was already discussed in Chap. 2, this chapter reviews entropy coding methods and the associated perceptual modelling methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms. Wiley, New Jersey (2013)
MATH Google Scholar
Bosi, M., Goldberg, R.E.: Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, Dordrecht (2003)
Book Google Scholar
Bäckström, T.: Vandermonde factorization of Toeplitz matrices and applications in filtering and warping. IEEE Trans. Signal Process. 61(24), 6257–6263 (2013)
Article MathSciNet Google Scholar
Bäckström, T., Helmrich, C.R.: Decorrelated innovative codebooks for ACELP using factorization of autocorrelation matrix. In: Proceedings of the Interspeech, pp. 2794–2798 (2014)
Google Scholar
Edler, B.: Coding of audio signals with overlapping block transform and adaptive window functions. Frequenz 43(9), 252–256 (1989)
Article Google Scholar
Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Springer, New York (1992)
Book MATH Google Scholar
Gibson, J.D., Sayood, K.: Lattice quantization. Adv. Electron. Electron Phys. 72, 259–330 (1988)
Article Google Scholar
Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. John Hopkins University Press, Maryland (1996)
MATH Google Scholar
Gray, R.M., Neuhoff, D.L.: Quantization. IEEE Trans. Inf. Theory 44(6), 2325–2383 (1998)
Article MATH Google Scholar
Jayant, N.S., Noll, P.: Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, New Jersey (1984)
Google Scholar
Mitra, S.K.: Digital Signal Processing: A Computer-Based Approach. McGraw-Hill, Boston (1998)
Google Scholar
Pisoni, D., Remez, R.: The Handbook of Speech Perception. Wiley, New Jersey (2008)
Google Scholar
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)
Google Scholar
Sanchez, V.E., Adoul, J.-P.: Low-delay wideband speech coding using a new frequency domain approach. In: Proceedings of the ICASSP, vol. 2, pp. 415–418. IEEE (1993)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

International Audio Laboratories Erlangen (AudioLabs), Friedrich-Alexander University Erlangen-Nürnberg (FAU), Erlangen, Germany
Tom Bäckström

Authors

Tom Bäckström
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Bäckström .

Appendix

1.1 Entropy of a Zero-mean Multivariate Normal Distribution

Recall that the zero-mean multivariate normal distribution of an $N\times 1$ variable x is defined as

$$\begin{aligned} f(x) = \frac{\exp \left( -\frac{1}{2} x^T R_x^{-1} x\right) }{\sqrt{\left( 2\pi \right) ^{N}\det (R_x)}} . \end{aligned}$$

(3.24)

The differential entropy (also known as the continuous entropy, since it applies to continuous-valued variables x) is defined as

$$\begin{aligned} H(x) = -\int _x f(x) \log _2 f(x)\, dx. \end{aligned}$$

(3.25)

Substituting Eq. 3.24 into 3.25 yields

$$\begin{aligned} \begin{aligned} H(x)&= \int _x f(x)\left[ \log _2\sqrt{\left( 2\pi \right) ^{N}\det (R_x)}+ \frac{\log _2e}{2} x^T R_x^{-1} x\right] \, dx\\&= \frac{1}{2}\left[ \log _2\left[ {\left( 2\pi \right) ^{N}\det (R_x)}\right] + \log _2e\int _x f(x) x^T R_x^{-1} x\, dx\right] , \end{aligned} \end{aligned}$$

(3.26)

since $\int _x f(x)\, dx = 1$.

We recognise that the autocorrelation is defined as

$$\begin{aligned} R_x = {\mathscr {E}}\{xx^H\} = \int _x f(x)\, xx^H dx. \end{aligned}$$

(3.27)

Moreover, since for the trace operator we have ${{\mathrm{tr}}}(AB)={{\mathrm{tr}}}(BA)$, it follows that

$$\begin{aligned} \begin{aligned} {\mathscr {E}}\{x^HR_x^{-1}x\}&= {\mathscr {E}}\{{{\mathrm{tr}}}(x^HR_x^{-1}x)\} = {\mathscr {E}}\{{{\mathrm{tr}}}(xx^HR_x^{-1})\} = {{\mathrm{tr}}}({\mathscr {E}}\{xx^HR_x^{-1}\}) \\&={{\mathrm{tr}}}({\mathscr {E}}\{xx^H\}R_x^{-1}) = {{\mathrm{tr}}}(I) = N. \end{aligned} \end{aligned}$$

(3.28)

On the other hand, from the definition of the expectation, we obtain

$$\begin{aligned} {\mathscr {E}}\{x^HR_x^{-1}x\} = \int _x f(x)\, x^T R_x^{-1} x\, dx = N . \end{aligned}$$

(3.29)

Substituting into Eq. 3.26 yields

$$\begin{aligned} \boxed {H(x) = \frac{1}{2}\log _2\left[ \left( e2\pi \right) ^N\det (R_x)\right] .} \end{aligned}$$

(3.30)

Let us then assume that x is quantised such that for each x we have a unique quantisation cell $Q_k$ such that $x\in Q_k$. The probability that x is within $Q_k$ is by definition

$$\begin{aligned} p\left( x\in Q_k\right) = \int _{x\in Q_k} f(x)\, dx. \end{aligned}$$

(3.31)

Due to the mean-value theorem, we know that there exist such an $x_k\in Q_k$ that

$$\begin{aligned} p\left( x\in Q_k\right) = \int _{x\in Q_k} f(x)\, dx = V(Q) f(x_k), \end{aligned}$$

(3.32)

where $V(Q_k)$ is the volume of the quantisation cell.

Assuming that $V(Q_k)$ is equal for all k, $V(Q_k)=V(Q)$, then the entropy of this quantisation scheme is

$$\begin{aligned} \begin{aligned} H(Q)&= -\sum _k P(x\in Q_k) \log _2 P(x\in Q_k) = -\sum _k V(Q)f(x_k) \log _2 V(Q)f(x_k)\\ {}&=-\sum _k f(x_k) \log _2 \frac{\exp \left( -\frac{1}{2} x_k^T R_x^{-1} x_k\right) V(Q)}{\sqrt{\left( 2\pi \right) ^{N}\det (R_x)}} \\ {}&= \frac{1}{2}\sum _k f(x_k) \left[ x_k^T R_x^{-1} x_k\log _2 e + \log _2\frac{\left( 2\pi \right) ^{N}\det (R_x)}{\left[ V(Q)\right] ^2}\right] . \end{aligned} \end{aligned}$$

(3.33)

When the quantisation cells are small $V(Q)\rightarrow 0$, then due to Eq. 3.29

$$\begin{aligned} \lim _{V(Q)\rightarrow 0}\sum _k P(x_k\in Q_k) x_k^H R_x^{-1} x_k V(Q)= \int _x f(x) x^H R_x^{-1} x\, dx = N. \end{aligned}$$

(3.34)

Using the result from Eqs. 3.28 and 3.29, it follows that

$$\begin{aligned} \boxed { H(Q) \approx \frac{1}{2}\log _2\frac{\left( e 2\pi \right) ^{N}\det (R_x)}{\left[ V(Q)\right] ^2} . } \end{aligned}$$

(3.35)

The remaining component is then to determine the volume of quantisation cells V(Q). By direct (uniform) quantisation of a sample $\xi _k$ with accuracy $\varDelta \xi $, we refer to an operation $\hat{\xi }_k = \varDelta \xi {{\mathrm{round}}}(\xi _k/\varDelta \xi )$, where ${{\mathrm{round}}}()$ denotes rounding to the nearest integer. The quantisation cells are then $Q_k=[\varDelta \xi (k-\frac{1}{2}),\,\varDelta \xi (k+\frac{1}{2})]$, whereby the length (the 1-dimensional volume) of the cell is $V(Q)=\varDelta \xi $.

If we then apply direct quantisation to a $N\times 1$ vector x, then clearly the quantisation cell size will be $V(Q)=(\varDelta \xi )^N$, with the assumption that all dimensions are quantised with the same accuracy. Note that here we assumed that quantisation cells are hyper-cubes, which makes analysis simple. It can however be shown that better efficiency can be achieved by lattice quantisation, where quantisation cells are ordered in something like a honeycomb structure. Such methods are however beyond the scope of this work and for more details we refer to [7].

Now suppose that we use an $N\times N$ orthonormal transform $x=Ay$ and we quantise the vector y with direct uniform quantisation. Since A is orthonormal then $A^HA=AA^H=I$ and

$$\begin{aligned} \Vert x\Vert ^2=\Vert Ay\Vert ^2=y^HA^HAy = y^H I y = y^H y = \Vert y\Vert ^2. \end{aligned}$$

(3.36)

It follows that if $\varDelta y=y-\hat{y}$ is the quantisation error then

$$\begin{aligned} \begin{aligned} {\mathscr {E}}\{\Vert A(y+\varDelta y)\Vert ^2\}&= {\mathscr {E}}\{\Vert y+\varDelta y\Vert ^2\} \\&= {\mathscr {E}}\{\Vert y\Vert ^2\} + 2{\mathscr {E}}\{y^H\varDelta y \} + {\mathscr {E}}\{\Vert \varDelta y\Vert ^2\}\\&= {\mathscr {E}}\{\Vert y\Vert ^2\} + {\mathscr {E}}\{\Vert \varDelta y\Vert ^2\}, \end{aligned} \end{aligned}$$

(3.37)

since the expectation of the correlation between $\varDelta y$ and y is zero, ${\mathscr {E}}\{y^H\varDelta y \}=0$. In other words, an orthonormal transform does not change the magnitude or distribution of white noise as it is an isomorphism. Moreover, with orthonormal transforms, the quantisation cell volume remains $V(Q)=(\varDelta \xi )^N$.

The situation is however different if the transform A is not orthonormal;

$$\begin{aligned} \begin{aligned} {\mathscr {E}}\{\Vert A(y+\varDelta y)\Vert ^2\}&= {\mathscr {E}}\{\Vert Ay\Vert ^2\} + 2{\mathscr {E}}\{y^HA^{H}A\varDelta y \} + {\mathscr {E}}\{\Vert A\varDelta y\Vert ^2\}\\&= {\mathscr {E}}\{\Vert Ay\Vert ^2\} + {\mathscr {E}}\{\Vert A\varDelta y\Vert ^2\}. \end{aligned} \end{aligned}$$

(3.38)

Using Eq. 3.28 we have

$$\begin{aligned} \begin{aligned} {\mathscr {E}}\{\Vert A(y+\varDelta y)\Vert ^2\}&= {\mathscr {E}}\{\Vert Ay\Vert ^2\} + {{\mathrm{tr}}}(A^{H}A){\mathscr {E}}\{\Vert \varDelta y\Vert ^2\}. \end{aligned} \end{aligned}$$

(3.39)

It follows that ${\mathscr {E}}\left\{ \Vert \varDelta x\Vert ^2\right\} ={\mathscr {E}}\left\{ \Vert \varDelta \upsilon \Vert ^2\right\} {{\mathrm{tr}}}(A^{H}A)$ whereby

$$\begin{aligned} V(Q)=\left[ \frac{\Vert \varDelta x\Vert ^2}{N}\right] ^{N/2} =\left[ \frac{\Vert \varDelta y\Vert ^2 {{\mathrm{tr}}}(A^{H}A)}{N}\right] ^{N/2} \end{aligned}$$

(3.40)

and

$$\begin{aligned} \boxed { H(Q) \approx \frac{N}{2}\log _2\frac{ e 2\pi \left[ \det (R_x)\right] ^{1/N}}{\Vert \varDelta y\Vert ^2 \frac{1}{N}{{\mathrm{tr}}}(A^{H}A)} . } \end{aligned}$$

(3.41)

Observe that this formula assumes two things: (i) Quantisation cells are small, which means that $\varDelta y$ must be small (the approximation becomes an equality when $\varDelta y\rightarrow 0$, whereby, however, the entropy diverges since we have a zero in the denominator) and (ii) Quantisation cells are hyper-cubes, that is, the quantisation accuracy $\varDelta y$ is equal for each dimension of y.

Optimal Gain

The objective function

$$\begin{aligned} \eta (\hat{x}',\,\gamma )= \Vert W(x-\gamma \hat{x}')\Vert ^2 \end{aligned}$$

(3.42)

can be readily minimised with respect to $\gamma $ by setting the partial derivative to zero

$$\begin{aligned} 0 = \frac{\partial }{\partial \gamma }\Vert W(x-\gamma \hat{x}')\Vert ^2 = -2 (\hat{x}')^HW^HW(x-\gamma \hat{x}'). \end{aligned}$$

(3.43)

Solving for $\gamma $ yields its optimal value

$$\begin{aligned} \gamma _\text {opt}=\frac{x^HW^HW\hat{x}'}{\Vert W\hat{x}'\Vert ^2}. \end{aligned}$$

(3.44)

Substituting $\gamma _\text {opt}$ for $\gamma $ in Eq. 3.42 yields

$$\begin{aligned} \begin{aligned} \eta (\hat{x}',\,\gamma _\text {opt})&=\Vert W(x-\gamma _\text {opt}\hat{x}')\Vert ^2 = (x-\gamma _\text {opt}\hat{x}')^HW^HW(x-\gamma _\text {opt}\hat{x}') \\&= \Vert Wx\Vert ^2-2\gamma _\text {opt}x^HW^HW\hat{x}'+\gamma _\text {opt}^2\Vert W\hat{x}'\Vert ^2 \\&= \Vert Wx\Vert ^2-2\frac{x^HW^HW\hat{x}'}{\Vert W\hat{x}'\Vert ^2} x^HW^HW\hat{x}'+\left[ \frac{x^HW^HW\hat{x}'}{\Vert W\hat{x}'\Vert ^2}\right] ^2\Vert W\hat{x}'\Vert ^2 \\&= \Vert Wx\Vert ^2-\frac{(x^HW^HW\hat{x}')^2}{\Vert W\hat{x}'\Vert ^2}. \end{aligned} \end{aligned}$$

(3.45)

Since $\Vert Wx\Vert ^2$ is a constant, we further have

$$\begin{aligned} \begin{aligned} \hat{x}_\text {opt}'&= \arg \min _{\hat{x}'}\eta (\hat{x}',\,\gamma _\text {opt})= \arg \min _{\hat{x}'}\left[ \Vert Wx\Vert ^2-\frac{(x^HW^HW\hat{x}')^2}{\Vert W\hat{x}'\Vert ^2}\right] \\&=\arg \max _{\hat{x}'}\left[ \frac{(x^HW^HW\hat{x}')^2}{\Vert W\hat{x}'\Vert ^2}\right] . \end{aligned} \end{aligned}$$

(3.46)

The optimal $\hat{x}'$ is therefore obtained as a solution to

$$\begin{aligned} \boxed { \hat{x}_\text {opt}' =\arg \max _{\hat{x}'}\left[ \frac{(x^HW^HW\hat{x}')^2}{\Vert W\hat{x}'\Vert ^2}\right] .} \end{aligned}$$

(3.47)

Observe that the objective function is thus the normalised correlation between $W\hat{x}$ and $W\hat{x}'$.

However, if $\hat{x}_\text {opt}'$ is a solution of the above problem, then the negative $-\hat{x}_\text {opt}'$ will give the same error, which means that the solution is not unique. The optimal gain would then also have an opposite sign. To obtain an objective function with a unique optimum, we can instead maximise the square root of Eq. 3.47

$$\begin{aligned} \hat{x}_\text {opt}' =\arg \max _{\hat{x}'}\left[ \frac{x^HW^HW\hat{x}'}{\Vert W\hat{x}'\Vert }\right] . \end{aligned}$$

(3.48)

The original objective function has though only two local minima, whereby we can simply change the sign of $\hat{x}_\text {opt}$ if the gain is negative.

Finally, the last step is to find the optimal quantisation of the gain $\gamma $, which can be determined by minimising in a least squares sense

$$\begin{aligned} \min _{\gamma }\eta (\hat{x}_\text {opt}',\,\gamma )= \min _{\gamma }\Vert W(x-\gamma \hat{x}_\text {opt}')\Vert ^2. \end{aligned}$$

(3.49)

By writing out the above norm we find that the objective function is a second-order polynomial of $\gamma $, whereby evaluation is computationally simple.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bäckström, T. (2017). Principles of Entropy Coding with Perceptual Quality Evaluation. In: Speech Coding. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-50204-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-50204-5_3
Published: 30 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50202-1
Online ISBN: 978-3-319-50204-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Principles of Entropy Coding with Perceptual Quality Evaluation

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Entropy of a Zero-mean Multivariate Normal Distribution

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation