Abstract
The objective of speech coding is to transmit speech at the highest possible quality with the lowest possible amount of resources. To achieve the best compromise, we can use available information about 1. the source, which is the speech production system, 2. the quality measure or evaluation criteria, which depends on the performance of the human hearing system and 3. the statistical frequency and distribution of the involved parameters. By developing models for all such information, we can optimise the system to perform efficiently. In practice, the three methods are overlapping in the sense that it is often difficult to make a clear-cut separation between them. While source modelling was already discussed in Chap. 2, this chapter reviews entropy coding methods and the associated perceptual modelling methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms. Wiley, New Jersey (2013)
Bosi, M., Goldberg, R.E.: Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, Dordrecht (2003)
Bäckström, T.: Vandermonde factorization of Toeplitz matrices and applications in filtering and warping. IEEE Trans. Signal Process. 61(24), 6257–6263 (2013)
Bäckström, T., Helmrich, C.R.: Decorrelated innovative codebooks for ACELP using factorization of autocorrelation matrix. In: Proceedings of the Interspeech, pp. 2794–2798 (2014)
Edler, B.: Coding of audio signals with overlapping block transform and adaptive window functions. Frequenz 43(9), 252–256 (1989)
Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Springer, New York (1992)
Gibson, J.D., Sayood, K.: Lattice quantization. Adv. Electron. Electron Phys. 72, 259–330 (1988)
Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. John Hopkins University Press, Maryland (1996)
Gray, R.M., Neuhoff, D.L.: Quantization. IEEE Trans. Inf. Theory 44(6), 2325–2383 (1998)
Jayant, N.S., Noll, P.: Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, New Jersey (1984)
Mitra, S.K.: Digital Signal Processing: A Computer-Based Approach. McGraw-Hill, Boston (1998)
Pisoni, D., Remez, R.: The Handbook of Speech Perception. Wiley, New Jersey (2008)
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)
Sanchez, V.E., Adoul, J.-P.: Low-delay wideband speech coding using a new frequency domain approach. In: Proceedings of the ICASSP, vol. 2, pp. 415–418. IEEE (1993)
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Entropy of a Zero-mean Multivariate Normal Distribution
Recall that the zero-mean multivariate normal distribution of an \(N\times 1\) variable x is defined as
The differential entropy (also known as the continuous entropy, since it applies to continuous-valued variables x) is defined as
Substituting Eq. 3.24 into 3.25 yields
since \(\int _x f(x)\, dx = 1\).
We recognise that the autocorrelation is defined as
Moreover, since for the trace operator we have \({{\mathrm{tr}}}(AB)={{\mathrm{tr}}}(BA)\), it follows that
On the other hand, from the definition of the expectation, we obtain
Substituting into Eq. 3.26 yields
Let us then assume that x is quantised such that for each x we have a unique quantisation cell \(Q_k\) such that \(x\in Q_k\). The probability that x is within \(Q_k\) is by definition
Due to the mean-value theorem, we know that there exist such an \(x_k\in Q_k\) that
where \(V(Q_k)\) is the volume of the quantisation cell.
Assuming that \(V(Q_k)\) is equal for all k, \(V(Q_k)=V(Q)\), then the entropy of this quantisation scheme is
When the quantisation cells are small \(V(Q)\rightarrow 0\), then due to Eq. 3.29
Using the result from Eqs. 3.28 and 3.29, it follows that
The remaining component is then to determine the volume of quantisation cells V(Q). By direct (uniform) quantisation of a sample \(\xi _k\) with accuracy \(\varDelta \xi \), we refer to an operation \(\hat{\xi }_k = \varDelta \xi {{\mathrm{round}}}(\xi _k/\varDelta \xi )\), where \({{\mathrm{round}}}()\) denotes rounding to the nearest integer. The quantisation cells are then \(Q_k=[\varDelta \xi (k-\frac{1}{2}),\,\varDelta \xi (k+\frac{1}{2})]\), whereby the length (the 1-dimensional volume) of the cell is \(V(Q)=\varDelta \xi \).
If we then apply direct quantisation to a \(N\times 1\) vector x, then clearly the quantisation cell size will be \(V(Q)=(\varDelta \xi )^N\), with the assumption that all dimensions are quantised with the same accuracy. Note that here we assumed that quantisation cells are hyper-cubes, which makes analysis simple. It can however be shown that better efficiency can be achieved by lattice quantisation, where quantisation cells are ordered in something like a honeycomb structure. Such methods are however beyond the scope of this work and for more details we refer to [7].
Now suppose that we use an \(N\times N\) orthonormal transform \(x=Ay\) and we quantise the vector y with direct uniform quantisation. Since A is orthonormal then \(A^HA=AA^H=I\) and
It follows that if \(\varDelta y=y-\hat{y}\) is the quantisation error then
since the expectation of the correlation between \(\varDelta y\) and y is zero, \({\mathscr {E}}\{y^H\varDelta y \}=0\). In other words, an orthonormal transform does not change the magnitude or distribution of white noise as it is an isomorphism. Moreover, with orthonormal transforms, the quantisation cell volume remains \(V(Q)=(\varDelta \xi )^N\).
The situation is however different if the transform A is not orthonormal;
Using Eq. 3.28 we have
It follows that \({\mathscr {E}}\left\{ \Vert \varDelta x\Vert ^2\right\} ={\mathscr {E}}\left\{ \Vert \varDelta \upsilon \Vert ^2\right\} {{\mathrm{tr}}}(A^{H}A)\) whereby
and
Observe that this formula assumes two things: (i) Quantisation cells are small, which means that \(\varDelta y\) must be small (the approximation becomes an equality when \(\varDelta y\rightarrow 0\), whereby, however, the entropy diverges since we have a zero in the denominator) and (ii) Quantisation cells are hyper-cubes, that is, the quantisation accuracy \(\varDelta y\) is equal for each dimension of y.
Optimal Gain
The objective function
can be readily minimised with respect to \(\gamma \) by setting the partial derivative to zero
Solving for \(\gamma \) yields its optimal value
Substituting \(\gamma _\text {opt}\) for \(\gamma \) in Eq. 3.42 yields
Since \(\Vert Wx\Vert ^2\) is a constant, we further have
The optimal \(\hat{x}'\) is therefore obtained as a solution to
Observe that the objective function is thus the normalised correlation between \(W\hat{x}\) and \(W\hat{x}'\).
However, if \(\hat{x}_\text {opt}'\) is a solution of the above problem, then the negative \(-\hat{x}_\text {opt}'\) will give the same error, which means that the solution is not unique. The optimal gain would then also have an opposite sign. To obtain an objective function with a unique optimum, we can instead maximise the square root of Eq. 3.47
The original objective function has though only two local minima, whereby we can simply change the sign of \(\hat{x}_\text {opt}\) if the gain is negative.
Finally, the last step is to find the optimal quantisation of the gain \(\gamma \), which can be determined by minimising in a least squares sense
By writing out the above norm we find that the objective function is a second-order polynomial of \(\gamma \), whereby evaluation is computationally simple.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Bäckström, T. (2017). Principles of Entropy Coding with Perceptual Quality Evaluation. In: Speech Coding. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-50204-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-50204-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50202-1
Online ISBN: 978-3-319-50204-5
eBook Packages: EngineeringEngineering (R0)