Ambisonics pp 131-152 | Cite as

# Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)

## Abstract

Unlike pressure-gradient transducers, single-transducer microphones with higher-order directivity apparently turned out to be difficult to manufacture at reasonable audio quality. Therefore nowadays, higher-order Ambisonic recording with compact devices is based on compact spherical arrays of pressure transducers. To prepare for higher-order Ambisonic recording based on arrays, we first need a model of the sound pressure that the individual transducers of such an array would receive in an arbitrary surrounding sound field. The lossless, linear wave equation is the most suitable model to describe how sound propagates when the sound field is composed of surrounding sound sources. Fundamentally, the wave equation models sound propagation by how small packages of air react (i) when being expanded or compressed by a change of the internal pressure, and to (ii) directional differences in the outside pressure by starting to move. Based there upon, the inhomogeneous solutions of the wave equation describe how an entire free sound field builds up if being excited by an omnidirectional sound source, as a simplified model of an arbitrary physical source, such as a loudspeaker, human talker, or musical instrument. After adressing these basics, the chapter shows a way to get Ambisonic signals of high spatial and timbral quality from the array signals, considering the necessary diffuse-field equalization, side-lobe suppression, and trade off between spatial resolution and low-frequeny noise boost. The chapter concludes with application examples.

Gary Elko and Jens Meyer are the well-known inventors of the first commercially available compact spherical microphone array that is able to record higher-order Ambisonics [2], the Eigenmike. There are several inspiring scientific works with valuable contributions that can be recommended for further reading [3, 4, 5, 6, 7, 8, 9, 10, 11, 12], above all Boaz Rafaely’s excellent introductory book [13].

This mathematical theory might appear extensive, but it cannot be avoided when aiming at an in-depth understanding of higher-order Ambisonic microphones. The theory enables processing of the microphone signals received such that the surrounding sound field excitation is retrieved in terms of an Ambisonic signal. Some readers may want to skip the physical introduction and resume in Sect. 6.5 on spherical scattering or Sect. 6.6 on the processing of the array signals.

## 6.1 Equation of Compression

*reversible*short-term temperature fluctuations becoming effective when air is being compressed by sound, causing the specific stiffness of air in sound propagation. The Appendix A.6.1 shows how to derive this adiabatic compression relation based on the first law of thermodynamics and the ideal gas law. It relates the relative volume change \(\frac{V}{V_0}\) to the pressure change \(p=-K\,\frac{V}{V_0}\) by the bulk modulus of air. After expressing the bulk modulus by more common constants

^{1}\(K=\rho \,c^2\) and differentially formulating the volume change over time using the change of the sound particle velocity in space, e.g. in one dimension \(\dot{p} = -\rho \,c^2\;\frac{\partial v_x}{\partial x}\), cf. Appendix A.6.1, we get the three-dimensional compression equation:

*Independently of whether the outer boundaries of a small package of air are traveling at a common velocity: If there are directions into which their velocity is spatially increasing, the resulting gradual volume expansion over time causes a proportional decrease of interior pressure over time.*

## 6.2 Equation of Motion

*x*direction, \(F_\mathrm {x}=m\,\frac{\partial v_\mathrm {x}}{\partial t}\) equates the external force to mass

*m*times acceleration, i.e. increase in velocity \(\frac{\partial v}{\partial t}\). For a small package of air with constant volume \(V_0=\Delta x\Delta y\Delta z\), the mass is obtained by the air density \(m=\rho \,V_0\), and the force equals the decrease of in pressure over the three space directions, times the corresponding partial surface, e.g. for the

*x*direction \(F_\mathrm {x}=-[p(x+\Delta x)-p(x)]\Delta y\Delta z\). For the

*x*direction, this yields after expanding by \(\frac{\Delta x}{\Delta x}\)

*Independently of the common exterior pressure load on all the outer boundaries of a small air package, an outer pressure decrease into any direction implies a corresponding pushing force on the package causing a proportional acceleration into this direction.*

## 6.3 Wave Equation

*p*is a pure sinusoidal oscillation \(\sin (\omega \,t+\phi _0)\), the second derivative in time corresponds to a factor \(-\omega ^2\), and by substitution with the wave-number \(k=\frac{\omega }{c}\), we can write the frequency-domain wave equation as

### 6.3.1 Elementary Inhomogeneous Solution: Green’s Function (Free Field)

*q*of the equation can be represented by its convolution with the Dirac delta distribution \(\int q(\varvec{s})\,\delta (\varvec{r}-\varvec{s})\, \mathrm {d}V(\varvec{s})=q(\varvec{r})\). Consequently, as the wave equation is linear, the general solution must therefore also equal the convolution of the Green’s function with the excitation function \(p(\varvec{r})=\int q(\varvec{s})\,G(\varvec{r}-\varvec{s})\,\mathrm {d}V(\varvec{s})\) over space; if formulated in the time domain: also over time. The integral superimposes acoustical responses of any point in time and space of the source phenomenon, weighted by the corresponding source strength in space and time.

Acoustic source phenomena are characterized by the behavior of the Green’s function: far away, the amplitude decays with \(\frac{1}{r}\) and the phase \(-kr=-\omega \frac{r}{c}\) corresponds to the radially increasing delay \(\frac{r}{c}\). Both is expressed in Sommerfeld’s radiation condition \(\lim _{r\rightarrow \infty }r\bigl (\frac{\partial }{\partial r}p+\mathrm {i}k\,p\bigr )=0\).

*. The radius coordinate of the Green’s function is the distance between two Cartesian position vectors \(\varvec{r}_\mathrm {s}\) and \(\varvec{r}\), the source and receiver location. Letting one of them become large is denoted by re-expressing it in terms of radius and direction vector \(\varvec{r}_\mathrm {s}=r_\mathrm {s}\varvec{\theta }_\mathrm {s}\). This permits far-field approximation*

**Plane waves***phase approximation*, for instance at a wave-length of 30 cm, we notice even for a relatively small distance difference, e.g. between 15 m and 15 m \(+\) 15 cm, we could change the sign of the wave. To approximate the phase of the Green’s function, we must therefore at least use \(r_\mathrm {s}-\varvec{\theta }_\mathrm {s}^\mathrm {T}\varvec{r}\) as approximation. By contrast, this level of precision is irrelevant for the

*magnitude approximation*, e.g., it would be negligible if we used \(\frac{1}{15\,\mathrm {m}}\) instead of the magnitude \(\frac{1}{15\,\mathrm {m}+15\,\mathrm {cm}}\).

*plane wave*from the source direction \(\varvec{\theta }_\mathrm {s}\)

Plane waves are an invaluable tool to locally approximate sound fields from sources that are sufficiently far away, within a small region.^{2}

## 6.4 Basis Solutions in Spherical Coordinates

*r*, azimuth \(\varphi \), and zenith \(\vartheta \). For simplification, zenith is replaced by \(\zeta =\cos \vartheta =\frac{z}{r}\), here. We may solve the Helmholtz equation \((\bigtriangleup +k^2)p=0\) in spherical coordinates by the radial and directional parts of the Laplacian \(\bigtriangleup =\bigtriangleup _\mathrm {r}+\bigtriangleup _{\upvarphi ,\upzeta }\), as identified in Appendix A.3

*R*of this, so-called,

*spherical Bessel differential equation*: spherical Hankel functions of the second kind \(h_n^{(2)}(kr)\) able to represent radiation (radially outgoing into every direction), consistently with Green’s function

*G*, diverging with an \((n+1)\)-fold pole at \(kr=0\), a physical behavior that would also be observed after spatially differentiating

*G*, see Fig. 6.1; spherical Bessel functions \(j_n(kr)=\mathfrak {R}\{h_n^{(2)}(kr)\}\) are real-valued, converge everywhere, exhibit an

*n*-fold zero at \(kr=0\), and can’t represent radiation. Implementations typically rely on the accurate standard libraries implementing cylindrical Bessel and Hankel functions:

*. Any sound field evaluated at a radius*

**Wave spectra and spherical basis solutions***r*where the air is source-free and homogeneous in any direction can be represented by spherical basis functions for enclosed \(j_n(kr)Y_n^m(\varvec{\theta })\) and radiating fields \(h_n(kr)Y_n^m(\varvec{\theta })\)

*incoming waves*that pass through and emanate from radii larger than

*r*and \(c_{nm}\) are the coefficients of

*outgoing waves*radiating from sources at radii smaller than

*r*; the coefficients are called

*wave spectra*of the incoming and outgoing waves, cf. [16].

*. Plane waves only use the coefficients \(b_{nm}\), while \(c_{nm}=0\) in Eq. (6.13). The sum of incoming plane waves from all directions, whose amplitudes are given by the spherical harmonics coefficients \(\chi _{nm}\) as a set of Ambisonic signals are described by the*

**Ambisonic plane-wave spectrum, plane wave***incoming*wave spectrum, see Appendix A.6.5, Eq. (A.119)

## 6.5 Scattering by Rigid Higher-Order Microphone Surface

*. The scattering equation relates the recorded sound pressure expanded in spherical harmonics to the Ambisonic signal of surround sound scene, see frequency responses in Fig. 6.5,*

**Relation of recorded sound pressure to Ambisonic signal**It is formally convenient that as soon as the sound pressure is given in terms of its spherical harmonic coefficient signals \(\psi _{nm}\), the Ambisonic signals \(\chi _{nm}\) of a concentric playback system are obviously just an inversely filtered version thereof, with no need for further unmixing/matrixing.

## 6.6 Higher-Order Microphone Array Encoding

The block diagram of Ambisonic encoding of higher-order microphone array signals is shown in Fig. 6.7. The first processing step is about decomposing the pressure samples \(\varvec{p}(t) \) from the microphone array into its spherical harmonics coefficients \(\varvec{\psi }_\mathrm {N}(t)\): To which amount do the samples contain omnidirectional, figure-of-eight, and other spherical harmonic patterns, up to which the microphone arrangement allows decomposition. The frequency-independent matrix \((\varvec{Y}_\mathrm {N}^\mathrm {T})^\dagger \) does the conversion. It is the left-inverse to the spherical harmonics sampled at the microphone positions, as shown in the upcoming section.

*E*normalized bands, in order to provide (i) limitation of noise and errors, (ii) a frequency response perceived as flat, and (iii) optimal suppression of the sidelobes.

## 6.7 Discrete Sound Pressure Samples in Spherical Harmonics

*. The equation can be (pseudo-)inverted if the matrix \(\varvec{Y}_\mathrm {N}\) is well conditioned. Typically more microphones are used than coefficients searched \(\mathrm {M}\ge (\mathrm {N}+1)^2\). Inversion is a matter of mean-square error minimization: As the \(\mathrm {M}\) dimensions may contain more degrees of freedom than \((\mathrm {N}+1)^2\), the coefficient vector \(\varvec{\psi }_\mathrm {N}\) giving the closest model \(\varvec{p}_\mathrm {N}\) to the measurement \(\varvec{p}\) is searched,*

**Left inverse (MMSE)**If the microphones are arranged in a *t*-design and the order \(\mathrm {N}\) is chosen suitably, then the transpose matrix times \(\frac{4\pi }{\mathrm {L}}\) is equivalent to the left inverse. A more thorough discussion on spherical point sets can be found in [17, 18, 19].

The *maximum determinant points* [20] are a particular kind of *critical* directional sampling scheme that allows to use exactly as few microphones \(\mathrm {M}=(\mathrm {N}+1)^2\) as spherical harmonic coefficients obtained, yielding a well-conditioned square matrix \(\varvec{Y}_\mathrm {N}\), so that it can be inverted directly without left/pseudo-inversion. The 25 maximum-determinant points for \(\mathrm {N}=4\) are used in the simulation example below.^{3}

*. An important implication of estimating \(\psi _{nm}\) is that we need to assume that the distribution of the sound pressure is of limited spherical harmonic order on the measurement surface. This could be done by restricting the frequency range, as high-order harmonics are attenuated well-enough according above suitable frequency limits, cf. Fig. 6.5. However, low-pass filtered signals are unacceptable in practice. Instead, one has to accept*

**Finite-order assumption and spatial aliasing***spatial aliasing*at high frequencies, i.e. directional mapping errors and direction-specific comb filters. Figure 6.8 shows spatial aliasing of \(\varvec{\psi }_\mathrm {N}=(\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\,\varvec{p}\) in the angular domain \(p=\sum \psi _{nm}Y_n^m\).

## 6.8 Regularizing Filter Bank for Radial Filters

*n*-fold (unstable) pole at 0 Hz. Considering that microphone self noise and array imperfection cause erroneous signals louder than the acoustically expected \(n\mathrm {th}\)-order vanishing signals around 0 Hz, filter shapes will moreover cause an excessive boost of erroneous signals unless implemented with precaution. Filters of the different orders

*n*must be stabilized by high-pass slopes of at least the order

*n*, see also [6, 9, 21, 22, 23, 24, 25], and with \((n+1)\mathrm {th}\)-order high-pass slopes, see Fig. 6.9, such errors are being cut off by first-order high-pass slopes at exemplary cut-on frequencies at 90, 680, 1650, 2600 Hz for the Ambisonic orders 1, 2, 3, 4, yielding a noise boost of 20 dB for a \(4\mathrm {th}\)-order microphone with \(\mathrm {a}=4.2\) cm, at most. However, just cutting on the frequencies of each order is not enough: every cut-on frequency causes a noticeable loudness drop below due to the discarded signal contributions. It is better to design a filter bank with crossovers instead, which allows compensation for the loudness loss in every band. A zero-phase, \(n\mathrm {th}\)-order Butterworth high-pass response is defined by \(H_\mathrm {hi}=\frac{\omega ^n}{1+\omega ^n}\) and amplitude-complementary to the low pass \(H_\mathrm {lo}=\frac{1}{1+\omega ^n}\), so that \(H_\mathrm {hi}+H_\mathrm {lo}=1\).

This filter bank design moreover allows to adjust loudness and sidelobe suppression in every frequency band, separately.

## 6.9 Loudness-Normalized Sub-band Side-Lobe Suppression

The filter bank design shown above would only yield Ambisonic signals whose order increases with the frequency band. Ideally, this variation of the order comes with the necessity of individual max-\(\varvec{r}_\mathrm {E}\) sidelobe suppression in every band. Moreover, Ambisonic signals of different orders are differently loud, so also diffuse-field equalization of the *E* measure is desirable in every band.

*b*in which the Ambisonic orders retrieved are \(0\le n\le b\)

## 6.10 Influence of Gain Matching, Noise, Side-Lobe Suppression

If regularization filters were set to 50, 160, 500, 1600 and sidelobe suppression turned off for testing, one would get the poor image as in Fig. 6.14a, where high-order signals at low frequencies are highly boosted.

If a noise-free case is assumed, and only the max-\(\varvec{r}_\mathrm {E}\) side-lobe suppression of the highest band is used for all bands, one gets the image in Fig. 6.14b, which improves with individual max-\(\varvec{r}_\mathrm {E}\) weights in Fig. 6.14c.

*. Assuming that self-noise of the microphones is uncorrelated, it will also remain uncorrelated and of equal strength after decomposing the \(\mathrm {M}\) microphone signals \(p_i=\mathcal {N}\) into the \((\mathrm {N}+1)^2\) spherical harmonic coefficient signals \(\psi _{nm}=\frac{(\mathrm {N}+1)^2}{\mathrm {M}}\mathcal {N}\), if \(\mathrm {M}\approx (\mathrm {N}+1)^2\) and the microphone arrangement permits a well-conditioned pseudo inversion \(\varvec{Y}_\mathrm {N}^\dagger \). The spectral change of the microphone self noise due to the radial filters \(\rho _n(\omega )\) can be described by the noise of the \((2n+1)\) signals of the same order, amplified by \(|\rho _n(\omega )|^2\), in comparison to the zeroth-order signal:*

**Self-noise behavior**Open measurement data (SOFA format) characterizing the directivity patterns of the 32 Eigenmike em32 transducers are provided under the link http://phaidra.kug.ac.at/o:69292. They are measured on a \(12^\circ \times 11.25^\circ \) azimuth\(\times \) zenith grid, yielding \(480\times 256\) pt impulse responses for each of the 32 transducers.

## 6.11 Practical Free-Software Examples

### 6.11.1 Eigenmike Em32 Encoding Using Mcfx and IEM Plug-In Suites

As found in [28], the em32 transducers exhibit a frequency response that favors low frequencies and attenuates high frequencies. This behavior is sufficiently well equalized in practice using two parametric shelving filters, a low shelf at 500 Hz with a gain of \(-5\) dB, and a high shelf at 5 kHz using a gain of \(+5\) dB, see Fig. 6.18.

### 6.11.2 SPARTA Array2SH

The SPARTA suite by Aalto University includes the Array2SH plug-in shown in Fig. 6.19 to convert the transducer signals of a microphone array into Ambisonics. It provides both encoding of the signals, as well as calculation and application of radial-focusing filters based on the geometry of the array. It supports rigid and open arrays and comes with presets for several arrays, such as the Eigenmike em32. The plug-in allows to adjust the radial filters in terms of regularization type and maximum gain. The Reg. Type called Z-Style corresponds to the linear-phase design of Sect. 6.9.

## Footnotes

- 1.
Typical constants are: density \(\rho =1.2\) kg/m\(^3\), speed of sound \(c=343\) m/s.

- 2.
This is because, strictly speaking, an entire plane-wave sound field is unphysical and of infinite energy: either the exhaustive in-phase vibration of an infinite plane is required, or an infinite-amplitude point-source infinitely far away is required with infinite anticipation \(t_\mathrm {s}\rightarrow +\infty \) (non-causal).

- 3.

## References

- 1.J. Daniel, Evolving views on HOA: from technological to pragmatic concernts, in
*Proceedings of the 1st Ambisonics Symposium*(Graz, 2009)Google Scholar - 2.G.W. Elko, R.A. Kubli, J. Meyer, Audio system based on at least second-order eigenbeams, in
*PCT Patent*, vol. WO 03/061336, no. A1 (2003)Google Scholar - 3.G.W. Elko, Superdirectional microphone arrays, in
*Acoustic Signal Processing for Telecommunication*, ed. by J. Benesty, S.L. Gay (Kluwer Academic Publishers, Dordrecht, 2000)CrossRefGoogle Scholar - 4.J. Meyer, G. Elko, A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield, in
*IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002. Proceedings.(ICASSP’02)*, vol. 2 (Orlando, 2002)Google Scholar - 5.G.W. Elko, Differential microphone arrays, in
*Audio Signal Processing for Next-Generation Multimedia Communication Systems*, ed. by Y. Huang, J. Benesty (Springer, Berlin, 2004)Google Scholar - 6.J. Daniel, S. Moreau, Further study of sound ELD coding with higher order ambisonics, in
*116th AES Convention*(2004)Google Scholar - 7.S.-O. Petersen, Localization of sound sources using 3d microphone array, M. Thesis, University of South Denmark, Odense (2004). www.oscarpetersen.dk/speciale/Thesis.pdf
- 8.B. Rafaely, Analysis and design of spherical microphone arrays. IEEE Trans. Speech Audio Process. (2005)Google Scholar
- 9.S. Moreau, Étude et réalisation d’outils avancés d’encodage spatial pour la technique de spatialisation sonore Higher Order Ambisonics: microphone 3d et contrôle de distance, Ph.D. Thesis, Université du Maine (2006)Google Scholar
- 10.H. Teutsch,
*Modal Array Signal Processing: Principles and Applications of Acoustic Wavefield Decomposition*(Springer, Berlin, 2007)zbMATHGoogle Scholar - 11.Z. Li, R. Duraiswami, Flexible and optimal design of spherical microphone arrays for beamforming. IEEE Trans. ASLP
**15**(2) (2007)CrossRefGoogle Scholar - 12.W. Song, W. Ellermeier, J. Hald, Using beamforming and binaural synthesis for the psychoacoustical evaluation of target sources in noise. J. Acoust. Soc. Am.
**123**(2) (2008)CrossRefGoogle Scholar - 13.B. Rafaely,
*Fundamentals of Spherical Array Processing*, 2nd edn. (Springer, Berlin, 2019)CrossRefGoogle Scholar - 14.ISO 31-11:1978, Mathematical signs and symbols for use in physical sciences and technology (1978)Google Scholar
- 15.ISO 80000-2, quantities and units? Part 2: Mathematical signs and symbols to be used in the natural sciences and technology (2009)Google Scholar
- 16.E.G. Williams,
*Fourier Acoustics*(Academic, Cambridge, 1999)Google Scholar - 17.B. Rafaely, B. Weiss, E. Bachmat, Spatial aliasing in spherical microphone arrays. IEEE Trans. Signal Process.
**55**(3) (2007)MathSciNetCrossRefGoogle Scholar - 18.F. Zotter, Sampling strategies for acoustic holography/holophony on the sphere, in
*NAG-DAGA, Rotterdam*(2009)Google Scholar - 19.P. Lecomte, P.-A. Gauthier, C. Langrenne, A. Berry, A. Garcia, A fifty-node Lebedev grid and its applications to ambisonics. J. Audio Eng. Soc.
**64**(11) (2016)CrossRefGoogle Scholar - 20.I.H. Sloan, R.S. Womersley, Extremal systems of points and numerical integration on the sphere. Adv. Comput. Math.
**21**, 107–125 (2004)MathSciNetCrossRefGoogle Scholar - 21.B. Bernschütz, C. Pörschmann, S. Spors, Soft-limiting bei modaler amplitudenverstärkung bei sphärischen mikrofonarrays im plane-wave decomposition verfahren, in
*Fortschritte der Akustik - DAGA*(2011)Google Scholar - 22.T. Rettberg, S. Spors, On the impact of noise introduced by spherical beamforming techniques on data-based binaural synthesis, in
*Fortschritte der Akustik - DAGA*(2013)Google Scholar - 23.T. Rettberg, S. Spors, Time-domain behaviour of spherical microphone arrays at high orders, in
*Fortschritte der Akustik - DAGA*(2014)Google Scholar - 24.B. Rafaely,
*Fundamentals of Spherical Array Processing*, 1st edn. (Springer, Berlin, 2015)CrossRefGoogle Scholar - 25.D.L. Alon, B. Rafaely, Spatial decomposition by spherical array processing, in
*Parametric Time-Frequency Domain Spatial Audio*, ed. by V. Pulkki, S. Delikaris-Manias, A. Politis (Wiley, New Jersey, 2017)Google Scholar - 26.S. Lösler, F. Zotter, Comprehensive radial filter design for practical higher-order ambisonic recording, in
*Fortschritte der Akustik – DAGA Nürnberg*(2015)Google Scholar - 27.F. Zotter, M. Zaunschirm, M. Frank, M. Kronlachner, A beamformer to play with wall reflections: The icosahedral loudspeaker. Comput. Music J.
**41**(3) (2017)CrossRefGoogle Scholar - 28.F. Zotter, M. Frank, C. Haar, Spherical microphone array equalization for ambisonics, in
*Fortschritte der Akustik - DAGA*(Nürnberg, 2015)Google Scholar

## Copyright information

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.