While surrounding Ambisonic loudspeaker arrays play sound from outside the listening area into the audience, compact spherical loudspeaker arrays play sound into the room from a single position. Directivity adjustable in orientation and shape can be used to steer sound beams in order to excite wall reflections in the given, acoustic environment. The directional shapes and orientations of such beams are all controlled by—guess what—Ambisonic signals. Despite the huge practical difference, both applications do not only share the spherical harmonics that lend their shapes to Ambisonic signals: The control of radiating sound beams employs nearly the same model- or measurement-based radial steering filters as those of compact higher-order Ambisonic microphones.

The works of Warusfel [3], Kassakian [4], Avizienis [5], Zotter [6, 7], Pomberger [8], Pollow [9], Mattioli Pasqual [10] established the electroacoustic background technology required to describe compact spherical loudspeaker arrays built with electrodynamic transducers. The early works on auditory objects were written by Schmeder [11], Sharma, Frank, and Zotter [2, 12, 13]. And some contemporary results were found in the project “Orchestrating the Space by Icosahedral Loudspeaker” (OSIL) between 2015 and 2018 [14,15,16,17,18,19].

7.1 Auditory Events of Ambisonically Controlled Directivity

7.1.1 Perceived Distance

Laitinen showed in [20] that increasing the directivity of a listener-facing loudspeaker array from omnidirectional to second order was able to create auditory events that were perceptually closer than the physical distance to the loudspeaker array. The experimental results can be explained by the increase of the direct-to-reverberant energy ratio, as the sound beam of the directional source does not as much excite room reflections.

Wendt extended Laitinen’s work by experiments employing a simulation of a third-order directional source in a virtual room (third-order image source model) played back by a loudspeaker ring in an anechoic room [16]. He could show that the perceived distance between the listener and the higher-order directional source could not only be controlled by the order of the directivity pattern but also by the orientation of the source (towards the listener, away from the listener). Beams projecting sounds away from the listener were perceived behind the source, cf. Fig. 7.1. Again, the perceptual results could be modeled by simple measures known from room acoustics.

Fig. 7.1
figure 1

Perceived distance (medians and 95% confidence intervals) depends on order and direction of a static source emitting sound with max-\(\varvec{r}_\mathrm {E}\) directivity (\(180^\circ \) towards listener, \(0^\circ \) away from listener) in a simulated environment

7.1.2 Perceived Direction

Using a similar room simulation, the study in [21] asked participants to indicate the perceived direction of an auditory event created by a third-order directional source. The results showed that for different source orientations, listeners perceived auditory objects at directions that often did not coincide with the sound source, but with the delayed reflection paths, cf. Fig. 7.2. Perceived directions focused on the direct sound and the three first reflections after 6, 8, and 9 ms. For some orientations, still even the second-order reflections at 12 and 14 ms were dominating localization. However, the influence of later reflections is reduced by the precedence effect. The perceived directions can be modeled by the extended energy vector originally developed for off-center listening positions in surrounding loudspeakers arrangements, as also shown in [17]. Experiments in [22] showed that panning between a reflection and the direct sound creates auditory objects in between. When applying the appropriate delay and gain to the direct sound to compensate for the longer path of the reflection, the localization curves are similar to those of standard stereo using a pair of loudspeakers.

Fig. 7.2
figure 2

IKO perceived directions (black circles, radii indicate relative amount of answers) and modeling (gray crosses), \(3\mathrm {rd}\)-order max-\(\varvec{r}_\mathrm {E}\) beam, \(2\mathrm {nd}\)-order image source model. Gray shading in the background indicates level of each path

7.2 First-Order Compact Loudspeaker Arrays and Cubes

The simplest way of creating a loudspeaker array with adjustable directivity in a practical sense is a cube with loudspeakers on its plane surfaces, as suggested by Misdariis [23]. Restricting the directivity control to two dimensions reduces the number of loudspeaker drivers to four and facilitates to equip the array with a carrying handle on top and a flange adapter at the bottom, cf. [24] and Fig. 7.3.

Directivity control. First-order Ambisonics utilizes monopole and dipole modes, which directly translate to the corresponding far-field radiation patterns. These modes can easily be created due to the cubic shape by either playing of all four drivers in phase or the opposing drivers out of phase, cf. Fig. 7.4. Nevertheless, the frequency responses of such monopole and dipole modes need to be equalized to enable their phase- and magnitude-aligned superposition in the far field. Filters and measurement data of cube loudspeakers built at IEM [24] are freely available on http://phaidra.kug.ac.at/o:67631.

Fig. 7.3
figure 3

Design of a loudspeaker cube: prototype, and vertical and horizontal cross section plots

Fig. 7.4
figure 4

System controlling the monopoles and dipole modes of the loudspeaker cubes, to accomplish first-order beamforming with the shape parameter \(\alpha \) and beam direction \(\varphi _0\)

To overcome the compressive effort of interior volume changes at low frequencies, the filter \(H_\mathrm {bctl}\) in Fig. 7.4 equalizes the smaller velocity of the loudspeaker cones when driven omnidirectionally to the velocity when driven in dipoles as a first step, and as a second step, it attenuates the monopole pattern slightly to account for its more efficient radiation at low frequencies. The filter \(H_\mathrm {EQ}\) is a general equalizer required to obtain a flat frequency response, \(0\le \alpha \le 1\) is a first-order omni to dipole beam-shape parameter, and \(\varphi _0\) is the beam direction. The filter \(H_\mathrm {bctl}\) can be specified as a \(5\mathrm {th}\)-order IIR filter purely based on geometric and electroacoustic parameters [19].

Direct and indirect sound with two cubes. The study in [19] examined the width of the listening area for the creation of a central auditory object between a pair of loudspeaker cubes cf. Fig. 7.5. Steering the two beams directly at the listener yielded a narrow listening area that increased with the distance to the loudspeakers, similar as known from typical stereo applications, cf. Fig. 2.9. A much wider listening area is achieved by steering the beams to the front wall to excite reflections. To this end, max-\(\varvec{r}_\mathrm {E}\) (super-cardioid) beams were chosen and oriented in a way to ideally suppress direct sound from the loudspeaker cubes at the listening position. The proposed setup of two loudspeaker cubes can be used to play back stable L, C, R channels of a surround production without the need of an actual center loudspeaker.

Fig. 7.5
figure 5

Width of the listening area for a central auditory object at two distances from a pair of loudspeaker cubes with different orientation of max-\(\varvec{r}_\mathrm {E}\)/super-cardioid beams

Surround with depth: Together with the distance control described by Laitinen [20], the stable in-between auditory image has been used in [19] to establish a surround-with-depth system consisting of a quadraphonic setup of four loudspeaker cubes. As first layer, it uses the direct sounds from the 4 loudspeakers from \({\pm }45^\circ \) and \({\pm } 135^\circ \) together with the 4 in-between images at \(0^\circ \), \({\pm }90^\circ \), and \(180^\circ \) to obtain 8 directions for third-order Ambisonic surround panning. As a second layer for depth, surround with depth uses 4 cardioid beams pointing into the 4 room corners to provide the impression of distant sounds. Blending between those two layer is used to control the distance impression of surround sounds.

7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO

With transducers mounted on spheres or polyhedra, higher-order radiators can be built. Typically, those are Platonic solids such as dodecahedra or icosahedra, as they can easily be manufactured from equal-sided polygons cf. Fig. 7.6. Often, the loudspeakers are also mounted onto a common interior volume. Hereby, the higher-order modes can be controlled at reduced impedance of the inner stiffness, however, this also causes acoustic coupling of the transducer motions. Typically, multiple-input-multiple-output (MIMO) crosstalk cancellers are employed to suppress the coupling and to control the velocity of the transducer cones. If this is accomplished, the acoustic radiation can be modeled and equalized by the spherical cap model, cf. [6, 15, 25, 26].

Fig. 7.6
figure 6

Powerful icosahedral loudspeaker array (IKO by IEM and Sonible) and reflecting baffles in Ligeti concert hall, in preparation of an electroacoustic music concert

Cap model. Higher-order loudspeaker arrays on a compact spherical housing are modeled by the spherical cap model. It assumes for the exterior air that the radial surface velocity is a boundary condition consisting of separated spherical cap shapes of the size \(\alpha \) centered around the directions \(\{\varvec{\uptheta }_l\}\), each unity in value. These idealized transducer shapes driven by the transducer velocities \(v_l\) compose the surface velocity

$$\begin{aligned} v(\varvec{\theta })&=\sum _{l=0}^\mathrm {L}u(\varvec{\uptheta }_l^\mathrm {T}\varvec{\theta }-\cos {\textstyle \frac{\alpha }{2}})\,v_l. \end{aligned}$$
(7.1)

Here, \(u(\zeta )\) denotes the unit step function that is unity for \(\zeta \ge 0\) and zero otherwise. The surface velocity distribution can be decomposed into spherical harmonics as

$$\begin{aligned} v(\varvec{\theta })&=\sum _{n=0}^\infty \sum _{m=-n}^nY_n^m(\varvec{\theta })\,\sum _{l=0}^\mathrm {L}w_{nm}^{(l)}\,v_l\,. \end{aligned}$$
(7.2)

The coefficients \(w_{nm}^{(l)}\) of the \(l\mathrm {th}\) cap are defined by spherical convolution Eq. (A.56) of a Dirac delta \(\delta (\varvec{\uptheta }_l^\mathrm {T}\varvec{\theta }-1)\) pointing to the cap center with a zenithal cap \(u(\cos \vartheta -\cos \frac{\alpha }{2})\):

$$\begin{aligned} w_{nm}^{(l)}&=w_n\;Y_n^m(\varvec{\uptheta }_l), \end{aligned}$$
(7.3)

where \(Y_n^m(\varvec{\uptheta }_l)\) are the coefficients expressing the Dirac delta, extended to a cap by weighting with \(w_n\). The term \(w_n=2\pi \int _{\cos \frac{\alpha }{2}}^1P_n(\zeta )\;\mathrm {d}\zeta \) is derived in Eq. (A.60)

$$\begin{aligned} w_n&=2\pi {\left\{ \begin{array}{ll} -\frac{P_{n+1}(\cos {\frac{\alpha }{2}})-\cos {\frac{\alpha }{2}}\,P_n(\cos {\frac{\alpha }{2}})}{n}, &{}\text {for n>0},\\ 1-\cos {\textstyle \frac{\alpha }{2}}, &{}\text {for n=0}. \end{array}\right. } \end{aligned}$$
(7.4)

Decoder. Without radiation control yet, any low-order target spherical harmonic \(n\le \mathrm {N}\) can be synthesized as velocity pattern \(\phi _{nm}\) by superimposing the spherical cap coefficients \(w_{nm}^{(l)}\) with suitable transducer velocities \(v_l\), i.e. \(\phi _{nm}=\sum _l w_n\,Y_n^m(\varvec{\uptheta }_l)\,v_l\). We write a matrix/vector notation with the matrix \(\varvec{Y}=[\varvec{y}(\varvec{\uptheta }_1),\dots ,\varvec{y}(\varvec{\uptheta }_\mathrm {L})]\) containing the spherical harmonics \(\varvec{y}(\varvec{\theta })=[Y_n^m(\varvec{\theta })]_{nm}\) sampled at the transducer positions \(\{\varvec{\uptheta }_l\}\) to represent Dirac deltas pointing there, and \(\varvec{w}=[w_n]_{nm}\) to represent the cap shape,

$$\begin{aligned} \varvec{\phi }&=\mathrm {diag}\{\varvec{w}\}\varvec{Y}\,\varvec{v}. \end{aligned}$$
(7.5)

As long as the order \(\mathrm {N}\) up to which coefficients are controlled is low enough \(\mathrm {L}\ge (\mathrm {N}+1)^2\) and transducers are well-distributed, perfect control is feasible. The corresponding velocities are found by solving a least-squares problem, see Appendix A.4, Eq. (A.63), yielding the right inverse of the \(\mathrm {N{th}}\)-order cap-coefficient matrix,

$$\begin{aligned} \varvec{v}&=\varvec{Y}_\mathrm {N}^\mathrm {T}(\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\,\mathrm {diag}\{\varvec{w}_\mathrm {N}\}^{-1}\varvec{\phi }_\mathrm {N}=\varvec{D}\,\mathrm {diag}\{\varvec{w}_\mathrm {N}\}^{-1}\varvec{\phi }_\mathrm {N}. \end{aligned}$$
(7.6)

The right inverse \(\varvec{D}=\varvec{Y}_\mathrm {N}^\mathrm {T}(\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\) is a mode-matching decoder, cf. Eq. (4.40).

Exterior problem. The radiated sound pressure is described by the exterior problem denoted by the coefficients \(c_{nm}\) in Eq. (6.13) and the spherical Hankel functions \(h_n^{(2)}(kr)\). To relate it to a time-derived surface velocity at the array radius \(r=\mathrm {a}\), we derive the exterior solution with regard to radius \(\frac{\partial p}{\partial r}=k\frac{\partial p}{\partial kr}=-\mathrm {i}kc\,\rho v\), cf. Eq. (6.2),

$$\begin{aligned} v(\varvec{\theta })&= \frac{\mathrm {i}}{\rho c}\sum _{n=0}^\infty \sum _{m=-n}^nh_n'^{(2)}(k\mathrm {a})\;Y_n^m(\varvec{\theta })\;c_{nm}. \end{aligned}$$
(7.7)

Comparing Eq. (7.2) to Eq. (7.7) yields \(c_{nm}=\rho c[\mathrm {i}{h_n'^{(2)}(k\mathrm {a})}]^{-1}{\sum _{l=0}^\mathrm {L}w_n\,Y_n^m(\varvec{\uptheta }_l)\,v_l}\), the coefficients to calculate the radiated pressure. Far away, we replace the spherical Hankel function that approaches \(h_n^{(2)}(kr)\rightarrow \mathrm {i}^{n+1}k^{-1}e^{-\mathrm {i}kr}\) by the term \({\mathrm {i}^{n+1}}{k^{-1}}\) in Eq. (6.13) so that the radiated far-field sound pressure \(p\propto \sum {\mathrm {i}^{n+1}}{k^{-1}}Y_n^m c_{nm}\) becomes

$$\begin{aligned} p(\varvec{\theta })\propto \sum _{n=0}^\infty \sum _{m=-n}^n \;Y_n^m(\varvec{\theta })\; \frac{\mathrm {i}^{n}\;w_n}{k\,h_n'^{(2)}(k\mathrm {a})}\sum _{l=0}^\mathrm {L}Y_n^m(\varvec{\uptheta }_l)\,v_l. \end{aligned}$$
(7.8)

7.3.1 Directivity Control

The spherical harmonics coefficients of the far-field sound pressure pattern in Eq. (7.8) are controlled by the cap velocities \(v_l\)

$$\begin{aligned} \psi _{nm}&=\frac{\mathrm {i}^{n}\;w_n}{k\,h_n'^{(2)}(k\mathrm {a})}\sum _{l=0}^\mathrm {L}Y_n^m(\varvec{\uptheta }_l)\,v_l, \end{aligned}$$
(7.9)

and we desire to form the directional sound beam they represent according to a max-\(\varvec{r}_\mathrm {E}\) pattern \(a_n\,Y_n^m(\varvec{\theta }_0)\) yielding radiation focused towards \(\varvec{\theta }_0\)

$$\begin{aligned} \psi _{nm}=a_n\,Y_n^m(\varvec{\theta }_0). \end{aligned}$$
(7.10)

To find suitable cap velocities \(v_l\), we equate the model Eqs. (7.9) and (7.10). In matrix/vector notation never used the equation is

$$\begin{aligned} \mathrm {diag}\{[&\mathrm {i}^{n}w_n\,k^{-1}/h_n'^{(2)}(k\mathrm {a})]_{nm}\}\;\varvec{Y}\varvec{v}=\mathrm {diag}\{[a_n]_{nm}\}\varvec{y}(\varvec{\theta }_0). \end{aligned}$$
(7.11)

The diagonal matrix on the left is easy to invert, and for patterns up to the order \(n\le \mathrm {N}\), the mode-matching decoder \(\varvec{D}\) of Eq. (7.6) already gives us a way to define velocities inverting the matrix \(\varvec{Y}_\mathrm {N}\) from the right. The preliminary solution becomes

$$\begin{aligned} \Rightarrow \varvec{v}&=\varvec{D}\,\mathrm {diag}\{[\mathrm {i}^{-n}w_n^{-1}\,k\,h_n'^{(2)}(k\mathrm {a})\,a_n]_{nm}\}\;\varvec{y}_\mathrm {N}(\varvec{\theta }_0). \end{aligned}$$
(7.12)

On-axis equalized, sidelobe-suppressing directivity control limiting the excursion. The inverse cap shape coefficient \(w_n^{-1}\) and the max-\(\varvec{r}_\mathrm {E}\) weight \(a_n\) can be regarded as a part of the radiation control filters \(\mathrm {i}^{-n}\,k\,h_n'^{(2)}(k\mathrm {a})\). The expression \({\mathrm {i}^{-n-1}}(k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})\) of compact spherical microphone arrays (Sect. 6.6) qualitatively differs by a factor k. Practical implementation of radiation control filters and their regularization is therefore quite similar to radial filters of spherical microphone arrays. There are three main differences, as explained in [15]:

  • With loudspeaker arrays, it is rather the excursion that is limited, which primarily entails a different strategy of adjusting the filter bank cut-on frequencies, which due to size are at lower frequencies where group-delay distortions are less disturbing, and linear-phase implementations would cause avoidably long delays.

  • Moreover, instead of cut-on filter slopes of \((n+1)\mathrm {th}\) order required for noise removal in signals obtained from spherical microphone arrays, limited excursion requires cut-on slopes of at least \((n+3)\mathrm {th}\) order, i.e. \(4\mathrm {th}\) order to cut on the \(1\mathrm {st}\)-order Ambisonic signals. Thereof, one additional order is caused by the qualitative difference of \(k^{-1}\) in radial filters, and another order by the conversion of velocity to excursion by a factor \((\mathrm {i}\omega )^{-1}\).

  • Finally, instead of diffuse-field equalization that is useful for surround sound playback of spherical microphone array signals, it is more useful to equalize spherical sound beams on-axis (free field).

Fig. 7.7
figure 7

Signal processing for higher-order compact spherical loudspeaker array control: the Ambisonic directivity signals \(\varvec{\chi }_\mathrm {N}(t)\) run through radiation control filters \(\rho _n(\omega )\), a decoder yielding desired velocities \(\varvec{v}(t)\), and a crosstalk canceller/equalizer provides suitable output voltages \(\varvec{u}(t)\)

On-axis equalization yields a different scaling of the sub-band max-\(\varvec{r}_\mathrm {E}\) weights

$$\begin{aligned} a_{n,b}&={\left\{ \begin{array}{ll} P_n\bigl (\cos \frac{137.9^\circ }{b+1.51}\bigr )\;\frac{\sum _{n=0}^\mathrm {N}(2n+1)P_n\bigl (\cos \frac{137.9^\circ }{\mathrm {N}+1.51}\bigr )}{\sum _{n=0}^b(2n+1)P_n\bigl (\cos \frac{137.9^\circ }{b+1.51}\bigr )} , &{} \text {for}\, n\le b,\\ 0, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(7.13)

Typically, cut-on frequencies for compact spherical loudspeaker arrays are low, and linear-phase filterbanks would require long pre-delays. It is useful to employ Linkwitz-Riley filters for the crossovers, to get a low-latency implementation. To emphasize the similarity to Eq. (6.22), we write Linkwitz-Riley filters [27] as combination of an all-pass \(A^{m}\) with twice the phase response of an \(m\mathrm {th}\)-order Butterworth low-pass combined either with the magnitude-squared low-pass response \([1+(\omega /\omega _c)^{2m}]^{-1}\) or high-pass response \((\omega /\omega _c)^{2m}[1+(\omega /\omega _c)^{2m}]^{-1}\) per crossover. Such a minimum-phase crossover is of even order, so that the minimum-order cut-on slope must be rounded up to the next even order \(2\lceil \frac{b+3}{2}\rceil \). Plain high/low crossovers would be in-phase unless combined with further crossovers to form narrower bands. However, an in-phase filterbank is obtained after inserting the product of all all-passes in every band, cf. [28]. Although non-minimum-phase, this is still low-latency. For the band b containing Ambisonic orders \(0\le n\le b\), the modified filterbank is

$$\begin{aligned} H_b(\omega )&=\frac{\bigl (\frac{\omega }{\omega _b}\bigr )^{2\lceil \frac{b+3}{2}\rceil }}{1+\bigl (\frac{\omega }{\omega _b}\bigr )^{2\lceil \frac{b+3}{2}\rceil }} \frac{1}{1+\bigl (\frac{\omega }{\omega _{b+1}}\bigr )^{2\lceil \frac{b+4}{2}\rceil }} \prod _{b'=0}^\mathrm {N}A_{\omega _{b'}}^{\lceil \frac{b'+3}{2}\rceil }(\omega ). \end{aligned}$$
(7.14)

The sum \(\sum _b H_b(\omega )\) is considered to be sufficiently flat, so that the radial filters for compact spherical loudspeaker arrays using Eqs. (7.8), (7.13), (7.14) become

$$\begin{aligned} \rho _n(\omega )&=\left[ \sum _{b=n}^\mathrm {N}a_{n,b} H_b(\omega )\right] \,\mathrm {i}^{-n}\,w_n^{-1}\,h_n'^{(2)}(k\mathrm {a})\,e^{\mathrm {i}k\mathrm {a}}. \end{aligned}$$
(7.15)

Figure 7.7 shows the block diagram to control compact spherical loudspeaker arrays by Ambisonic input signals, including the radiation control filters, the decoder, and a voltage-equalizing crosstalk canceller feeding the loudspeakers.

7.3.2 Control System and Verification Based on Measurements

Velocity equalization/crosstalk cancellation. In the frequency domain, laser vibrometer measurements, cf. Fig. 7.8a, characterize the physical multiple-input-multiple-output (MIMO) system of transducer input voltages \(u_l(\omega )\) to transducer velocities \(v_l(\omega )\)

$$\begin{aligned} \varvec{v}(\omega )&=\varvec{T}(\omega )\,\varvec{u}(\omega ), \end{aligned}$$
(7.16)

including the effect of acoustic coupling through the common enclosure. Corresponding open measurement data setsFootnote 1 can be found online, as described in [18]. Theoretically, the frequency-domain inverse of the matrix \(\varvec{T}(\omega )\) can be used to equalize and control the transducer velocities with acoustic crosstalk cancelled, as indicated in Fig. 7.7,

$$\begin{aligned} \varvec{u}(\omega )&=\varvec{T}^{-1}(\omega )\,\varvec{v}(\omega ). \end{aligned}$$
(7.17)

In practice, this is only useful up to the frequency at which the loudspeaker cone vibration breaks up into modes, so typically below 1 kHz.

Control system: The entire control system with Ambisonic signals \(\varvec{\chi }_\mathrm {N}(\omega )\) as inputs uses Eqs. (7.6), (7.15), (7.17)

$$\begin{aligned} \varvec{u}(\omega )&=\varvec{T}^{-1}(\omega )\,\varvec{D}\, \mathrm {diag}\{\varvec{\rho }(\omega )\}\,\varvec{\chi }_\mathrm {N}(\omega ). \end{aligned}$$
(7.18)

Directivity measurement. It is useful to characterize the directivity obtained by measurements to verify the results; high-resolution \(648\times 20\) measurements \(\varvec{G}(\omega )\) of the IKO are found online1. The sound pressure can be decomposed with the known directional sampling by left-inversion of a spherical harmonics matrix \(\varvec{Y}_{17}^\mathrm {T}\), see Appendix A.4, Eq. (A.65, which can be up to \(17\mathrm {th}\) order on a \(10^\circ \times 10^\circ \) grid in azimuth and zenith:

$$\begin{aligned} \varvec{p}(\omega )&=\varvec{G}(\omega )\,\varvec{u}(\omega ),&\Rightarrow \varvec{\psi }_{17}(\omega )&=(\varvec{Y}_{17}^\mathrm {T})^\dagger \varvec{p}(\omega ). \end{aligned}$$
(7.19)

With the highly resolved spherical harmonics coefficients, polar diagrams or balloon diagrams can be evaluated at any direction

$$\begin{aligned} p(\varvec{\theta },\omega )&=\varvec{y}_{17}(\varvec{\theta })^\mathrm {T}\varvec{\psi }_{17}(\omega ), \end{aligned}$$
(7.20)

given any control system delivering suitable voltages \(\varvec{u}\) for beamforming, as e.g. obtained by Eq. (7.18).

Fig. 7.8
figure 8

Measurements on the IKO as a MIMO system in terms of transducer output velocities (left) and radiation patterns (right) depending on the transducer input voltages

To inspect the frequency-dependent directivity, a horizontal cross section is shown in Fig 7.9. The beamforming gets effective above 100 Hz and a beam width of \({\pm }30^\circ \) is held until 2 kHz. The filterbank starts the \(0\mathrm {th}\) order above 38 Hz, and with 75, 125, 210 Hz, \(1\mathrm {st}\), \(2\mathrm {nd}\), and \(3\mathrm {rd}\) order are successively added including on-axis equalized max-\(\varvec{r}_\mathrm {E}\) weightings. Above 2 kHz both spatial aliasing and modal breakup of the transducer cones affect directivity. However, these beamforming-direction-dependent distortions are often negligible in typical rooms.

Fig. 7.9
figure 9

Horiontal cross section of the IKO’s directivity/dB over frequency/Hz and azimuth/degrees when beamforming to \(0^\circ \) azimuth on the horizon, with radiation control filters above, with filterbank frequencies (38, 75, 125, 210) Hz

Fig. 7.10
figure 10

Distance control using the IKO in the IEM CUBE

7.4 Auditory Objects of the IKO

7.4.1 Static Auditory Objects

The study in [16] showed that distance control by changing the directivity and its orientation can also be achieved with the IKO in a real room, cf. Fig. 7.10. The experiments used stationary pink noise and could create auditory objects nearly 2 m behind the IKO, which corresponds to the distance between the IKO and the front wall of the playback room.

Fig. 7.11
figure 11

Signal-dependent distance of auditory objects created by the IKO: markers indicate short or long fade-in and fade-out of pink noise bursts, \(\cdot \) indicates transient click sound

The maximum distance of auditory objects created by the IKO is strongly signal-dependent. Experiments in [14] showed that the auditory distance of pink noise bursts decreased for shorter fade-in times, while the fade-out time had no influence, cf. Fig. 7.11. A transient click sound was perceived even closer to the IKO. This can be explained by the precedence effect, that favors the earlier direct sound over the reflected sound from the walls. While this effect is strong for transient sounds, it is inhibited for stationary sounds with long fade-in times.

However, the precedence effect can even be reduced for transient click sounds by simultaneous playback of a masker sound that reduces the influence of the direct sound [29]. In comparison to no masker, playing a pink noise masker doubles the auditory distance, cf. Fig. 7.12. Using the room noise as a masker by playing the target sound very softly further increases the distance and yield a perception that is detached from the IKO.

Fig. 7.12
figure 12

Distance of auditory objects created by the IKO playing a transient click sound in dependence of maskers: no masker (M0), noise at different levels (M1, M2) and playback at very low level (room masker MR1 and MR2)

Fig. 7.13
figure 13

Average perceived locations for each 500 ms step during front/back-movement (dark gray) and left/right-movement (light gray) at two listening positions, triangle indicates start and asterisk end of the trajectory

7.4.2 Moving Auditory Objects

The studies in [14, 15] extended the previous listening experiments towards simple time-varying beam directions, such as from the left to the right, front/back or circles. To report the perceived locations of the moving auditory objects, listeners used a touch screen that showed a floor plan of the room, including the listening position and the position of the IKO. They had to indicate the location of the auditory object’s trajectory every 500 ms. The perceived trajectories depend on the listening position, but they can always be recognized, cf. Fig. 7.13. The empirical knowledge was applied in the artistic study in [14] about body-space relations, composing sounds that are spatialized with different static directions and simple movements.

Fig. 7.14
figure 14

Average perceived locations for each 500 ms step during circular movement of transient sound (dark gray) and stationary noise (light gray) without and with additional reflectors, triangle indicates start and asterisk end of the trajectory

For concerts, the artistic practice evolved to set the IKO up together with reflector baffles, cf. Fig. 7.14. A recent study in [30] investigated their effect on the perception of moving transient and stationary sounds. The baffles obviously reduce the signal-dependency by contributing more additional reflection paths, contrasting the direct sound.

7.5 Practical Free-Software Examples

7.5.1 IEM Room Encoder and Directivity Shaper

The IEM Room Encoder VST plug-in, cf. Fig. 4.36, can not only be used to simulate the room reflections of an omnidirectional sound source based on the image-source method, but it also supports directional sound sources. As format, it employs Ambisonics with ACN ordering and adjustable normalization up to seventh order. Thus, it enables to utilize data from directivity measurements or even directional recordings done with a surrounding spherical microphone, e.g. to put real instrument recordings into the virtual room.

As an alternative, the IEM Directivity Shaper, cf. Fig. 7.15 provides simple means to generate a frequency-dependent directivity pattern from scratch and to apply it on a mono input signal. This is useful to generate the typical rotary speaker effect of a Leslie cabinet.

Fig. 7.15
figure 15

IEM Directivity Shaper plug-in

7.5.2 IEM Cubes 5.1 Player and Surround with Depth

As shown in Fig. 7.5, a pair of loudspeaker cubes can create a stable auditory event in between them to replace an actual center loudspeaker. In order to play back an entire 5.1 production, the IEM cubes 5.1 Player plug-in extends this approach by two additional beams to the side walls for the surround channels, cf. Fig. 7.16. The plug-in provides a control of the shape, direction, and level of all beams, as well as a delay compensation for the reflection paths.

Fig. 7.16
figure 16

IEM cubes 5.1 Player, cubes Surround Decoder and Distance Encoder plug-ins

Surround sound with depth can be realized with a quadraphonic setup of four loudspeaker cubes and a combination of the cubes Surround Decoder and multiple Distance Encoder plug-ins, cf. Fig. 7.16. For each source, the Distance Encoder controls position and distance, i.e. the blending between the two layers. The output of the plug-in is a 10-channel audio stream including 7 channels for third-order (inner layer) and 3 for first-order 2D Ambisonics (outer depth layer). The cubes Surround Decoder plug-in decodes the 10-channel audio stream and distributes the signals to the 16 drivers of four loudspeaker cubes. For each loudspeaker cube, the directions to excite direct and reflected sound of the inner layer and the diffuse sound of the depth layer can be adjusted in order to adapt to the playback environment. Additionally, the directivity patterns for direct, reflected, and diffuse sound beams can be controlled, as well as a delay to compensate for the longer propagation paths of the reflected sound. The plug-ins are available under https://git.iem.at/audioplugins/CubeSpeakerPlugins.

7.5.3 IKO

Spatialization using the IKO can use a similar infrastructure of plug-ins as surrounding loudspeaker arrays. Ambisonic encoder plug-ins, such as the ambix_encoder or the IEM StereoEncoder or MultiEncoder, create the third-order Ambisonic signals that are subsequently fed to a decoder. Decoding to the IKO requires the processing steps as shown in Fig. 7.7: radiation control filters in the spherical harmonic domain, decoding from spherical harmonics to transducer signals, as well as crosstalk cancellation and equalization of the transducers. This processing can be summarized in a 16 (spherical harmonics up to third order) \(\times \) 20 (transducers) filter matrix. Convolution can be done efficiently using the mcfx_convolver plug-in. Filter presets for the IKO can be found under http://phaidra.kug.ac.at/o:79235.