Improving the conditioning of the optimization criterion in acoustic multi-channel equalization using shorter reshaping filters
Abstract
In acoustic multi-channel equalization techniques, such as complete multi-channel equalization based on the multiple-input/output inverse theorem (MINT), relaxed multi-channel least-squares (RMCLS), and partial multi-channel equalization based on MINT (PMINT), the length of the reshaping filters is generally chosen such that perfect dereverberation can be achieved for perfectly estimated room impulse responses (RIRs). However, since in practice the available RIRs typically differ from the true RIRs, this reshaping filter length may not be optimal. This paper provides a mathematical analysis of the robustness increase of equalization techniques against RIR perturbations when using a shorter reshaping filter length than conventionally used. Based on the condition number of the (weighted) convolution matrix of the RIRs, a mathematical relationship between the reshaping filter length and the robustness against RIR perturbations is established. It is shown that shorter reshaping filters than conventionally used yield a smaller condition number, i.e., a higher robustness against RIR perturbations. In addition, we propose an automatic non-intrusive procedure for determining the reshaping filter length based on the L-curve. Simulation results confirm that using a shorter reshaping filter length than conventionally used yields a significant increase in robustness against RIR perturbations for MINT, RMCLS, and PMINT. Furthermore, it is shown that PMINT using an optimal intrusively determined reshaping filter length outperforms all other considered techniques. Finally, it is shown that the automatic non-intrusively determined reshaping filter length in PMINT yields a similar performance as the optimal intrusively determined reshaping filter length.
Keywords
Speech dereverberation Condition number Reshaping filter length L-curve1 Introduction
The microphone signals recorded in many hands-free speech communication applications, such as teleconferencing, voice-controlled systems, or hearing aids, do not only contain the desired speech signal but also attenuated and delayed copies due to reverberation. While early reverberation may be desirable [1, 2, 3], late reverberation may degrade the perceived speech quality and intelligibility [4, 5, 6] as well as the performance of automatic speech recognition systems [7, 8]. In order to mitigate these detrimental effects of reverberation, several single-channel and multi-channel dereverberation techniques have been proposed [9], with multi-channel techniques being generally preferred since they are able to exploit both the spectro-temporal and the spatial characteristics of the received microphone signals. Existing multi-channel dereverberation techniques can be broadly classified into spectro-temporal enhancement techniques [10, 11, 12, 13, 14], probabilistic modeling-based techniques [15, 16, 17, 18], and acoustic multi-channel equalization techniques [19, 20, 21, 22, 23, 24, 25, 26]. Acoustic multi-channel equalization techniques aim to reshape the available room impulse responses (RIRs) between the speaker and the microphone array. Since in theory they can achieve perfect dereverberation [19], they represent an attractive approach to speech dereverberation.
A well-known complete multi-channel equalization technique aiming at acoustic system inversion is the multiple-input/output inverse theorem (MINT)-based technique [19], which however suffers from drawbacks in practice. Since the available RIRs typically differ from the true RIRs due to fluctuations (e.g., temperature or position variations [27]) or due to the sensitivity of blind system identification (BSI) and supervised system identification (SSI) methods to near-common zeros or interfering noise [28, 29, 30], MINT generally fails to invert the true RIRs, possibly leading to severe distortions in the output signal [22, 23, 24, 26]. In order to increase the robustness against RIR perturbations, partial multi-channel equalization techniques, such as relaxed multi-channel least-squares (RMCLS) [23] and partial multi-channel equalization based on MINT (PMINT) [24], have been proposed. Since early reflections tend to improve speech intelligibility [1, 2, 3] and late reflections are the major cause of speech intelligibility degradation [4, 5, 6], the objective of partial equalization techniques is to shorten the overall impulse response by suppressing only the late reflections. While RMCLS imposes no constraints on the remaining early reflections, PMINT has been shown to be more perceptually advantageous since it also aims to control the remaining early reflections. Although partial equalization techniques can be significantly more robust than MINT, their performance still remains rather susceptible to RIR perturbations [23, 24, 26]. As a result, several methods have been proposed to further increase the robustness against RIR perturbations. In [22, 24], it has been proposed to incorporate regularization, such that the distortion energy due to RIR perturbations is decreased. In [26], it has been proposed to use a signal-dependent penalty function to promote sparsity in the output signal and reduce artifacts generated by non-robust techniques. In [31, 32], it has been proposed to relax the constraints on the filter design by constructing approximate reshaping filters in the subband domain. In [33], it has been proposed to relax the constraints on the filter design by using a shorter reshaping filter length than conventionally used. The objective of this paper is to provide a mathematical analysis of the robustness increase when using a shorter reshaping filter length as well as to propose an automatic non-intrusive procedure for selecting an optimal shorter reshaping filter length.
The length of the reshaping filters in MINT, RMCLS, and PMINT is conventionally chosen such that perfect dereverberation can be achieved for perfectly estimated RIRs. As already mentioned, since in practice the available RIRs typically differ from the true RIRs, this choice of the reshaping filter length yields a high sensitivity to RIR perturbations. In [33], it has been analytically shown that decreasing the reshaping filter length increases the robustness for MINT and PMINT only if the multi-channel convolution matrix of the RIRs is a square matrix. In this paper, it is analytically shown that decreasing the reshaping filter length increases the robustness of MINT, RMCLS, and PMINT independently of the dimension of the (weighted) multi-channel convolution matrix of the RIRs. A mathematical relationship between the reshaping filter length and the condition number of the (weighted) multi-channel convolution matrix of the available RIRs, hence, the sensitivity of equalization techniques to RIR perturbations, is derived. We show that shorter reshaping filters than conventionally used yield a smaller condition number, i.e., a higher robustness against RIR perturbations.
In general, the reshaping filter length yielding optimal performance can only be determined intrusively (i.e., using a clean reference signal), obviously limiting the practical applicability. Hence, we also propose and investigate an automatic non-intrusive selection procedure for the reshaping filter length based on the L-curve [34, 35, 36].
Simulation results for several acoustic systems and RIR perturbations show by means of instrumental performance measures that using shorter reshaping filters in MINT, RMCLS, and PMINT significantly increases the robustness against RIR perturbations. In addition, it is demonstrated that PMINT using the optimal intrusively determined reshaping filter length outperforms the other considered equalization techniques, yielding a larger reverberant energy suppression and perceptual speech quality improvement. Furthermore, it is shown that the non-intrusively determined reshaping filter length yields a nearly optimal performance for PMINT.
The paper is organized as follows. In Section 2, the considered acoustic configuration and the used notation is introduced. In Section 3, state-of-the-art acoustic multi-channel equalization techniques, i.e., MINT, RMCLS, and PMINT, are briefly reviewed. In Section 4, the sensitivity of these equalization techniques to RIR perturbations is evaluated by means of the condition number of the (weighted) convolution matrix and analytical insights on increasing the robustness by decreasing the reshaping filter length are provided. In Section 5, the automatic non-intrusive procedure for determining the reshaping filter length is discussed. Using instrumental performance measures, the dereverberation performance of all considered techniques is compared in Section 6.
2 Configuration and notation
where ∗ denotes convolution, s(n) is the clean speech signal, h_{ m }(n) is the RIR between the speech source and the m-th microphone, x_{ m }(n) is the reverberant speech component, and v_{ m }(n) is the noise component. Since acoustic multi-channel equalization techniques generally design reshaping filters without taking the additive noise into account, in the following it is assumed that v_{ m }(n)=0, and hence, y_{ m }(n)=x_{ m }(n).
The reshaping filter g can then be constructed based on different design objectives for the EIR c.
3 Acoustic multi-channel equalization
Definition of the target EIR c_{d} and weighting matrix W for MINT, RMCLS, and PMINT
Technique | Target EIR c_{d} | Weighting matrix W |
---|---|---|
MINT | \(\big [\underbrace {0 \; \ldots \; 0}_{\tau } \; 1 \; 0 \; \ldots \; 0 \big ]^{T}\) | I |
RMCLS | \(\big [\underbrace {0 \; \ldots \; 0}_{\tau } \; 1 \; 0 \; \ldots \; 0 \big ]^{T}\) | \({\text {diag}}\big \{\big [\underbrace {1 \; \ldots \; 1}_{\tau } \; \underbrace {1 \; 0 \; \ldots \; 0}_{L_{d}} \; 1 \; \ldots 1\big ]^{T}\big \}\) |
PMINT | \(\big [\underbrace {0\ldots 0}_{\tau }\underbrace {\hat {h}_{p}(0) \ldots \hat {h}_{p}(L_{d}-1)}_{L_{d}} 0 \ldots 0\big ]^{T}\) | I |
with W an L_{ c }×L_{ c }–dimensional diagonal weighting matrix. The definition of the weighting matrix W for MINT, RMCLS, and PMINT is presented in Table 1, where I denotes the L_{ c }×L_{ c }–dimensional identity matrix. Based on these definitions of W and c_{d}, it can be observed that on the one hand, MINT and PMINT do not use a weighting matrix and constrain all taps of the EIR (i.e., W=I), while on the other hand, RMCLS uses a weighting matrix and does not constrain all taps of the EIR (i.e., \(\mathbf {W} = {\text {diag}}\big \{\big [\underbrace {1 \; \ldots \; 1}_{\tau } \; \underbrace {1 \; 0 \; \ldots \; 0}_{L_{d}} \; 1 \; \ldots 1\big ]^{T}\big \}\)). It has been experimentally validated in [23, 24, 26] that by constraining all taps of the EIR, MINT and PMINT may result in a good perceptual speech quality but a high sensitivity to RIR perturbations, whereas by not constraining all taps of the EIR, RMCLS may result in a lower sensitivity to RIR perturbations but a decreased perceptual speech quality.
where {·}^{+} denotes the matrix pseudo-inverse. When the true RIRs are available, i.e., \(\hat {\mathbf {H}} = \mathbf {H}\), the reshaping filter of length L_{ g } according to (11) yields perfect dereverberation, i.e., WHg_{LS}=Wc_{d}. However, in the presence of RIR perturbations, i.e., \(\hat {\mathbf {H}} \neq \mathbf {H}\), this filter typically fails to achieve perfect dereverberation, i.e., WHg_{LS}≠Wc_{d}, possibly even causing severe distortions in the output signal [24, 26]. The sensitivity of the reshaping filter to RIR perturbations can be evaluated by analyzing the condition number of the matrix \(\mathbf {W}\hat {\mathbf {H}}\).
4 Robust acoustic multi-channel equalization
In this section, the Wedin theorem [37] relating the condition number of the matrix being inverted to the sensitivity of the solution to perturbations is briefly reviewed. In addition, it is analytically shown that using shorter reshaping filters than conventionally used decreases the condition number of the matrix \(\mathbf {W}\hat {\mathbf {H}}\), hence increasing the robustness against RIR perturbations.
where it is assumed that χ_{ A }ξ<1.
The relation in (15) shows that a large condition number χ_{ A } can result (and typically does) in a large deviation between the true and the perturbed solution [37,38].
Notation for different reshaping filter lengths and the corresponding matrices
Variable | Denotes |
---|---|
\(L^{\mathrm {t}}_{g} = \left \lceil {\frac {L_{h}-1}{M-1}} \right \rceil \) | Reshaping filter length conventionally used in acoustic multi-channel equalization techniques |
\(\mathbf {W}_{\mathrm {t}}\hat {\mathbf {H}}_{\mathrm {t}}\) | Matrix when using the reshaping filter length \(L^{\mathrm {t}}_{g}\) |
p_{t}=L_{ h }+Lgt−1 | Number of rows in \(\mathbf {W}_{\mathrm {t}}\hat {\mathbf {H}}_{\mathrm {t}}\) |
q_{t}=MLgt≥p_{t} | Number of columns in \(\mathbf {W}_{\mathrm {t}}\hat {\mathbf {H}}_{\mathrm {t}}\) |
r_{t}≤p_{t} | Rank of \(\mathbf {W}_{\mathrm {t}}\hat {\mathbf {H}}_{\mathrm {t}}\) |
Lgs<Lgt | Reshaping filter length smaller than \(L^{\mathrm {t}}_{g}\) |
\(\mathbf {W}_{\mathrm {s}}\hat {\mathbf {H}}_{\mathrm {s}}\) | Matrix when using the reshaping filter length \(L^{\mathrm {s}}_{g}\) |
p_{s}=L_{ h }+Lgs−1 | Number of rows in \(\mathbf {W}_{\mathrm {s}}\hat {\mathbf {H}}_{\mathrm {s}}\) |
q_{s}=MLgs<p_{s} | Number of columns in \(\mathbf {W}_{\mathrm {s}}\hat {\mathbf {H}}_{\mathrm {s}}\) |
r_{s}=q_{s} | Rank of \(\mathbf {W}_{\mathrm {s}}\hat {\mathbf {H}}_{\mathrm {s}}\) |
we consider the following interlacing inequalities between the singular values of a matrix and its sub-matrices [39].
for i=1, …, min{u−l,v−l}.
Hence, using a shorter reshaping filter than conventionally used in equalization techniques can result (and based on simulations, it always does) in a lower condition number of the matrix being inverted.
Condition number of the matrix \(\mathbf {W}_{\mathrm {t}}\hat {\mathbf {H}}_{\mathrm {t}}\) (\(L^{\mathrm {t}}_{g} = 1947\)) and of two sub-matrices \(\mathbf {W}_{\mathrm {s}}\hat {\mathbf {H}}_{\mathrm {s}}\) (\(L^{\mathrm {s}}_{g} = 1000\) and \(L^{\mathrm {s}}_{g} = 300\)) for PMINT
Filter length | Condition number |
---|---|
\(L^{\mathrm {t}}_{g} = 1947\) | \(\chi _{\mathbf {W}_{\mathrm {t}}\hat {\mathbf {H}}_{\mathrm {t}}} = 1.65 \times 10^{7}\) |
\(L^{\mathrm {s}}_{g} = 1000\) | \(\chi _{\mathbf {W}_{\mathrm {s}}\hat {\mathbf {H}}_{\mathrm {s}}} = 4.91 \times 10^{3}\) |
\(L^{\mathrm {s}}_{g} = 300\) | \(\chi _{\mathbf {W}_{\mathrm {s}}\hat {\mathbf {H}}_{\mathrm {s}}} = 6.23 \times 10^{2}\) |
In summary, decreasing the reshaping filter length in acoustic multi-channel equalization techniques decreases the condition number of the matrix being inverted, increasing the robustness against RIR perturbations. However, decreasing the reshaping filter length also reduces the equalization performance with respect to the true RIRs, resulting in a trade-off between equalization performance for perfectly estimated RIRs and robustness in the presence of RIR perturbations. Using a shorter reshaping filter is not only desirable to increase the robustness against RIR perturbations, but also because of the lower computational complexity of the reshaping filter design.
5 Automatic non-intrusive reshaping filter length
The optimal reshaping filter length \(L^{\text {opt}}_{g}\) yielding the highest dereverberation performance obviously depends on the acoustic system and the RIR perturbation level. In simulations, \(L^{\text {opt}}_{g}\) can be intrusively determined by exploiting a clean reference signal. Reshaping filters for several reshaping filter lengths can be computed and applied to the received microphone signals such that different output signals are generated. The optimal reshaping filter length \(L^{\text {opt}}_{g}\) can then be selected by comparing the different output signals to the clean reference signal. Since one typically does not have access to the clean reference signal, an automatic non-intrusive procedure is required in practice.
Motivated by the simplicity and the applicability of the L-curve to automatically determine a regularization parameter in regularized (weighted) least-squares solutions [24,34,35], in this section, we propose to use the L-curve to automatically determine the reshaping filter length \(L^{\text {auto}}_{g}\) in acoustic multi-channel equalization techniques.
Using a shorter reshaping filter introduces a trade-off between the condition number \(\chi _{\mathbf {W}\hat {\mathbf {H}}}\) and the (weighted) least-squares error \(\|\mathbf {W}(\hat {\mathbf {H}}\mathbf {w} - \mathbf {c}_{\mathrm {d}})\|^{2}_{2}\). An appropriate filter length should incorporate knowledge about \(\chi _{\mathbf {W}\hat {\mathbf {H}}}\) and \(\|\mathbf {W}(\hat {\mathbf {H}}\mathbf {w} - \mathbf {c}_{\mathrm {d}})\|^{2}_{2}\), such that preferably both quantities are kept as small as possible. Due to the arising trade-off between these quantities, the parametric plot of the condition number versus the (weighted) least-squares error for several reshaping filter lengths has an L-shape. The corner of the L-curve, i.e., the point of maximum curvature, is located where the filter changes from being dominated by a large condition number to being dominated by a large (weighted) least-squares error. Hence, we propose to automatically select the reshaping filter length \(L_{g}^{auto}\) as the filter length corresponding to the corner of the parametric plot of the condition number \(\chi _{\mathbf {W}\hat {\mathbf {H}}}\) versus the (weighted) least-squares error \(\|\mathbf {W}(\hat {\mathbf {H}}\mathbf {w} - \mathbf {c}_{\mathrm {d}})\|^{2}_{2}\).
6 Simulation results and discussion
In this section, we investigate the influence of the reshaping filter length on the dereverberation performance of all considered acoustic multi-channel equalization techniques. In Section 6.1, the considered acoustic systems, instrumental performance measures, and algorithmic settings are introduced. In Section 6.2, the increase in robustness when using shorter reshaping filter lengths is investigated. In Section 6.3, the performance of all considered equalization techniques using the intrusively determined reshaping filter length \(L^{\text {opt}}_{g}\) is compared for several acoustic systems and RIR perturbation levels. In Section 6.4, the performance of PMINT using the automatic non-intrusively determined reshaping filter length \(L^{\text {auto}}_{g}\) is investigated.
6.1 Acoustic systems, instrumental performance measures, and algorithmic settings
Characteristics of the considered measured acoustic systems
Acoustic system | T_{60} [ms] | L _{ h } | iDRR [dB] |
---|---|---|---|
S_{1} | 730 | 5840 | −2.79 |
S_{2} | 610 | 4880 | −0.87 |
S_{3} | 360 | 2880 | 1.43 |
with −33 dB a moderate perturbation level and −15 dB a larger perturbation level. It should be noted that the NPMs in (25) represent realistic NPMs achieved by state-of-the-art BSI methods (for relatively short RIRs in the order of 300−500 taps) [30].
and the EDC of the RIR h_{1}(n) is computed similarly. The perceptual speech quality is evaluated using the frequency-weighted segmental signal-to-noise-ratio (fSNR) and the cepstral distance (CD) [43]. In [44], it has been shown that measures such as fSNR and CD can exhibit a high correlation with subjective listening tests when evaluating the overall quality and the perceived amount of reverberation for a wide range of state-of-the-art dereverberation (and noise reduction) techniques. These signal-based measures are intrusive measures, generating a similarity score between a test signal and a reference signal. The reference signal employed here is obtained by convolving the clean speech signal with the direct path and early reflections (considered to be 10 ms long) of h_{1}(n). The improvement in fSNR, i.e., ΔfSNR, is computed as the difference between the fSNR of the output signal z(n) and the fSNR of the first microphone signal x_{1}(n). Similarly, the improvement in CD, i.e., ΔCD, is computed as the difference between the CD of the output signal z(n) and the CD of the first microphone signal x_{1}(n). Note that a positive ΔfSNR and a negative ΔCD indicate a performance improvement.
Algorithmic settings. For the acoustic systems described in Table 4 and for all considered equalization techniques, the conventionally used filter length is \(L^{\mathrm {t}}_{g} =\left \lceil {\frac {L_{h}-1}{M-1}}\right \rceil \), i.e., \(L^{\mathrm {t}}_{g} = 1947\) for system S_{1}, \(L^{\mathrm {t}}_{g} = 1627\) for system S_{2}, and \(L^{\mathrm {t}}_{g} = 960\) for system S_{3}. The delay is set to τ=90 and the length of the direct path and early reflections is set to L_{ d }=0.01×f_{ s }, corresponding to 10 ms (cf. Table 1). The target EIR c_{d} for PMINT is constructed using the first RIR, i.e., p=1 (cf. Table 1).
For each acoustic system, each NPM, and each equalization technique, the optimal filter length \(L^{\text {opt}}_{g}\) is selected from (28) as the filter length yielding the lowest CD. It should be noted that using the CD for determining the optimal reshaping filter length is an intrusive procedure which cannot be applied in practice, since knowledge of the clean reference signal is required. In Section 6.4, the performance when using the automatic non-intrusive procedure for selecting the reshaping filter length is investigated.
6.2 Increasing robustness using shorter reshaping filters
Optimal intrusively determined reshaping filter length \(L^{\text {opt}}_{g}\) for MINT, RMCLS, and PMINT for acoustic system S_{1} and all considered NPMs
NPM [dB] | MINT | RMCLS | PMINT |
---|---|---|---|
−33 | 1140 | 1200 | 1170 |
−30 | 1200 | 1200 | 1230 |
−27 | 930 | 1140 | 1200 |
−24 | 1050 | 1020 | 1050 |
−21 | 870 | 840 | 900 |
−18 | 780 | 780 | 900 |
−15 | 510 | 660 | 510 |
In summary, as expected from the theoretical analysis in Section 4, these simulation results demonstrate that using an optimal intrusively determined shorter reshaping filter length than conventionally used in MINT is advantageous to increase the robustness against RIR perturbations. However, since acoustic system inversion using MINT is very sensitive to RIR perturbations, these results indicate that even a shorter reshaping filter is not sufficient to make MINT robust enough against RIR perturbations.
In summary, as expected from the theoretical analysis in Section 4, these simulation results demonstrate that using an optimal intrusively determined shorter reshaping filter length than conventionally used in RMCLS is advantageous and increases the robustness against RIR perturbations.
In summary, as expected from the theoretical analysis in Section 4, these simulation results demonstrate that using an optimal intrusively determined shorter reshaping filter length than conventionally used in PMINT results in a significant increase in robustness against RIR perturbations, both in terms of reverberant energy suppression and perceptual speech quality improvement.
6.3 Performance of equalization techniques when using the optimal intrusive reshaping filter length
In the previous section, it was shown that using a shorter reshaping filter than conventionally used increases the robustness of all considered equalization techniques against RIR perturbations. In this section, the performance of MINT, RMCLS, and PMINT using the optimal intrusively determined reshaping filter length \(L^{\text {opt}}_{g}\) is extensively compared for all acoustic systems in Table 4 and all NPMs in (25). The performance of the different techniques is evaluated in terms of ΔDRR, ΔfSNR, and ΔCD, where the presented performance measures are averaged over all considered NPMs.
Table 6 presents the obtained ΔDRR, ΔfSNR, and ΔCD values for all considered techniques^{1}. First, it can be observed that MINT using \(L^{\text {opt}}_{g}\) results in the lowest performance in terms of all performance measures, often worsening the perceptual speech quality in comparison to the unprocessed microphone signal x_{1}(n). Since MINT is very sensitive to RIR perturbations (cf. Fig. 5), the robustness increase that can be obtained by using a shorter reshaping filter length is also limited. Second, it can be observed that RMCLS and PMINT using \(L^{\text {opt}}_{g}\) result in a high reverberant energy suppression in terms of ΔDRR, with PMINT outperforming RMCLS for systems S_{2} and S_{3} whereas a similar performance is obtained for system S_{1}. Finally, it can be observed that for all considered acoustic systems, PMINT using the reshaping filter length \(L^{\text {opt}}_{g}\) yields the highest perceptual speech quality improvement, outperforming RMCLS in terms of ΔfSNR and ΔCD. While PMINT always improves the perceptual speech quality in comparison to the unprocessed microphone signal x_{1}(n), RMCLS sometimes fails to yield an improvement, as indicated by the negative ΔfSNR for systems S_{2} and S_{3}. The advantage of PMINT lies in its control of the early reflections in the EIR, hence better preserving the perceptual speech quality of the output signal.
In summary, based on instrumental measures, it can be said that PMINT using the optimal intrusively determined reshaping filter length \(L^{\text {opt}}_{g}\) is a robust and perceptually advantageous equalization technique, yielding a high reverberant energy suppression and outperforming all other considered equalization techniques in terms of perceptual speech quality. Informal listening tests further support this conclusion.
6.4 Performance of PMINT when using the automatic non-intrusive reshaping filter length
In this section, we investigate the performance of PMINT when using the automatic non-intrusively determined reshaping filter length \(L^{\text {auto}}_{g}\) (cf. Section 5) instead of the optimal intrusively determined reshaping filter length \(L^{\text {opt}}_{g}\). For completeness, the obtained values of \(L^{\text {auto}}_{g}\) are also compared to the values of \(L^{\text {opt}}_{g}\). Similarly as in Section 6.3, we consider all acoustic systems in Table 4 and all NPMs in (25). In order to generate the parametric L-curve, the matrix \(\mathbf {W}\hat {\mathbf {H}}\) is constructed for all reshaping filter lengths in (28), the PMINT reshaping filter is computed, and the quantities \(\chi _{\mathbf {W}\hat {\mathbf {H}}}\) and \(\|\mathbf {W}(\hat {\mathbf {H}}\mathbf {w}-\mathbf {c}_{\mathrm {d}}) \|^{2}_{2}\) are calculated. Using the triangle method [36], the automatic reshaping filter length \(L^{\text {auto}}_{g}\) corresponding to the point of maximum curvature of the L-curve is determined. The performance of PMINT using \(L^{\text {auto}}_{g}\) is evaluated in terms of ΔDRR, ΔfSNR, and ΔCD, where the presented performance measures are averaged over all considered NPMs.
Table 7 presents the values of \(L^{\text {opt}}_{g}\) and \(L^{\text {auto}}_{g}\) for the acoustic system S_{1} and all considered NPMs^{2}. It can be observed that for low NPMs, the non-intrusively determined reshaping filter length is very similar to the optimal intrusively determined one. As the NPM increases beyond −21 dB, the reshaping filter length obtained using the proposed non-intrusive procedure is larger than the optimal intrusively determined one.
Average performance of PMINT using the automatically non-intrusively determined reshaping filter length \(L^{\text {auto}}_{g}\)
S _{1} | S _{2} | S _{3} | |||||||
---|---|---|---|---|---|---|---|---|---|
ΔDRR [dB] | ΔfSNR [dB] | ΔCD [dB] | ΔDRR [dB] | ΔfSNR [dB] | ΔCD [dB] | ΔDRR [dB] | ΔfSNR [dB] | ΔCD [dB] | |
\(L^{\text {auto}}_{w}\)-PMINT | 6.68 | 7.90 | −1.68 | 3.41 | 1.62 | −0.33 | 1.97 | 0.14 | −0.29 |
In summary, the presented results show that the automatic non-intrusively determined reshaping filter length in PMINT yields a high performance in the presence of RIR perturbations, making PMINT when using this shorter reshaping filter length a robust and perceptually advantageous acoustic multi-channel equalization technique.
7 Conclusions
In this paper, we have analyzed the use of a shorter reshaping filter length than conventionally used in order to increase the robustness of acoustic multi-channel equalization techniques. We have analytically shown that using a shorter reshaping filter decreases the condition number of the (weighted) convolution matrix, increasing as a result the robustness against RIR perturbations. In addition, we have proposed to automatically determine the reshaping filter length as the point of maximum curvature of the parametric plot of the condition number versus the (weighted) least-squares error, such that both quantities are simultaneously kept small. Using instrumental performance measures, it has been shown that using shorter reshaping filters indeed increases the robustness of MINT, RMCLS, and PMINT against RIR perturbations. In addition, it has been shown that PMINT using the optimal intrusively determined reshaping filter length outperforms MINT and RMCLS. Finally, it has been shown that the automatic non-intrusive procedure for selecting the reshaping filter length in PMINT yields a nearly optimal performance, confirming the practical applicability of using shorter reshaping filters in acoustic multi-channel equalization.
8 Appendix A
Footnotes
- 1.It should be noted that the performance measures presented for system S_{1} are an average of the results already presented in Section 6.2.Table 6
Average performance of MINT, RMCLS, and PMINT using the optimal intrusively determined reshaping filter length \(L^{\text {opt}}_{g}\)
S _{1}
S _{2}
S _{3}
ΔDRR [dB]
ΔfSNR [dB]
ΔCD [dB]
ΔDRR [dB]
ΔfSNR [dB]
ΔCD [dB]
ΔDRR [dB]
ΔfSNR [dB]
ΔCD [dB]
\(L^{\text {opt}}_{g}\)-MINT
4.41
−0.55
−1.31
1.66
−2.07
0.07
1.20
−3.89
−0.22
\(L^{\text {opt}}_{g}\)-RMCLS
6.75
3.53
−1.77
1.76
−0.81
−0.39
1.31
−0.61
−0.52
\(L^{\text {opt}}_{g}\)-PMINT
6.98
8.65
-1.78
4.42
2.58
-0.66
2.40
1.88
-0.57
- 2.It should be noted that presented \(L^{\text {opt}}_{g}\) values are the same as the ones presented in Table 5.Table 7
Intrusively and non-intrusively determined reshaping filter lengths for PMINT for acoustic system S_{1} and all considered NPMs
NPM [dB]
−33
−30
−27
−24
−21
−18
−15
\(L^{\text {opt}}_{g}\)
1170
1230
1200
1050
900
900
510
\(L^{\text {auto}}_{g}\)
1200
1170
1170
1230
1230
1170
1170
Notes
Funding
This work was supported in part by the Cluster of Excellence 1077 “Hearing4All,” funded by the German Research Foundation (DFG) and the joint Lower Saxony-Israeli Project ATHENA, funded by the State of Lower Saxony.
Authors’ contributions
The contribution of the first author consists in developing the main algorithmic idea, deriving the mathematical analysis, performing simulations, analyzing the simulation results, and drafting the article. The contribution of the second author consists in critically discussing the mathematical analysis and the simulation results and in proofreading and revising the article. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.JS Bradley, H Sato, M Picard, On the importance of early reflections for speech in rooms. J. Acoust. Soc. Am.113(6), 3233–3244 (2003).CrossRefGoogle Scholar
- 2.I Arweiler, JM Buchholz, The influence of spectral characteristics of early reflections on speech intelligibility. J. Acoust. Soc. Am.130(2), 996–1005 (2011).CrossRefGoogle Scholar
- 3.A Warzybok, J Rennies, T Brand, S Doclo, B Kollmeier, Effects of spatial and temporal integration of a single early reflection on speech intelligibility. J. Acoust. Soc. Am.133(1), 269–282 (2013).CrossRefGoogle Scholar
- 4.R Beutelmann, T Brand, Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. J. Acoust. Soc. Am.120(1), 331–342 (2006).CrossRefGoogle Scholar
- 5.S Goetze, E Albertin, J Rennies, EAP Habets, K-D Kammeyer, in Proc. AES International Conference on Sound Quality Evaluation. Speech quality assessment for listening-room compensation (Pitea, 2010), pp. 11–20.Google Scholar
- 6.A Warzybok, I Kodrasi, JO Jungmann, EAP Habets, T Gerkmann, A Mertins, S Doclo, B Kollmeier, S Goetze, in Proc. International Workshop on Acoustic Echo and Noise Control. Subjective speech quality and speech intelligibility evaluation of single-channel dereverberation algorithms (Antibes, 2014), pp. 333–337.Google Scholar
- 7.T Yoshioka, A Sehr, M Delcroix, K Kinoshita, R Maas, T Nakatani, W Kellermann, Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition. IEEE Signal Proc. Mag.29(6), 114–126 (2012).CrossRefGoogle Scholar
- 8.F Xiong, BT Meyer, N Moritz, R Rehr, J Anemüller, T Gerkmann, S Doclo, S Goetze, Front-end technologies for robust ASR in reverberant environments–spectral enhancement-based dereverberation and auditory modulation filterbank features. EURASIP J. Adv. Signal Process.2015(1), 1–18 (2015).CrossRefGoogle Scholar
- 9.PA Naylor, ND Gaubitch (eds.), Speech Dereverberation (Springer, London, 2010).Google Scholar
- 10.K Lebart, JM Boucher, A new method based on spectral subtraction for speech dereverberation. Acta. Acoustica. 87(3), 359–366 (2001).Google Scholar
- 11.EAP Habets, S Gannot, I Cohen, Late reverberant spectral variance estimation based on a statistical model. IEEE Sig. Process Lett.16(9), 770–774 (2009).CrossRefGoogle Scholar
- 12.A Kuklasiński, S Doclo, SH Jensen, J Jensen, Maximum likelihood PSD estimation for speech enhancement in reverberation and noise. IEEE/ACM Trans. Audio Speech Lang. Process.24(9), 1595–1608 (2016).Google Scholar
- 13.I Kodrasi, S Doclo, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing. Late reverberant power spectral density estimation based on an eigenvalue decomposition (New Orleans, 2017), pp. 611–615.Google Scholar
- 14.I Kodrasi, S Doclo, in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Multi-channel late reverberation power spectral density estimation based on nuclear norm minimization (New York, 2017). (accepted for publication).Google Scholar
- 15.T Nakatani, T Yoshioka, K Kinoshita, M Miyoshi, B-H Juang, Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Trans. Audio Speech Lang. Process.18(7), 1717–1731 (2010).CrossRefGoogle Scholar
- 16.D Schmid, G Enzner, S Malik, D Kolossa, R Martin, Variational Bayesian inference for multichannel dereverberation and noise reduction. IEEE/ACM Trans. Audio Speech Lang. Process.22(8), 1320–1335 (2014).CrossRefGoogle Scholar
- 17.B Schwartz, S Gannot, EAP Habets, Online speech dereverberation using Kalman filter and EM algorithm. IEEE/ACM Trans. Audio Speech Lang. Process.23(2), 394–406 (2015).CrossRefGoogle Scholar
- 18.A Jukić, T Van Waterschoot, T Gerkmann, S Doclo, Multi-channel linear prediction-based speech dereverberation with sparse priors. IEEE/ACM Trans. Audio Speech Lang. Process.23(9), 1509–1520 (2015).CrossRefGoogle Scholar
- 19.M Miyoshi, Y Kaneda, Inverse filtering of room acoustics. IEEE Trans. Acoust. Speech Signal Process.36(2), 145–152 (1988).CrossRefGoogle Scholar
- 20.M Kallinger, A Mertins, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing. Multi-channel room impulse response shaping - a study (Toulouse, 2006), pp. 101–104.Google Scholar
- 21.JO Jungmann, R Mazur, M Kallinger, M Tiemin, A Mertins, Combined acoustic MIMO channel crosstalk cancellation and room impulse response reshaping. IEEE Trans. Audio Speech Lang. Process.20(6), 1829–1842 (2012).CrossRefGoogle Scholar
- 22.T Hikichi, M Delcroix, M Miyoshi, Inverse filtering for speech dereverberation less sensitive to noise and room transfer function fluctuations. EURASIP J. Adv. Signal Process.2007: (2007).Google Scholar
- 23.F Lim, W Zhang, EAP Habets, PA Naylor, Robust multichannel dereverberation using relaxed multichannel least squares. IEEE/ACM Trans. Audio Speech Lang. Process.22(9), 1379–1390 (2014).CrossRefGoogle Scholar
- 24.I Kodrasi, S Goetze, S Doclo, Regularization for partial multichannel equalization for speech dereverberation. IEEE Trans. Audio Speech Lang. Process.21(9), 1879–1890 (2013).CrossRefGoogle Scholar
- 25.RS Rashobh, AWH Khong, D Liu, Multichannel equalization in the KLT and frequency domains with application to speech dereverberation. IEEE/ACM Trans. Audio Speech Lang. Process.22(3), 634–646 (2014).CrossRefGoogle Scholar
- 26.I Kodrasi, S Doclo, Signal-dependent penalty functions for robust acoustic multi-channel equalization. IEEE Trans. Audio Speech Lang. Process.25(7), 1512–1525 (2017).CrossRefGoogle Scholar
- 27.BD Radlovic, RC Williamson, RA Kennedy, Equalization in an acoustic reverberant environment: robustness results. IEEE Trans. Speech Audio Process.8(3), 311–319 (2000).CrossRefGoogle Scholar
- 28.AWH Khong, L Xiang, PA Naylor, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing. Algorithms for identifying clusters of near-common zeros in multichannel blind system identification and equalization (Las Vegas, 2008), pp. 229–232.Google Scholar
- 29.MA Haque, T Hasan, Noise robust multichannel frequency-domain LMS algorithms for blind channel identification. IEEE Signal Process. Lett.15:, 305–308 (2008).CrossRefGoogle Scholar
- 30.M Hu, ND Gaubitch, PA Naylor, DB Ward, in Proc. European Signal Processing Conference. Noise robust blind system identification algorithms based on a Rayleigh quotient cost function (Nice, 2015).Google Scholar
- 31.ND Gaubitch, PA Naylor, Equalization of multichannel acoustic systems in oversampled subbands. IEEE Trans. Audio Speech Lang. Process.17(6), 1061–1070 (2009).CrossRefGoogle Scholar
- 32.F Lim, PA Naylor, in Proc. European Signal Processing Conference. Robust speech dereverberation using subband multichannel least squares with variable relaxation (Marrakech, 2013).Google Scholar
- 33.I Kodrasi, S Doclo, in Proc. European Signal Processing Conference. The effect of inverse filter length on the robustness of acoustic multichannel equalization (Bucharest, 2012).Google Scholar
- 34.PC Hansen, Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev. 34(4), 561–580 (1992).MathSciNetCrossRefMATHGoogle Scholar
- 35.PC Hansen, DP O’Leary, The use of the L-curve in the regularization of discrete ill-posed problems. SIAM J. Sci. Comput.14(6), 1487–1503 (1993).MathSciNetCrossRefMATHGoogle Scholar
- 36.JL Castellanos, S Gómez, V Guerra, The triangle method for finding the corner of the L-curve. Appl. Numer. Math.43(4), 359–373 (2002).MathSciNetCrossRefMATHGoogle Scholar
- 37.P Wedin, Perturbation theory for pseudo-inverses. BIT Numer. Math.13(2), 217–232 (1973).MathSciNetCrossRefMATHGoogle Scholar
- 38.G Golub, C Van Loan, Matrix Computations (The John Hopkins University Press, Baltimore, 1996).MATHGoogle Scholar
- 39.RA Horn, CR Johnson, Topics in matrix analysis (Cambridge University Press, Cambridge, 1999).MATHGoogle Scholar
- 40.A Farina, in Proc. AES Convention. Simultaneous measurement of impulse response and distortion with a swept-sine technique (Pitea, 2000), pp. 18–22.Google Scholar
- 41.M Nilsson, SD Soli, A Sullivan, Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. J. Acoust. Soc. Am.95(2), 1085–1099 (1994).CrossRefGoogle Scholar
- 42.W Zhang, PA Naylor, An algorithm to generate representations of system identification errors. Res. Lett. Signal Process. 2008: (2008).Google Scholar
- 43.Y Hu, PC Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process.16(1), 229–238 (2008).CrossRefGoogle Scholar
- 44.K Kinoshita, M Delcroix, S Gannot, EAP Habets, R Haeb-Umbach, W Kellermann, V Leutnant, R Maas, T Nakatani, B Raj, A Sehr, T Yoshioka, A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research. EURASIP J. Adv. Signal Process.2016(1), 1–19 (2016).CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.