# Non-unitary matrix joint diagonalization for complex independent vector analysis

- 2.1k Downloads
- 1 Citations

**Part of the following topical collections:**

## Abstract

Independent vector analysis (IVA) is a special form of independent component analysis (ICA), which has demonstrated its prominent performance in solving convolutive blind source separation (BSS) problems in the frequency domain. Most IVA algorithms are based on optimizing certain contrast functions, where the main difficulty of these approaches lies in finding a reliable and fast estimation of the unknown distribution of sources. Despite the rich availability of efficient tensorial approaches to the standard ICA problem, these methods have not been explored considerably for IVA. In this article, we propose a matrix joint diagonalization approach to solve the complex IVA problem. The new factorization neither relies on a whitening process, nor does it require an estimate of the joint probability distribution of the dependent signal groups. The latter is in contrast to most IVA approaches up to date. The underlying geometry of the problem is investigated together with a critical point analysis of the resulting cost function. A conjugate gradient algorithm on the appropriate manifold setting is developed.

### Keywords

Conjugate Gradient Independent Component Analysis Conjugate Gradient Method Independent Component Analysis Parallel Transport## 1 Introduction

Independent component analysis (ICA) is a standard statistical tool for solving the blind source separation (BSS) problem. BSS aims to recover source signals from the observed mixtures, without knowing either the distribution of the sources or the mixing process. Application of the standard ICA model is often limited, since it requires mutual statistical independence between all individual components. However, in many applications, there exist groups of signals of interest, where components from different groups are *mutually statistically independent* indeed, but where *mutual statistical dependence* occurs between components in the same group. Such problems can be tackled by a technique now referred to as multidimensional independent component analysis (MICA) [1], or independent subspace analysis (ISA) [2].

A special form of ISA arises in solving the BSS problem with convolutive mixtures [3]. After transferring the convolutive observations into the frequency domain via short-time Fourier transforms, the convolutive BSS problem results in a collection of instantaneous complex BSS problems in each frequency bin. After solving the subproblems individually, the final stage faces the challenge of aligning all statistically dependent components from different groups, which is referred to as the *permutation problem*. To avoid this problem, a new approach named independent vector analysis (IVA) has been proposed in [4]. Besides its application in convolutive BSS problem, IVA has also recently been applied to analyze multivariate Gaussian models, cf. [5, 6]. In the current literature, the majority of IVA algorithms are based on optimizing certain contrast functions, cf. [5, 7, 8, 9]. The main difficulty of these contrast function based approaches lies in estimating the unknown distribution of the sources, which usually requires a large number of observations [10].

On the other hand, tensorial approaches are efficient and richly available to solve both the ICA and ISA problems. In particular, joint block diagonalization approaches are shown to be effective methods for solving the ISA problem, cf. [11, 12], and are inherently applicable to IVA. However, such general joint block diagonalization approaches do not take the intrinsic structure of the IVA problem into account. Recent study in [13] proposes a joint diagonalization approach of cross cumulant matrices to solve the complex IVA problem. More recently, the present authors have developed a similar approach of jointly diagonalizing both cross covariance and cross pseudo covariance matrices, cf. [14]. In this article, we extend the previous study in [14], and adapt the so-called complex oblique projective (COP) manifold, which has proven to be an appropriate setting for the standard instantaneous complex ICA problem [15], to the current scenario. Finally, an efficient conjugate gradient (CG) based IVA algorithm is proposed, and numerical experiments are provided to demonstrate the convergence properties of the proposed CG algorithm, and to compare its performance with two recently developed IVA algorithms in terms of separation quality.

## 2 Notations

Throughout the article, (·)^{⊤} denotes the matrix transpose, (·)^{H} the Hermitian transpose, $\overline{(\xb7)}$ the entry-wise complex conjugate of a matrix, and by *Gl*(*m*) the set of all *m*×*m* invertible complex matrices. The Frobenius norm of a matrix $A\in {\mathbb{C}}^{m\times n}$ is denoted by $\parallel A{\parallel}_{F}:=\sqrt{tr\left(A{A}^{\mathsf{H}}\right)}$, where *tr*(·) is the trace of a square matrix. Given a square matrix $Z\in {\mathbb{C}}^{m\times m}$, ddiag(*Z*)forms a diagonal matrix whose diagonal entries are those of *Z*, and off(*Z*) generates a matrix by setting all diagonal entries of *Z* to zero, i.e. off(*Z*):=*Z*−ddiag(*Z*).

In this study, we consider an *m*-dimensional complex signal $s\left(t\right)={\left[{s}_{1}\right(t),\dots ,{s}_{m}(t\left)\right]}^{\top}\in {\mathbb{C}}^{m}$ as an *m*-dimensional complex stochastic process indexed by the variable *t*. The empirical expectation of a random variable *s* is denoted by $\mathbb{E}\left[s\right(t\left)\right]=\frac{1}{T}\sum _{t=1}^{T}s\left(t\right)$, where *T* is the number of samples. As usual for the standard ICA model, we assume without loss of generality that $\mathbb{E}\left[s\right(t\left)\right]=0$. The empirical covariance and pseudo-covariance matrix of complex signals *s*(*t*) are referred to as $\text{cov}\left(s\right(t\left)\right):=\mathbb{E}\left[s\right(t\left)s{\left(t\right)}^{\mathsf{H}}\right]$ and $\text{pcov}\left(s\right(t\left)\right):=\mathbb{E}\left[s\right(t\left)s{\left(t\right)}^{\top}\right]$, respectively.

## 3 Problem description

*t*. Let ${w}_{i}(t,f)\in \mathbb{C}$ and ${s}_{i}(t,f)\in \mathbb{C}$ denote the coefficient of the center frequency

*f*of the

*i*th observation

*w*

_{ i }(

*t*) and the

*i*th source signal

*s*

_{ i }(

*t*), respectively. Then, for a given pair (

*t,*

*f*), the Fourier coefficients of the observations and the sources obey the equality

*f*, we get a standard instantaneous complex BSS problem as

where $W\left(f\right)\in \left[w\right(1,f),\dots ,w(T,f\left)\right]\in {\mathbb{C}}^{m\times T}$ and $S\left(f\right)\in \left[s\right(1,f),\dots ,s(T,f\left)\right]\in {\mathbb{C}}^{m\times T}$, with *T* being the number of chosen time frames. One popular approach to solve the convolutive BSS problem is to solve the individual instantaneous BSS problem at each frequency (2), and then assemble the results from each frequency to reconstruct the estimated signal in the time domain [18].

*S*(

*f*)by ${s}_{i}\left(f\right)=\left[{s}_{i}\right(1,f),\dots ,{s}_{i}(T,f\left)\right]\in {\mathbb{C}}^{1\times T}$ for

*i*= 1,…,

*m*. Following the assumption of statistical independence between the sources, the complex valued signals

*s*

_{ i }(

*f*) and

*s*

_{ j }(

*f*) are

*statistically independent*for

*i*≠

*j*. In contrast, we assume that for a pair of frequencies (

*f*

_{ p },

*f*

_{ q })with

*f*

_{ p }≠

*f*

_{ q }, the complex signals

*s*

_{ i }(

*f*

_{ p }) and

*s*

_{ i }(

*f*

_{ q }) are

*statistically dependent*for a given source. The development of IVA is inspired by this cross frequency structure. It aims to find a set of demixing matrices $\left\{{X}_{f}\right\}\subset Gl\left(m\right)$ via

- (1)
all sub-ICA problems are solved, and

- (2)
the statistical alignment between groups is restored, i.e. the estimated

*i*th signals {*y*_{ i }(*f*)} are mutually statistically dependent.

*pseudo cross covariance*, defined as

also allows to gain additional information about the second-order statistics of the involved signals. In this study, we assume that cross covariances between sources in all groups do not vanish. The assumption of statistical independence between the source signals implies that the cross covariance matrix cov(*S*(*f*_{ i }),*S*(*f*_{ j })) and the pseudo cross covariance matrix pcov(*S*(*f*_{ i }),*S*(*f*_{ j }))are diagonal for all pairs (*i* *j*). With a further assumption on the sources being non-stationary, which has been exploited in [19], we arrive at a problem of jointly diagonalizing two sets of cross covariance and pseudo cross covariance matrices at different time instances.

*k*subproblems, we consider the cross covariance and pseudo cross covariance matrices at

*n*time instances, i.e. for all

*i*,

*j*=1,…,

*k*and

*t*=1,…,

*n*, a set of matrices ${\left\{{C}_{ij}^{\left(t\right)}\right\}}_{i<j}$ and a set of complex symmetric matrices ${\left\{{R}_{ij}^{\left(t\right)}\right\}}_{i<j}$, which are constructed by

for all *i* < *j*and *t* = 1,…,*n*, are simultaneously, or approximately simultaneously diagonalized. In this study, we study the noise free IVA problem as defined in (2), and neglect the cross covariance matrix estimation errors due to the finite sample size effect. In other words, we assume that both sets of ${C}_{ij}^{\left(t\right)}$’s and ${R}_{ij}^{\left(t\right)}$’s are jointly diagonalizable.

Note that the above problem is similar to the simultaneous SVD formulation proposed in [20], where only the situation with two transform matrices is studied, i.e. *k* = 2. To the contrary, our current setting deals with the cases of multiple transform matrices {*X*_{ i }}_{i=1,…,k}, which are not restricted to be unitary. Finally, instead of considering second order cross covariance matrices, our developed approach can be generalized to the high order cross cumulants. We refer to [17] for further details.

## 4 Diagonality measure and the COP manifold

Our cost function to tackle problem (7) originates from the popular off-norm function that measures the squared Frobenius norm of the off-diagonal entries of the involved matrices. We develop an appropriate mathematical setting on the subsequently defined *complex oblique projective* (COP) manifold to provide its critical point analysis.

### 4.1 Derivation of the cost function

For legibility reasons, from now on, we only consider the problem of simultaneously diagonalizing the covariance matrices, i.e. the first condition in (7). The combination with the additional requirement that also the pseudo cross covariance matrices may be used for estimating the demixing matrix is straightforwardly adapted to our setting and not further discussed here.

*g*, that is $g({X}_{1}^{\ast},\dots ,{X}_{k}^{\ast})=0$. It is clear that a minimization approach without further constraints on the

*X*

_{ i }would drive all diagonalizers to zero. In order to avoid such trivial solutions and to regularize the minimization problem, the authors in [21] propose to restrict all columns of transform matrices to have unit norm. This set is known as the

*oblique manifold*, which has been shown to be an appropriate setting for matrix diagonalization, cf. [22]. Its complex counterpart is the so-called

*complex oblique manifold*

*Ob*

^{ k }(

*m*) the product manifold of

*k*copies of

*Ob*(

*m*). The restriction of the off-norm cost function (8) is denoted by

*p*th column of

*X*

_{ i }by

*x*

_{ ip }. It is obvious that the function

*g*

_{1}is invariant with respect to the phase difference of each column

*x*

_{ ip }, which reflects the well-known scaling ambiguity of complex ICA problems. By a further calculation,

*g*

_{1}has the form

*x*

_{ ip }, in this study we employ an elegant mathematical setting for the problem. Recall the fact that each ${x}_{ip}{x}_{ip}^{\mathsf{H}}$ defines a Hermitian rank-one projector, the set of which identifies the (

*m*− 1)- dimensional

*complex projective space*${\mathbb{C}\mathbb{P}}^{m-1}$, i.e.

*X*

_{ i }form a complex basis (i.e. invertibility of

*X*

_{ i }), we naturally arrive at the following set, which we refer to as the

*complex oblique projective (COP) manifold*,

*g*

_{1}now induces the following function

*g*

_{2}on the COP manifold. Namely, if ${\mathcal{Q}}^{k}\left(m\right)$ denotes the

*k*-times product of $\mathcal{Q}\left(m\right)$,

*g*

_{2}is given by

with **P**_{ i }:=(*P*_{i 1},…,*P*_{ im }).

### 4.2 The geometry of the complex oblique projective manifold

*m*-fold product of Hermitian matrices. With this, $\mathcal{Q}\left(m\right)$ is an open and dense Riemannian submanifold of the

*m*-times product of ${\mathbb{C}\mathbb{P}}^{m-1}$ with the standard metric, i.e.

where $\overline{\mathcal{Q}\left(m\right)}$ denotes the closure of $\mathcal{Q}\left(m\right)$. Accordingly, the tangent spaces, the geodesics, and the parallel transport for $\mathcal{Q}\left(m\right)$ and ${\left({\mathbb{C}\mathbb{P}}^{m-1}\right)}^{m}$ coincide locally and thus are easily derived from the geometry of ${\mathbb{C}\mathbb{P}}^{m-1}$. We refer to [24] for further discussions and details about ${\mathbb{C}\mathbb{P}}^{m-1}$.

*P*in ${\mathbb{C}\mathbb{P}}^{m-1}$ is given by

*A*,

*B*]:=

*AB*−

*BA*is the matrix commutator. Then, the tangent space at $\mathbf{P}=({P}_{1},\dots ,{P}_{m})\in \mathcal{Q}\left(m\right)$ is simply the Cartesian product

^{(·)}denotes the matrix exponential. Thus, the (local

^{a}) geodesic through $\mathbf{P}\in \mathcal{Q}\left(m\right)$ in direction $\mathbf{Z}:=({Z}_{1},\dots ,{Z}_{m})\in {T}_{\mathbf{P}}\mathcal{Q}\left(m\right)$ is given by

*γ*

_{P,Z}(

*t*) is

*τ*

_{P,Z}being the parallel transport of $\Psi \in {T}_{P}{\mathbb{C}\mathbb{P}}^{m-1}$ with respect to the Levi-Civita connection along the geodesic

*γ*

_{P,Z}(

*t*), i.e.

*natural*or

*Riemannian*gradient of a function that is the restriction of some globally defined function to a sub-manifold is simply the orthogonal projection of the Euclidean gradient onto the corresponding tangent space. For the complex projective space, this projection is given by

It is easily seen that the operator *π*_{ P } is an orthogonal projector on the tangent space ${T}_{p}{\mathbb{C}\mathbb{P}}^{m-1}$, i.e. that *π*_{ P }∘*π*_{ P }(*A*)=*π*_{ P }(*A*) and that the null space of *π*_{ P } is orthogonal to its image. Here, ∘denotes the composition of functions. The formulas for the tangent spaces, the geodesics, the parallel transport, and the projection onto the tangent spaces of ${\mathcal{Q}}^{k}\left(m\right)$ follow directly from the product manifold structure.

## 5 Critical point analysis of the cost function

In this section, we conduct a critical point analysis of the cost function *g*_{2} on the product COP manifold. We show that the joint diagonalizers are a non-degenerate global minimum of *g*_{2}, This is an important fact, since in many cases the speed of convergence relies on the non-degeneracy of the minima. First of all, we present a lemma which originates from the derivation of the cost from the off-norm function.

### Lemma 1

*g*

_{1}, as defined in (10), i.e. ${X}_{i}^{\ast \mathsf{H}}{C}_{ij}^{\left(t\right)}{X}_{j}^{\ast}={D}_{ij}^{\left(t\right)}=\text{diag}({d}_{ij1}^{\left(t\right)},\dots ,{d}_{\mathrm{ijm}}^{\left(t\right)})$ being diagonal for all

*t*=1,…,

*n*and

*i*,

*j*=1,…,

*k*, then the set of corresponding Hermitian projectors ${\mathcal{P}}^{\ast}=({\mathbf{P}}_{1}^{\ast},\dots ,{\mathbf{P}}_{k}^{\ast})\in {\mathcal{Q}}^{k}\left(m\right)$ with ${P}_{\mathrm{ip}}^{\ast}:={x}_{\mathrm{ip}}^{\ast}{x}_{\mathrm{ip}}^{\ast \mathsf{H}}\in {\mathbb{C}\mathbb{P}}^{m-1}$ minimizes the cost function

*g*

_{2}, defined in (14) and

*p*,

*q*)th entry is computed as

*g*

_{2}at $\mathcal{P}\in {\mathcal{Q}}^{k}\left(m\right)$ in direction $\mathcal{Z}:=({\mathbf{Z}}_{1},\dots ,{\mathbf{Z}}_{k})\in {T}_{\mathcal{P}}{\mathcal{Q}}^{k}\left(m\right)$ as

By recalling the structure of the tangent space of ${\mathcal{Q}}^{k}\left(m\right)$ and the result in Lemma 1, it is trivial to see that the first derivative of *g*_{2} vanishes at ${\mathcal{P}}^{\ast}$, which corresponds to the correct joint diagonalizers.

*g*

_{2}at the joint diagonalizers. To that end, we denote by $\mathbb{o}\mathbb{f}\mathbb{f}\left(m\right):=\{Z\in {\mathbb{C}}^{m\times m}\phantom{\rule{1em}{0ex}}|\phantom{\rule{1em}{0ex}}{z}_{\mathrm{ii}}=0,$

*i*=1,…,

*m*}the set of matrices with zero diagonal. Let

*Π*be the natural projection

*μ*

_{ X }be defined as

*e*

_{ j }denotes the

*j*th standard basis vector. Note, that

*μ*

_{ X }defines a locally injective but not bijective mapping. The composition of

*Π*and

*μ*

_{ X }, however, yields a local diffeomorphism. With the shorthand notation

**P**:=

*Π*(

*X*), the mapping

**P**and thus permits a local parameterization of ${\mathcal{Q}}^{k}\left(m\right)$ via

*x*

_{ ip }. Let $\mathcal{Z}:=T{\Phi}_{{\mathcal{P}}^{\ast}}(\Theta )\in {T}_{{\mathcal{P}}^{\ast}}{\mathcal{Q}}^{k}\left(m\right)$. Then, we can compute the Hessian of

*g*

_{2}at the critical points ${\mathcal{P}}^{\ast}\in {\mathcal{Q}}^{k}\left(m\right)$, i.e. the symmetric bilinear form ${\mathsf{H}}_{{g}_{2}}\phantom{\rule{0.3em}{0ex}}\left({\mathcal{P}}^{\ast}\right):{T}_{{\mathcal{P}}^{\ast}}{\mathcal{Q}}^{k}\left(m\right)\times {T}_{{\mathcal{P}}^{\ast}}{\mathcal{Q}}^{k}\left(m\right)\to \mathbb{R}$ via

*if and only if*all (2×2)-matrices

are positive definite. Since this is a generic assumption on the data, we have the following result.

### Theorem 1

Generically, the global minimizer of the cost function *g*_{2}is non-degenerate.

## 6 A CG IVA algorithm

In this section, we introduce a general form of CG algorithms on matrix manifolds. After computing the Riemannian gradient of the cost function on the COP manifold, we develop a CG based IVA algorithm. The CG methods on matrix manifolds are shortly reviewed here. They form the backbone of the algorithm for our optimization problem on the COP manifold and explain the use of the differential geometric concepts derived in the previous sections. For an in-depth introduction on optimization on matrix manifolds, we refer the interested reader to [23].

*M*be a submanifold of some Euclidean space with inner product 〈·,·〉 and let $f:M\to \mathbb{R}$ be smooth. The CG method is initialized by some

*x*

_{0}∈

*M*and the descent direction

*H*

_{0}:=−

*gradf*(

*x*

_{0})given by the Riemannian gradient. If

*f*is the restriction of a globally defined function $\hat{f}$ to

*M*, the Riemannian gradient is just the orthogonal projection of the gradient of $\hat{f}$ to the tangent space, i.e.

where $\nabla \hat{f}\left(x\right)$ denotes the Euclidean gradient of $\hat{f}$, and *π*_{ x } is the orthogonal projection onto *T*_{ x }*M*. Subsequently, sweeps are iterated that consist of two steps, a *line search* in a given direction (i.e. along a geodesic in that direction) followed by an update of the *search direction*. Several different possibilities for these steps lead to different CG methods. Assume now that *x*_{ i }, *H*_{ i }, and *G*_{ i }:=grad *f*(*x*_{ i }) are given.

*γ*

_{ i }with

*γ*

_{ i }(0)=

*x*

_{ i }and ${\stackrel{\u0307}{\gamma}}_{i}\left(0\right)={H}_{i}$, the line search aims to find ${\lambda}_{i}\in \mathbb{R}$ that minimizes $f\circ \gamma :t\to \mathbb{R}$. A generic approach for the step-size selection is a Riemannian adaption to the backtracking line search and several modifications, cf. [23, 25]. Here, we present a closed form solution for the step-size selection that works particularly well for our problem due to the quadratic nature of our cost function, cf. [26]. It is based on the assumption that a one-dimensional Newton step along

*f*∘

*γ*yields a good approximation for its minimizer. Explicitly, we choose the step-size as

The absolute value in the denominator is chosen for the following reason. While being an unaltered one-dimensional Newton step in a neighborhood of a minimum the step size is the negative of a regular Newton step if $\frac{{d}^{2}}{d{t}^{2}}(f\circ \gamma )\left(\lambda \right){}_{\lambda =0}<0$ and thus yields non-attractiveness for critical points that are not minima.

*H*

_{ i }and

*G*

_{ i }, which are tangent to

*x*

_{ i }, to the tangent space ${T}_{{x}_{i+1}}M$. This is done via parallel transport along the geodesic

*γ*, which we denote by

Albeit the nice performance in applications, convergence analysis of CG methods on smooth manifolds is still an open problem. Partial convergence results for CG-methods on manifolds can be found in [27, 28] and a recent result in [29].

*g*

_{2}is to compute its Riemannian gradient. Let us denote by $\hat{{g}_{2}}$ the continuation of

*g*

_{2}to the embedding space ${\mathbb{C}}^{m\times m\times m\times k}$. Following the computation in Equation (27), we have the Euclidean gradient of $\hat{{g}_{2}}$ at $\mathcal{P}\in {\mathcal{Q}}^{k}\left(m\right)$, i.e. $\nabla \hat{{g}_{2}}\left(\mathcal{P}\right):=({\mathbf{J}}_{1},\dots ,{\mathbf{J}}_{k})$, for each element ${J}_{ip}\in {\mathbb{C}}^{m\times m}$, as

*g*

_{2}at $\mathcal{P}\in {\mathcal{Q}}^{k}\left(m\right)$, i.e. $\text{grad}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{g}_{2}\left(\mathcal{P}\right):=({\mathbf{G}}_{1},\dots ,{\mathbf{G}}_{k})\in {T}_{\mathcal{P}}{\mathcal{Q}}^{k}\left(m\right)$, for each element ${G}_{ip}\in {T}_{{P}_{\mathrm{ip}}}{\mathbb{C}\mathbb{P}}^{m-1}$, as

The above formula for the Riemannian gradient now allows to implement the geometric CG algorithm for minimizing the function *g*_{2}as define in (14) in a straightforward way. A pseudo code is provided in Algorithm 1.

## Algorithm 1 A CG IVA algorithm

**Input:** A set of matrices $\left\{{C}_{ij}^{\left(t\right)}\right\}\subset {\mathbb{C}}^{m\times m}$ for *i*,*j*=1,…,*n*;

**Step 1:** Generate an initial guess ${\mathcal{P}}^{\left(0\right)}=[{\mathbf{P}}_{1}^{\left(0\right)}\dots ,{\mathbf{P}}_{k}^{\left(0\right)}]\in {\mathcal{Q}}^{k}\left(m\right)$ and set *i*=1;

**Step 2:** Compute ${\mathcal{G}}^{\left(1\right)}={\mathcal{H}}^{\left(1\right)}=[{\mathbf{H}}_{1},\dots ,{\mathbf{H}}_{k}]\leftarrow -\mathrm{grad}{g}_{2}\left({\mathcal{P}}^{\left(0\right)}\right)$ using Equation (40);

**Step 3:** Set *i*=*i* + 1;

**Step 4:** Update ${\mathcal{P}}^{i+1}\leftarrow \left({\gamma}_{{\mathbf{P}}_{1},{\mathbf{H}}_{1}}\left({\lambda}_{i}\right),\dots ,{\gamma}_{{\mathbf{P}}_{k},{\mathbf{H}}_{k}}\left({\lambda}_{i}\right)\right)$, where *λ*_{ i }is computed (36);

**Step 5:**Update ${\mathcal{H}}^{(i+1)}\leftarrow -{\mathcal{G}}^{(i+1)}+{\gamma}_{i}\phantom{\rule{1em}{0ex}}{\tau}_{{\mathcal{P}}^{\left(i\right)},{\mathcal{H}}^{i}}\left({\lambda}_{i}\right)$, where

and *γ*_{ i }is chosen according to Equation (38);

**Step 6:** If *i* mod (2*km*(*m*−1)−1)=0, set ${\mathcal{H}}^{(i+1)}\leftarrow -{\mathcal{G}}^{(i+1)}$;

**Step 7:** If $\u2225{\mathcal{G}}^{(i+1)}\u2225$ is small enough, stop. Otherwise, go to Step 3;

## 7 Numerical experiments

In our experiment, we investigate the performance of our method in terms of both local convergence property and accuracy of estimating the joint diagonalizers.

### 7.1 Experiment one

where the matrices *A*_{ i }∈*Gl*(*m*)are randomly picked, both real and imaginary parts of the diagonal entries of ${\Omega}_{ij}^{\left(t\right)}$ and ${\hat{\Omega}}_{ij}^{\left(t\right)}$ are drawn from a uniform distribution on the interval (0,10), the matrices ${N}_{ij}^{H}\in {\mathbb{C}}^{m\times m}$ and ${N}_{ij}^{S}\in {\mathbb{C}}^{m\times m}$ are a Hermitian and a complex symmetric matrix, respectively, whose real and imaginary parts are generated from a uniform distribution on the unit interval (−0.5,0.5), representing additive stationary noise, and $\u03f5\in \mathbb{R}$ is the noise level.

*m*=3,

*k*=3,

*n*=3. First of all, we choose the noise level

*ϵ*=0. A typical local convergence curve of our proposed algorithm is shown in Figure 1. A tendency of superlinear convergence can be observed.

*ϵ*∈{0.1,0.5,1.0}, and run 50 tests. The performance index is chosen to be the averaged Amari error, proposed in [30]. Generally, the smaller the Amari error, the better the separation. The quartile based boxplot of averaged Amari errors of our proposed algorithm against three different noise levels are drawn in Figure 2. Our CG algorithm demonstrates its correspondingly delaying performance with the increasing noise levels.

### 7.2 Experiment two

In this experiment, we compare our CG based IVA approach, referred to as *IVA-CG*, with two second-order statistics based IVA algorithms. We refer to one contrast optimization based IVA algorithm as *IVA-CO*, cf. [5, 6], and the other matrix joint diagonalization based approach as *IVA-JD*, cf. [13]. The task of this experiment is to separate two groups of complex valued signals. We take three real audio source signals with 480,000 samples, and apply the short time Fourier transform to the sources with the number of FFT points being 1,024. By doing so, we end up with a complex IVA problem with 513 groups of statistically dependent complex signals.

## 8 Conclusion

We propose a matrix joint diagonalization approach to solve the complex IVA problem which does not rely on a pre-whitening step nor on the estimation of the unknown distribution of the sources. A mathematical setting is derived that allows a formulation without ambiguity on the set of unknown parameters, i.e. the dimension of the search space is maximally reduced. This leads in a natural way to a smooth manifold structure that we call *complex oblique projective* manifold, due to its close relation to the oblique manifold which consists of invertible matrices with normalized columns. We propose to solve the complex IVA problem via minimizing a cost function that is based on the well-known off-norm function for measuring joint diagonality. We show that our setting leads to a non-degenerate Hessian for the solution of the IVA problem. This is an important result for the design of minimization methods, since in many cases, the speed of convergence relies on the non-degeneracy of the minima. We develop a geometric CG method for solving the IVA problem and conclude by providing some numerical experiments.

## Endnote

^{a}Note, that $\mathcal{Q}\left(m\right)$ is not a geodesically complete manifold.

## Notes

### Acknowledgements

This study had been supported by the Cluster of Excellence *CoTeSys*—Cognition for Technical Systems, funded by the Deutsche Forschungsgemeinschaft (DFG).

## Supplementary material

### References

- 1.Cardoso JF: Multidimensional independent component analysis. In
*Proceedings of the 23rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4*. (Seattle, WA, USA; 1998).Google Scholar - 2.Hyvärinen A, Hoyer PO: Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces.
*Neural Comput*2000, 12(7):1705-1720. 10.1162/089976600300015312CrossRefGoogle Scholar - 3.Araki S, Mukai R, Makino S, Nishikawa T, Saruwatari H: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech.
*IEEE Trans. Speech Audio Process*2003, 11(2):109-116. 10.1109/TSA.2003.809193CrossRefGoogle Scholar - 4.Lee I, Kim T, Lee TW: Independent vector analysis for convolutive blind speech separation. In
*Blind Speech Separation, Signals and Communication Technology*. Edited by: Makino S, Lee TW, Sawada H. (Springer, Netherlands; 2007).Google Scholar - 5.Anderson M, Li XL, Adalı T: Complex-valued independent vector analysis: application to multivariate Gaussian model.
*Signal Process*2012, 92(8):1821-1831. 10.1016/j.sigpro.2011.09.034CrossRefGoogle Scholar - 6.Anderson M, Adalı T, Li XL: Joint blind source separation with multivariate Gaussian model: algorithms and performance analysis.
*IEEE Trans. Signal Process*2012, 60(4):1672-1683.MathSciNetCrossRefGoogle Scholar - 7.Kim T: Real-time independent vector analysis for convolutive blind source separation.
*IEEE Trans. Circ. Syst. I: Regular Papers*2010, 57(7):1431-1438.CrossRefGoogle Scholar - 8.Hao J, Lee I, Lee TW, Sejnowski TJ: Independent vector analysis for source separation using a mixture of Gaussians prior.
*Neural Comput*2010, 22(6):1646-1673. 10.1162/neco.2010.11-08-906MathSciNetCrossRefMATHGoogle Scholar - 9.Lee I, Jang GJ: Independent vector analysis based on overlapped cliques of variable width for frequency-domain blind signal separation.
*EURASIP J. Adv. Signal Process*2012, 113: 1-12.CrossRefGoogle Scholar - 10.Bermejo S: Finite sample effects in higher order statistics contrast functions for sequential blind source separation.
*IEEE Signal Process. Lett*2005, 12(6):481-484.CrossRefGoogle Scholar - 11.Ghennioui H, Fadaili EM, Thirion-Moreau N, Adib A, Moreau E: A nonunitary joint block diagonalization algorithm for blind separation of convolutive mixtures of sources.
*IEEE Signal Process. Lett*2007, 14(11):860-863.CrossRefGoogle Scholar - 12.Ghennioui H, Thirion-Moreau N, Moreau E, Aboutajdine D: Gradient-based joint block diagonalization algorithms: application to blind separation of FIR convolutive mixtures.
*Signal Process*2010, 90(6):1836-1849. 10.1016/j.sigpro.2009.12.002CrossRefMATHGoogle Scholar - 13.Li XL, Adalı T, Anderson M: Joint blind source separation by generalized joint diagonalization of cumulant matrices.
*Signal Process*2011, 91(10):2314-2322. 10.1016/j.sigpro.2011.04.016CrossRefMATHGoogle Scholar - 14.Shen H, Kleinsteuber M: A matrix joint diagonalization approach for complex independent vector analysis. In
*Proceedings of the 10th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), vol. 7191 Lecture Notes in Computer Science*. Edited by: Theis F, Cichocki A, Yeredor A, Zibulevsky M. (Springer-Verlag, Berlin/Heidelberg; 2012).Google Scholar - 15.Shen H, Kleinsteuber M: Complex blind source separation via simultaneous strong uncorrelating transform. In
*Lecture Notes in Computer Science, Proceedings of the 9th International Conference on Latent Variable Analysis and Signal Separation, vol. 6365*. (Springer-Verlag, Berlin/Heidelberg; 2010).Google Scholar - 16.Smaragdis P: Blind separation of convolved mixtures in the frequency domain.
*Neurocomputing*1998, 22: 21-34. 10.1016/S0925-2312(98)00047-2CrossRefMATHGoogle Scholar - 17.Comon P, Jutten C:
*Handbook of Blind Source Separation: Independent Component Analysis and Applications*. Academic Press Inc, San Diego, USA; 2010.Google Scholar - 18.Makino S, Lee TW, Sawada H:
*Blind Speech Separation Signals and Communication Technology*. Springer, Netherlands; 2007.CrossRefGoogle Scholar - 19.Hosseini S, Deville Y, Saylani H: Blind separation of linear instantaneous mixtures of non-stationarysignals in the frequency domain.
*Signal Process*2009, 89: 819-830. 10.1016/j.sigpro.2008.10.024CrossRefMATHGoogle Scholar - 20.Maehara T, Murota K: Simultaneous singular value decomposition.
*Linear Alg. Appl*2011, 435: 106-116. 10.1016/j.laa.2011.01.007MathSciNetCrossRefMATHGoogle Scholar - 21.Absil PA, Gallivan KA: Joint diagonalization on the oblique manifold for independent component analysis. In
*Proceedings of the 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5*. (Toulouse, France; 2006).Google Scholar - 22.Afsari B: Sensitivity analysis for the problem of matrix joint diagonalization.
*SIAM J. Matrix Anal. Appl*2008, 30(3):1148-1171. 10.1137/060655997MathSciNetCrossRefGoogle Scholar - 23.Absil PA, Mahony R, Sepulchre R:
*Optimization Algorithms on Matrix Manifolds*. Princeton University Press, Princeton, NJ; 2008.CrossRefMATHGoogle Scholar - 24.Helmke U, Hüper K, Trumpf J: Newton’s method on Graßmann manifolds. 2007.Google Scholar
- 25.Nocedal J, Wright SJ:
*Numerical Optimization*. Springer, New York; 2006.MATHGoogle Scholar - 26.Kleinsteuber M, Hüper K: An intrinsic CG algorithm for computing dominant subspaces. In
*Proceedings of the 32nd IEEE International Conference on, Acoustics, Speech, and Signal Processing (ICASSP), vol. 4*. (Hawaii, USA; 2007).Google Scholar - 27.Smith ST: Optimization techniques on Riemannian manifolds. In
*Hamiltonian and Gradient Flows, Algorithms and Control, Fields Institute Communications, vol. 3*. Edited by: Bloch A. American Mathematical Society, Providence, RI; 1994).Google Scholar - 28.Gabay D: Minimizing a differentiable function over a differential manifold.
*J. Optimiz. Theory Appl*1982, 37(2):177-219. 10.1007/BF00934767MathSciNetCrossRefMATHGoogle Scholar - 29.Ring W, Wirth B: Optimization methods on Riemannian manifolds and their application to shape space.
*SIAM J. Optimiz*2012, 22(2):596-627. 10.1137/11082885XMathSciNetCrossRefMATHGoogle Scholar - 30.Amari SI, Cichocki A, Yang HH: A new learning algorithm for blind signal separation. In
*Advances in Neural Information Processing Systems (NIPS), vol. 8*. Edited by: Touretzky DS, Mozer MC, Hasselmo ME. (The MIT Press, Cambridge, MA, USA; 1996).Google Scholar

## Copyright information

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.