Non-unitary matrix joint diagonalization for complex independent vector analysis

Open Access
Research
Part of the following topical collections:
  1. Dependent Component Analysis

Abstract

Independent vector analysis (IVA) is a special form of independent component analysis (ICA), which has demonstrated its prominent performance in solving convolutive blind source separation (BSS) problems in the frequency domain. Most IVA algorithms are based on optimizing certain contrast functions, where the main difficulty of these approaches lies in finding a reliable and fast estimation of the unknown distribution of sources. Despite the rich availability of efficient tensorial approaches to the standard ICA problem, these methods have not been explored considerably for IVA. In this article, we propose a matrix joint diagonalization approach to solve the complex IVA problem. The new factorization neither relies on a whitening process, nor does it require an estimate of the joint probability distribution of the dependent signal groups. The latter is in contrast to most IVA approaches up to date. The underlying geometry of the problem is investigated together with a critical point analysis of the resulting cost function. A conjugate gradient algorithm on the appropriate manifold setting is developed.

Keywords

Conjugate Gradient Independent Component Analysis Conjugate Gradient Method Independent Component Analysis Parallel Transport 

1 Introduction

Independent component analysis (ICA) is a standard statistical tool for solving the blind source separation (BSS) problem. BSS aims to recover source signals from the observed mixtures, without knowing either the distribution of the sources or the mixing process. Application of the standard ICA model is often limited, since it requires mutual statistical independence between all individual components. However, in many applications, there exist groups of signals of interest, where components from different groups are mutually statistically independent indeed, but where mutual statistical dependence occurs between components in the same group. Such problems can be tackled by a technique now referred to as multidimensional independent component analysis (MICA) [1], or independent subspace analysis (ISA) [2].

A special form of ISA arises in solving the BSS problem with convolutive mixtures [3]. After transferring the convolutive observations into the frequency domain via short-time Fourier transforms, the convolutive BSS problem results in a collection of instantaneous complex BSS problems in each frequency bin. After solving the subproblems individually, the final stage faces the challenge of aligning all statistically dependent components from different groups, which is referred to as the permutation problem. To avoid this problem, a new approach named independent vector analysis (IVA) has been proposed in [4]. Besides its application in convolutive BSS problem, IVA has also recently been applied to analyze multivariate Gaussian models, cf. [5, 6]. In the current literature, the majority of IVA algorithms are based on optimizing certain contrast functions, cf. [5, 7, 8, 9]. The main difficulty of these contrast function based approaches lies in estimating the unknown distribution of the sources, which usually requires a large number of observations [10].

On the other hand, tensorial approaches are efficient and richly available to solve both the ICA and ISA problems. In particular, joint block diagonalization approaches are shown to be effective methods for solving the ISA problem, cf. [11, 12], and are inherently applicable to IVA. However, such general joint block diagonalization approaches do not take the intrinsic structure of the IVA problem into account. Recent study in [13] proposes a joint diagonalization approach of cross cumulant matrices to solve the complex IVA problem. More recently, the present authors have developed a similar approach of jointly diagonalizing both cross covariance and cross pseudo covariance matrices, cf. [14]. In this article, we extend the previous study in [14], and adapt the so-called complex oblique projective (COP) manifold, which has proven to be an appropriate setting for the standard instantaneous complex ICA problem [15], to the current scenario. Finally, an efficient conjugate gradient (CG) based IVA algorithm is proposed, and numerical experiments are provided to demonstrate the convergence properties of the proposed CG algorithm, and to compare its performance with two recently developed IVA algorithms in terms of separation quality.

2 Notations

Throughout the article, (·) denotes the matrix transpose, (·)H the Hermitian transpose, ( · ) ¯ Open image in new window the entry-wise complex conjugate of a matrix, and by Gl(m) the set of all m×m invertible complex matrices. The Frobenius norm of a matrix A C m × n Open image in new window is denoted by A F : = t r ( A A H ) Open image in new window, where tr(·) is the trace of a square matrix. Given a square matrix Z C m × m Open image in new window, ddiag(Z)forms a diagonal matrix whose diagonal entries are those of Z, and off(Z) generates a matrix by setting all diagonal entries of Z to zero, i.e. off(Z):=Z−ddiag(Z).

In this study, we consider an m-dimensional complex signal s ( t ) = [ s 1 ( t ) , , s m ( t ) ] C m Open image in new window as an m-dimensional complex stochastic process indexed by the variable t. The empirical expectation of a random variable s is denoted by E [ s ( t ) ] = 1 T t = 1 T s ( t ) Open image in new window, where T is the number of samples. As usual for the standard ICA model, we assume without loss of generality that E [ s ( t ) ] = 0 Open image in new window. The empirical covariance and pseudo-covariance matrix of complex signals s(t) are referred to as cov ( s ( t ) ) : = E [ s ( t ) s ( t ) H ] Open image in new window and pcov ( s ( t ) ) : = E [ s ( t ) s ( t ) ] Open image in new window, respectively.

3 Problem description

It is known that convolutive BSS problems can be transformed into in the frequency domain, and can be solved as instantaneous complex BSS problems for every frequency simultaneously, when the demixing filter is sufficiently longer than the mixing filter, cf. [16, 17]. In this study, we consider the spectral time-frequency representation of a signal in terms of a short-time Fourier transformation that is centered at time t. Let w i ( t , f ) C Open image in new window and s i ( t , f ) C Open image in new window denote the coefficient of the center frequency f of the i th observation w i (t) and the i th source signal s i (t), respectively. Then, for a given pair (t, f), the Fourier coefficients of the observations and the sources obey the equality
w ( t , f ) = A f s ( t , f ) , Open image in new window
(1)
where w ( t , f ) : = [ w 1 ( t , f ) , , w m ( t , f ) ] C m Open image in new window, s ( t , f ) : = [ s 1 ( t , f ) , , s m ( t , f ) ] C m Open image in new window, and A f C m × m Open image in new window serves as a complex mixing matrix. More compactly, for a fixed frequency f , we get a standard instantaneous complex BSS problem as
W ( f ) = A f S ( f ) , Open image in new window
(2)

where W ( f ) [ w ( 1 , f ) , , w ( T , f ) ] C m × T Open image in new window and S ( f ) [ s ( 1 , f ) , , s ( T , f ) ] C m × T Open image in new window, with T being the number of chosen time frames. One popular approach to solve the convolutive BSS problem is to solve the individual instantaneous BSS problem at each frequency (2), and then assemble the results from each frequency to reconstruct the estimated signal in the time domain [18].

Let us denote the rows of S(f)by s i ( f ) = [ s i ( 1 , f ) , , s i ( T , f ) ] C 1 × T Open image in new window for i = 1,…,m. Following the assumption of statistical independence between the sources, the complex valued signals s i (f) and s j (f) are statistically independent for ij. In contrast, we assume that for a pair of frequencies (f p ,f q )with f p f q , the complex signals s i (f p ) and s i (f q ) are statistically dependent for a given source. The development of IVA is inspired by this cross frequency structure. It aims to find a set of demixing matrices { X f } G l ( m ) Open image in new window via
Y ( f ) = X f H W ( f ) , Open image in new window
(3)
such that
  1. (1)

    all sub-ICA problems are solved, and

     
  2. (2)

    the statistical alignment between groups is restored, i.e. the estimated i th signals {y i (f)} are mutually statistically dependent.

     
The main idea for our approach is to exploit the cross covariance matrices between groups of observations defined as
cov ( W ( f i ) , W ( f j ) ) : = 1 T t = 1 T w ( t , f i ) w ( t , f j ) H = A ( f i ) 1 T t = 1 T s ( t , f i ) s ( t , f j ) H = : cov ( S ( f i ) , S ( f j ) ) A ( f j ) H . Open image in new window
(4)
Similarly, the so-called pseudo cross covariance, defined as
pcov ( W ( f i ) , W ( f j ) ) : = 1 T t = 1 T w ( t , f i ) w ( t , f j ) = A ( f i ) pcov ( S ( f i ) , S ( f j ) ) A ( f j ) , Open image in new window
(5)

also allows to gain additional information about the second-order statistics of the involved signals. In this study, we assume that cross covariances between sources in all groups do not vanish. The assumption of statistical independence between the source signals implies that the cross covariance matrix cov(S(f i ),S(f j )) and the pseudo cross covariance matrix pcov(S(f i ),S(f j ))are diagonal for all pairs (i j). With a further assumption on the sources being non-stationary, which has been exploited in [19], we arrive at a problem of jointly diagonalizing two sets of cross covariance and pseudo cross covariance matrices at different time instances.

To summarize, we are interested in solving the following problem. For a complex IVA problem with k subproblems, we consider the cross covariance and pseudo cross covariance matrices at n time instances, i.e. for all i,j=1,…,k and t=1,…,n, a set of matrices { C i j ( t ) } i < j Open image in new window and a set of complex symmetric matrices { R i j ( t ) } i < j Open image in new window, which are constructed by
C i j ( t ) = A i Ω i j ( t ) A j H and R i j ( t ) = A i Ω ~ i j ( t ) A j ¯ , Open image in new window
(6)
where Ω i j ( t ) , Ω ~ i j ( t ) C m × m Open image in new window are diagonal. The task is to find a set of matrices { X i } i = 1 k Gl ( m ) Open image in new window such that
X i H C i j ( t ) X j and X i H R i j ( t ) X ¯ j , Open image in new window
(7)

for all i < jand t = 1,…,n, are simultaneously, or approximately simultaneously diagonalized. In this study, we study the noise free IVA problem as defined in (2), and neglect the cross covariance matrix estimation errors due to the finite sample size effect. In other words, we assume that both sets of C i j ( t ) Open image in new window’s and R i j ( t ) Open image in new window’s are jointly diagonalizable.

Note that the above problem is similar to the simultaneous SVD formulation proposed in [20], where only the situation with two transform matrices is studied, i.e. k = 2. To the contrary, our current setting deals with the cases of multiple transform matrices {X i }i=1,…,k, which are not restricted to be unitary. Finally, instead of considering second order cross covariance matrices, our developed approach can be generalized to the high order cross cumulants. We refer to [17] for further details.

4 Diagonality measure and the COP manifold

Our cost function to tackle problem (7) originates from the popular off-norm function that measures the squared Frobenius norm of the off-diagonal entries of the involved matrices. We develop an appropriate mathematical setting on the subsequently defined complex oblique projective (COP) manifold to provide its critical point analysis.

4.1 Derivation of the cost function

For legibility reasons, from now on, we only consider the problem of simultaneously diagonalizing the covariance matrices, i.e. the first condition in (7). The combination with the additional requirement that also the pseudo cross covariance matrices may be used for estimating the demixing matrix is straightforwardly adapted to our setting and not further discussed here.

Let us define the off-norm function as
g : ( Gl ( m ) ) k R , g ( X 1 , , X k ) : = i < j k t = 1 n 1 2 off ( X i H C i j ( t ) X j ) F 2 . Open image in new window
(8)
Due to the noise-free assumption and since we neglect finite sample size effects, the set of joint diagonalizers ( X 1 , , X k ) Open image in new window of the C i j ( t ) Open image in new window in Equation (7) is a global minimum of g, that is g ( X 1 , , X k ) = 0 Open image in new window. It is clear that a minimization approach without further constraints on the X i would drive all diagonalizers to zero. In order to avoid such trivial solutions and to regularize the minimization problem, the authors in [21] propose to restrict all columns of transform matrices to have unit norm. This set is known as the oblique manifold, which has been shown to be an appropriate setting for matrix diagonalization, cf. [22]. Its complex counterpart is the so-called complex oblique manifold
O b ( m ) : = X G l ( m ) ddiag ( X H X ) = I m Open image in new window
(9)
and we denote by Ob k (m) the product manifold of k copies of Ob(m). The restriction of the off-norm cost function (8) is denoted by
g 1 : O b k ( m ) R , g 1 ( X 1 , , X k ) : = i < j k t = 1 n 1 2 off ( X i H C i j ( t ) X j ) F 2 . Open image in new window
(10)
Now denote the p th column of X i by x ip . It is obvious that the function g1 is invariant with respect to the phase difference of each column x ip , which reflects the well-known scaling ambiguity of complex ICA problems. By a further calculation, g1 has the form
g 1 ( X 1 , , X k ) = 1 2 i < j k t = 1 n p q m x i p H C i j ( t ) x j q 2 = 1 2 i < j k t = 1 n p q m x i p H C i j ( t ) x j q x i p H C i j ( t ) x j q H = 1 2 i < j k t = 1 n p q m tr x i p x i p H C i j ( t ) x j q x j q H C i j ( t ) H . Open image in new window
(11)
Instead of fixing a phase for each x ip , in this study we employ an elegant mathematical setting for the problem. Recall the fact that each x i p x i p H Open image in new window defines a Hermitian rank-one projector, the set of which identifies the (m − 1)- dimensional complex projective space C P m 1 Open image in new window, i.e.
C P m 1 = P C m × m P H = P , P 2 = P , tr ( P ) = 1 . Open image in new window
(12)
By doing so for each column and by maintaining the fact that the columns of X i form a complex basis (i.e. invertibility of X i ), we naturally arrive at the following set, which we refer to as the complex oblique projective (COP) manifold,
Q ( m ) : = P 1 , , P m P i C P m 1 , det i = 1 m P i > 0 . Open image in new window
(13)
The off-norm cost function g1 now induces the following function g2 on the COP manifold. Namely, if Q k ( m ) Open image in new window denotes the k-times product of Q ( m ) Open image in new window, g2 is given by
g 2 : Q k ( m ) R , g 2 ( P 1 , , P k ) : = i < j k t = 1 n p q m tr P i p C i j ( t ) P j q C i j ( t ) H , Open image in new window
(14)

with P i :=(Pi 1,…,P im ).

4.2 The geometry of the complex oblique projective manifold

In this section, we recall some basic facts and concepts that are necessary for developing a Riemannian CG algorithm on the COP manifold, cf. [23]. In particular, we require a formula for the parallel transport and the geodesics of the COP manifold. We endow Q ( m ) Open image in new window with the standard Riemannian metric
( A 1 , , A m ) , ( B 1 , , B m ) : = i t r ( A i B i ) , Open image in new window
(15)
inherited from the Euclidean metric of the m-fold product of Hermitian matrices. With this, Q ( m ) Open image in new window is an open and dense Riemannian submanifold of the m-times product of C P m 1 Open image in new window with the standard metric, i.e.
Q ( m ) ¯ = : C P m 1 m , Open image in new window
(16)

where Q ( m ) ¯ Open image in new window denotes the closure of Q ( m ) Open image in new window. Accordingly, the tangent spaces, the geodesics, and the parallel transport for Q ( m ) Open image in new window and ( C P m 1 ) m Open image in new window coincide locally and thus are easily derived from the geometry of C P m 1 Open image in new window. We refer to [24] for further discussions and details about C P m 1 Open image in new window.

Let us denote by
u ( m ) : = Ω C m × m Ω = Ω H Open image in new window
(17)
the set of skew-Hermitian matrices. The tangent space at P in C P m 1 Open image in new window is given by
T P C P m 1 = [ P , Ω ] Ω u ( m ) Open image in new window
(18)
where [A,B]:=ABBA is the matrix commutator. Then, the tangent space at P = ( P 1 , , P m ) Q ( m ) Open image in new window is simply the Cartesian product
T P Q ( m ) T P 1 C P m 1 × × T P m C P m 1 . Open image in new window
(19)
With the above metric, the geodesics through P C P m 1 Open image in new window in direction Z T P C P m 1 Open image in new window are given by
γ P , Z : R C P m 1 , γ P , Z ( t ) : = e t [ Z , P ] P e t [ Z , P ] , Open image in new window
(20)
where e(·)denotes the matrix exponential. Thus, the (locala) geodesic through P Q ( m ) Open image in new window in direction Z : = ( Z 1 , , Z m ) T P Q ( m ) Open image in new window is given by
γ P , Z ( t ) = γ P 1 , Z 1 ( t ) , , γ P m , Z m ( t ) . Open image in new window
(21)
The parallel transport of Ψ : = ( Ψ 1 , , Ψ m ) T P Q ( m ) Open image in new window with respect to the Levi-Civita connection along the geodesic γP,Z(t) is
τ P , Z ( Ψ ) : = τ P 1 , Z 1 ( Ψ 1 ) , , τ P m , Z m ( Ψ m ) , Open image in new window
(22)
with τP,Z being the parallel transport of Ψ T P C P m 1 Open image in new window with respect to the Levi-Civita connection along the geodesic γP,Z(t), i.e.
τ P , Z ( Ψ ) = e [ Z , P ] Ψ e [ Z , P ] . Open image in new window
(23)
The natural or Riemannian gradient of a function that is the restriction of some globally defined function to a sub-manifold is simply the orthogonal projection of the Euclidean gradient onto the corresponding tangent space. For the complex projective space, this projection is given by
π P : C m × m T P C P m 1 , A [ P , [ P , 1 2 ( A + A H ) ] ] . Open image in new window
(24)

It is easily seen that the operator π P is an orthogonal projector on the tangent space T p C P m 1 Open image in new window, i.e. that π P π P (A)=π P (A) and that the null space of π P is orthogonal to its image. Here, ∘denotes the composition of functions. The formulas for the tangent spaces, the geodesics, the parallel transport, and the projection onto the tangent spaces of Q k ( m ) Open image in new window follow directly from the product manifold structure.

5 Critical point analysis of the cost function

In this section, we conduct a critical point analysis of the cost function g2 on the product COP manifold. We show that the joint diagonalizers are a non-degenerate global minimum of g2, This is an important fact, since in many cases the speed of convergence relies on the non-degeneracy of the minima. First of all, we present a lemma which originates from the derivation of the cost from the off-norm function.

Lemma 1

Let us assume that all C i j ( t ) Open image in new window’s are jointly diagonalizable. If ( X 1 , , X k ) O b k ( m ) Open image in new window minimizes the cost function g1, as defined in (10), i.e. X i H C i j ( t ) X j = D i j ( t ) = diag ( d i j 1 ( t ) , , d ijm ( t ) ) Open image in new window being diagonal for all t=1,…,n and i,j=1,…,k, then the set of corresponding Hermitian projectors P = ( P 1 , , P k ) Q k ( m ) Open image in new window with P ip : = x ip x ip H C P m 1 Open image in new window minimizes the cost function g2, defined in (14) and
P i p C i j ( t ) P j q = 0 , for p q , P i p C i j ( t ) P j q = d i j p ( t ) P i p , for p = q . Open image in new window
(25)
The above lemma follows directly from the condition of X i H C i j ( t ) X j Open image in new window being diagonal, i.e. its (p,q)th entry is computed as
x i p H C i j ( t ) x i q = 0 , for p q , d i j p ( t ) , for p = q . Open image in new window
(26)
Now, let P : = ( P 1 , , P m ) Q k ( m ) Open image in new window be arbitrary. We compute the first derivative of g2at P Q k ( m ) Open image in new window in direction Z : = ( Z 1 , , Z k ) T P Q k ( m ) Open image in new window as
D g 2 ( P ) Z = i < j k p q m t = 1 n tr Z i p C i j ( t ) P j q C i j ( t ) H + tr P i p C i j ( t ) Z j q C i j ( t ) H . Open image in new window
(27)

By recalling the structure of the tangent space of Q k ( m ) Open image in new window and the result in Lemma 1, it is trivial to see that the first derivative of g2 vanishes at P Open image in new window, which corresponds to the correct joint diagonalizers.

The remainder of this section addresses the characterization of the Hessian of g2at the joint diagonalizers. To that end, we denote by o f f ( m ) : = { Z C m × m | z ii = 0 , Open image in new windowi=1,…,m}the set of matrices with zero diagonal. Let Πbe the natural projection
Π : Ob ( m ) Q ( m ) , Π ( X ) = ( x 1 x 1 H , , x m x m H ) Open image in new window
(28)
and let μ X be defined as
μ X : o f f ( m ) O b ( m ) , Z X ( I m + Z ) diag 1 X ( e 1 + z 1 ) , , 1 X ( e m + z m ) , Open image in new window
(29)
where e j denotes the j th standard basis vector. Note, that μ X defines a locally injective but not bijective mapping. The composition of Π and μ X , however, yields a local diffeomorphism. With the shorthand notation P := Π(X), the mapping
ϕ P : o f f ( m ) Q ( m ) , Z Π μ X ( Z ) Open image in new window
(30)
is a local parametrization around P and thus permits a local parameterization of Q k ( m ) Open image in new window via
Φ P : o f f k ( m ) Q k ( m ) , ( Z 1 , , Z k ) ( ϕ P 1 ( Z 1 ) , , ϕ P k ( Z k ) ) , Open image in new window
(31)
with Φ P ( 0 ) = P : = ( P 1 , , P k ) Open image in new window. The associated tangent map T Φ P Open image in new window is given as
T Φ P : T o f f k ( m ) o f f k ( m ) T P Q k ( m ) , ( Θ 1 , , Θ k ) x 11 ξ ( x 11 ) θ 11 H + θ 11 ξ ( x 11 ) x 11 H , , x k m ξ ( x k m ) θ k m H + θ k m ξ ( x k m ) x k m H , Open image in new window
(32)
where ξ ( x ip ) : = I m x ip x ip H Open image in new window is the orthogonal projection operator onto the complement space of x ip . Let Z : = T Φ P ( Θ ) T P Q k ( m ) Open image in new window. Then, we can compute the Hessian of g2 at the critical points P Q k ( m ) Open image in new window, i.e. the symmetric bilinear form H g 2 ( P ) : T P Q k ( m ) × T P Q k ( m ) R Open image in new window via
H g 2 ( P ) ( Z , Z ) = H g 2 ( P ) ( T Φ P ( Θ ) , T Φ P ( Θ ) ) = d 2 d t 2 ( g 2 Φ P ) ( t Θ ) t = 0 = i < j k p q m t = 1 n 2 tr Z i p C i j ( t ) Z j q C i j ( t ) H = i < j k p q m t = 1 n 2 d i j p ( t ) θ p q ( i ) + d i j q ( t ) θ q p ( j ) 2 . Open image in new window
(33)
The last equality holds by following the results in Lemma 1, i.e. X i H C i j ( t ) X j = D i j ( t ) Open image in new window, which is equivalent to C i j ( t ) = X i H D i j ( t ) X j 1 Open image in new window and the fact that Z ip = ( I m P i 1 ) X i θ ip Open image in new window. It can easily been seen that the Hessian form (33) is positive definite if and only if all (2×2)-matrices
t = 1 n | d i j p ( t ) | 2 d i j q ( t ) d i j p ( t ) ¯ d i j p ( t ) d i j q ( t ) ¯ | d i j q ( t ) | 2 Open image in new window
(34)

are positive definite. Since this is a generic assumption on the data, we have the following result.

Theorem 1

Generically, the global minimizer of the cost function g2is non-degenerate.

6 A CG IVA algorithm

In this section, we introduce a general form of CG algorithms on matrix manifolds. After computing the Riemannian gradient of the cost function on the COP manifold, we develop a CG based IVA algorithm. The CG methods on matrix manifolds are shortly reviewed here. They form the backbone of the algorithm for our optimization problem on the COP manifold and explain the use of the differential geometric concepts derived in the previous sections. For an in-depth introduction on optimization on matrix manifolds, we refer the interested reader to [23].

Let M be a submanifold of some Euclidean space with inner product 〈·,·〉 and let f : M R Open image in new window be smooth. The CG method is initialized by some x0M and the descent direction H0:=−gradf(x0)given by the Riemannian gradient. If f is the restriction of a globally defined function f ̂ Open image in new window to M, the Riemannian gradient is just the orthogonal projection of the gradient of f ̂ Open image in new window to the tangent space, i.e.
grad f ( x ) = π x f ̂ ( x ) , Open image in new window
(35)

where f ̂ ( x ) Open image in new window denotes the Euclidean gradient of f ̂ Open image in new window, and π x is the orthogonal projection onto T x M. Subsequently, sweeps are iterated that consist of two steps, a line search in a given direction (i.e. along a geodesic in that direction) followed by an update of the search direction. Several different possibilities for these steps lead to different CG methods. Assume now that x i , H i , and G i :=grad f(x i ) are given.

Given a geodesic γ i with γ i (0)=x i and γ ̇ i ( 0 ) = H i Open image in new window, the line search aims to find λ i R Open image in new window that minimizes f γ : t R Open image in new window. A generic approach for the step-size selection is a Riemannian adaption to the backtracking line search and several modifications, cf. [23, 25]. Here, we present a closed form solution for the step-size selection that works particularly well for our problem due to the quadratic nature of our cost function, cf. [26]. It is based on the assumption that a one-dimensional Newton step along fγyields a good approximation for its minimizer. Explicitly, we choose the step-size as
λ i : = d f γ ( λ ) | λ = 0 d 2 d λ 2 f γ ( λ ) | λ = 0 . Open image in new window
(36)

The absolute value in the denominator is chosen for the following reason. While being an unaltered one-dimensional Newton step in a neighborhood of a minimum the step size is the negative of a regular Newton step if d 2 d t 2 ( f γ ) ( λ ) λ = 0 < 0 Open image in new window and thus yields non-attractiveness for critical points that are not minima.

In order to compute the new search direction H i + 1 T x i + 1 M Open image in new window, we need to transport H i and G i , which are tangent to x i , to the tangent space T x i + 1 M Open image in new window. This is done via parallel transport along the geodesic γ, which we denote by
τ : T x i M T x i + 1 M. Open image in new window
(37)
The updated search direction is now chosen according to a Riemannian adaption of the Hestenes-Stiefel, the Polak-Ribière, or the Fletcher-Reeves update. Here, we choose a different formulation that performs slightly better in our situation than the afore mentioned ones, namely
γ i : = G i + 1 , G i + 1 τ G i H i , G i . Open image in new window
(38)

Albeit the nice performance in applications, convergence analysis of CG methods on smooth manifolds is still an open problem. Partial convergence results for CG-methods on manifolds can be found in [27, 28] and a recent result in [29].

As it is clear from the above, the first step towards formulating a CG algorithm for minimizing the cost function g2 is to compute its Riemannian gradient. Let us denote by g 2 ̂ Open image in new window the continuation of g2 to the embedding space C m × m × m × k Open image in new window. Following the computation in Equation (27), we have the Euclidean gradient of g 2 ̂ Open image in new window at P Q k ( m ) Open image in new window, i.e. g 2 ̂ ( P ) : = ( J 1 , , J k ) Open image in new window, for each element J i p C m × m Open image in new window, as
J i p = j > i k p q m t = 1 n C i j ( t ) P i q C i j ( t ) H + j < i k p q m t = 1 n C i j ( t ) H P i q C i j ( t ) . Open image in new window
(39)
By projecting it onto the tangent space T P Q k ( m ) Open image in new window, we get the Riemannian gradient of g2at P Q k ( m ) Open image in new window, i.e. grad g 2 ( P ) : = ( G 1 , , G k ) T P Q k ( m ) Open image in new window, for each element G i p T P ip C P m 1 Open image in new window, as
G i p = [ P i p , [ P i p , j > i k p q m t = 1 n C i j ( t ) P i q C i j ( t ) H + j < i k p q m t = 1 n C i j ( t ) H P i q C i j ( t ) ] ] . Open image in new window
(40)

The above formula for the Riemannian gradient now allows to implement the geometric CG algorithm for minimizing the function g2as define in (14) in a straightforward way. A pseudo code is provided in Algorithm 1.

Algorithm 1 A CG IVA algorithm

Input: A set of matrices { C i j ( t ) } C m × m Open image in new window for i,j=1,…,n;

Step 1: Generate an initial guess P ( 0 ) = [ P 1 ( 0 ) , P k ( 0 ) ] Q k ( m ) Open image in new window and set i=1;

Step 2: Compute G ( 1 ) = H ( 1 ) = [ H 1 , , H k ] grad g 2 ( P ( 0 ) ) Open image in new window using Equation (40);

Step 3: Set i=i + 1;

Step 4: Update P i + 1 γ P 1 , H 1 ( λ i ) , , γ P k , H k ( λ i ) Open image in new window, where λ i is computed (36);

Step 5: Update H ( i + 1 ) G ( i + 1 ) + γ i τ P ( i ) , H i ( λ i ) Open image in new window, where
G ( i + 1 ) = grad g 2 ( P ( i ) ) , Open image in new window
(41)

and γ i is chosen according to Equation (38);

Step 6: If i mod (2km(m−1)−1)=0, set H ( i + 1 ) G ( i + 1 ) Open image in new window;

Step 7: If G ( i + 1 ) Open image in new window is small enough, stop. Otherwise, go to Step 3;

7 Numerical experiments

In our experiment, we investigate the performance of our method in terms of both local convergence property and accuracy of estimating the joint diagonalizers.

7.1 Experiment one

The First task of our experiment is to jointly diagonalize two sets of complex matrices, { C i j ( t ) } i < j Open image in new window and { R i j ( t ) } i < j Open image in new window, which are constructed by
C i j ( t ) = A i Ω i j ( t ) A j H + ϵ N i j H and R i j ( t ) = A i Ω ̂ i j ( t ) A j + ϵ N i j S Open image in new window
(42)

where the matrices A i Gl(m)are randomly picked, both real and imaginary parts of the diagonal entries of Ω i j ( t ) Open image in new window and Ω ̂ i j ( t ) Open image in new window are drawn from a uniform distribution on the interval (0,10), the matrices N i j H C m × m Open image in new window and N i j S C m × m Open image in new window are a Hermitian and a complex symmetric matrix, respectively, whose real and imaginary parts are generated from a uniform distribution on the unit interval (−0.5,0.5), representing additive stationary noise, and ϵ R Open image in new window is the noise level.

In our experiments, we set m=3, k=3, n=3. First of all, we choose the noise level ϵ=0. A typical local convergence curve of our proposed algorithm is shown in Figure 1. A tendency of superlinear convergence can be observed.
Figure 1

Convergence behavior of the proposed CG algorithm.

In order to investigate the performance of the proposed algorithm in terms of estimation accuracy, we restrict ϵ∈{0.1,0.5,1.0}, and run 50 tests. The performance index is chosen to be the averaged Amari error, proposed in [30]. Generally, the smaller the Amari error, the better the separation. The quartile based boxplot of averaged Amari errors of our proposed algorithm against three different noise levels are drawn in Figure 2. Our CG algorithm demonstrates its correspondingly delaying performance with the increasing noise levels.
Figure 2

Performance of the proposed CG algorithm.

7.2 Experiment two

In this experiment, we compare our CG based IVA approach, referred to as IVA-CG, with two second-order statistics based IVA algorithms. We refer to one contrast optimization based IVA algorithm as IVA-CO, cf. [5, 6], and the other matrix joint diagonalization based approach as IVA-JD, cf. [13]. The task of this experiment is to separate two groups of complex valued signals. We take three real audio source signals with 480,000 samples, and apply the short time Fourier transform to the sources with the number of FFT points being 1,024. By doing so, we end up with a complex IVA problem with 513 groups of statistically dependent complex signals.

For a practical implementation of our method, note that computing and jointly diagonalizing all possible cross covariance and pseudo covariance matrices between the 513 groups is prohibitively expensive. We overcome this issue by only taking two neighboring frequency bins randomly at one time. The sources from each frequency bin are mixed independently via multiplying a mixing matrix, whose entries are drawn from a normal distribution. We run the experiment 100 times, and plot the boxplot of averaged Amari errors of the three studied algorithms in Figure 3. It depicts clearly that our proposed IVA-CG algorithm outperforms the other two consistently.
Figure 3

Comparison of separation performance.

8 Conclusion

We propose a matrix joint diagonalization approach to solve the complex IVA problem which does not rely on a pre-whitening step nor on the estimation of the unknown distribution of the sources. A mathematical setting is derived that allows a formulation without ambiguity on the set of unknown parameters, i.e. the dimension of the search space is maximally reduced. This leads in a natural way to a smooth manifold structure that we call complex oblique projective manifold, due to its close relation to the oblique manifold which consists of invertible matrices with normalized columns. We propose to solve the complex IVA problem via minimizing a cost function that is based on the well-known off-norm function for measuring joint diagonality. We show that our setting leads to a non-degenerate Hessian for the solution of the IVA problem. This is an important result for the design of minimization methods, since in many cases, the speed of convergence relies on the non-degeneracy of the minima. We develop a geometric CG method for solving the IVA problem and conclude by providing some numerical experiments.

Endnote

aNote, that Q ( m ) Open image in new window is not a geodesically complete manifold.

Notes

Acknowledgements

This study had been supported by the Cluster of Excellence CoTeSys—Cognition for Technical Systems, funded by the Deutsche Forschungsgemeinschaft (DFG).

Supplementary material

13634_2012_350_MOESM1_ESM.pdf (31 kb)
Authors’ original file for figure 1
13634_2012_350_MOESM2_ESM.pdf (29 kb)
Authors’ original file for figure 2
13634_2012_350_MOESM3_ESM.pdf (30 kb)
Authors’ original file for figure 3

References

  1. 1.
    Cardoso JF: Multidimensional independent component analysis. In Proceedings of the 23rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4. (Seattle, WA, USA; 1998).Google Scholar
  2. 2.
    Hyvärinen A, Hoyer PO: Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 2000, 12(7):1705-1720. 10.1162/089976600300015312CrossRefGoogle Scholar
  3. 3.
    Araki S, Mukai R, Makino S, Nishikawa T, Saruwatari H: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process 2003, 11(2):109-116. 10.1109/TSA.2003.809193CrossRefGoogle Scholar
  4. 4.
    Lee I, Kim T, Lee TW: Independent vector analysis for convolutive blind speech separation. In Blind Speech Separation, Signals and Communication Technology. Edited by: Makino S, Lee TW, Sawada H. (Springer, Netherlands; 2007).Google Scholar
  5. 5.
    Anderson M, Li XL, Adalı T: Complex-valued independent vector analysis: application to multivariate Gaussian model. Signal Process 2012, 92(8):1821-1831. 10.1016/j.sigpro.2011.09.034CrossRefGoogle Scholar
  6. 6.
    Anderson M, Adalı T, Li XL: Joint blind source separation with multivariate Gaussian model: algorithms and performance analysis. IEEE Trans. Signal Process 2012, 60(4):1672-1683.MathSciNetCrossRefGoogle Scholar
  7. 7.
    Kim T: Real-time independent vector analysis for convolutive blind source separation. IEEE Trans. Circ. Syst. I: Regular Papers 2010, 57(7):1431-1438.CrossRefGoogle Scholar
  8. 8.
    Hao J, Lee I, Lee TW, Sejnowski TJ: Independent vector analysis for source separation using a mixture of Gaussians prior. Neural Comput 2010, 22(6):1646-1673. 10.1162/neco.2010.11-08-906MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Lee I, Jang GJ: Independent vector analysis based on overlapped cliques of variable width for frequency-domain blind signal separation. EURASIP J. Adv. Signal Process 2012, 113: 1-12.CrossRefGoogle Scholar
  10. 10.
    Bermejo S: Finite sample effects in higher order statistics contrast functions for sequential blind source separation. IEEE Signal Process. Lett 2005, 12(6):481-484.CrossRefGoogle Scholar
  11. 11.
    Ghennioui H, Fadaili EM, Thirion-Moreau N, Adib A, Moreau E: A nonunitary joint block diagonalization algorithm for blind separation of convolutive mixtures of sources. IEEE Signal Process. Lett 2007, 14(11):860-863.CrossRefGoogle Scholar
  12. 12.
    Ghennioui H, Thirion-Moreau N, Moreau E, Aboutajdine D: Gradient-based joint block diagonalization algorithms: application to blind separation of FIR convolutive mixtures. Signal Process 2010, 90(6):1836-1849. 10.1016/j.sigpro.2009.12.002CrossRefMATHGoogle Scholar
  13. 13.
    Li XL, Adalı T, Anderson M: Joint blind source separation by generalized joint diagonalization of cumulant matrices. Signal Process 2011, 91(10):2314-2322. 10.1016/j.sigpro.2011.04.016CrossRefMATHGoogle Scholar
  14. 14.
    Shen H, Kleinsteuber M: A matrix joint diagonalization approach for complex independent vector analysis. In Proceedings of the 10th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), vol. 7191 Lecture Notes in Computer Science. Edited by: Theis F, Cichocki A, Yeredor A, Zibulevsky M. (Springer-Verlag, Berlin/Heidelberg; 2012).Google Scholar
  15. 15.
    Shen H, Kleinsteuber M: Complex blind source separation via simultaneous strong uncorrelating transform. In Lecture Notes in Computer Science, Proceedings of the 9th International Conference on Latent Variable Analysis and Signal Separation, vol. 6365. (Springer-Verlag, Berlin/Heidelberg; 2010).Google Scholar
  16. 16.
    Smaragdis P: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 1998, 22: 21-34. 10.1016/S0925-2312(98)00047-2CrossRefMATHGoogle Scholar
  17. 17.
    Comon P, Jutten C: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press Inc, San Diego, USA; 2010.Google Scholar
  18. 18.
    Makino S, Lee TW, Sawada H: Blind Speech Separation Signals and Communication Technology. Springer, Netherlands; 2007.CrossRefGoogle Scholar
  19. 19.
    Hosseini S, Deville Y, Saylani H: Blind separation of linear instantaneous mixtures of non-stationarysignals in the frequency domain. Signal Process 2009, 89: 819-830. 10.1016/j.sigpro.2008.10.024CrossRefMATHGoogle Scholar
  20. 20.
    Maehara T, Murota K: Simultaneous singular value decomposition. Linear Alg. Appl 2011, 435: 106-116. 10.1016/j.laa.2011.01.007MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Absil PA, Gallivan KA: Joint diagonalization on the oblique manifold for independent component analysis. In Proceedings of the 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5. (Toulouse, France; 2006).Google Scholar
  22. 22.
    Afsari B: Sensitivity analysis for the problem of matrix joint diagonalization. SIAM J. Matrix Anal. Appl 2008, 30(3):1148-1171. 10.1137/060655997MathSciNetCrossRefGoogle Scholar
  23. 23.
    Absil PA, Mahony R, Sepulchre R: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ; 2008.CrossRefMATHGoogle Scholar
  24. 24.
    Helmke U, Hüper K, Trumpf J: Newton’s method on Graßmann manifolds. 2007.Google Scholar
  25. 25.
    Nocedal J, Wright SJ: Numerical Optimization. Springer, New York; 2006.MATHGoogle Scholar
  26. 26.
    Kleinsteuber M, Hüper K: An intrinsic CG algorithm for computing dominant subspaces. In Proceedings of the 32nd IEEE International Conference on, Acoustics, Speech, and Signal Processing (ICASSP), vol. 4. (Hawaii, USA; 2007).Google Scholar
  27. 27.
    Smith ST: Optimization techniques on Riemannian manifolds. In Hamiltonian and Gradient Flows, Algorithms and Control, Fields Institute Communications, vol. 3. Edited by: Bloch A. American Mathematical Society, Providence, RI; 1994).Google Scholar
  28. 28.
    Gabay D: Minimizing a differentiable function over a differential manifold. J. Optimiz. Theory Appl 1982, 37(2):177-219. 10.1007/BF00934767MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Ring W, Wirth B: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optimiz 2012, 22(2):596-627. 10.1137/11082885XMathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Amari SI, Cichocki A, Yang HH: A new learning algorithm for blind signal separation. In Advances in Neural Information Processing Systems (NIPS), vol. 8. Edited by: Touretzky DS, Mozer MC, Hasselmo ME. (The MIT Press, Cambridge, MA, USA; 1996).Google Scholar

Copyright information

© Shen and Kleinsteuber; licensee Springer. 2012

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Authors and Affiliations

  1. 1.The Department of Electrical Engineering and Information TechnologyTechnische Universität MünchenMünchenGermany

Personalised recommendations