# Exploiting periodicity to extract the atrial activity in atrial arrhythmias

## Abstract

Atrial fibrillation disorders are one of the main arrhythmias of the elderly. The atrial and ventricular activities are decoupled during an atrial fibrillation episode, and very rapid and irregular waves replace the usual atrial *P*-wave in a normal sinus rhythm electrocardiogram (ECG). The estimation of these wavelets is a must for clinical analysis. We propose a new approach to this problem focused on the quasiperiodicity of these wavelets. Atrial activity is characterized by a main atrial rhythm in the interval 3-12 Hz. It enables us to establish the problem as the separation of the original sources from the instantaneous linear combination of them recorded in the ECG or the extraction of only the atrial component exploiting the quasiperiodic feature of the atrial signal. This methodology implies the previous estimation of such main atrial period. We present two algorithms that separate and extract the atrial rhythm starting from a prior estimation of the main atrial frequency. The first one is an algebraic method based on the maximization of a cost function that measures the periodicity. The other one is an adaptive algorithm that exploits the decorrelation of the atrial and other signals diagonalizing the correlation matrices at multiple lags of the period of atrial activity. The algorithms are applied successfully to synthetic and real data. In simulated ECGs, the average correlation index obtained was 0.811 and 0.847, respectively. In real ECGs, the accuracy of the results was validated using spectral and temporal parameters. The average peak frequency and spectral concentration obtained were 5.550 and 5.554 Hz and 56.3 and 54.4%, respectively, and the kurtosis was 0.266 and 0.695. For validation purposes, we compared the proposed algorithms with established methods, obtaining better results for simulated and real registers.

## Keywords

Source separation Electrocardiogram Atrial fibrillation Periodic component analysis Second-order statistics## 1 Introduction

In biomedical signal processing, data are recorded with the most appropriate technology in order to optimize the study and analysis of a clinically interesting application. Depending on the different nature of the underlying physics and the corresponding signals, diverse information is obtained such as electrical and magnetic fields, electromagnetic radiation (visible, X-ray), chemical concentrations or acoustic signals just to name some of the most popular. In many of these different applications, for example, the ones based on biopotentials, such as electro- and magnetoencephalogram, electromyogram or electrocardiogram (ECG), it is usual to consider the observations as a linear combination of different kinds of biological signals, in addition to some artifacts and noise due to the recording system. This is the case of atrial tachyarrhythmias, such as atrial fibrillation (AF) or atrial flutter (AFL), where the atrial and the ventricular activity can be considered as signals generated by independent bioelectric sources mixed in the ECG together with other ancillary sources [1].

AF is the most common arrhythmia encountered in clinical practice. Its study has received and continues receiving considerable research interest. According to statistics, AF affects 0.4% of the general population, but the probability of developing it rises with age, less than 1% for people under 60 years of age and greater than 6% in those over 80 years [2]. The diagnosis and treatment of these arrhythmias can be enriched by the information provided by the electrical signal generated in the atria (*f*-waves) [3]. Frequency [4] and time-frequency analysis [5] of these *f*-waves can be used for the identification of underlying AF mechanisms and prediction of therapy efficacy. In particular, the fibrillatory rate has primary importance in AF spontaneous behavior [6], response to therapy [7] or cardioversion [8]. The atrial fibrillatory frequency (or rate) can reliably be assessed from the surface ECG using digital signal processing: firstly, extracting the atrial signal and then, carrying out a spectral analysis.

There are two main methodologies to obtain the atrial signal. The first one is based on the cancellation of the QRST complexes. An established method for QRST cancellation consists of a spatiotemporal signal model that accounts for dynamic changes in QRS morphology caused, for example, by variations in the electrical axis of the heart [9]. The other approach involves the decomposition of the ECG as a linear combination of different source signals [10]; in this case, it can be considered as a blind source separation (BSS) problem, where the source vector includes the atrial, ventricular and ancillary sources and the mixture is the ECG recording. The problem has been solved previously using independent component analysis (ICA), see [1, 11]. ICA methods are blind, that is, they do not impose anything on the linear combination but the statistical independence. In addition, the ICA algorithms based on higher-order statistics need the signals to be non-Gaussian, with the possible exception of one component. When these restrictions are not satisfied, BSS can still be carried out using only second-order statistics, in this case the restriction being sources with different spectra, allowing the separation of more than one Gaussian component.

Regardless of whether second- or higher-order statistics are used, BSS methods usually assume that the available information about the problem is minimum, perhaps the number of components (dimensions of the problem), the kind of combination (linear or not, with or without additive noise, instantaneous or convolutive, real or complex mixtures), or some restrictions to fix the inherent indeterminacies about sign, amplitude and order in the recovered sources. However, it is more realistic to consider that we have some prior information about the nature of the signals and the way they are mixed before obtaining the multidimensional recording.

One of the most common types of prior information in many of the applications involving the ECG is that the biopotentials have a periodic behavior. For example, in cardiology, we can assume the periodicity of the heartbeat when recording a healthy electrocardiogram ECG. Obviously, depending on the disease under study, this assumption applies or not, but although the exact periodic assumption can be very restrictive, a quasiperiodic behavior can still be appropriated. Anyway, the most important point is that this fact is known in advance, since the clinical study of the disease is carried out usually before the signal processing analysis. This is the kind of knowledge that BSS methods ignore and do not take into account avoiding the specialization *ad hoc* of classical algorithms to exploit all the available information of the problem under consideration.

We present here a new approach to estimate the atrial rhythm in atrial tachyarrhythmias based on the quasiperiodicity of the atrial waves. We will exploit this knowledge in two directions, firstly in the statement of the problem: a separation or extraction approach. The classical BSS separation approach that tries to recover all the original signals starting from the linear mixtures of them can be adapted to an extraction approach that estimates only one source, since we are only interested in the clinically significant quasiperiodic atrial signal. Secondly, we will impose the quasiperiodicity feature in two different implementations, obtaining an algebraic solution to the problem and an adaptive algorithm to extract the atrial activity. The use of periodicity has two advantages: First, it alleviates the computational cost and the effectiveness of the estimates when we implement the algorithm, since we will have to estimate only second-order statistics, avoiding the difficulties of achieving good higher-order statistics estimates; second, it allows the development of algorithms that focus on the recovering of signals that match a cost function that measure in one or another way the distance of the estimated signal to a quasiperiodic signal. It helps in relaxing the much stronger assumption of independence and allows the definition of new cost functions or the proper selection of parameters such as the time lag in the covariance matrix in traditional second-order BSS methods. The drawback is that the main period of the atrial rhythm must be previously estimated.

## 2 Statement of the problem

### 2.1 Observation model

*T*wave (during ventricular repolarization). The atria generate the

*P*wave (during atrial depolarization). The wave corresponding to the repolarization of the atria is thought to be masked by the higher amplitude QRS complex. Figure 1a shows a typical NSR, indicating the different components of the ECG.

During an atrial fibrillation episode, all this coordination between ventricles and atria disappears and they become decoupled [9]. In the surface ECG, the atrial fibrillation arrhythmia is defined by the substitution of the regular *P* waves by a set of irregular and fast wavelets usually referred to as *f*-waves. This is due to the fact that, during atrial fibrillation, the atria beat chaotically and irregularly, out of coordination with the ventricles. In the case that these *f*-waves are not so irregular (resembling a sawtooth signal) and have a much lower rate (typically 240 waves per minute against up to almost 600 for the atrial fibrillation case), the arrhythmia is called atrial flutter. In Figure 1b, c, we can see the ECG recorded at the lead V1 for a typical atrial fibrillation and atrial flutter episode, respectively, in order to clarify the differences from a visual point of view among healthy, atrial fibrillation and flutter episodes.

*t*can be represented as the linear combination of the decoupled atrial and ventricular sources and some other components, such as breathing, muscle movements or the power line interference:

where $\mathbf{x}\left(t\right)\in {\Re}^{12\times 1}$ is the electrical signal recorded at the standard 12 leads in an ECG recording, $\mathbf{A}\in {\Re}^{12\times M}$ is the unknown full column rank mixing matrix, and $\mathbf{s}\left(t\right)\in {\Re}^{M\times 1}$ is the source vector that assembles all the possible *M* sources involved in the ECG, including the interesting atrial component. Note that since the number of sources is usually less than 12, the problem is overdetermined (more mixtures than sources). Nevertheless, the dimensions of the problem are not reduced since the atrial signal is usually a low power component and the inclusion of up to 12 sources can be helpful in order to recover some novel source or a multidimensional subspace for some of them, for example, when the ventricular component is composed of several subcomponents defining a basis for the ventricular activity subspace due to the morphological changes of the ventricular signal in the surface ECG.

### 2.2 On the periodicity of the atrial activity

A normal ECG is a recurrent signal, that is, it has a highly structured morphology that is basically repeated in every beat. It means that classical averaging methods can be helpful in the analysis of ECGs of healthy patients just aligning in time the different heartbeats, for example, for the reduction of noise in the recordings. However, during an atrial arrhythmia, regular RR-period intervals disappear, since every beat becomes irregular in time and shape, being composed of very chaotic *f*-waves. In addition, the ventricular response also becomes irregular, with higher average rate (shorter RR intervals).

*f*-waves produces a spectrum with a high low frequency peak and some harmonics; in the case of atrial fibrillation, there also exists a main atrial rhythm, but its characteristic frequency is higher and the power distribution is not so well structured around harmonics, since the signal is more irregular than the flutter. In Figure 2, we show the spectrum for the atrial fibrillation and atrial flutter activities shown in Figure 1. As can be seen, both of them show a power spectral density concentrated around a main peak in a frequency band (narrow-band signal). In our case, the main atrial rhythms correspond to 3.88 and 7.07 Hz for the flutter and fibrillation cases, respectively; in addition, we can observe in the figure the harmonics for the flutter case. This atrial frequency band presents slight variations depending on the authors, for example, 4-9 Hz [12, 13], 5-10 Hz [14], 3.5-9 Hz [11] or 3-12 Hz [15].

Note that even in the case of a patient with atrial fibrillation, the highly irregular *f*-waves can be considered regular in a short period of time, typically up to 2 s [5]. From a signal processing point of view, this fact implies that the atrial signal can be considered a quasiperiodic signal with a time-varying *f*-wave shape. On the other hand, for the case of atrial flutter, it is usually supposed that the waveform can be modeled by a simple stationary sawtooth signal. Anyway, the time structure of the atrial rhythm guarantees that the short time spectrum is defined by the Fourier transform of a quasiperiodic signal, that is, a fundamental frequency in addition to some harmonics in the bandwidth 2.5-25 Hz [5].

*f*-waves satisfy approximately the periodicity condition:

where *P* is the period defined as the inverse of the main atrial rhythm and *n* is any integer number. Note that we assume that the signals **x**(*t*) are obtained by sampling the original periodic analog signal with a sampling period much larger than the bandwidth of the atrial activity.

**R**

_{ s }(

*τ*) =

*E*[

**s**(

*t*+

*τ*)

**s**(

*t*)

^{T}]. At the lag equal to the period, the covariance matrix becomes:

where the elements of the diagonal of **Λ**(*P*) are the covariance of the sources *Λ*_{ i }(*P* ) = *ρ*_{ si } (*P*) = *E* [*s*_{ i } (*t* + *P*) *s*_{ i } (*t*)].

We do not require the sources to be statistically independent but only second-order independent. This second-order approach is robust against additive Gaussian noise, since there is no limitation in the number of Gaussian sources that the algorithms can extract. Otherwise, the restriction is imposed in the spectrum of the sources: They must be different, that is, the autocovariance function of the sources must be different *ρ*_{ si } (*τ*). This restriction is fulfilled since the spectrum of ventricular and atrial activities is overlapping but different [16]. Taking into account Equation 5, we can assure that the covariance matrices at lags multiple of *P* will be also diagonal with one entry being almost the same, the one corresponding to the autocovariance of the atrial signal.

## 3 Methods

### 3.1 Periodic component analysis of the electrocardiogram in atrial flutter and fibrillation episodes

*s*

_{ A }(

*t*) can be expressed as:

*s*

_{ A }(

*t*) with a maximal periodic structure by means of estimating the recovering vector (

**w**). In mathematical terms, we establish the following equation as a measure of the periodicity [17]:

*P*is the period of interest, that is, the inverse of the fundamental frequency of the atrial rhythm. Note that

*p*(

*P*) is 0 for a periodic signal with period

*P*. This equation can be expressed in terms of the covariance matrix of the recorded ECG,

**C**

_{ x }(

*τ*) =

*E*{

**x**(

*t*+

*τ*)

**x**(

*t*)

^{T}}:

As stated in [17], the vector **w** minimizing Equation 8 corresponds to the eigenvector of the smallest generalized eigenvalue of the matrix pair (**A** _{ x }(*P*), **C** _{ x }(0)), that is, **U**^{T}**A** _{ x }(*P*)**U** = **D**, where **D** is the diagonal generalized eigenvalue matrix corresponding to the eigenmatrix **U** that simultaneously diagonalizes **A** _{ x }(*P*) and **C** _{ x }(0), with real eigenvalues sorted in descending order on its diagonal entries.

The covariance matrix must be estimated at the pseudoperiod of the atrial signal. The next subsection explains how to obtain this information. Once the pair $\left({\widehat{\mathbf{C}}}_{x}\left(P\right),{\mathbf{C}}_{x}\left(0\right)\right)$ is obtained, the transformed signals are **y**(*t*) = **U**^{T}**x**(*t*) corresponding to the periodic components. The elements of **y**(*t*) are ordered according to the amount of periodicity close to the *P* value, that is, *y*_{1}(*t*) is the estimated atrial signal since it is the most periodic component with respect to the atrial frequency. In other words, attending to the previously estimated period *P*, the *y*_{ i } (*t*) component is less periodic in terms of *P* than *y*_{ j } (*t*) for *i > j*.

Regarding the algorithms focused on the extraction of only one component, periodic component analysis allows the possibility to assure the dimension of the subspace of the atrial activity observing the first components in **y**(*t*). With respect to the BSS methods, it allows the correct extraction of the atrial rhythm in an algebraic way, with no postprocessing step to identify it among the rest of ancillary signals nor the use of a previous whitening step to decouple the components, since we know that at least the first one *y*_{1}(*t*) belongs to the atrial subspace. The fact that we can recover more components can be helpful in situations where the atrial subspace is composed of more than one atrial signal with similar frequencies. In that case, instead of discarding all the components of the vector **y**(*t*) but the first one, we could keep more than one.

If we are interested in a sequential algorithm instead of in a batch type solution such as the periodic component analysis, we can exploit the fact that the vector **x**(**t**) in Equation 1 can be understood as a linear combination of the columns of matrix **A** instead of as a mixture of sources defined by the rows of **A**, that is, the contribution of the atrial component to the observation vector is defined by the corresponding column **a**_{ i } in the mixing matrix **A**. Following this interpretation of Equation 1, one intuitive way to extract the *i* th source is to project *x*(*t*) onto the space in ${\Re}^{12\times 1}$ orthogonal to, denoted by ⊥, all of the columns of **A** except *a*_{ i } , that is, {*a*_{1}, ..., *a*_{i-1}, *a*_{i+1}, ..., *a*_{12}}.

Therefore, the optimal vector **w** that permits the extraction of the atrial source can be obtained by forcing *s*_{ A } (*t*) to be uncorrelated with the residual components in ${\mathbf{E}}_{{\mathbf{w}}^{\perp |\mathbf{t}}}=\mathbf{I}-\left(\mathbf{t}{\mathbf{w}}^{\mathsf{\text{T}}}\u2215{\mathbf{w}}^{\mathsf{\text{T}}}\mathbf{t}\right)$, the oblique projector onto direction **w**^{⊥}, that is, the space orthogonal to **w**, along **t** (direction of **a** _{ i }, the column *i* of the mixing matrix **A** when the atrial component is the *i* th source). The vector **w** is defined for the case of 12 sources as **w**⊥span {**a**_{1}, ..., **a**_{i-1}, **a**_{i+1}, ..., **a**_{1}2}.

where *d*_{0}, *d*_{1}, ..., *d*_{ Q } are *Q* + 1 unknown scalars and ||·|| denotes the Euclidean length of vectors. In order to avoid the trivial solution, the constraints ||**t**|| = 1 and ||[ *d*_{0}, *d*_{1}, ..., *d*_{ Q } ]|| = 1 are imposed. One source is perfectly extracted if **R** _{ x }(*τ*)**w** = *d*_{ τ }**t**, because **t** is collinear with one column vector in **A**, and **w** is orthogonal to the other *M* - 1 column vectors in the mixing matrix.

*Q*+1 covariance matrices

**R**

_{ x }(

*τ*) at time lags the multiple periods of the main atrial rhythm

*τ*= 0

*, P*, ...,

*QP*, the restriction || [

*d*

_{0},

*d*

_{1}, ...,

*d*

_{Q}] || = 1 implies $d0=d1=\cdots =dQ=\frac{1}{\sqrt{Q+1}}$, that is, the vector of unknown scalars

*d*

_{0},

*d*

_{1}, ...,

*d*

_{Q}is fixed and the cost function must be maximized only with respect to the extracting vector. The final version of the algorithm (we omit details, see [18]) is:

Regardless of the algorithm we follow, the algebraic or sequential solution, both of them require an initial estimation of the period *P* as a parameter.

### 3.2 Estimation of the atrial rhythm period

An initial estimation of the atrial frequency must be first addressed. Although the ventricular signal amplitude (QRST complex) is much higher than the atrial one, during the *T - Q* intervals, the ventricular amplitude is very low. From the lead with higher AA, usually V1 [12], the main peak frequency is estimated using the Iterative Singular Spectrum Algorithm (ISSA) [15]. ISSA consists of two steps: In the first one, it fills the gaps obtained on an ECG signal after the removal of the QRST intervals; in the second step, the algorithm locates the dominant frequency as the largest peak in the interval [3,12] Hz of the spectral estimate obtained with a Welch's periodogram.

To fill the gaps after the QRST intervals are removed, SSA embeds the original signal V1 in a subspace of higher-dimension *M*. The *M*-lag covariance matrix is computed as usual. Then, the singular value decomposition (SVD) of the *MxM* covariance matrix is obtained so the original signal can be reconstructed with the SVD. Excluding the dimensions associated with the smaller eigenvalues (noise), the SSA reconstructs the missing samples using the eigenvectors of the SVD as a basis. In this way, we can obtain an approximation of the signal in the QRST intervals that from a spectral point of view is better than other polynomial interpolations.

## 4 Materials

### 4.1 Database

We will use simulated and real ECG data in order to test the performance of the algorithms under controlled (synthetic ECG) and real situations (real ECG). The simulated signals come from [11] (see Section 4.1 in [11] for details about the procedure to generate them); the most interesting property of these signals is that the different components correspond to the same patient and session (preserving the electrode position), being only necessary the interpolation during the QRST intervals for the atrial component. The data were provided by the authors and consist of ten recordings, four marked as "atrial flutter" (AFL) and six marked as "atrial fibrillation" (AF). The real recording database contains forty-eight registers (ten AFL and thirty eight AF) belonging to a clinical database recorded at the Clinical University Hospital, Valencia, Spain. The ECG recordings were taken with a commercial recording system with 12 leads (Prucka Engineering Cardiolab system). The signals were digitized at 1,000 samples per second with 16 bits resolution.

In our experiments, we have used all the available leads for a period of 10 s for every patient. The signals were preprocessed in order to reduce the baseline wander, high-frequency noise and power line interference for the later signal processing. The recordings were filtered with an 8-coeffcient highpass Chebyshev filter and with a 3-coeffcient lowpass Butterworth filter to select the bandwidth of interest: 0.5-40 Hz. In order to reduce the computational load, the data were downsampled to 200 samples per second with no significant changes in the quality of the results.

### 4.2 Performance measures

*ρ*) between the true atrial signal,

*x*

_{ A }(

*t*), and the extracted one, ${\widehat{x}}_{A}\left(t\right)$; for unit variance signals and ${m}_{{x}_{A}},{m}_{{\widehat{x}}_{A}}$ is the means of the signals:

where *Pa*(*f*) is the power spectrum of the extracted atrial signal ${\widehat{x}}_{A}\left(t\right)$ and *f*_{ p } is the fibrillatory frequency peak (main peak frequency in the 3-12 Hz band). A large SC is usually understood as a good extraction of the atrial *f*-waves because a more concentrated spectrum implies better cancellation of low- and high-frequency interferences due to breathing, QRST complexes or power line signal.

In time domain, the validation of the results with the real recordings will be carried out using kurtosis [19]. Although the true kurtosis value of the atrial component is unknown, a large value of kurtosis is associated with remaining QRST complexes and consequently implies a poor extraction.

### 4.3 Statistical analysis

Parametric or nonparametric statistics were used depending on the distribution of the variables. Initially, the Jarque-Bera test was applied to assess the normality of the distributions, and later, the Levene test proved the homoscedasticity of the distributions. Next, the statistical tests used to analyze the data were ANOVA or Kruskal-Wallis. Statistical significance was assumed for *p* < 0.05.

## 5 Results

The proposed algorithms were exhaustively tested with the synthetic and real recordings explained in the previous section. We refer to them as periodic component analysis (piCA) and periodic sequential approximate diagonalization (pSAD). The prior information (initial period $\left(\tilde{P}\right)$) was estimated for each patient from the lead V1 and was calculated as the inverse of the initial estimation of the main peak frequency $\left(\tilde{p}=1\u2215{\tilde{f}}_{p}\right)$. In addition, for comparison purposes, we indicate the results obtained by two established methods in the literature: spatiotemporal QRST cancellation (STC) [9] and spatiotemporal blind source separation (ST-BSS) [11].

### 5.1 Synthetic recordings

*ρ*) and peak frequency $\left({\widehat{f}}_{p}\right)$ values obtained by the algorithms (the two proposed and the two established algorithms). The mean true fibrillatory frequency is 3.739 Hz for the AFL case and 5.989 Hz for the AF recordings (remember that in the atrial flutter case, the

*f*-waves are slower and less irregular). The spectral analysis was carried out with the modified periodogram using the Welch-WOSA method with a Hamming window of 4,096 points length, a 50% overlapping between adjacent windowed sections and an 8,192-point fast Fourier transform (FFT).

Correlation values (*ρ*) and peak frequency $\left({\widehat{f}}_{p}\right)$ obtained by the algorithms piCA, pSAD, STC and ST-BSS in the case of synthetic registers for AFL and AF.

piCA | pSAD | STC | ST-BSS | |
---|---|---|---|---|

| ||||

| 0.822 ± 0.116 | 0.884 ± 0.046 | 0.708 ± 0.080 | 0.792 ± 0.206 |

${\widehat{f}}_{p}\left(Hz\right)$ | 3.742 ± 0.126 | 3.647 ± 0.230 | 3.721 ± 0.230 | 4.155 ± 0.997 |

| ||||

| 0.804 ± 0.080 | 0.823 ± 0.078 | 0.709 ± 0.097 | 0.789 ± 0.072 |

${\widehat{f}}_{p}\left(Hz\right)$ | 5.981 ± 0.812 | 5.974 ± 0.813 | 5.927 ± 0.788 | 5.974 ± 0.814 |

The extraction with the proposed algorithms is very good, with cross-correlation above 0.8 and with a very accurate estimation of the fibrillatory frequency. Compared to the STC and ST-BSS methods, the results obtained by piCA and pSAD are better, as we can observe in Table 1.

*ρ*) and the true (

*f*

_{ p }) and estimated main atrial rhythm or fibrillatory frequency peak $\left({\widehat{f}}_{p}\right)$ for the four AFL and six AF recordings. For the sake of simplicity, Figure 4 only shows the results for the two new algorithms. The behavior of both algorithms is quite similar; only for patient 2 in the AFL case, the performance of pSAD is clearly better than piCA.

We conclude that both algorithms perform very well for the synthetic signals and must be tested with real recordings, with the inconvenience that objective error measures cannot be obtained since there is no grounded atrial signal to be compared to.

### 5.2 Real recordings

In the case of real recordings, we cannot compute the correlation since the true *f*-waves are not available. To assess the quality of the extraction, the typical error measures must be now substituted by approximative measurements. In this case, SC and kurtosis will be used to measure the performance of the algorithms in frequency and time domain. In addition, we can still compute the atrial rate, that is, the main peak frequency, although again we cannot measure its goodness in absolute units. SC and ${\widehat{f}}_{p}$ values were obtained from the power spectrum using the same estimation method as in the case of synthetic recordings.

We start to consider the extraction as successful when the extracted signal has a SC value higher than 0.30 [15] and a kurtosis value lower than 1.5 [11]. Both thresholds are established heuristically in the literature. We have confirmed these values in our experiments analyzing visually the estimated atrial signals when these restrictions are satisfied simultaneously. Hence, the comparison of the atrial activities obtained for the same patient by the different methods is straightforward: The signal with lowest kurtosis and largest SC will be the best estimate.

Spectral concentration (SC), kurtosis and peak frequency $\left({\widehat{f}}_{p}\right)$ obtained by the algorithms in the case of real registers.

piCA | pSAD | STC | ST-BSS | |
---|---|---|---|---|

| ||||

| 0.687 ± 0.126 | 0.600 ± 0.151 | 0.378 ± 0.092 | 0.661 ± 0.134 |

Kurtosis | -0.610 ± 0.350 | -0.007 ± 1.728 | 1.866 ± 1.260 | -0.543 ± 0.295 |

${\widehat{f}}_{p}\left(Hz\right)$ | 4.117 ± 0.783 | 4.114 ± 0.780 | 5.139 ± 1.455 | 4.444 ± 1.048 |

| ||||

| 0.527 ± 0.114 | 0.529 ± 0.112 | 0.380 ± 0.133 | 0.438 ± 0.164 |

Kurtosis | 0.497 ± 1.020 | 0.874 ± 2.134 | 7.886 ± 18.746 | 0.138 ± 0.563 |

${\widehat{f}}_{p}\left(Hz\right)$ | 5.927 ± 1.067 | 5.933 ± 1.067 | 6.115 ± 1.065 | 5.881 ± 1.083 |

To check whether the performances of the new algorithms are statistically different, we calculated the statistical significance with the corresponding test for the SC, kurtosis and frequency. We found no significant differences between piCA and pSAD as we expected after seeing Figure 5, since the results are quite similar for many recordings. On the other hand, when comparing piCA and pSAD with STC and ST-BSS in all the cases, there were statistically significant differences (*p* < 0.05) for SC and kurtosis parameters. All the algorithms estimated the frequency with no statistically significant differences.

*f*-waves obtained by pSAD (top) and piCA (middle) scaled by the factor associated with its projection onto the lead V1. In addition, we show the signal recorded in lead V1 (bottom). As can be seen, they are almost identical (this is not surprising since the SC and kurtosis values in Figure 5 are the same for this patient); during the nonventricular activity periods, the estimated and the V1 signals are very similar (the algorithms basically canceled the baseline); during the QRS complexes, the algorithms were able to subtract the high-amplitude ventricular component, remaining the atrial signal without discontinuities.

*f*-waves obtained by the two algorithms are not exactly the same for the 48 recordings. The recordings where the estimated signals are clearly different are number 2 and 8 for AFL and number 2 for AF case. We will analyze these three cases in detail. For patient number 8 with AFL, the kurtosis value is high for pSAD algorithm. Observing the signal in time (Figure 7, atrial signal recovered by pSAD (top) and by piCA (middle), both scaled by the factor associated with its projection onto the lead V1, and lead V1 (bottom)), we can see that it is due to one ectopic beat located around second 5.8 which pSAD was not able to cancel. If we do not include it in the estimation of the kurtosis, it is reduced to 0.9, a close to Gaussian distribution as we expected. This result confirms the goodness of kurtosis as an index to measure the quality of the extraction. Note that since it is very sensitive to large values of the signal, it is a very good detector of residual QRST complexes.

*f*-waves in Figure 8. This case does not correspond to an algorithm failure, but it is due to a problem with the recording. Nevertheless, the algorithms recover a quasiperiodic component and for the case of pSAD even with an acceptable kurtosis value (it is able to cancel the beats between seconds 6 and 8 of the recording).

The solution is algebraic, and there is no adaptive learning. The first recovered signal is clearly the cleanest atrial component (remember that one advantage of piCA with respect to classical ICA-based solutions is that we do not need a postprocessing to identify the atrial component, since in piCA the recovered components are ordered by periodicity). The second one could be considered an atrial signal too, although the *f*-waves are contaminated by some residual QRST complexes, for example, in second 1 or 2.5. In fact, this second atrial component is very similar to the signal that recovers pSAD. Since pSAD is extracting only one source, it is not able to recover the atrial subspace when it includes more than one component. In this case, the problem arises because some of the QRS complexes are by chance periodic with period the half of the *f*-waves period, so the signal estimated by pSAD is also periodic with the correct period.

*f*-waves. In Figure 10, we show the extracted atrial signal for recording number 33 with AF after the first, second and fifth iteration. As we can observe, just after two iterations, the QRS complexes that are still visible after the first iteration have been canceled. The remaining large values are continuously reduced in every iteration, obtaining a very good estimate of the f-waves after five iterations.

Finally, we compared the requirements in terms of time for both algorithms. The mean and standard deviation of the time consumed by the algorithms to estimate the atrial activity for each patient were 0.0114 ± 0.0016 s for piCA and 0.0110 ± 0.0040 s for pSAD (for a fixed number of iterations of 20).

### 5.3 Influence of the estimation of the initial period

## 6 Discussion

In this Section, we discuss the characteristics of the proposed algorithms, emphasizing the advantages and drawbacks, and their relationships with the solutions based on the cancellation of the QRST complexes and BSS-ICA approach, represented by the STC and ST-BSS methods, respectively.

The algorithms piCA and pSAD use the pseudoperiodic property of the atrial activity in time domain. They do not require whitening nor the use of higher-order cumulants as found in BSS-ICA solutions. They only rely on the nonidentical spectrum of the sources and exploit the periodicity feature in a different way. The algorithm piCA is based on a cost function that measures periodicity; the establishment of such a cost function in an appropriate way allows us to obtain an algebraic solution, where the estimated components are ordered attending to this periodic criterion; the obtained algorithm has the great advantage with respect to ICA-based algorithms that it avoids the typical ordering problem due to the inherent indeterminacies of ICA and that the independence assumption is not required. On the other hand, pSAD exploits the structure of the spatial correlation matrix of the sources at different lags. Periodicity is used to select the lags adapting the general algorithm to the atrial arrhythmia problem.

The results show that although the approaches and implementations of the periodicity hypothesis are quite different, piCA and pSAD obtained similar results for synthetic and real recordings in terms of quality parameters and time consumed. Since the piCA decomposition recovers signals according to the similarity to the period value in descending order, if the error is very large, it is easy to detect that none of the recovered signals corresponds to an atrial activity. In the case of piCA, we just have to analyze the first component to be sure whether the algorithm worked or not. In addition, we can explore the first piCA signals to assure whether there are more candidates to be considered as atrial signals, defining the atrial subspace. For the pSAD algorithm, since we only obtain a signal, it is also very simple to assure the quality of the extraction (or at least if it can be considered successful or not attending to the criteria established in the paper that depend on the SC and kurtosis values of the estimated *f*-waves).

Both algorithms require an approximate value of the atrial dominant frequency as a parameter. It implies that these methods are not blind, such as classical BSS-ICA methods, since they are dependent on this parameter. This value is obtained through the ISSA algorithm, which works well even in the case of very fast heart rate, since it averages through various beats in the filling of the gaps during the QRST intervals. Nevertheless, we have analyzed the influence of the initial estimate of the frequency obtained by ISSA $\left({\tilde{f}}_{p}\right)$ in the performance of the algorithms. The piCA algorithm is very robust to poor estimates of the initial atrial rhythm period, that is, the performance of the algorithm does not deteriorate too much for the studied interval of the initial period. This is because piCA searches for the closest periodic signal to the initial period; when the initial value is not the correct one, the algorithm is still looking for a periodic signal in the interval, and the only one is the atrial activity. Of course, the better the initial estimation, the better the quality of the extraction. In the case of pSAD, the algorithm can obtain a good estimation of the AA when the initial period changes up to 5 samples in absolute value (± 1 Hz), that is, it is not so robust. The reason is that when the initial frequency is far from the correct one, the assumption ${d}_{0}={d}_{1}=\cdots ={d}_{Q}=\frac{1}{\sqrt{Q+1}}$ is not correct (Equation 12), since the time lags *τ* = 0, *P*, ..., *QP* are not multiple of the true period in this case.

The comparison of the results has been carried out using SC and kurtosis. When two methods are compared for the same recording, the one with larger SC value is usually considered as the algorithm that performed better. We must remark that this is not an absolute error measurement, since we do not have access to the true atrial signal for the case of real recordings, since it is unknown by definition unless we can obtain a clean recording of only the atrial activity thanks to an invasive procedure. On the other hand, the different statistical properties of the atrial (most often a sub-Gaussian close to Gaussian variable, i.e., a distribution with small negative kurtosis value) and ventricular activities (super-Gaussian random variable, i.e., a heavy-tailed distribution with positive kurtosis value) can be used to measure the cancellation of the ventricular rhythm in the estimated *f*-waves. Note that kurtosis is very sensitive to outliers, so a remaining ventricular activity will reveal as a large kurtosis value, far away from the theoretical value for the atrial rhythm (kurtosis around zero value). For synthetic and real recordings, both algorithms obtained better results than STC and ST-BSS algorithms. Only for the case of the kurtosis and AF patients, ST-BSS obtained a better result than the proposed methods. This result was expected since the first step of ST-BSS removes sources with kurtosis greater than 1.5 before executing SOBI algorithm [20] in the second step.

One limitation of the new algorithms, as it is usual in the BSS-ICA methods, is that they do not preserve the amplitude of the atrial signal, since all of them are based on the model **x**(*t*) = **As**(*t*), so the source vector can be multiplied by a constant factor and the mixing matrix divided by the same factor, obtaining the same recorded ECG. This is not the case of the methods based on the cancellation of the QRST complexes. Since they are based on the subtraction of templates of the QRST complexes, they preserve the amplitude of the original ECG. The main problem of this framework is the reduction of performance when a high-quality QRST cancellation template is difficult to obtain. This is the case of clinical practice where no more than 10 s are available [21], as it happens in our study. This fact explains the poor results obtained by STC for some registers as we mentioned in the Results section. Other limitations are their high sensitiveness to variations in QRST morphology or the difficulty of finding the optimal selection of the complexes to generate the template [22].

## 7 Conclusion

We have presented a new approach to solve the problem of the extraction of the atrial activity for atrial arrhythmias. We have shown that the periodicity of the atrial signal can be exploited in two different ways: in a classical BSS approach based on second-order statistics helping in the selection of the time lags where the correlation function is computed (pSAD) and in a novel way introducing a cost function that is related to the periodicity (pICA). The methods depend on the previous estimation of the period or main atrial frequency. As the results have shown, both methods work very well, analyzing the influence of the quality of this initial estimation in the performance of the methods. In addition, we have compared the results with two well-established methodologies and discussed the limitations and advantages of all of them. The proposed methods work very well in the case of high-dimensional recordings such as 12 lead ECGs and where it is not difficult to obtain a rough estimate of the main atrial frequency.

## Endnotes

^{a}This paper is in part supported by the Valencia Regional Government (Generalitat Valenciana) through project GV/2010/002 (Conselleria d'Educacio) and by the Universidad Politecnica de Valencia under grant no. PAID-06-09-003-382.

## Notes

### Acknowledgements

The authors would like to thank Roberto Sassi for his collaboration in the estimation of the initial frequencies and to Francisco Castells and Jose Millet for sharing the AF synthetic and real database obtained with the help of the cardiologists Ricardo Ruiz and Roberto Garcia-Civera during the project TIC2002-00957.

## Supplementary material

## References

- 1.Rieta J, Castells F, Sanchez C, Zarzoso V, Millet J:
*IEEE Trans Biomed Eng*. 2004,**51**(7):1176. 10.1109/TBME.2004.827272CrossRefGoogle Scholar - 2.
- 3.Sörnmo L, Stridh M, Husser D, Bollmann A, Olsson S:
*Philos Trans A*. 2009,**367**(1887):235. 10.1098/rsta.2008.0162CrossRefGoogle Scholar - 4.Bollmann A, Husser D, Mainardi L, Lombardi F, Langley P, Murray A, Rieta J, Millet J, Olsson S, Stridh M, Sörnmo L:
*Europace*. 2006,**8**(11):911. 10.1093/europace/eul113CrossRefGoogle Scholar - 5.Stridh M, Sornmo L, Meurling C, Olsson S:
*IEEE Trans Biomed Eng*. 2004,**51**(1):100. 10.1109/TBME.2003.820331CrossRefGoogle Scholar - 6.Asano Y, Saito J, Matsumoto K, Kaneko K, Yamamoto T, Uchida M:
*Am J Cardiol*. 1992,**69**(12):1033. 10.1016/0002-9149(92)90859-WCrossRefGoogle Scholar - 7.
- 8.Manios E, Kanoupakis E, Chlouverakis G, Kaleboubas M, Mavrakis H, Vardas P:
*Cardiovasc Res*. 2000,**47**(2):244. 10.1016/S0008-6363(00)00100-0CrossRefGoogle Scholar - 9.
- 10.Castells F, Igual J, Rieta J, Sanchez C, Millet J:
*Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'03)*. 2003.,**5:**Google Scholar - 11.Castells F, Rieta J, Millet J, Zarzoso V:
*IEEE Trans Biomed Eng*. 2005,**52**(2):258. 10.1109/TBME.2004.840473CrossRefGoogle Scholar - 12.Petrutiu S, Ng J, Nijm G, Al-Angari H, Swiryn S, Sahakian A:
*IEEE Eng Med Biol Mag*. 2006,**25**(6):24.CrossRefGoogle Scholar - 13.Stridh M, Bollmann A, Olsson S, Sornmo L:
*IEEE Eng Med Biol Mag*. 2006,**25**(6):31.CrossRefGoogle Scholar - 14.Langley P, Bourke J, Murray A:
*Computers in Cardiology*. 2000.Google Scholar - 15.Sassi R, Corino V, Mainardi L:
*Ann Biomed Eng*. 2009,**37**(10):2082-921. 10.1007/s10439-009-9757-3CrossRefGoogle Scholar - 16.Llinares R, Igual J, Salazar A, Camacho A:
*Digit Signal Process*. 2011,**21**(2):391. 10.1016/j.dsp.2010.06.005CrossRefGoogle Scholar - 17.
- 18.
- 19.Llinares R, Igual J, Miró-Borrás J:
*Comput Biol Med*. 2010,**40**(11-12):943. 10.1016/j.compbiomed.2010.10.006CrossRefGoogle Scholar - 20.Belouchrani A, Abed-Meraim K, Cardoso J, Moulines E:
*IEEE Trans Signal Process*. 1997,**45**(2):434. 10.1109/78.554307CrossRefGoogle Scholar - 21.Lemay M, Vesin J, van Oosterom A, Jacquemet V, Kappenberger L:
*IEEE Trans Biomed Eng*. 2007,**54**(3):542.CrossRefGoogle Scholar - 22.Alcaraz R, Rieta J:
*Physiol Meas*. 2008,**29**(12):1351. 10.1088/0967-3334/29/12/001CrossRefGoogle Scholar

## Copyright information

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.