Reducing the PAPR in FBMC-OQAM systems with low-latency trellis-based SLM technique

  • S.S. Krishna Chaitanya Bulusu
  • Hmaied Shaiek
  • Daniel Roviras
Open Access


Filter-bank multi-carrier (FBMC) modulations, and more specifically FBMC-offset quadrature amplitude modulation (OQAM), are seen as an interesting alternative to orthogonal frequency division multiplexing (OFDM) for the 5th generation radio access technology. In this paper, we investigate the problem of peak-to-average power ratio (PAPR) reduction for FBMC-OQAM signals. Recently, it has been shown that FBMC-OQAM with trellis-based selected mapping (TSLM) scheme not only is superior to any scheme based on symbol-by-symbol approach but also outperforms that of the OFDM with classical SLM scheme. This paper is an extension of that work, where we analyze the TSLM in terms of computational complexity, required hardware memory, and latency issues. We have proposed an improvement to the TSLM, which requires very less hardware memory, compared to the originally proposed TSLM, and also have low latency. Additionally, the impact of the time duration of partial PAPR on the performance of TSLM is studied, and its lower bound has been identified by proposing a suitable time duration. Also, a thorough and fair comparison of performance has been done with an existing trellis-based scheme proposed in literature. The simulation results show that the proposed low-latency TSLM yields better PAPR reduction performance with relatively less hardware memory requirements.


5G Dynamic programming Computational complexity FBMC-OQAM PAPR SLM Trellis-based 

1 Introduction

Filter-bank multi-carrier (FBMC)-based systems, clubbed with offset quadrature amplitude modulation (OQAM), is being seriously considered for future communication systems. FBMC-OQAM has many attractive features such as excellent frequency localization, a power spectral density (PSD) with very low side lobes, an improved robustness to time-variant channel characteristics, and carrier frequency offsets. Armed with these properties, FBMC-OQAM seems to be a more suitable candidate as a radio waveform for 5G radio access technology (RAT) than orthogonal frequency division multiplexing (OFDM), especially for asynchronous devices [1]. However, FBMC-OQAM, as a multi-carrier technique, has a high peak-to-average power ratio (PAPR). There is an essential need to introduce novel methods relevant to PAPR reduction. In this paper, we mainly focus on PAPR reduction using probabilistic schemes.

Although several classifications of the PAPR reduction methods for OFDM do exist, there is a notable classification with five categories which are as follows: clipping effect transformations [2], coding [3], frame superposition: tone reservation (TR) [4], expansible constellation point: tone injection (TI) [5] and active constellation extension (ACE) [6] and probabilistic schemes: selected mapping (SLM) [7] and partial transmit sequence (PTS) [8]. The classical schemes, proposed for OFDM, cannot be directly applied to FBMC-OQAM, owing to their overlapping symbol structure. Off late, some PAPR schemes have been suggested for FBMC-OQAM systems, namely, ACE [9], Iterative clipping [10, 11], ACE combined with TR [12] and TR [13, 14].

Coming to recently proposed probabilistic schemes, three symbol-by-symbol-based schemes have been proposed in [15, 16, 17]. In [18], a trellis-based PTS scheme with multi-block joint optimization (MBJO) has been introduced. Inspired by this trellis-based approach, a novel trellis-based SLM (TSLM) scheme has been presented in [19]. However, the existing TSLM technique needs very high hardware memory, which also impacts the latency. So, in this paper, we have proposed a low-latency TSLM, which needs very low hardware memory and thereby avoiding latency issues. A thorough and fair comparison of performance has been done with existing probabilistic schemes, overlapped SLM (OSLM) [16], dispersive SLM (DSLM) [17], and MBJO-PTS [18]. The simulation results show that there is a tradeoff between hardware memory and PAPR reduction and also that low-latency TSLM yields better performance with relatively low computational complexity and low latency and requires less hardware memory.

The rest of the paper is organized as follows: Section 2 gives a brief overview of the FBMC-OQAM signal structure and the impact of their overlapping nature. Section 3 presents the analysis of PAPR in FBMC-OQAM signals, along with abridged introduction to the classical SLM scheme. In Section 3.3, we briefly discuss about the exhaustive search. Section 4 presents the idea of trellis-based approach with its capability in achieving an optimal PAPR reduction performance along with the TSLM algorithm. In the same section, we propose the low-latency TSLM algorithm. In Section 5, the computational complexity of probabilistic schemes are derived. In Section 6, the simulation results are presented, and the conclusion of the paper is given in Section 7.

2 Overview of FBMC-OQAM system

Let us consider that we need to transmit M×N complex symbols in a FBMC-OQAM system over N tones. Then, we transmit real symbols at interval \(\frac {T}{2}\), where T is the symbol period [20]. In OQAM mapping, the M complex input symbol vectors {X 0,X 1,…,X M−1} are mapped into 2M real symbols {a 0,n ,a 1,n ,…,a 2M−1,n }. After this OQAM mapping, the real symbols undergo poly-phase filtering that involves IFFT transformations along with filtering by a synthesis filter bank. The obtained continuous-time base-band FBMC-OQAM signal x(t) can be written as [21]
$$\begin{array}{*{20}l} x(t)=&~\mathcal G\left\{\mathbf{X}_{0}, \mathbf{X}_{1},\ldots,\mathbf{X}_{M-1}\right\}\\ =&\sum\limits_{{m'}=0}^{2M-1}\sum\limits_{n=0}^{N-1}a_{m',n}h(t-m'T/2)e^{j\frac{2\pi}{T}nt}e^{j\varphi_{m',n}}, \end{array} $$
  • x(t)≠0 from \(t=[0, \left (M-\frac {1}{2}\right)T+4T)\)

  • \(\mathcal G\{.\}\) is the FBMC-OQAM modulation function

  • \(a_{m^{\prime },n}\phantom {\dot {i}\!}\) are OQAM mapped real symbols from X m

  • h(t) is the prototype filter impulse response

  • \(\varphi _{m^{\prime },n}\phantom {\dot {i}\!}\) is the phase term, equals to \(\frac {\pi }{2}(m'+n)-\pi m'n\)

The prototype filter used in this paper is the one designed in the European PHYDYAS project, whose most significant parameter is the duration of its impulse response also known as overlapping factor, K. For K=4, the h(t) is given by [22]. FBMC-OQAM signals have overlapping nature. We can see in Fig. 1 that the duration of the impulse response in the case of rectangular filter used in OFDM is T, whereas the duration of h(t) spreads beyond one symbol period, and this impacts the FBMC-OQAM signal, causing adjacent FBMC-OQAM symbols to overlap.
Fig. 1

Illustration of the ideal mean power profile of FBMC-OQAM symbols

3 Probabilistic PAPR reduction schemes for OFDM and their adaptation for FBMC-OQAM

3.1 PAPR

For a continuous-time base-band FBMC-OQAM signal x(t) that is transmitted during a symbol period T, the PAPR is defined by
$$\begin{array}{*{20}l} \text{PAPR}_{x(t)}=\frac{\max_{t\in T}|{x(t)}|^{2}}{\frac{1}{T}{\int\limits_{0}^{T}}|{x(t)}|^{2}.dt}. \end{array} $$

The complementary cumulative density function (CCDF) of PAPR of a signal quantifies how frequent the PAPR exceeds a given threshold value γ, and it is defined as P r{PAPR x[n]γ}.

3.2 Selected mapping for OFDM signals

SLM was introduced in [7], where we generate U complex phase rotation vectors ϕ (u), for 0≤uU−1, of length N as:
$$\begin{array}{*{20}l} {\boldsymbol{\phi}}^{(u)}=\left\{ \begin{array}{ll} \left(1,\ldots,1\right)^{\mathsf{T}},&u=0,\\ \left({\phi}_{0}^{(u)},\ldots,{\phi}_{N-1}^{(u)}\right)^{\mathsf{T}},&1\leq u\leq U-1, \end{array}\right. \end{array} $$
where \(\phi _{k}^{(u)}\) is the kth element of ϕ (u) defined as
$$\begin{array}{*{20}l} {}\phi_{k}^{(u)} \!= e^{j\psi^{(u)}_{k}} \!\in\! \mathbb{C},\! ~0 \!\leq \! u \!\leq\! U \,-\, 1,\! ~0 \!\leq\! k\leq\! N \,-\, 1,\! ~\psi^{(u)}_{k} \!\in\,[\!0,2\pi). \end{array} $$
The frequency-domain input symbols X with N tones are phase rotated by U phase rotations vectors of size N as given below
$$\begin{array}{*{20}l} \mathbf{X}^{(u)}=\mathbf{X}~\odot~{\boldsymbol{\phi}}^{(u)},~0\leq u\leq U-1, \end{array} $$
where ⊙ denotes the carrier-wise point-to-point multiplication. By applying IFFT operation, we obtain the U time-domain signal patterns {x (0)(t),x (1)(t),…,x (U−1)(t)}. The target of the optimization problem is to identify the signal \({x^{(u_{\text {min}})(t)}}\phantom {\dot {i}\!}\) that has the least PAPR so that
$$\begin{array}{*{20}l} u_{\text{min}}=\underset{0\leq u \leq U-1} {\mathrm{arg~min}}\left[\text{PAPR}_{x^{(u)}(t)}\right]. \end{array} $$

In the index of the respective phase rotation vector, u min is sent to a receiver as side information (SI), comprising log2U bits. If SI is error-protected, then BER of SLM is the same as the original OFDM.

Recently, some symbol-by-symbol based schemes have been proposed for FBMC-OQAM such as, OSLM [16] and DSLM [17]. The sub-optimality of any symbol-by-symbol approach is effectively dealt in [19], where it has been shown that whatever improvement that has been achieved for one symbol can probably be hampered by its immediate next symbol.

3.3 Exhaustive search

In order to achieve the optimal performance in PAPR reduction, one need to consider all the possible U phase rotations for all M symbols and pick out the best one out of the U M different combinations. In practical sense, it is meaningless to perform this exhaustive search, since it adds mammoth complexity to the implementation of any SLM-based scheme. To deal with the similar problem in the case of PTS, a trellis-based PTS scheme with multi-block joint optimization (MBJO) has been introduced in [18]. Nevertheless, for small values of U and M, simulation results will be presented in order to quantify the gap between the proposed method, TSLM, and the optimal exhaustive search.

4 Overview on trellis-based approach and TSLM algorithm

In order to circumvent the high computational complexity of exhaustive search, we opt for the dynamic programming, which can help in reducing substantially the number of paths one need to pick [23]. At any transition between two stages, we have U 2 paths to compare, and for M FBMC-OQAM symbols, we have totally M−1 transitions. Therefore, the TSLM scheme needs to search only U 2(M−1) paths. This is due to eliminating certain paths by evaluating them based on a metric. If we have to transmit M input symbol vectors {X 0,X 1,…,X M−1}, then we need to find Θ, which is the optimal set of M different phase rotation vectors that give the best PAPR
$$\begin{array}{*{20}l} \boldsymbol{\Theta}=\left\{\boldsymbol{\phi}^{\left(u_{\text{min}}^{0}\right)}, \boldsymbol{\phi}^{\left(u_{\text{min}}^{1}\right)},\ldots,\boldsymbol{\phi}^{\left(u_{\text{min}}^{M-1}\right)}\right\}, \end{array} $$
where \(\left \{u_{\text {min}}^{0}, u_{\text {min}}^{1},\ldots,u_{\text {min}}^{M-1}\right \}\) are the indices of the optimal phase rotation vectors for the M input symbol vectors, which are to be sent to the receiver as SI. With M FBMC-OQAM symbols and U phase rotation vectors, we need to find the best path in the trellis of Fig. 2 that gives the lowest PAPR. Choosing an optimal path in the trellis means finding the multiplicative vectors by solving (6), with the help of a trellis diagram.
Fig. 2

Illustration of the trellis diagram between M stages composed of U states

For 0≤mM−1, every mth FBMC-OQAM symbol x m (t), obtained from modulation of input symbol vector X m , is represented as the mth stage in the trellis at time instant mT. At each stage, there will be U different states, representing the rotated FBMC-OQAM symbols. Among these states, any ith trellis state indicates rotation by phase vector ϕ (i). Between every two stages, there exist U 2 possible paths. The joint FBMC-OQAM modulation of the mth and (m+1)th rotated input symbol vectors \(\mathbf {X}_{m}^{(u)}\) and \(\mathbf {X}_{m+1}^{(v)}\), respectively, is represented in the trellis by the path \(\zeta ^{(u,v)}_{(m\Rightarrow m+1)}\) between the uth state in the mth stage and the vth state in the (m+1)th stage, where ⇒ represents a transition between two successive stages.

The partial PAPR that has been calculated between two stages with multiple states serves as characteristic of path metric, which can aid at identifying the U optimal paths that arrive at successive stages. Unlike a full PAPR, a partial PAPR of a signal x(t) is computed over a particular time instant T 0. For the path \(\zeta ^{(u,v)}_{(m\Rightarrow m+1)}\), its path metric \(\Gamma ^{(u,v)}_{({m},{m}+1)}\) can be written as
$$\begin{array}{*{20}l} \Gamma^{(u,v)}_{({m},{m}+1)}=f\left(\text{PPAPR}^{(u,v)}_{{m},{m}+1}\right), \end{array} $$
where f(.) is any convex function and \(\text {PPAPR}^{(u,v)}_{{m},{m}+1}\) is the partial PAPR of the path \(\zeta ^{(u,v)}_{(m\Rightarrow m+1)}\), which is to be computed over duration T 0 as
$$\begin{array}{*{20}l} {}\text{PPAPR}^{(u,v)}_{{m},{m}+1}=\frac{\max_{t\in T_{0}}|x_{{m},{m}+1}^{(u,v)}(t)|^{2}}{\frac{1}{T_{0}}\int_{T_{0}}|x_{{m},{m}+1}^{(u,v)}(t)|^{2}.dt},~0\leq u,v \leq U-1, \end{array} $$

where T 0∈[m T+T a ,m T+T b ), which is any arbitrary interval within the [m T,m T+4.5T) interval. It has to be noted that T a ≥0 and T b <4.5T. Similarly, we define a state metric Ψ (u,m) at the mth stage as a measure of optimality of cumulative path metrics of the optimal paths that arrived to this state from previous stages through various transitions. It can be evaluated simply by adding the path metric \(\Gamma ^{(w,u)}_{(m-1\Rightarrow m)}\) of the arriving optimal path \(\zeta ^{(w,u)}_{(m-1\Rightarrow m)}\) from the wth state of the previous (m−1)th stage with the state metric Ψ (w,m−1) of the wth state from which this optimal path departs.

The whole optimization problem in this regard can be viewed as a continuum of overlapping optimization sub-problems, i.e., finding a FBMC-OQAM signal with least PAPR is equivalent to obtaining the accumulation of the least peaks. This is reflected in the state metric of a given state at any stage.

4.1 TSLM algorithm

In the TSLM, every two symbols are rotated with different phase rotation vectors that are i.i.d and are FBMC-OQAM modulated. The two optimal states between the two successive stages are chosen among others, based on the least PAPR criterion that has been computed over a given time instant T 0. The TSLM algorithm involves the following steps:
  • Step 1—Initialization: Firstly, we generate M complex input symbol vectors {X 0,X 1,…,X M−1} and U phase rotation vectors {ϕ (0),ϕ (1),…,ϕ (U−1)} of length N as per (3). We initialize the counter m and the state metrics for all states of the first stage as below.
    $$\begin{array}{*{20}l} m&=0, \end{array} $$
    $$\begin{array}{*{20}l} \Psi_{(u,0)}&=0,~u=0,\ldots,U-1. \end{array} $$

    As long as the condition 0≤mM−2 is satisfied, we perform steps 2, 3, 4, 5, and 6 in a repeated manner.

  • Step 2—Phase rotation: Two input symbol vectors X m ,X m+1 are phase rotated with U different phase rotation vectors, as per (5), giving \(\left \{\mathbf {X}_{m}^{(0)},\mathbf {X}_{m}^{(1)},\ldots,\mathbf {X}_{m}^{(U-1)}\right \}\) and \(\left \{\mathbf {X}_{m+1}^{(0)},\mathbf {X}_{m+1}^{(1)},\ldots,\mathbf {X}_{m+1}^{(U-1)}\right \}\), respectively.

  • Step 3—FBMC-OQAM modulation: For 0≤u,vU−1, FBMC-OQAM modulation is done jointly for all combination of the patterns of the mth and (m+1)th input symbols, along with the preceding symbols, such as
    $$ \begin{aligned} x_{{m},{m}+1}^{(u,v)}(t) \!=&\mathcal G\left\{\! \ldots,\mathbf{X}_{m-2}^{\left(\boldsymbol{\lambda}\left(\left(\boldsymbol{\lambda}(u,{m-1})\right),{m-2}\right)\right)}, \mathbf{X}_{m-1}^{\left(\boldsymbol{\lambda}(u,{m-1})\right)},\mathbf{X}_{m}^{(u)}, \mathbf{X}_{m+1}^{(v)}\right\}, \end{aligned} $$

    where λ(u,m−1) is the surviving phase rotation at the uth state of stage m.

  • Step 4—Path metric calculation: For each of the U 2 patterns of the modulated FBMC-OQAM signal x m,m+1(u,v)(t), we compute partial PAPR as per Eq. (9). For the path \(\zeta ^{(u,v)}_{(m\Rightarrow m+1)}\), we calculate its path metric \(\Gamma ^{(u,v)}_{({m},{m}+1)}\) according to (8).

  • Step 5—Survivor path identification: The states of stage m that are related to the survivor paths leading to stage m+1 are stored in a state matrix λ(v,m) of order U×M, as given below
    $$\begin{array}{*{20}l} {}\boldsymbol{\lambda}(\! v,{m}) \,=\,\! \min\limits_{u\in [0,U-1]}{\!\left[\! \Psi_{(u,{m})} \,+\, \Gamma^{(u,v)}_{({m},{m}+1)}\! \right]},\! ~v \,=\, 0,\ldots,U \,-\, 1. \end{array} $$
  • Step 6—State metric updation: The state metric Ψ (v,m+1), for the stage m+1, can be updated as follows:
    $$\begin{array}{*{20}l} {}\Psi_{(v,{m}+1)} \,=\, \Psi_{\left(\boldsymbol{\lambda}(v,{m}),{m}\right)} + \Gamma^{\left(\boldsymbol{\lambda}(v,{m}),v\right)}_{({m},{m}+1)},~v=0,\ldots,U-1. \end{array} $$
  • Step 7—Incrementation: Increment the value of m by 1 and if 0≤mM−2, then go to step 2, or else, if 0≤m=M−1, go to step 8.

  • Step 8—Traceback: Once state metrics for all the the Mth stages has been computed, then identify the state that has the least state metric as shown below
    $$\begin{array}{*{20}l} \boldsymbol{\Theta}(M-1)&=\min\limits_{u\in [0,U-1]}{\left[\Psi_{(u,M-1)}\right]}. \end{array} $$
    Then, start tracing back from last stage to the first one in order to find the unique survivor path Θ by identifying the optimal states at each stage as below
    $$\begin{array}{*{20}l} \boldsymbol{\Theta}(k)&=\boldsymbol{\lambda}(\boldsymbol{\Theta}(k+1),k), \end{array} $$

    where k=M−2,M−3,…,1,0. This survivor path Θ is the set of optimal phase rotation vectors that is obtained after solving the optimization problem by dynamic programming and its indices \(\{u_{\text {min}}^{0}, u_{\text {min}}^{1},\ldots,u_{\text {min}}^{M-1}\}\) are supposed to be transmitted to the receiver as SI.

4.2 Proposed low-latency TSLM in terms of hardware memory and latency

When we consider implementation complexity, we need to take two things into account, computational complexity and hardware memory. The former shall be dealt in our analysis in the next section. The originally proposed TSLM [19] needs a state matrix λ of order U×M, which means we need to store in total MNU time-domain complex samples in memory, before we start tracing back. This adds latency by M stages and requires very huge hardware memory. A latency of M stages means that we have to traceback until M stages for the identification of survivor paths. Hardware memory can significantly impact the implementation cost, and high latency is undesirable in some critical communication systems.

We have studied the impact of traceback depth parameter , which heavily impacts not only in the PAPR reduction performance but also in the latency and hardware memory requirements. It has to be noted that the choice of depends upon the prototype filter overlapping factor K. So, in this paper, we propose a low latency TSLM that requires less hardware memory and also have lower latency when compared to the originally proposed TSLM. In the new proposal, the indices of the survivor paths can be stored, reducing the memory requirements to MU. However, we store the indices of the optimal states. When a new FBMC symbol pair (m,m+1) is processed (step 2 to step 6), we freeze definitely the rotation vector at stage m. It is then possible to compute the modulated signal from (m)T to (m+1)T. Thus, we can slowly accumulate the modulated signal related to individual symbols, in order to obtain the total signal.

Later, in the simulation results, we shall show that for any value of >K, the PAPR reduction performance of the low-latency TSLM is the same as that of the originally proposed TSLM. In our analysis, we have realized that there is a tradeoff between latency and PAPR reduction performance. The PAPR reduction performance of low-latency TSLM varies from being sub-optimal to quasi-optimal, depending upon the choice of . However, it has to be noted that both the original TSLM and low-latency TSLM have same computational complexity.

5 Computational complexity analysis of trellis-based probabilistic schemes

This section aims at fair comparison of PAPR reduction performances of TSLM and MBJO-PTS [18] schemes in terms of computational complexity. A fair comparison of any PTS and SLM scheme cannot be possible, if both schemes do not exhibit the same computational complexity [24]. The complexity analysis in this paper includes both complex multiplications and additions. The following consideration holds generally for any SLM and PTS schemes that are applied in FBMC-OQAM systems. However, in the performance comparison between the two schemes, only the complex multiplications are considered, since they dominate the overall complexity in common hardware implementations [25]. We have given general expressions for computational complexity, so that for any given probabilistic scheme, they can be readily derived accordingly.

5.1 Derivation of computational complexity in TSLM for multiplications

In any SLM-based scheme, the complexity in implementation will be due to phase rotation, FBMC-OQAM modulation, and metric calculation. Let us denote the complexity related to these three operations in the TSLM scheme as c rot, c mod, and c met, respectively. The phase rotation of the mth input symbol vector X m needs complex multiplications equal to the number of its tones N. So, the complexity c rot is given as
$$\begin{array}{*{20}l} c_{\text{rot}}=N. \end{array} $$
In poly-phase filtering operation, we perform IFFT and filtering with h(t). OQAM mapping involves complex-to-real symbol mapping. It may seem that one has to perform two real IFFT operations. Nevertheless, it is possible to compute two real IFFTs simultaneously like a single complex IFFT operation without increasing the number of complex multiplications [26]. The same can be applied in filtering with h(t). Thus, the computational complexity involved in FBMC-OQAM modulation c mod is given as
$$\begin{array}{*{20}l} c_{\text{mod}}&=\underbrace{\frac{N}{2}\log_{2} N}_{\text{IFFT}}+\underbrace{4N}_{\text{filtering}}. \end{array} $$
In the metric calculation operation, we need N complex multiplications to find the peak. The c met depends on T 0 and is given as
$$\begin{array}{*{20}l} c_{\text{met}}=dN, \end{array} $$

where T 0 is the duration of time in terms of N and d is a constant that represents the number of successive symbol intervals, considered for metric calculation.

The computational complexity in the case of TSLM is summarized in Table 1, and its general expression is given below
$$\begin{array}{*{20}l} c_{\text{TSLM}}&=M\left[U.c_{\text{rot}}+U.c_{\text{mod}}+\left(1-{\frac{1}{M}}\right)U^{2}.c_{\text{met}}\right] \end{array} $$
Table 1

Multiplication computational complexity in TSLM








\(\frac {N}{2}\log _{2} N+4N\)




(M−1)U 2

$$\begin{array}{*{20}l} &=MN\left[\left({\frac{1}{2}}\log_{2}N+5\right)U+d\left(1-{\frac{1}{M}}\right)U^{2}\right]. \end{array} $$

5.2 Derivation of computational complexity in MBJO-PTS for multiplications

In PTS scheme, we individually perform phase rotation in time domain to the V sub-blocks and then add them, leading to W V different signal patterns, where W is the total number of candidate phases that is to be chosen for a sub-block. MBJO-PTS scheme is a trellis-based adaption of classical PTS scheme to FBMC-OQAM system by multi-block joint optimization and is presented in [18]. Unlike SLM, in any PTS-based scheme, we can perform phase rotation in time domain. This avoids the need for multiple FBMC-OQAM modulation operations. Thus, the complexity due to FBMC-OQAM modulation in a PTS-based scheme \(\hat {c}_{\text {mod}}\) can be reduced as \(\frac {N}{V}\)-point IFFT.
$$\begin{array}{*{20}l} \hat{c}_{\text{mod}}&={\frac{N}{2V}}\log_{2}{\frac{N}{V}}+4N. \end{array} $$
Since we consider a certain time duration T 0 for partial PAPR calculation, we need dN complex multiplications within that time duration. The computational complexity involved in phase rotation operation for MBJO-PTS scheme \(\hat {c}_{met}\) is given by
$$\begin{array}{*{20}l} \hat{c}_{\text{rot}}&=dNV. \end{array} $$
The computational complexity involved in metric calculation for MBJO-PTS scheme \(\hat {c}_{met}\) is given by
$$\begin{array}{*{20}l} \hat{c}_{\text{met}}=dN. \end{array} $$
General expression for MBJO-PTS computation complexity for M FBMC-OQAM symbols has been derived similarly based on the information in Table 2
$$\begin{array}{*{20}l} {}c_{\text{MBJO}}&=M\left[W.\hat{c}_{\text{rot}}+V.\hat{c}_{\text{mod}}+\left(1-{\frac{1}{M}}\right)W^{2V}.\hat{c}_{\text{met}}\right] \end{array} $$
Table 2

Multiplication computational complexity in MBJO-PTS








\({\frac {N}{2V}}\log _{2}{\frac {N}{V}}+4N\)




(M−1)W 2V

$$\begin{array}{*{20}l} &=MN \!\left[\!\left(\!{\frac{1}{2}\!}\log_{2}{\!\frac{N}{V}} \,+\, 4V \,+\, dVW \!\right) \,+\, d \!\left(\! 1 \,-\, {\frac{1}{M}\!}\right)\! W^{2V}\!\right]. \end{array} $$

From (20) and (25), it is clear that, in FBMC-OQAM with TSLM and MBJO-PTS, the complexities involved in rotation and metric calculation are linear w.r.t N, whereas the modulation complexity with TSLM and MBJO-PTS are of order \(\mathcal {O}(\frac {N}{2}\log _{2}(N))\) and \(\mathcal {O}(\frac {N}{2V}\log _{2}(\frac {N}{V}))\), respectively. It implies that the modulation operation has much significant complexity than the remaining ones. From the size of the phase rotation point of view, the complexity is solely dominated by U in TSLM. On the contrary, it is distributed between V and W in MBJO-PTS.

5.3 Condition for identical computational complexity

In order to avail a fair comparison, the condition for identical computation complexity in both TSLM and MBJO-PTS schemes is given by
$$\begin{array}{*{20}l} c_{\text{TSLM}}=c_{\text{MBJO}}. \end{array} $$
By substituting (20) and (25) in (27), we obtain
$$\begin{array}{*{20}l} &MN\left[\left({\frac{1}{2}}\log_{2}N+5\right)U+d\left(1-{\frac{1}{M}}\right)U^{2}\right]\\ &=MN \!\left[\!\left(\!{\frac{1}{2}\!}\log_{2}{\frac{N}{V}\!} \,+\, 4V \,+\, dVW\! \right) \,+\, d \!\left(1-{\frac{1}{M}}\right)W^{2V}\right]. \end{array} $$
For a large value of M, the term \(1-\frac {1}{M}\to 1\), and therefore, it can be neglected. Eq. 28 is simplified
$$\begin{array}{*{20}l} {}{dU^{2} \,+\,\! \left(\!{\frac{1}{2}\!}\log_{2}\! N \,+\, 5\! \right)\! U \,-\, {\frac{1}{2}\!}\log_{2}{\!\frac{N}{V}} \,-\, 4V\! \,-\, dVW\! \,-\, dW^{2V}\,=\,0}. \end{array} $$
The possible root U in ideal case for the quadratic function (29), denoted by U root is given by
$$\begin{array}{*{20}l} U_{\text{root}}=\left\lfloor{\frac{-{\frac{1}{2}}\log_{2}N-5+\sqrt{\Delta}}{2d}}\right\rfloor, \end{array} $$
where Δ is the discriminant, which is given by
$$\begin{array}{*{20}l} {}\Delta \,=\, {\!\left(\!{\frac{1}{2}}\log_{2}N \,+\, 5 \!\right)\!}^{2} \,+\, 4d \!\left(\!{\frac{1}{2}\!}\log_{2}{\!\frac{N}{V}} \,+\, 4V \,+\, dVW \,+\, dW^{2V}\right). \end{array} $$

5.4 Derivation of addition computational complexity in TSLM and MBJO-PTS

The computational complexity due to complex additions for M FBMC-OQAM symbol, in the TSLM and MBJO-PTS schemes, is summarized in the Table 3. The expressions for computational complexity due to complex additions for TSLM and MBJO-PTS can be derived accordingly in a similar fashion to that of complex multiplications. However, in the case of MBJO-PTS, we need to take into account the extra V additions needed per symbol, due to sub-block re-addition.
Table 3

Addition comparison of computational complexities of the TSLM and MBJO-PTS















N log2N+3N


\({\frac {N}{V}}\log _{2}\frac {N}{V}+3N\)




(M−1)U 2

2(V−1)d N

(M−1)W 2V

6 Simulation results

The objective of the simulations is to analyze the performance of low latency TSLM scheme in comparison with OFDM when classical SLM scheme is used. Simulations are done for a FBMC-OQAM signal that has been generated from 105 4QAM symbols with 64 tones. The PHYDYAS prototype filter [22], which spans over 4T was used by default unless specified otherwise. The range of the complex phase rotation vector was chosen such as ϕ (u)∈{1,−1}. In general, most of the PAPR reduction schemes are implemented over discrete-time signals. So, we need to sample the continuous-time FBMC-OQAM signal x(t), thereby obtaining its discrete-time signal s[n]. In order to well approximate the PAPR, we have oversampled the modulated signal by a factor of 4 [27] and then implemented the TSLM scheme on the discrete-time signal s[ n]. Exponential function has been used as the function f in (8), when calculating the path metrics. We have tried to see the impact of higher constellation on PAPR reduction with TSLM but found 16QAM to be more or less the same as 4QAM.

6.1 Impact of variation of T0 duration

When step 3 of the TSLM algorithm is proposed, we are interested with the PAPR related to the mth and (m+1)th input symbols over the duration T 0∈ [m T+T a ,m T+T b ). Looking at Fig. 1, we can notice that these two symbols have an impact on the overall signal mainly in the interval [m T+T,m T+4T] (i.e., T a min=T and T b max=4T. As shown in Fig. 3, choosing T a =2T and T b =4T seems to be the lower bound as it yields better performance than the remaining intervals. If we choose intervals T a >2T or T b <3T, then there is a significant degradation on the performance. In conclusion, it was found that the intervals T a =2T and T b =4T are a quasi-optimal choice, meanwhile having a lower complexity.
Fig. 3

CCDF of PAPR for FBMC-OQAM symbols with partial PAPR calculated over different T 0, U=2, and PHYDYAS filter

6.2 Comparison of TSLM and exhaustive search approach

In an exhaustive search over M symbols, all U M possible phase rotations are tested and the best one is chosen. With the trellis-based approach, only U 2 possible phase rotations are tested in step 3 of the TSLM algorithm and U of them are kept as surviving paths. By avoiding exhaustive search, we hamper optimality in trellis-based approaches. Thus, any trellis-based approach lags behind exhaustive search approach.

So, we tried to analyze how much better the TSLM fares in terms of PAPR reduction, w.r.t. exhaustive search approach. Since, it is not possible to simulate an exhaustive search over 105 symbols, we have considered 10 symbols with U=2 and performed Monte Carlo simulation for 10000 number of times to be sufficient. It means we have to perform a search over 1024 different patterns and pick the one with the least PAPR. In Fig. 4, we have plotted the CCDF of PAPR for TSLM and exhaustive search. We can notice from this figure that TSLM is indeed a quasi-optimal approach. Because, we loose a mere 0.65 dB at 10−3 of CCDF of PAPR, while reducing the computational complexity from \(\mathcal {O}(U^{M})\) to \(\mathcal {O}\left ((M-1)U^{2}\right)\).
Fig. 4

CCDF of PAPR for FBMC-OQAM symbols with TSLM and with an exhaustive search, T 0= [ 2T,4T) with U=2 and PHYDYAS filter

6.3 Impact of the size of U

Like any SLM scheme, the size of phase rotation vector impacts the performance of the PAPR reduction. With OFDM, we have only U possible phase rotations for PAPR reduction in the time interval T because we have a symbol-by-symbol approach. Whereas with FBMC-OQAM, we have U M possible phase rotations for reducing the PAPR in the time interval (M+3.5)T. The ratio of number of possible phase rotation divided by the impacted time interval is always better for FBMC-OQAM explaining the fact that trellis-based approach can outperform the performance of OFDM for the same number of phase rotation vectors U.

For an illustration of impact of U, we have considered T 0=[m T+2T,m T+4T). The different sizes considered are U={2,4,8}. The values at 10−3 of CCDF of PAPR in Fig. 5 has been summarized in Table 4. We can see from this table that the FBMC-OQAM with TSLM has outperformed the OFDM with classical SLM by 0.35, 0.24, and 0.02 dB at 10 −3 value of CCDF of PAPR when U=2, U=4, and U=8, respectively. It is worth noting that at 10 −3 value of CCDF of PAPR when U=2, we are able to achieve 1.73-dB PAPR reduction from the original signal with an SI of 1 bit. Such proper exploitation can be possible with the trellis-based approach instead of symbol-by-symbol optimization. Another observation is that the lead gap between CCDF curves of OFDM and FBMC-OQAM gets narrowed as U increases.
Fig. 5

CCDF of PAPR for FBMC-OQAM symbols, T 0=[2T,4T) with U=2,4,8 and PHYDYAS filter

Table 4

CCDF of PAPR at 10−3 value (in dB) for N=64

Modulation type

Reduction scheme





Classical SLM





Trellis-based SLM




6.4 Impact of traceback depth on latency and hardware memory

Even though TSLM is quasi-optimal, as mentioned earlier, it is important to take into account the hardware memory and latency induced by this algorithm. We can observe in Fig. 1 that most of the energy of a FBMC-OQAM symbol lies in its succeeding two symbols rather than its own period interval. This is due to the fact that the prototype filter overlapping factor K=4. So, it is of considerable interest to consider the cases of ={1,2,3}. The reason behind choosing ={1,2,3} is that the prototype filter overlapping factor K=4, e.g., ={2}, means, at any mth stage, we have to traceback until the (m−2)th stage, in order to identify survivor paths. If we do not alter the , then we have to wait for the processing of all M symbols (105 in our simulation). For these values along with ={105}, we have plotted the CCDF of PAPR in Fig. 6. In the legend of that figure, “ =105” indicates the original TSLM and “ ={1,2,3}” indicate low-latency TSLM with different traceback depths.
Fig. 6

CCDF of PAPR for FBMC-OQAM symbols with ={105,3,2,1}, T 0= [ 2T,4T), U=2, and PHYDYAS filter

The case of =1 may seem like that of DSLM [17], but it is different. In the case of DSLM, the choice of optimal rotation of a given mth input symbol vector X m depends only on the past input symbol vectors X m−1,…,X 0, whose optimal rotations have already been fixed, whereas for low-latency TSLM with =1, at the mth stage, it shall depend not only on past input symbol vectors but also on one succeeding future input symbol vector X m+1, as we perform joint modulation in step 3 of the TSLM algorithm. So, when we move to the next (m+1)th stage in the trellis, the optimal choice (i.e., the survivor path) may vary and this may have impacted the decision in the previous stage. Then, the choice of the mth stage should bear with the incorrect decision, and this in turn will impact the PAPR reduction. Also, the possibility of incorrect decision will increase along with U leading to much sub-optimal performance for higher value of U. As seen in Fig. 6, the PAPR reduction performance of low-latency TSLM with =1 lags the TSLM with =105 by around 0.8 dB at 10 −3 value of CCDF of PAPR.

However, for =2, we are rectifying the above gap by a large extent. Even though it has sub-optimal performance, it is worth noting that low-latency TSLM with =2 lags the TSLM with =105 by around 0.37 dB at 10 −3 value of CCDF of PAPR. Finally, we have observed that low-latency TSLM with =3 reaches the quasi-optimal performance of TSLM. But, in this case, the latency is substantially reduced from 105 stages to 3 stages and we need very less hardware memory, since we store just 2N U complex time samples instead of 105 N U. The latency and the number of complex time samples needed to store for different values of have been summarized in Table 5, where we can see the tradeoff between latency and PAPR reduction performance. If there is a constraint on latency or hardware memory, then a low-latency TSLM with ={2,3} can be considered, which have tolerable sub-optimal and quasi-optimal performances respectively.
Table 5

Impact of on latency and hardware memory for N=64 and U=2

Traceback depth


Complex time

10−3value of CCDF


samples to be stored

of PAPR (dB)



12.8×106 T














6.5 Impact of choice of the metric function

The choice of metric function f(.) in Eq. (8) seems to have some impact on the performance in terms of PAPR mitigation. Two different functions, namely, linear and exponential functions, have been chosen to understand the impact of choice of the metric function f(.) on the performance of the TSLM scheme. As shown in Fig. 7, for low values of the PAPR, the PAPR reduction performance with exponential function is almost as same as that with the linear one, albeit, lagging minutely. Very small performance gain can be seen at high values of the PAPR. This can be explained by the fact that the exponential function puts more weightage to higher peaks than the linear one in identifying the set of optimal phase rotation vectors. Although we do consider exponential function in all our simulation, we suggest that it can be sufficient to choose a linear metric function.
Fig. 7

CCDF of PAPR for FBMC-OQAM symbols, T 0= [ 2T,4T), U=2, and different metric functions

6.6 Comparison of TSLM with existing probabilistic schemes

Among the SLM-based schemes, the TSLM has been already been compared with DSLM in [19], where it have been shown that it is superior to any scheme based on symbol-by-symbol approach. DSLM has superior performance than OSLM, as shown in [28]. MBJO-PTS is a trellis-based scheme, which yields quasi-optimal performance among the PTS schemes. In fact, fair comparison of any PTS and SLM scheme cannot be possible, if both schemes do not exhibit the same computational complexity [24]. So, we try to compare the multiplications computational complexity of MBJO-PTS with TSLM by keeping the number of tones, type of modulation, the prototype filter, and T 0 duration identical. The value of W is 2 as per the proposed MBJO-PTS scheme [18]. The value of U root calculated for V={2,4} according to (30) is found to be 3 and 14, respectively. The comparison of the performance of MBJO-PTS for V={2,4} and W=2 w.r.t. TSLM scheme for corresponding values of U={3,14} can be seen in Fig. 8. The number of complex multiplications and additions needed for implementation of TSLM and MBJO-PTS algorithms over 105 FBMC-OQAM symbols has been summarized in Table 6.
Fig. 8

CCDF of PAPR for FBMC-OQAM symbols, T 0= [ 2T,4T) with U=8 with PHYDYAS filter

Table 6

Computational complexities of the TSLM and MBJO-PTS for N=64



PAPR reduction scheme



TSLM (U=3)



MBJO-PTS (V=2, W=2)



TSLM (U=14)



MBJO-PTS (V=4, W=2)



At CCDF of PAPR equal to 10 −3 in Fig. 8, we can infer that the FBMC-OQAM with TSLM leads the MBJO-PTS scheme in PAPR reduction by roughly 0.7 and 0.2 dB for U=3 and U=14, respectively.

To do a complex multiplication, we need to perform three complex additions. So, we can compute from Table 6 the relative reduction in computational complexity of TSLM w.r.t MBJO-PTS. Thus, we have found that the proposed TSLM method with U={3,14} reduces the overall complexity in terms of complex additions, by 19.65 and 23.42% compared with the MBJO-PTS method with V={2,4} and W=2, respectively.

7 Conclusions

Since FBMC-OQAM signals have high PAPR, there is a dire need to probe for suitable PAPR reduction schemes. This paper is an extension of the recently proposed TSLM. In this paper, the computational complexity of the TSLM scheme has been derived and low-latency TSLM has been proposed, which not only can yield tolerable sub-optimal or same performance to that of TSLM but also has very low latency and needs less hardware memory. Then, the impact of time duration of partial PAPR on the performance of TSLM is studied and its lower bound has been identified by proposing suitable time duration. A thorough and fair comparison of performance has been done with an existing trellis-based scheme proposed in literature, and the simulation results show that low-latency TSLM yields better performance with relatively low latency.



The work done in this paper is financially supported by the French National Research Agency (ANR) project ACCENT5 with grant agreement code: ANR-14- C E28-0026-02.

Competing interests

The authors declare that they have no competing interests.


  1. 1.
    BF Boroujeny, OFDM Versus Filter Bank Multi-carrier. IEEE Signal Proc.Mag. 8(3), 92–112 (2006).Google Scholar
  2. 2.
    X Li, LJ Cimini, Effects of clipping and filtering on the performance of OFDM. IEEE Commun.Lett. 2(5), 131–133 (1998).CrossRefGoogle Scholar
  3. 3.
    TA Wilkinson, AE Jones, in 45th IEEE Veh.Technol. Conf, 2. Minimisation of the peak-to-mean envelope power ratio of multi-carrier transmission schemes by block coding (Chicago, 1995), pp. 925–829.Google Scholar
  4. 4.
    J Tellado, J Cioffi, in IEEE CTMC, GLOBECOM. Peak power reduction for multicarrier transmission (IEEE PublicationSydney, 1998).Google Scholar
  5. 5.
    H Ochiai, A novel trellis shaping design with both peak and average power reduction for OFDM systems. IEEE Trans.Commun. 52(11), 1916–1926 (2004).CrossRefGoogle Scholar
  6. 6.
    BS Krongold, DL Jones, PAR reduction in OFDM via active constellation extension. IEEE Trans. Broadcast. 49:, 258–268 (2003).CrossRefGoogle Scholar
  7. 7.
    RW Bauml, RFH Fischer, JB Huber, Reducing the peak-to-average power ratio of multi-carrier modulation by selected mapping. IEE Electron. Lett. 32(22), 2056–2057 (1996).CrossRefGoogle Scholar
  8. 8.
    SH Muller, JB Huber, OFDM with reduction peak to average power ratio by optimum combination of partial transmit sequences. IEEE Electron. lett. 33:, 368–369 (1997).CrossRefGoogle Scholar
  9. 9.
    N van der Neut, B Maharaj, F de Lange, G Gonzalez, F Gregorio, J Cousseau, in EURASIP Journal on Advances in Signal Processing, 2014, no. 172. PAPR reduction in FBMC using an ACE-based linear programming optimization (Springer Publications, 2014).Google Scholar
  10. 10.
    Z Kollar, P Horvath, in Hindawi Journal of Computer Networks and Communications, 2012, no. 382736. PAPR Reduction of FBMC by Clipping and its Iterative Compensation (Hindawi Publications, 2012).Google Scholar
  11. 11.
    Z Kollar, L Varga, B Horvath, P Bakki, J Bito, in Hindawi Scientific World Journal, 2014, no. 841680. Evaluation of Clipping Based Iterative PAPR Reduction Techniques for FBMC Systems (Hindawi Publications, 2014).Google Scholar
  12. 12.
    B Horvath, P Horvath, in IEEE European Wireless Conference. Establishing Lower Bounds on the Peak-to-Average-Power Ratio in Filter Bank Multicarrier Systems (Budapest, 2015), pp. 1–6.Google Scholar
  13. 13.
    S Lu, D Qu, Y He, Sliding Window Tone Reservation Technique for the Peak-to-Average Power Ratio Reduction of FBMC-OQAM Signals. IEEE Wireless Commun. Lett. 1(4), 268–271 (2012).CrossRefGoogle Scholar
  14. 14.
    KC Bulusu, H Shaiek, D Roviras, in IEEE International Symposium on Wireless Communication Systems. Reduction of PAPR of FBMC-OQAM Signals by Dispersive Tone Reservation Technique (Brussels, 2015), pp. 561–565.Google Scholar
  15. 15.
    G Cheng, H Li, B Dong, S Li, An improved selective mapping method for PAPR reduction in OFDM/OQAM system. Scientific Research Communications and Network Journal. 5(3C), 53–56 (2013).CrossRefGoogle Scholar
  16. 16.
    A Skrzypczak, P Siohan, JP Javaudin, in 63rd IEEE Veh.Technol. Conf, 4. Reduction of the peak-to-average power ratio for OFDM-OQAM modulation (Melbourne, 2006), pp. 2018–2022.Google Scholar
  17. 17.
    KC Bulusu, H Shaiek, D Roviras, R Zayani, in 11th IEEE International Symposium on Wireless Communication Systems. PAPR Reduction for FBMC-OQAM Systems Using Dispersive SLM Technique (Barcelona, 2014), pp. 568–572.Google Scholar
  18. 18.
    D Qu, S Lu, T Jiang, Multi-Block Joint Optimization for the Peak-to-Average Power Ratio Reduction of FBMC-OQAM Signal. IEEE Trans.Signal Process. 61(7), 1605–1613 (2013).MathSciNetCrossRefGoogle Scholar
  19. 19.
    KC Bulusu, H Shaiek, D Roviras, in IEEE International Conference on Communications. Potency of Trellis-Based SLM over the Symbol-by-Symbol Approach in Reducing PAPR for FBMC-OQAM Signals (London, 2015), pp. 4757–4762.Google Scholar
  20. 20.
    P Siohan, C Siclet, N Lacaille, Analysis and design of OFDM/OQAM systems based on filter bank theory. IEEE Trans.Signal Process. 50:, 1170–1183 (2002).CrossRefGoogle Scholar
  21. 21.
    BL Floch, M Alard, C Berrou, Coded orthogonal frequency division multiplex. IEEE Proc. 83(6), 982–996 (1995).CrossRefGoogle Scholar
  22. 22.
    M Bellanger, in IEEE International Conference on Acoustic, Speech and Signal Processing. Specification and design of prototype filter for filter bank based multi-carrier transmission (Salt Lake City, 2001), pp. 2417–2420.Google Scholar
  23. 23.
    R Bellamen, Applied Dynamic Programming (Princeton University Press, New Jersey, 1962).CrossRefGoogle Scholar
  24. 24.
    C Siegl, RFH Fischer, in IEEE International ITG Workshop on Smart Antennas. Comparison of partial transmit sequences and selected mapping for peak-to-average power ratio reduction in MIMO OFDM (Darmstadt, 2008), pp. 324–331.Google Scholar
  25. 25.
    A Burg, VLSI Circuits for MIMO Communication Systems, Ph.D. Thesis, ETH Zurich (2006).Google Scholar
  26. 26.
    E Chu, A George, Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms, Computational Mathematics Series (CRC Press, Boca raton, 1999).CrossRefGoogle Scholar
  27. 27.
    Tellado J, Peak to Average Ratio Reduction for Multi-carrier Modulation, Ph.D. Thesis, Stanford University, Stanford, CA, USA (1999).Google Scholar
  28. 28.
    KC Bulusu, Performance Analysis and PAPR Reduction Techniques for Filter-Bank based Multi-Carrier Systems with Non-Linear Power Amplifiers, Ph.D. Thesis, Conservatoire National des Arts et Métiers, Paris, France (2016).Google Scholar

Copyright information

© The Author(s) 2016

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.CEDRIC/LAETITIA LaboratoryConservatoire National des Arts et MétiersParisFrance

Personalised recommendations