Multiprediction particle filter for efficient parallelized implementation
 2.1k Downloads
 3 Citations
Abstract
Particle filter (PF) is an emerging signal processing methodology, which can effectively deal with nonlinear and nonGaussian signals by a samplebased approximation of the state probability density function. The particle generation of the PF is a dataindependent procedure and can be implemented in parallel. However, the resampling procedure in the PF is a sequential task in natural and difficult to be parallelized. Based on the Amdahl's law, the sequential portion of a task limits the maximum speedup of the parallelized implementation. Moreover, large particle number is usually required to obtain an accurate estimation, and the complexity of the resampling procedure is highly related to the number of particles. In this article, we propose a multiprediction (MP) framework with two selection approaches. The proposed MP framework can reduce the required particle number for target estimation accuracy, and the sequential operation of the resampling can be reduced. Besides, the overhead of the MP framework can be easily compensated by parallel implementation. The proposed MPPF alleviates the global sequential operation by increasing the local parallel computation. In addition, the MPPF is very suitable for multicore graphics processing unit (GPU) platform, which is a popular parallel processing architecture. We give prototypical implementations of the MPPFs on multicore GPU platform. For the classic bearingonly tracking experiments, the proposed MPPF can be 25.1 and 15.3 times faster than the sequential importance resamplingPF with 10,000 and 20,000 particles, respectively. Hence, the proposed MPPF can enhance the efficiency of the parallelization.
Keywords
particle filter parallelization GPU1. Introduction
Hidden state estimation of a dynamic system with noisy measurements is an important problem in many research areas. Bayesian approach is a common framework for state estimation by obtaining the probability density function (PDF) of the hidden state. For the linear system models with Gaussian noise, Kalman filter (KF) can track mean and covariance of the state PDF. However, KF cannot work well in nonlinear system with nonGaussian noise. Particle filter (PF) [1, 2, 3, 4, 5] is an emerging signal processing methodology, which succeeds in dealing with nonlinear and nonGaussian signals by a samplebased approximation of the state PDF. Because, nonlinear dynamic systems with nonGaussian noise appear widely in realworld applications, such as surveillance, object tracking, computer and robot vision, etc., PF outperforms than classical KF in the aforementioned applications.
Based on the Amdahl's law[12], the sequential portion of a task limits the speedup in parallelized implementation. The resampling procedure is a sequential task that significantly limits the acceleration of the parallelized PF. In general, the complexity of the resampling procedure is proportional to the size of the posterior particle set. In tradition, the prior application domain knowledge can be utilized in the system model to reduce the uncertainty of the system state, such as [13, 14]. However, this approach is applicationdependent and hard to be utilized in other applications.
In this article, we propose a multiprediction (MP) sampling approach to profit the parallelized PF. The proposed MPsampling approach consists of MP operation, weight updating, and local particle selection, as shown in Figure 1b. In the proposed MP operation, multiple predicted particles are generated from a specific basis particle, and the prediction number is defined as P. The SIRPF with N_{1} basis particles can generate N_{1} predicted particle. The proposed MPPF with N_{2} particles can generate N_{2} × P predicted particles. As P is large, the required basis particle number of the proposed MPPF can be significantly reduced for the same predicted particle number. Hence, the proposed MPPF can suppress the complexity of the resampling procedure and benefit the parallelized PF. Besides, the proposed MPPF has an overhead of additional prediction computations from the MP operation. Because the prediction procedure is data independent for each basis particle, the MP operation can be easily implemented in parallel. In summary, the proposed MPPF reduces the sequential global data operation resulting from the resampling procedure by increasing the local computation overhead. Hence, the proposed MPPF improves the execution time of the parallelized PF by reducing the complexity of the resampling procedure. It should be noted that our approach is not proposed to replace the algorithms in [7, 8, 9, 10, 11]. Proposed MPPF can be combined with modified resampling algorithms in [7, 8, 9, 10, 11] to further improve the efficiency of the parallelized PF. To clarify the benefit of our approach, we compare proposed MPPF with regular SIRPF.
Recently, multicore graphics processing unit s (GPUs) are popular in the signal processing domain [15, 16, 17] for its capability of massive parallel computation. The main feature of the multicore GPUs is its high efficiency to process many parallel local computations. However, the latency of the memory access in GPU is much larger, because GPU does not have levels of cache for global data. If the executed task consists of many sequential operations or uncoalesed global data access [18], then the processing cores have to stall and result in low utilization. The proposed MPPF trades additional local computations for reducing the amount of the global data access. To verify the benefit of the proposed MPPFs, we implement the proposed MPPF on NVIDIA multicore GPUs. Our prototype results show that the proposed MPPFs can be above 10× faster than the SIRPF on multicore GPU platform.
The rest of this article is organized as follows. The review of conventional SIRPF is given in Section 2. Then the proposed MPPF is presented in Section 3. The simulation results of the proposed MPPFs are shown in Section 4. Implementation on the NVIDIA GPU and comparisons are presented in Section 5. Finally, Section 6 concludes the study of this article.
2. Review of SIR Pf
where x _{ t } is the system state vector that we want to track; n _{ t } the random vector describing the system uncertainty; y _{ t } the observable measurement vector; and v _{ t } the measurement noise vector. The PF algorithm can work in the condition that f_{ t } and h_{ t } are nonlinear or n _{ t } and v _{ t } are nonGaussian distribution. The PF algorithm needs the following information about system x and observation y:

P(x_{0}): The PDF of the initial system state.

P(x_{ t }x_{t1}): The transition PDF of system state.

P(y_{ t }x_{ t }): The observation likelihood function of y_{ t }with a given system state.
For nonlinear/nonGaussian scenario, Equations 3 and 4 cannot be obtained analytically. The SIRPF approximates the posterior P(x_{ t }y_{1:t1}) with a particle set ${\left\{{x}_{t}^{\left(i\right)},{w}_{t}^{\left(i\right)}\right\}}_{i=1}^{N}$, and ${w}_{t}^{\left(i\right)}$ is associated weight for each particle. The SIRPF algorithm with N particles is described as
Initialization
Generate N initial particles ${x}_{0}^{\left(1\right)},...,{x}_{0}^{\left(N\right)}$ from predefined initial state distribution P(x_{0}). All particles have equal initial weights, ${w}_{0}^{\left(i\right)}=\mathsf{\text{1}}\u2215N$.
 (a)
Prediction: Draw the predicted particles ${x}_{t}^{\left(i\right)}$ through the state transition model. For i = 1,...,N, ${n}_{t}^{\left(i\right)}$ are independent with each other. These predicted particles can be utilized to approximate the prior prediction distribution P(x_{ t }y_{1:t1}).
 (b)Weight updating: After receiving the measurement, each particle needs to update the weight according to the likelihood function $P\left({y}_{t}{x}_{t}^{\left(i\right)}\right)$, as shown in Equation 5:${w}_{t}^{\left(i\right)}={w}_{t1}^{\left(i\right)}\cdot P\left({y}_{t}{x}_{t}^{\left(i\right)}\right)\mathsf{\text{.}}$(5)
 (c)Weight normalization: The normalization procedure makes the sum of particle weights equal to one. The particles with normalized updated weights can represent the posterior state distribution. The normalization procedure is represented as${w}_{t}^{\left(i\right)}={w}_{t}^{\left(i\right)}\u2215\sum _{N}{w}_{t}^{\left(i\right)}\mathsf{\text{.}}$(6)
 (d)
Resampling: After weight updating operation, some particle weights may be degenerated to a small value near zero. In general, systematic resampling (SR) is widely used for standard implementation of the resampling procedure. The SR procedure is to draw a new particle set with independent index j _{1},...,j_{ N } such that $P\left({j}_{k}=i\right)\propto {w}_{t}^{\left(i\right)}$ and set ${\widehat{x}}_{t}^{\left(jk\right)}={x}_{t}^{\left(i\right)}$ Besides, all particle weights are set to 1/N.
The data flow of the SIRPF with N_{1} particles is shown in Figure 1a. The posterior particles at (t  1) serve as the basis particles to generate the predicted prior particles at t. There is a tradeoff between estimation accuracy and particle number. The SIRPF with larger N will increase the estimation accuracy. However, because the resampling operation is executed on the posterior particle set, the SIRPF with larger N will raise the complexity of the resampling operation.
3. Proposed MP PF algorithm
The data flow of the proposed MPPF with N_{2} basis particles and P predictions is shown in Figure 1b. Our proposed MPPF is developed based on the SIRPF. We replace the sampling procedure in the SIRPF with our proposed MPsampling approach. There are two modifications in the proposed MPsampling approach: (1) MP operation. (2) Local particle selection (LPS) operation.
3.1 Proposed MP operation
The proposed MP operation is inspired by the phenomenonunpredictable behavior of the target. Due to the uncertainty in the system transition model described by P(x_{ t }x_{t1}), the state propagation has many, even infinite possible outputs. In SIRPF, however, each particle makes only one prediction for next timing instant, and it is hard to predict the moving of the target perfectly. Hence, the SIR needs to store many particles to predict the system transition behavior. In our proposed MP operation, each basis particle makes multiple predictions according to the system model to track the uncertain system state. With the same number of basis particles, the MPPF can produce a predicted prior particle set with larger size than the SIRPF. Hence, the MPPF can give more prediction state diversity to track the system state.
${x}_{t1}^{\left(i\right)}$ is a specific basis particle at t  1. The local predicted particle set, ${\left\{{x}_{\mathsf{\text{local}}}^{\left(j\right)}\right\}}_{j=1}^{P},$ is a samplebased representation of transition PDF $P\left({x}_{t}{x}_{t1}^{\left(i\right)}\right).$ In the predicted prior distribution, each predicted particle has equal weight as well as equal importance, and none of the predicted particles can be removed. After weight updating, the importance of each particle is not equal, and some local predicted particle with low importance can be removed. To maintain the same number of the basis particles for next iteration, the MPsampling approach uses the LPS procedure to reserves only one representative particle in each local particle set.
Pseudo code of the MP operation
1: /* MultiPrediction Operation */ 

2: for i = 1 to N do 
3: ${x}_{\mathsf{\text{temp}}}^{\left(1\right)}~f\left({x}_{t1}^{\left(i\right)},{n}_{t1}^{\left(1\right)}\right)$//Generate 1^{st} predicted particle 
4: ${w}_{\mathsf{\text{temp}}}^{\left(1\right)}={w}_{t1}^{\left(i\right)}\cdot p\left({y}_{t}{x}_{\mathsf{\text{temp}}}^{\left(1\right)}\right)$ 
5: ${x}_{t}^{\left(i\right)}={x}_{\mathsf{\text{temp}}}^{\left(1\right)}$ 
6: ${w}_{t}^{\left(i\right)}={w}_{\mathsf{\text{temp}}}^{\left(1\right)}$ 
7: for Predict count j = 2 to P do 
8: $\phantom{\rule{0.25em}{0ex}}{x}_{\mathsf{\text{temp}}}^{\left(j\right)}~f\left({x}_{t1}^{\left(i\right)},{n}_{t1}^{\left(j\right)}\right)$ 
9: ${w}_{\mathsf{\text{temp}}}^{\left(j\right)}={w}_{t1}^{\left(i\right)}\cdot p\left({y}_{t}{x}_{\mathsf{\text{temp}}}^{\left(j\right)}\right)$ 
10: LPS(${x}_{t}^{\left(i\right)}$,${w}_{t}^{\left(i\right)}$,${x}_{\mathsf{\text{temp}}}^{\left(j\right)}$,${w}_{\mathsf{\text{temp}}}^{\left(j\right)}$) 
11: end for 
12: end for 
3.2 Proposed LPS mechanisms
From each basis particle, a group of predicted particles are generated. As mentioned above, the importance of each particle is not equal after weight updating. Hence, after weight updating, fewer particles need to be stored. In the proposed MPsampling approach, the LPS procedure reserves one representative particle in each group. The representative particle is selected based on the weight distribution of the local predicted particle set. Two LPS approaches are described in the following.
3.2.1 Maximizing importance selection scheme
Pseudo code of MISbased LPS procedure
1: /* MISbased LPS Procedure */ 

2: Input: 
3: previous selected particle: {${x}_{t}^{\left(i\right)}$,${w}_{t}^{\left(i\right)}$} 
4: new generated particle: {${x}_{\mathsf{\text{temp}}}^{\left(j\right)}$,${w}_{\mathsf{\text{temp}}}^{\left(j\right)}$} 
5: Selection: 
6: if (${w}_{temp}^{\left(j\right)}>{w}_{t}^{\left(i\right)}$) 
7: ${x}_{t}^{\left(i\right)}={x}_{\mathsf{\text{temp}}}^{\left(j\right)}$ 
8: ${w}_{t}^{\left(i\right)}={w}_{\mathsf{\text{temp}}}^{\left(j\right)}$ 
9: end if 
3.2.2 Systematic resampling like selection scheme
Pseudo code of the SRSbased LPS procedure
1: /* SRSbased LPS Procedure */ 

2: Input: 
3: previous selected particle: {{${x}_{t}^{\left(i\right)}$,${w}_{t}^{\left(i\right)}$,} 
4: new generated particle: {${x}_{\mathsf{\text{temp}}}^{\left(j\right)}$,${w}_{\mathsf{\text{temp}}}^{\left(j\right)}$,} 
5: Selection: 
6: u ~ U[0, 1]//uniform random variable 
7: ${w}_{t}^{\left(i\right)}+={w}_{\mathsf{\text{temp}}}^{\left(1\right)}$ 
8: if ($\left({w}_{temp}^{\left(j\right)}\u2215{w}_{t}^{\left(i\right)}\right)>u$) 
9: ${x}_{t}^{\left(i\right)}={x}_{\mathsf{\text{temp}}}^{\left(j\right)}$ 
10: end if 
3.3 Prediction number and LPS scheme evaluation flow
Before describing the evaluation flow, we analyze two LPS schemes first. There are two considerations for choosing LPS scheme:
3.3.1 Complexity
Complexity comparison between the SRS and the MIS schemes
LPS scheme  Distance calculation  Likelihood computation  Compare  Div/Mul  Generation of uniform R.V. 

MIS  P  1 (normal likelihood) P  P  1  0  0 
SRS  P  P  P  1  P  1  P 
3.3.2 Robustness to measurement noise
In the SRS scheme, the representative particle is selected based on the PDF of the whole local predicted particle set. Hence, the predicted particles with similar weights have similar chance to be chosen as the representative particle in the SRS scheme. However, in the MIS scheme, the predicted particle with highest weight is always selected as the representative particle. In summary, the weights of the local particle set affect the result of the LPS procedure. In general, the measurement has a noise term. The weights of the particle set are updated based on the likelihoods to the measurement, so the weight is also affected by the measurement noise. As variance of the noise is high, the MIS scheme may suffer accuracy degradation, because the MIS scheme always selects the predicted particle with highest weight and believes the measurement too much.
In summary, for target accuracy, we should evaluate both two schemes and select the scheme that has lower execution time. Prediction number P and basis particle number N are main design parameters in the proposed MPPF. By increasing P, the MPPF can reduce the basis particle number as well as the global sequential operation. However, the total execution time may increase with too large P. Therefore, for target accuracy, a proper setting of (N, P) and the LPS schemes should be evaluated for a specific parallel architecture.
4. Simulation results and discussion
The proposed MPPF does not utilize the prior knowledge related to the application. In this section, we verify the proposed MPPF by two widely used benchmark simulation models. In Section 4.1, we use a simple system transition model to evaluate the two LPS scheme at different measurement noise strength. In Section 4.2, we use the BOT model, which has high transition uncertainty to demonstrate the benefit of the proposed MPPF.
4.1 Robustness to measurement noise
In this model, the term related to the hidden state is divide by 20, so the noise with ${\sigma}_{v}^{2}=1$ is a large noise. In Figure 3, the MISbased MPPF suffers from huge accuracy degradation due to high measurement noise, especially for large P. As the noise strength is large, the particle with highest weight is not perfectly correct. The representative particle should be selected based on their probability distribution. However, the MIS scheme always selects the particle with highest weight in the local particle set, and this simple but hasty approach does not comply with the statistic of the local predicted particle set.
When the noise strength is lower, as shown in Figures 4b and 5b, the estimation accuracy of the MIS scheme can be improved. Nevertheless, the MIS scheme is still not robust to the measurement noise. Because the SRS scheme selects the representative particle in probabilistic sense, the SRS scheme has better robustness to the measurement noise than the MIS scheme. The accuracy of the SRS scheme is always better than the SIRPF, as shown in Figures 3, 4 and 5.
MSE comparison results at the same prediction number
SIRPF  MPPF (SRS scheme)  

N  500  10  25  50  100  250 
P  1  50  20  10  5  2 
MSE  21.25  51.38  28.32  23.59  22.18  21.39 
4.2. The system model with high transition uncertainty
where v_{k} is additive Gaussian noise, and v_{ k } ~ N(0, r). In our simulation, $\sqrt{q}=0.001$ and $\sqrt{r}=0.005$, the same as the setting in [1]. We calculate the Position error from the difference between estimated position and the true position.
As mentioned in Section 3, the MIS scheme selects the representative particle compulsorily. We can observe two drawbacks in the MIS scheme from the above simulation: (a) low robustness to measurement noise; (b) the performance degradation in large prediction number. The drawbacks of the MIS scheme result from the simplification in the representative particle selection. The benefit of the MIS scheme is its simplicity. From the observation in simulation, the MIS scheme is feasible in low prediction number and low measurement noise. In contrast, the SRS scheme follows the posterior weight distribution to select the representative particle. Because the SRS select the local representative particle in probabilistic sense, the SRS scheme has higher stability and robustness than the MIS scheme.
5. Implementation of the MPPFs on GPU
5.1. Parallelized MPPF on NVIDIA multicore GPU
The proposed MPPF increases the prediction computation to reduce the complexity of the resampling procedure. Because the MPsampling operation can be executed independently among all basis particles, the prediction computation overhead can be compensated by parallel executions easily. In this subsection, we give the architecture of the MPPF implemented on NVIDIA GPU. NVIDIA multicore GPU can accelerate applications with singleinstructionmultiplethreads (SIMT) execution model and hierarchical memory.
5.2. Implementation result of the SIRPF on GPU
Hardware information for evaluation
GPU  NVIDIA GeForce GTX 280 

CUDA version  2.3 
Number of SMs  30 
Number of cores  240 
Clock frequency  1.3 GHz 
5.3. Design example for loose target accuracy
Execution time comparison between the MPPF and the SIRPF
N  P  Position error  Execution time (speedup) 

SIRPF  
10000  1  8.2052 × 10^{2}  610.7 ms (1×) 
Proposed MPPF (MIS)  
1100  10  8.1061 × 10^{2}  67.7 ms (9.0×) 
650  20  8.1426 × 10^{2}  41.1 ms (14.9×) 
400  50  8.1702 × 10^{2}  27.0 ms (22.6×) 
350  100  8.1376 × 10^{2}  24.8 ms (24.6×) 
300  200  8.1692 × 10^{2}  24.3 ms (25.1×) 
300  500  8.2049 × 10^{2}  36.8 ms (16.6×) 
Proposed MPPF (SRS)  
1450  10  8.1067 × 10^{2}  89.9 ms (6.8×) 
950  20  8.1225 × 10^{2}  58.9 ms (10.4×) 
650  50  8.0923 × 10^{2}  44.0 ms (13.9×) 
550  100  8.0021 × 10^{2}  41.8 ms (14.6×) 
500  200  7.9687 × 10^{2}  44.7 ms (13.7×) 
450  500  7.9891 × 10^{2}  59.8 ms (10.2×) 
The MPPF can use hundreds of particles to meet the same estimation accuracy of the SIRPF with 10,000 particles. Besides, as the particle number is small, the particle with higher weight may be more important to represent the PDF, and the MIS scheme is a proper scheme for small particle number setting. Hence, the MIS MPPF can use fewer particles than SRS scheme to achieve this accuracy threshold.
5.4. Design example for strict target accuracy
In the second design example, we set strict target accuracy first, 0.06, which are simulated accuracy of the SIRPF with 20,000 particles. The prediction number set for evaluation is {10, 20, 50, 100, 200, 500}, the same as in the above example. From the simulation result in Section 4.2, it should be noted that the MISbased MPPF is hard to achieve the threshold 0.06 with large P. Therefore, for the accuracy threshold 0.06, the MIS MPPF cannot use more predictions to reduce the execution time, and we skip the discussion of the MIS scheme for this target accuracy.
Execution time comparison between the MPPF and the SIRPF
N  P  Position error  Execution time (speedup) 

SIRPF  
20000  1  6.3124 × 10^{2}  1211.1 ms (1×) 
Proposed MPPF (SRS)  
3050  10  6.2311 × 10^{2}  187.0 ms (6.5×) 
2000  20  6.2263 × 10^{2}  124.8 ms (9.7×) 
1400  50  6.2784 × 10^{2}  91.2 ms (13.2×) 
1150  100  6.2624 × 10^{2}  79.2 ms (15.3×) 
1050  200  6.2450 × 10^{2}  79.4 ms (15.3×) 
950  500  6.2818 × 10^{2}  95.2 ms (12.7×) 
6. Conclusions
In this article, the MP framework with two LPS schemes is proposed to reduce the number of the basis particles. Among two proposed LPS schemes, the SRS scheme is robust to the measurement noise and does not suffer from accuracy saturation. The MIS scheme can work well for small prediction number P or particle number N. By reducing the basis particle number, the complexity of the resampling, the sequential part of the PF task, can be suppressed significantly. The MP framework increases the prediction computation, and this computation can be easily implemented in parallel due to its data independent feature. In other words, the MPPF increases the overhead of the parallel task and reduces the complexity of the sequential task significantly. To demonstrate the benefit of the MPPF for parallel architecture, we implement the MPPFs and the SIRPF on multicore GPU platform. For the classic BOT experiments, the maximum improvements of the proposed MPPF are 25.1 and 15.3 times faster than the SIRPF with 10,000 and 20,000 particles, respectively.
Appendix
Derivation of the proposed SRS scheme
In general, the SR procedure needs to collect all predicted particle information, and this results in additional latency and memory. Fortunately, the SRS procedure used in the proposed MP framework only selects one particle, and we modify the SR procedure into a sequential comparing operation, as shown in Table 1, to save the memory and latency overhead. In the following, we demonstrate the proposed SRS scheme also follows the probability defined in Equation 14 to select the representative particle.
From Equations 16 and 17, the SRS scheme follows the same probability described in Equation 14 to select the representative particle.
Notes
Acknowledgements
Financial supports from NSC (grant no. NSC 972220E002012) are greatly appreciated.
Supplementary material
References
 1.Gordon N, Salmond D, Smith AF: Novel approach to nonlinear/nonGaussian Bayesian state estimation. IEE Proc F Radar Signal Process 1993, 140: 107113. 10.1049/ipf2.1993.0015CrossRefGoogle Scholar
 2.Doucet A, de Freitas N, Gordon N, eds: Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science. Springer, New York; 2001.Google Scholar
 3.Ristic B, Arulampalam S: Beyond the Kalman Filter: Particle Filters for Tracking. Artech House, Boston; 2004.Google Scholar
 4.Arulampalam MS, Maskell S, Gordon N, Clapp T: A tutorial on particle filters for online nonlinear/nonGaussian Bayesian tracking. IEEE Trans Signal Process 2002,50(2):174188. 10.1109/78.978374CrossRefGoogle Scholar
 5.Capp'e O, Godsill SJ, Moulines E: An overview of existing methods and recent advances in sequential Monte Carlo. Proc IEEE 2007,95(5):899924.CrossRefGoogle Scholar
 6.Bolić M: Architectures for Efficient Implementation of Particle Filters. Ph.D. dissertation, State University of New York, Stony Brook; 2004.Google Scholar
 7.Bolić M, Djurić PM, Hong S: Resampling algorithms and architectures for distributed particle filters. IEEE Trans Signal Process 2005,53(7):24422450.MathSciNetCrossRefGoogle Scholar
 8.Sankaranarayanan AC, Chellappa R, Srivastava A: Algorithmic and architectural design methodology for particle filters in hardware. Proc IEEE International Conference on Computer Design (ICCD) 2005, 275280.Google Scholar
 9.Sankaranarayanan AC, Srivastava A, Chellappa R: Algorithmic and architectural optimizations for computationally efficient particle filtering. IEEE Trans Image Process 2008,17(5):737748.MathSciNetCrossRefGoogle Scholar
 10.Miao L, Zhang J, Chakrabarti C, PapandreouSuppappola A: A new parallel implementation for particle filters and its application to adaptive waveform design. Proc IEEE Workshop on Design and Impl Signal Proc Systems (SiPS) 2010, 1924.Google Scholar
 11.Manjunath BB, Williams AS, Chakrabarti C: A PapandreouSuppappola, Efficient mapping of advanced signal processing algorithms on multiprocessor architectures. Proc IEEE Workshop on Design and Impl Signal Proc Systems (SiPS2008) 2008, 269274.CrossRefGoogle Scholar
 12.Hill MD, Marty MR: Amdahl's law in the multicore era. IEEE Trans Comput 2008,41(7):3338.Google Scholar
 13.Chao CH, Chu CY, Wu AY: Locationconstrained particle filter for RSSIbased indoor human positioning and tracking system. Proc IEEE Workshop on Design and Impl Signal Proc Systems (SiPS2008) 2008, 7376.CrossRefGoogle Scholar
 14.Evennou F, Marx F, Novakov E: Mapaided indoor mobile positioning system using particle filter. Proc of IEEE Wireless Communications and Network Conf (WCNC) 2005, 1317.Google Scholar
 15.Shams R, Sadeghi P, Kennedy RA, Hartley RI: A survey of medical image registration on multicore and the GPU. IEEE Signal Process Mag 2010,27(2):5060.CrossRefGoogle Scholar
 16.Bisceglie MD, Santo MD, Galdi C, Lanari R, Ranaldo N: Synthetic aperture radar processing with GPGPU. IEEE Signal Process Mag 2010,27(2):6978.CrossRefGoogle Scholar
 17.Cheung NM, Fan X, Au OC, Kung MC: Video coding on multicore graphics processors. IEEE Signal Process Mag 2010,27(2):7989.CrossRefGoogle Scholar
 18.NVIDIA, NVIDIA CUDA TM programming guide[http://www.nvidia.com/object/cudahomenew.html]
 19.Lindholm E, Nickolls J, Oberman S, Montrym J: NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 2008,28(2):3955.CrossRefGoogle Scholar
Copyright information
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.