Sequential Monte Carlo Methods for Joint Detection and Tracking of Multiaspect Targets in Infrared Radar Images

  • Marcelo G. S. Bruno
  • Rafael V. Araújo
  • Anton G. Pavlov
Open Access
Research Article
Part of the following topical collections:
  1. Track Before Detect Algorithms

Abstract

We present in this paper a sequential Monte Carlo methodology for joint detection and tracking of a multiaspect target in image sequences. Unlike the traditional contact/association approach found in the literature, the proposed methodology enables integrated, multiframe target detection and tracking incorporating the statistical models for target aspect, target motion, and background clutter. Two implementations of the proposed algorithm are discussed using, respectively, a resample-move (RS) particle filter and an auxiliary particle filter (APF). Our simulation results suggest that the APF configuration outperforms slightly the RS filter in scenarios of stealthy targets.

Keywords

Target Motion Sequential Monte Carlo Background Clutter Joint Detection Image Grid 

1. Introduction

This paper investigates the use of sequential Monte Carlo filters [1] for joint multiframe detection and tracking of randomly changing multiaspect targets in a sequence of heavily cluttered remote sensing images generated by an infrared airborne radar (IRAR) [2]. For simplicity, we restrict the discussion primarily to a single target scenario and indicate briefly how the proposed algorithms could be modified for multiobject tracking.

Most conventional approaches to target tracking in images [3] are based on suboptimal decoupling of the detection and tracking tasks. Given a reference target template, a two-dimensional (2D) spatial matched filter is applied to a single-frame of the image sequence. The pixel locations where the output of the matched filter exceeds a pre-specified threshold are treated then as initial estimates of the true position of detected targets. Those preliminary position estimates are subsequently assimilated into a multiframe tracking algorithm, usually a linearized Kalman filter, or alternatively discarded as false alarms originating from clutter.

Depending on its level of sophistication, the spatial matched filter design might or might not take into account the spatial correlation of the background clutter and random distortions of the true target aspect compared to the reference template. In any case, however, in a scenario with dim targets in heavily cluttered environments, the suboptimal association of a single-frame matched filter detector and a multiframe linearized tracking filter is bound to perform poorly [4].

As an alternative to the conventional approaches, we introduced in [5, 6] a Bayesian algorithm for joint multiframe detection and tracking of known targets, fully incorporating the statistical models for target motion and background clutter and overcoming the limitations of the usual association of single-frame correlation detectors and Kalman filter trackers in scenarios of stealthy targets. An improved version of the algorithm in [5, 6] was later introduced in [7] to enable joint detection and tracking of targets with unknown and randomly changing aspect. The algorithms in [5, 6, 7] were however limited by the need to use discrete-valued stochastic models for both target motion and target aspect changes, with the "absent target" hypothesis treated as an additional dummy aspect state. A conventional hidden Markov model (HMM) filter was used then to perform joint minimum probability of error multiframe detection and maximum a posteriori (MAP) tracking for targets that were declared present in each frame. A smoothing version of the joint multiframe HMM detector/tracker, based essentially on a 2D version of the forward-backward (Baum-Welch) algorithm, was later proposed in [4]. Furthermore, we also proposed in [4] an alternative tracker based on particle filtering [1, 8] which, contrary to the original HMM tracker in [7], assumed a continuous-valued kinematic (position and velocity) state and a discrete-valued target aspect state. However, the particle filter algorithm in [4] enabled tracking only (assuming that the target was always present in all frames) and used decoupled statistically independent models for target motion and target aspect.

To better capture target motion, we drop in this paper the previous constraint in [5, 6, 7] and, as in the later sections of [4], allow the unknown 2D position and velocity of the target to be continuous-valued random variables. The unknown target aspect is still modeled however as a discrete random variable defined on a finite set Open image in new window , where each symbol is a pointer to a possibly rotated, scaled, and/or sheared version of the target's reference template. In order to integrate detection and tracking, building on our previous HMM work in [7], we extend the set Open image in new window to include an additional dummy state that represents the absence of a target of interest in the scene. The evolution over time of the target's kinematic and aspect states is described then by a coupled stochastic dynamic model where the sequences of target positions, velocities, and aspects are mutually dependent.

Contrary to alternative feature-based trackers in the literature, the proposed algorithm in this paper detects and tracks the target directly from the raw sensor images, processing pixel intensities only. The clutter-free target image is modeled by a nonlinear function that maps a given target centroid position into a spatial distribution of pixels centered around the (quantized) centroid position, with shape and intensity being dependent on the current target aspect. Finally, the target is superimposed to a structured background whose spatial correlation is captured by a noncausal Gauss-Markov random field (GMRf) model [9, 10, 11]. The GMRf model parameters are adaptively estimated from the observed data using an approximate maximum likelihood (AML) algorithm [12].

Given the problem setup described in the previous pa ragraph, the optimal solution to the integrated detection/ tracking problem requires the recursive computation at each frame Open image in new window of the joint posterior distribution of the target's kinematic and aspect states conditioned on all observed frames from instant Open image in new window up to instant Open image in new window . Given, however, the inherent nonlinearity of the observation and (possibly) motion models, the exact computation of that posterior distribution is generally not possible. We resort then to mixed-state particle filtering [13] to represent the joint posterior by a set of weighted samples (or particles) such that, as the number of particles goes to infinity, their weighted average converges (in some statistical sense) to the desired minimum mean-square error (MMSE) estimate of the hidden states. Following a sequential importance sampling (SIS) [14] approach, the particles may be drawn recursively from the coupled prior statistical model for target motion and aspect, while their respective weights may be updated recursively using a likelihood function that takes into account the models for the target's signature and for the background clutter.

We propose two different implementations for the mixed-state particle filter detector/tracker. The first implementation, which was previously discussed in a conference paper (see [15]) is a resample-move (RS) filter [16] that uses particle resampling [17] followed by a Metropolis-Hastings move step [18] to combat both particle degeneracy and particle impoverishment (see [8]). The second implementation, which was not included in [15], is an auxiliary particle filter (APF) [19] that uses the current observed frame at instant Open image in new window to preselect those particles at instant Open image in new window which, when propagated through the prior dynamic model, are more likely to generate new samples with high likelihood. Both algorithms are original with respect to the previous particle filtering-based tracking algorithm that we proposed in [4], where the problem of joint detection and tracking with coupled motion and aspect models was not considered.

Related Work and Different Approaches in the Literature

Following the seminal work by Isard and Blake [20], particle filters have been extensively applied to the solution of visual tracking problems. In [21], a sequential Monte Carlo algorithm is proposed to track an object in video subject to model uncertainty. The target's aspect, although unknown, is assumed, however, to be fixed in [21], with no dynamic aspect change. On the other hand, in [22], an adaptive appearance model is used to specify a time-varying likelihood function expressed as a Gaussian mixture whose parameters are updated using the EM [23] algorithm. As in our work, the algorithm in [22] also processes image intensities directly, but, unlike our problem setup, the observation model in [22] does not incorporate any information about spatial correlation of image pixels, treating instead each pixel as independent observations. A different Bayesian algorithm for tracking nonrigid (randomly deformable) objects in three-dimensional images using multiple conditionally independent cues is presented in [24]. Dynamic object appearance changes are captured by a mixed-state shape model [13] consisting of a discrete-valued cluster membership parameter and a continuous-valued weight parameter. A separate kinematic model is used in turn to describe the temporal evolution of the object's position and velocity. Unlike our work, the kinematic model in [24] is assumed statistically independent of the aspect to model.

Rather than investigating solutions to the problem of multiaspect tracking of a single target, several recent references, for example, [25, 26], use mixture particle filters to tackle the different but related problem of detecting and tracking an unknown number of multiple objects with different but fixed appearance. The number of terms in the nonparametric mixture model, that represents the posterior of the unknowns, is adaptively changed as new objects are detected in the scene and initialized with a new associated observation model. Likewise, the mixture weights are also recursively updated from frame to frame in the image sequence.

Organization of the Paper

The paper is divided into 6 sections. Section 1 is this introduction. In Section 2, we present the coupled model for target aspect and motion and review the observation and clutter models focusing on the GMRf representation of the background and the derivation of the associated likelihood function for the observed (target + clutter) image. In Section 3, we detail the proposed detector/tracker in the RS and APF configurations. The performance of the two filters is discussed in Section 4 using simulated infrared airborne radar (IRAR) data. A preliminary discussion on multitarget tracking is found in Section 5, followed by an illustrative example with two targets. Finally, we present in Section 6 the conclusions of our work.

2. The Model

In the sequel, we present the target and clutter models that are used in this paper. We use lowercase letters to denote both random variables/vectors and realizations (samples) of random variables/vectors; the proper interpretation is implied in context. We use lowercase Open image in new window to denote probability density functions (pdfs) and uppercase Open image in new window to denote the probability mass functions (pmfs) of discrete random variables. The symbol Open image in new window is used to denote the probability of an event Open image in new window in the Open image in new window -algebra of the sample space.

State Variables

Let Open image in new window be a nonnegative integer number and let superscript Open image in new window denote the transpose of a vector or matrix. The kinematic state of the target at frame Open image in new window is defined as the four-dimensional continuous (real-valued) random vector Open image in new window that collects the positions, Open image in new window and Open image in new window , and the velocities, Open image in new window and Open image in new window , of the target's centroid in a system of 2D Cartesian coordinates Open image in new window . On the other hand, the target's aspect state at frame Open image in new window , denoted by Open image in new window , is assumed to be a discrete random variable that takes values in the finite set Open image in new window where the symbol " Open image in new window " is a dummy state that denotes that the target is absent at frame Open image in new window , and each symbol Open image in new window , Open image in new window , is in turn a pointer to one possibly rotated, scaled, and/or sheared version of the target's reference template.

2.1. Target Motion and Aspect Models

The random sequence Open image in new window , is modeled as first-order Markov process specified by the pdf of the initial kinematic state Open image in new window , the transition pdf Open image in new window , the transition probabilities Open image in new window Open image in new window Open image in new window , Open image in new window , and the initial probabilities Open image in new window , Open image in new window .

Aspect Change Model

Assume that, at any given frame, for any aspect state Open image in new window , the clutter-free target image lies within a bounded rectangle of size Open image in new window . In this notation, Open image in new window and Open image in new window denote the maximum pixel distances in the target image when we move away, respectively, up and down, from the target centroid. Analogously, Open image in new window and Open image in new window are the maximum horizontal pixel distances in the target image when we move away, respectively, left and right, from the target centroid.

Assume also that each image frame has the size of Open image in new window pixels. We introduce next the extended grid Open image in new window that contains all possible target centroid locations for which at least one target pixel still lies in the sensor image. Next, let Open image in new window be a matrix of size Open image in new window such that Open image in new window for any Open image in new window and
Assuming that a transition from a "present target" state to the "absent target" state can only occur when the target moves out of the image, we model the probability of a change in the target's aspect from the state Open image in new window to the state Open image in new window , Open image in new window , as
where the two-dimensional vector Open image in new window denotes the quantized target centroid position defined on the extended image grid and obtained from the four-dimensional continuous kinematic state Open image in new window by making

where Open image in new window and Open image in new window are the spatial resolutions of the image, respectively, in the directions Open image in new window and Open image in new window . The parameter Open image in new window in (2) denotes in turn the probability of a new target entering the image once the previous target became absent. For simplicity, we restrict the discussion in this paper to the situation where there is at most one single target of interest present in the scene at each image frame. The specification Open image in new window , Open image in new window , corresponds to assuming the worst-case scenario where, given that a new target entered the scene, there is a uniform probability that the target will take any of the Open image in new window possible aspect states. Finally, the term Open image in new window in (2) is the probability of a target moving out of the image at frame Open image in new window given its kinematic and aspect states at frame Open image in new window .

Motion Model

For simplicity, we assume that, except in the situation where there is a transition from the "absent target" state to the "present target" state, the conditional pdf Open image in new window is independent of the current and previous aspect states, respectively, Open image in new window and Open image in new window . In other words, unless Open image in new window and Open image in new window , we make
where Open image in new window is an arbitrary pdf (not necessarily Gaussian) that models the target motion. Otherwise, if Open image in new window and Open image in new window , we reset the target's position and make
where Open image in new window is typically a noninformative (e.g., uniform) prior pdf defined in a certain region (e.g., upper-left corner) of the image grid. Given the independence assumption in (4), it follows that, for any Open image in new window

2.2. Observation Model and Likelihood Function

Next, we discuss the target observation model. Previous references mentioned in Section 1, for example, [21, 22, 24, 25, 26], are concerned mostly with video surveillance of near objects (e.g., pedestrian or vehicle tracking), or other similar applications (e.g., face tracking in video). For that class of applications, effects such as object occlusion are important and must be explicitly incorporated into the target observation model. In this paper by contrast, the emphasis is on a different application, namely, detection and tracking of small, quasipoint targets that are observed by remote sensors (usually mid-to high-altitude airborne platforms) and move in highly structured, generally smooth backgrounds (e.g., deserts, snow-covered fields, or other forms of terrain). Rather than modeling occlusion, our emphasis is instead on additive natural clutter.

Image Frame Model

Assuming a single target scenario, the Open image in new window th frame in the image sequence is modeled as the Open image in new window matrix:
where the matrix Open image in new window represents the background clutter and Open image in new window is a nonlinear function that maps the quantized target centroid position, Open image in new window , (see (3)) into a spatial distribution of pixels centered at Open image in new window and specified by a set of deterministic and known target signature coefficients dependent on the aspect state Open image in new window . Specifically, we make [4]

where Open image in new window is an Open image in new window matrix whose entries are all equal to zero, except for the element Open image in new window which is equal to 1.

For a given fixed template model Open image in new window , the coefficients Open image in new window in (8) are the target signature coefficients responding to that particular template. The signature coefficients are the product of a binary parameter Open image in new window , that defines the target shape for each aspect state, and a real coefficient Open image in new window , that specifies the pixel intensities of the target, again for the various states in the alphabet Open image in new window . For simplicity, we assume that the pixel intensities and shapes are deterministic and known at each frame for each possible value of Open image in new window . In particular, if Open image in new window takes the value Open image in new window denoting absence of target, then the function Open image in new window in (7) reduces to the identically zero matrix, indicating that sensor observations consist of clutter only.

Remark. Equation (8) assumes that the target's template is entirely located within the sensor image grid. Otherwise, for targets that are close to the image borders, the summation limits in (8) must be changed accordingly to take into account portions of the target that are no longer visible.

Clutter Model

In order to describe the spatial correlation of the background clutter, we assume that, after suitable preprocessing to remove the local means, the random field Open image in new window , Open image in new window , Open image in new window , is modeled as a first-order noncausal Gauss-Markov random field (GMrf) described by the finite difference equation [9]

where Open image in new window , with Open image in new window if Open image in new window and zero otherwise. The symbol Open image in new window denotes here the expectation (or expected value) of a random variable/vector.

Likelihood Function Model

Let Open image in new window , Open image in new window , and Open image in new window be the one-dimensional equivalent representations, respectively, of Open image in new window , Open image in new window and Open image in new window in (7), obtained by row-lexicographic ordering. Let also Open image in new window denote the covariance matrix associated with the random vector Open image in new window , assumed to have zero mean after appropriate preprocessing. For a GMrf model as in (9), the corresponding likelihood function for a fixed aspect state Open image in new window , Open image in new window , is given by [4]
is referred to in our work as the data term and
is called the energy term. On the other hand, for Open image in new window , Open image in new window reduces to the likelihood of the absent target state, which corresponds to the probability density function of Open image in new window assuming that the observation consists of clutter only, that is,
Writing the difference equation (9) in compact matrix notation, it can be shown [9, 10, 11] by the application of the principle of orthogonality that Open image in new window has a block-tridiagonal structure of the form

where Open image in new window denotes the Kronecker product, Open image in new window is Open image in new window identity matrix, and Open image in new window is a Open image in new window matrix whose entries Open image in new window if Open image in new window and are equal to zero otherwise.

Using the block-banded structure of Open image in new window in (14), it can be further shown that Open image in new window may be evaluated as the output of a modified 2D spatial matched filter using the expression
where Open image in new window , Open image in new window , are obtained, from (3), and Open image in new window is the output of a 2D differential operator

with Dirichlet (identically zero) boundary conditions.

Similarly, the energy term Open image in new window can be also efficiently computed by exploring the block-banded structure of Open image in new window . The resulting expression is the difference between the autocorrelation of the signature coefficients Open image in new window and their lag-one cross-correlations weighted by the respective GMrf model parameters Open image in new window or Open image in new window . Before we leave this section, we make two additional remarks.

Remark 2.

As before, (15) is valid for Open image in new window and Open image in new window . For centroid positions close to the image borders, the summation limits in (15) must be varied accordingly (see [6] for details).

Remark 3.

Within our framework, a crude non-Bayesian single frame maximum likelihood target detector could be built by simply evaluating the likelihood map Open image in new window for each aspect state Open image in new window and finding the maximum over the image grid of the sum of likelihood maps weighted by the a priori probability for each state Open image in new window (usually assumed to be identical). A target would be considered present then if the weighted likelihood peak exceeded a certain threshold. In that case, the likelihood peak would also provide an estimate for the target location. The integrated joint detector/tracker presented in Section 3 outperforms, however, the decoupled single-frame detector discussed in this remark by fully incorporating the dynamic motion and aspect motion into the detection process and enabling multiframe detection within the context of a track-before-detect philosophy.

3. Particle Filter Detector/Tracker

3.1. Sequential Importance Sampling

Given a sequence of observed frames Open image in new window , our goal is to generate, at each instant Open image in new window , a properly weighted set of samples (or particles) Open image in new window , Open image in new window , with associated weights Open image in new window such that, according to some statistical criterion, as Open image in new window goes to infinity,
A possible mixed-state sequential importance sampling (SIS) strategy (see [4, 13]) for the recursive generation of the particles Open image in new window and their proper weights is described in the algorithm below.
  1. (1)
     
  2. (2)
    Importance Sampling For Open image in new window
    1. (i)

      Draw Open image in new window according to (2).

       
    2. (ii)

      Draw Open image in new window according to (4) or (5).

       
    3. (iii)
      Update the importance weights
       

    using the likelihood function in Section 2.2.

     
End-For

3.2. Resample-Move Filter

The sequential importance sampling algorithm in Section 3.1 is guaranteed to converge asymptotically with probability one; see [27]. However, due to the increase in the variance of the importance weights, the raw SIS algorithm suffers from the "particle degeneracy" phenomenon [8, 14, 17]; that is, after a few steps, only a small number of particles will have normalized weights close to one, whereas the majority of the particles will have negligible weight. As a result of particle degeneracy, the SIS algorithm is inefficient, requiring the use of a large number of particles to achieve adequate performance.

Resampling Step

A possible approach to mitigate degeneracy is [17] to resample from the existing particle population with replacement according to the particle weights. Formally, after the normalization of importance weights Open image in new window , we draw Open image in new window with Open image in new window , and build a new particle set Open image in new window , Open image in new window , such that Open image in new window Open image in new window Open image in new window After the resampling step, the new selected trajectories Open image in new window Open image in new window Open image in new window Open image in new window Open image in new window Open image in new window are approximately distributed (see, e.g., [28]) according to the mixed posterior pdf Open image in new window so that we can reset all particle weights to Open image in new window .

Move Step

Although particle resampling according to the weights reduces particle degeneracy, it also introduces an undesirable side effect, namely, loss of diversity in the particle population as the resampling processes generate multiple copies of a small number or, in the extreme case, only one high-weight particle. A possible solution, see [16], to restore sample diversity without altering the sample statistics is to move the current particles Open image in new window to new locations Open image in new window using a Markov chain transition kernel Open image in new window , that is, invariant to the conditional mixture pdf Open image in new window . Provided that the invariance condition is satisfied, the new particle trajectories Open image in new window remain distributed according to Open image in new window and the associated particle weights may be kept equal to Open image in new window . A Markov chain that satisfies the desired invariance condition can be built using the following Metropolis-Hastings strategy [15, 18].

For Open image in new window , the following algorithm holds.
  1. (i)

    Draw Open image in new window according to (2).

     
  2. (ii)

    Draw Open image in new window according to (4) or (5).

     
  3. (iii)
     
  4. (iv)
     

End-For.

3.3. Auxiliary Particle Filter

An alternative to the resample-move filter in Section 3.2 is to use the current observation Open image in new window to preselect at instant Open image in new window a set of particles that, when propagated to instant Open image in new window according to the system dynamics, is more likely to generate samples with high likelihood. That can be done using an auxiliary particle filter (APF) [19] which samples in two steps from a mixed importance function:
where Open image in new window and Open image in new window are drawn according to the mixed prior Open image in new window . The proposed algorithm is summarized into the following steps.
  1. (1)
    Pre-sampling Selection Step For Open image in new window
    1. (i)

      Draw Open image in new window according to (2).

       
    2. (ii)

      Draw Open image in new window according to (4) or (5).

       
    3. (iii)
      Compute the first-stage importance weights
       

    using the likelihood function model in Section 2.2.

     
End-For
  1. (2)
    Importance Sampling with Auxiliary Particles For Open image in new window
    1. (i)
       
    2. (ii)

      Sample Open image in new window according to (2).

       
    3. (iii)

      Sample Open image in new window according to (4) or (5).

       
    4. (iv)
      Compute the second-stage importance weights

      End-For.

       
    5. (v)

      Normalize the weights Open image in new window such that Open image in new window .

       
     
  2. (3)
    Post-sampling Selection Step For Open image in new window
     

3.4. Multiframe Detector/Tracker

The final result at instant Open image in new window of either the RS algorithm in Section 3.2 or the APF algorithm in Section 3.3 is a set of equally weighted samples Open image in new window that are approximately distributed according to the mixed posterior Open image in new window . Next, let Open image in new window denote the hypothesis that the target of interest is present in the scene at frame Open image in new window . Conversely, let Open image in new window denote the hypothesis that the target is absent. Given the equally weighted set Open image in new window , we compute then the Monte Carlo estimate, Open image in new window , of the posterior probability of target absence by dividing the number of particles for which Open image in new window by the total number of particles Open image in new window . The minimum probability of error test to decide between hypotheses Open image in new window and Open image in new window at frame Open image in new window is approximated then by the decision rule
or, equivalently,

Finally, if Open image in new window is accepted, the estimate Open image in new window of the target's kinematic state at instant Open image in new window is obtained from the Monte Carlo approximation of Open image in new window , which is computed by averaging out the particles Open image in new window such that Open image in new window .

4. Simulation Results

In this section, we quantify the performance of the proposed sequential Monte Carlo detector/tracker, both in the RS and APF configurations, using simulated infrared airborne radar (IRAR) data. The background clutter is simulated from real IRAR images from the MIT Lincoln Laboratory database, available at the CIS website, at Johns Hopkins University. An artificial target template representing a military vehicle is added to the simulated image sequence. The simulated target's centroid moves in the image from frame to frame according to the simple white-noise acceleration model in [3, 4] with parameters Open image in new window and Open image in new window  second. A total of four rotated, scaled, or sheared versions of the reference template is used in the simulation.

The target's aspect changes from frame to frame following a known discrete-valued hidden Markov chain model where the probability of a transition to an adjacent aspect state is equal to 40%. In the notation of Section 2.1, that specification corresponds to setting Open image in new window , Open image in new window , Open image in new window if Open image in new window , and Open image in new window otherwise. All four templates are equally likely at frame zero, that is, Open image in new window for Open image in new window . The initial Open image in new window and Open image in new window positions of the target's centroid at instant zero are assumed to be uniformly distributed, respectively, between pixels 50 and 70 in the Open image in new window coordinate and pixels 10 and 20 in the Open image in new window coordinate. The initial velocities Open image in new window and Open image in new window are in turn Gaussian-distributed with identical means ( Open image in new window or Open image in new window pixels/frame) and a small standard deviation ( Open image in new window ).

Finally, the background clutter for the moving target sequence was simulated by adding a sequence of synthetic GMrf samples to a matrix of previously stored local means extracted from the database imagery. The GMrf samples were synthetized using correlation and prediction error variance parameters estimated from real data using the algorithms developed in [11, 12] see [4] for a detailed pseudocode.

Two video demonstrations of the operation of the proposed detector/tracker are available for visualization by clicking on the links in [29]. The first video (peak target-to-clutter ratio, or PTCR 10 dB) illustrates the performance over 50 frames of an 8 000-particle RS detector/tracker implemented as in Section 3.2, whereas the second video (PTCR 6.5 dB) demonstrates the operation over 60 frames of a 5 000-particle APF detector/tracker implemented as in Section 3.3. Both video sequences show a target of interest that is tracked inside the image grid until it disappears from the scene; the algorithm then detects that the target is absent and correctly indicates that no target is present. Next, once a new target enters the scene, that target is acquired and tracked accurately until, in the case of the APF demonstration, it leaves the scene and no target detection is once again correctly indicated.

Both video demos show the ability of the proposed algorithms to (1) detect and track a present target both inside the image grid and near its borders, (2) detect when a target leaves the image and indicate that there is no target present until a new target appears and (3), when a new target enters the scene, correctly detect that the target is present and track it accurately. In the sequel, for illustrative purposes only, we show in the paper the detection/tracking results for a few selected frames using the RS algorithm and a dataset that is different from the one shown in the video demos.

Figure 1(a) shows the initial frame of the sequence with the target centered in the (quantized) coordinates Open image in new window and superimposed on clutter. The clutter-free target template, centered at the same pixel location, is shown as a binary image in Figure 1(b). The simulated PTCR in Figure 1(b) is 10.6 dB.
Figure 1

(a) First frame of the cluttered target sequence, PTCR = 10. 6 dB; (b) target template and position in the first frame shown as a binary image.

Next, Figure 2(a) shows the tenth frame in the image sequence. Once again, we show in Figure 2(b) the corresponding clutter-free target image as a binary image. Note that the target from frame 1 has now undergone a random change in aspect in addition to translational motion.
Figure 2

(a) Tenth frame of the cluttered target sequence, PTCR = 10. 6 dB, with target translation, rotation, scaling, and shearing; (b) target template and position in the tenth frame shown as a binary image.

The tracking results corresponding to frames 1 and 10 are shown, respectively, in Figures 3(a) and 3(b). The actual target positions are indicated by a cross sign ('+'), while the estimated positions are indicated by a circle ('o'). Note that the axes in Figures 1(a) and 1(b) and Figures 2(a) and 2(b) represent integer pixel locations, while the axes in Figures 3(a) and 3(b) represent real-valued Open image in new window and Open image in new window , coordinates assuming spatial resolutions of Open image in new window meters/pixel such that the Open image in new window pixel range in the axes of Figures 1(a) and 1(b) and Figures 2(a) and 2(b) corresponds to a Open image in new window meter range in the axes of Figures 3(a) and 3(b).
Figure 3

Tracking results: actual target position (+), estimated target position (o); (a) initial frame, (b) tenth frame.

In this particular example, the target leaves the scene at frame 31 and no target reappears until frame 37. The SMC tracker accurately detects the instant when the target disappears and shows no false alarms over the 6 absent target frames as illustrated in Figures 4(a) and 4(b) where we show, respectively, the clutter+background-only thirty-sixth frame and the corresponding tracking results indicating in this case that no target has been detected. Finally, when a new target reappears, it is accurately acquired by the SMC algorithm. The final simulated frame with the new target at position Open image in new window is shown for illustration purposes in Figure 5(a). Figure 5(b) shows the corresponding tracking results for the same frame.
Figure 4

(a) Thirty-sixth frame of the cluttered target sequence with no target present; (b) detection result indicating absence of target.

Figure 5

(a) Fifty-first frame of the cluttered target sequence, PTCR = 10. 6 dB, with a new target present in the scene; (b) tracking results: actual target position (+), estimated target position (o).

In order to obtain a quantitative assessment of tracking performance, we ran 100 independent Monte Carlo simulations using, respectively, the 5000-particle APF detector/tracker and the 8000-particle RS detector/tracker. Both algorithms correctly detected the presence of the target over a sequence of 20 simulated frames in all 100 Monte Carlo runs. However, with PTCR = 6.5 dB, the 5000-particle APF tracker diverged (i.e., failed to estimate the correct target trajectory) in 3 out of the 100 Monte Carlo trials, whereas the RS tracker diverged in 5 out of 100 runs. When we increased the PTCR to 8.1  dB, the divergence rates fell to 2 out of 100 for the APF, and 3 out of 100 for the RS filter. Figures 6(a) and 6(b) show, in the case of PTCR = 6.5 dB, the root mean square (RMS) error curves (in number of pixels) for the target's position estimates, respectively, in coordinates Open image in new window and Open image in new window generated by both the APF and the RS trackers. The RMS error curves in Figure 6 were computed from the estimation errors recorded in each of the 100 Monte Carlo trials, excluding the divergent realizations. Our simulation results suggest that, despite the reduction in the number of particles from 8000 to 5000, the APF tracker still outperforms the RS tracker, showing similar RMS error performance with a slightly lower divergence rate. For both filters, in the nondivergent realizations, the estimation error is higher in the initial frames and decreases over time as the target is acquired and new images are processed.
Figure 6

RMS error for the target's position estimate, respectively, for the APF (divergence rate, 3%) and resample-move (divergence rate, 5%) trackers, PTCR = 6. 5 d; (a) Open image in new window coordinate, (b) Open image in new window coordinate.

5. Preliminary Discussion on Multitarget Tracking

We have considered so far a single target with uncertain aspect (e.g., random orientation or scale). In theory, however, the same modeling framework could be adapted to a scenario where we consider multiple targets with known (fixed) aspect. In that case, the discrete state Open image in new window , rather than representing a possible target model, could denote instead a possible multitarget configuration hypothesis. For example, if we knew a priori that there is a maximum of Open image in new window targets in the field of view of the sensor at each time instant, then Open image in new window would take Open image in new window possible values corresponding to the hypotheses ranging from "no target present" to "all targets present" in the image frame at instant Open image in new window . The kinematic state Open image in new window , on the other hand, would have variable dimension depending on the value assumed by Open image in new window , as it would collect the centroid locations of all targets that are present in the image given a certain target configuration hypothesis. Different targets could be assumed to move independently of each other when present and to disappear only when they move out of the target grid as discussed in Section 2. Likewise, a change in target configuration hypotheses would result in new targets appearing in uniformly random locations as in (5).

The main difficulty associated with the approach described in the previous paragraph is however that, as the number of targets increases, the corresponding growth in the dimension of the state space is likely to exacerbate particle depletion, thus causing the detection/tracking filters to diverge if the number of particles is kept constant. That may render the direct application of the joint detection/tracking algorithms in this paper unfeasible in a multitarget scenario. The basic tracking routines discussed in the paper may be still viable though when used in conjunction with more conventional algorithms for target detection/acquisition and data association. For a review of alternative approaches to multitarget tracking, mostly for video applications, we refer the reader to [30, 31, 32, 33].

5.1. Likelihood Function Modification in a Multitarget Scenario

In the alternative scenario with multiple (at most Open image in new window ) targets, where Open image in new window represents one of Open image in new window possible target configurations, the likelihood function model in (10) depends instead on a sum of data terms
and a sum of energy terms

where Open image in new window is the long-vector representation of the clutter-free image of the Open image in new window th target under the target configuration hypothesis Open image in new window , assumed to be identically zero for target configurations under which the Open image in new window th target is not present. The sum of the data terms corresponds to the sum of the outputs of different correlation filters matched to each of the Open image in new window possible (fixed) target templates taking into account the spatial correlation of the clutter background. The energy terms, Open image in new window , are on the other hand constant with Open image in new window for most possible locations of targets Open image in new window and Open image in new window on the image grid, except when either one of the two targets or both are close to the image borders. Finally, for Open image in new window , the energy terms are zero for present targets that are sufficiently apart from each other and, therefore, most of the time, they do not affect the computation of the likelihood function. The terms Open image in new window must be taken into account, however, for overlapping targets; in this case, they may be computed efficiently exploring the sparse structure of Open image in new window and Open image in new window . For details, we refer the reader to future work.

5.2. Illustrative Example with Two Targets

We conclude this preliminary discussion on multitarget tracking with an illustrative example where we track two simulated targets moving on the same real clutter background from Section 4 for 22 consecutive frames. This example differs, however, from the simulations in Section 4 in the sense that, rather than performing joint detection and tracking of the two targets, the algorithm assumes a priori that two targets are always present in the scene and performs target tracking only. The two targets are preacquired (detected) in the initial frame such that their initial positions are known up only to a small uncertainty. For this particular simulation, with PTCR 12.5 dB, that preliminary acquisition was done by applying the differential filter in (16) to the initial frame, and then applying the output of the differential filter to a bank of two spatial matched filters as in (15), designed according to the signature coefficients, respectively, for targets 1 and 2. The outputs of the two matched filters minus the corresponding energy terms for targets 1 and 2, respectively, are finally added together and thresholded to provide the initial estimates of the location of the two targets. Note that the cross-energy terms discussed in Section 5.1 may be ignored in this case since we are assuming that the two targets are initially sufficiently far apart. Frames 1 and 10 of the simulated cluttered sequence with the two targets are shown in Figures 7(a) and 7(b) for illustration purposes.
Figure 7

Simulated image sequence with two moving targets: (a) first frame, (b) tenth frame; PTCR 12. 5 dB.

Once the two targets are initially acquired, we track them jointly from the raw sensor images (i.e., without any other conventional decoupled data association/tracking method) using an auxiliary particle filter that assumes the modified likelihood function of Section 5.1 and two independent motion models, respectively, for targets 1 and 2. The tracking filter uses Open image in new window particles. Figure 8 shows the actual and estimated trajectories, respectively, for targets 1 and 2. As we can see from the plots, the experimental results are worse than those obtained in the single target, multiaspect case, but the filter was still capable of tracking the pixel locations of the centroids of the two targets fairly accurately with errors ranging from zero to two or three pixels at most and without using any ad hoc data association routine.
Figure 8

Illustrative example of two-target tracking using an auxiliary particle filter.

6. Conclusions and Future Work

We discussed in this paper a methodology for joint detection and tracking of multiaspect targets in remote sensing image sequences using sequential Monte Carlo (SMC) filters. The proposed algorithm enables integrated, multiframe target detection and tracking incorporating the statistical models for target motion, target aspect, and spatial correlation of the background clutter. Due to the nature of the application, the emphasis is on detecting and tracking small, remote targets under additive clutter, as opposed to tracking nearby objects possibly subject to occlusion.

Two different implementations of the SMC detector/tracker were presented using, respectively, a resample-move (RS) particle filter and an auxiliary particle filter (APF). Simulation results show that, in scenarios with heavily obscured targets, the APF and RS configurations have similar tracking performance, but the APF algorithm has a slightly smaller percentage of divergent realizations. Both filters, on the other hand, were capable of correctly detecting the target in each frame, including accurately declaring absence of target when the target left the scene and, conversely, detecting a new target when it entered the image grid. The multiframe track-before-detect approach allowed for efficient detection of dim targets that may be near invisible in a single-frame but become detectable when seen across multiple frames.

The discussion in this paper was restricted to targets that assume only a finite number of possible aspect states defined on a library of target templates. As an alternative for future work, an appearance model similar to the one described in [24] could be used instead, allowing the discrete-valued aspect states Open image in new window to denote different classes of continuous-valued target deformation models, as opposed to fixed target templates. Similarly, the framework in this paper could also be modified to allow for multiobject tracking as indicated in Section 5.

Notes

Acknowledgment

Part of the material in this paper was presented at the 2005 IEEE Aerospace Conference.

References

  1. 1.
    Doucet A, Godsill S, Andrieu C: On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing 2000, 10(3):197-208. 10.1023/A:1008935410038CrossRefGoogle Scholar
  2. 2.
    Bounds JK: The Infrared airborne radar sensor suite. In RLE Tech. Rep. 610. Massachusetts Institute of Technology, Cambridge, Mass, USA; 1996.Google Scholar
  3. 3.
    Bar-Shalom Y, Li X: Multitarget-Multisensor Tracking: Principles and Techniques. YBS Publishing, Storrs, Conn, USA; 1995.Google Scholar
  4. 4.
    Bruno MGS: Bayesian methods for multiaspect target tracking in image sequences. IEEE Transactions on Signal Processing 2004, 52(7):1848-1861. 10.1109/TSP.2004.828903CrossRefGoogle Scholar
  5. 5.
    Bruno MGS, Moura JMF: Optimal multiframe detection and tracking in digital image sequences. Proceedings of the IEEE International Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 5: 3192-3195.Google Scholar
  6. 6.
    Bruno MGS, Moura JMF: Multiframe detector/tracker: optimal performance. IEEE Transactions on Aerospace and Electronic Systems 2001, 37(3):925-945. 10.1109/7.953247CrossRefGoogle Scholar
  7. 7.
    Bruno MGS, Moura JMF: Multiframe Bayesian tracking of cluttered targets with random motion. Proceedings of the International Conference on Image Processing (ICIP '00), September 2000, Vancouver, BC, Canada 3: 90-93.Google Scholar
  8. 8.
    Arulampalam MS, Maskell S, Gordon N, Clapp T: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing 2002, 50(2):174-188. 10.1109/78.978374CrossRefGoogle Scholar
  9. 9.
    Moura JMF, Balram : Recursive structure of noncausal Gauss-Markov random fields. IEEE Transactions on Information Theory 1992, 38(2):334-354. 10.1109/18.119691MATHCrossRefGoogle Scholar
  10. 10.
    Moura JMF, Bruno MGS: DCT/DST and Gauss-Markov fields: conditions for equivalence. IEEE Transactions on Signal Processing 1998, 46(9):2571-2574. 10.1109/78.709549CrossRefGoogle Scholar
  11. 11.
    Moura JMF, Balram N: Noncausal Gauss Markov random fields: parameter structure and estimation. IEEE Transactions on Information Theory 1993, 39(4):1333-1355. 10.1109/18.243450MATHCrossRefGoogle Scholar
  12. 12.
    Schweizer SM, Moura JMF: Hyperspectral imagery: clutter adaptation in anomaly detection. IEEE Transactions on Information Theory 2000, 46(5):1855-1871. 10.1109/18.857796MATHCrossRefGoogle Scholar
  13. 13.
    Isard , Blake : A mixed-state condensation tracker with automatic model-switching. Proceedings of the 6th International Conference on Computer Vision, January 1998, Bombay, India 107-112.Google Scholar
  14. 14.
    Doucet A, Freitas JFG, Gordon N: An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice. Edited by: Doucet A, Freitas NFG, Gordon NJ. Springer, New York, NY, USA; 2001.CrossRefGoogle Scholar
  15. 15.
    Bruno MGS, de Araújo RV, Pavlov AG: Sequential Monte Carlo filtering for multi-aspect detection/tracking. Proceedings of the IEEE Aerospace Conference, March 2005, Big Sky, Mont, USA 2092-2100.Google Scholar
  16. 16.
    Gilks WR, Berzuini C: Following a moving target—Monte Carlo inference for dynamic Bayesian models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2001, 63(1):127-146. 10.1111/1467-9868.00280MATHMathSciNetCrossRefGoogle Scholar
  17. 17.
    Gordon N, Salmond D, Ewing C: Bayesian state estimation for tracking and guidance using the bootstrap filter. Journal of Guidance, Control, and Dynamics 1995, 18(6):1434-1443. 10.2514/3.21565CrossRefGoogle Scholar
  18. 18.
    Robert CP, Casella G: Monte Carlo Statistical Methods, Springer Texts in Statistics. Springer, New York, NY, USA; 1999.CrossRefGoogle Scholar
  19. 19.
    Pitt MK, Shephard N: Filtering via simulation: auxiliary particle filters. Journal of the American Statistical Association 1999, 94(446):590-599. 10.2307/2670179MATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    Isard M, Blake A: Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision 1998, 29(1):5-28. 10.1023/A:1008078328650CrossRefGoogle Scholar
  21. 21.
    Li B, Chellappa R: A generic approach to simultaneous tracking and verification in video. IEEE Transactions on Image Processing 2002, 11(5):530-544. 10.1109/TIP.2002.1006400CrossRefGoogle Scholar
  22. 22.
    Zhou SK, Chellappa R, Moghaddam B: Visual tracking and recognition using appearance-adaptive models in particle filters. IEEE Transactions on Image Processing 2004, 13(11):1491-1506. 10.1109/TIP.2004.836152CrossRefGoogle Scholar
  23. 23.
    Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1977, 39(1):1-38.MATHMathSciNetGoogle Scholar
  24. 24.
    Giebel J, Gavrila DM, Schnörr C: A bayesian framework for multi-cue 3D object tracking. Proceedings of the 8th European Conference on Computer Vision (ECCV '04), May 2004, Prague, Czech Republic 4: 241-252.Google Scholar
  25. 25.
    Vermaak J, Doucet A, Pérez P: Maintaining multi-modality through mixture tracking. Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV '03), October 2003, Nice, France 2: 1110-1116.CrossRefGoogle Scholar
  26. 26.
    Okuma K, Taleghani A, de Freitas N, Little JJ, Lowe DG: A boosted particle filter: multitarget detection and tracking. Proceedings of the 8th European Conference on Computer Vision (ECCV '04), May 2004, Prague, Czech Republic 3021: 28-39.Google Scholar
  27. 27.
    Geweke J: Bayesian inference in econometric models using Monte Carlo integration. Econometrica 1989, 57(6):1317-1339. 10.2307/1913710MATHMathSciNetCrossRefGoogle Scholar
  28. 28.
    Liu JS, Chen R, Logvinenko T: A theoretical framework for sequential importance sampling with resampling. In Sequential Monte Carlo Methods in Practice. Edited by: Doucet A, Freitas JFG, Gordon NJ. Springer, New York, NY, USA; 2001:225-246.CrossRefGoogle Scholar
  29. 29.
    Video Demonstration 1 & 2 http://www.ele.ita.br/~bruno
  30. 30.
    Ng W, Li J, Godsill S, Vermaak J: Tracking variable number of targets using sequential Monte Carlo methods. Proceedings of the IEEE/SP 13th Workshop on Statistical Signal Processing, July 2005, Bordeaux, France 1286-1291.Google Scholar
  31. 31.
    Ng W, Li J, Godsill S, Vermaak J: Multitarget tracking using a new soft-gating approach and sequential Monte Carlo methods. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 4: 1049-1052.Google Scholar
  32. 32.
    Hue C, Le Cadre J-P, Pérez P: Tracking multiple objects with particle filtering. IEEE Transactions on Aerospace and Electronic Systems 2002, 38(3):791-812. 10.1109/TAES.2002.1039400CrossRefGoogle Scholar
  33. 33.
    Czyz J, Ristic B, Macq B: A color-based particle filter for joint detection and tracking of multiple objects. Proceedingd of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 217-220.Google Scholar

Copyright information

© Marcelo G. S. Bruno et al. 2008

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Authors and Affiliations

  • Marcelo G. S. Bruno
    • 1
  • Rafael V. Araújo
    • 1
  • Anton G. Pavlov
    • 1
  1. 1.Instituto Tecnológico de AeronáuticaSão José dos CamposBrazil

Personalised recommendations