Keywords

1 Introduction

In September 8, 2014, Tiantuo-2 (TT-2) designed by National University of Defense Technology independently was successfully launched into orbit, which is the first Chinese interactive earth observation microsatellite using video imaging system. The mass was 67 kg, and the altitude was 490 km. An experiment based on interactive control strategies with human in the loop was carried out to realize continuous tracking and monitoring of moving targets.

Now video satellite has been widely concerned by domestic and foreign researchers, and several video satellites have been launched into orbits, such as Skysat-1 and 2 by Skybox Imaging, TUBSAT series satellites by Technical University of Berlin [15], and the video satellite by Chang Guang Satellite Technology. They can obtain video image with different on-orbit performance.

Increasing space objects have a greater impact on human space activities, and cataloging and monitoring them become a hot issue in the field of space environment [4, 10, 13, 16]. Compared to ground-based observation, space-based observation is not restricted by weather or geographical location, and avoids the disturbance of the atmosphere to the objects signal, which has a unique advantage [4, 10, 19]. To use video satellite to observe space objects is an effective approach.

Target detection and tracking in satellite video image is an important part of space-based observation. For general problem of target detection and tracking in video, Optical-flow [6], block-matching [8], template detection [3, 12] and so on have been proposed. But most of these methods are based on gray-level and enough texture information is highly required [26]. In fact, small target detection in optical images with star background mainly has following difficulties: (1) because the target occupies only one or a few pixels in the image, the shape of the target is not available. (2) Due to background stars and noise introduced by space environment detecting equipment, the target almost submerges in the complex background bright spots, which increases the difficulty of target detection greatly. (3) Attitude motion of the target leads to changing brightness, even losing the target in several frames.

Aiming at these difficulties, many scholars have proposed various algorithms, mainly including Track before Detection (TBD) and Detect before Tracking (DBT). Multistage Hypothesizing Testing (MHT) [1] and dynamic programming based algorithm [9, 18, 25] can be classified into TBD, which is effective when the image Signal to Noise Ratio is very low. But high computation complexity and hard threshold always follow them [14]. Actually, DBT is usually adopted for target detection in star images [2, 7, 22]. A space target detection algorithm in video based on motion information was proposed by [24], which solved the first and second difficulty partly. But the third difficulty remains unsolved and may lead to target losing.

Aiming at the third difficulty, an adaptive space target detector based on prior information is proposed in this paper. Considering the continuity of target motion and brightness change, adaptive thresholding based on local image properties and prior information of previous frames detection and Kalman filter is used to segment a single frame image. Experimental results about video image from TT-2 demonstrate the effectiveness of the algorithm.

2 Characteristic Analysis of Satellite Video Image

Video satellite image of space contains deep space background, stars, targets, and noise introduced by imaging devices and cosmic rays. The mathematical model is given by [20]

$$\begin{aligned} f(x,y,k)=f_{B}(x,y,k)+f_{s}(x,y,k)+f_{T}(x,y,k)+n(x,y,k) \end{aligned}$$
(1)

where \(f_{B}(x,y,k)\) is gray level of deep space background, \(f_{s}(x,y,k)\) is gray level of stars, \(f_{T}(x,y,k)\) is gray level of targets and n(x, y, k) is gray level of noise. (x, y) is pixel coordinate in the image. k is the number of the frame.

In video image, stars and weak small targets occupy only one or a few pixels. It is difficult to distinguish the target from background stars by morphological characteristics or photometric features. Besides, attitude motion of the target leads to changing brightness, even losing the target in several frames. Therefore its almost impossible to detect the target in a single frame image, and necessary to use the continuity of target motion and brightness change in multi-frame images.

Figure 1 is local image of video from TT-2, where (a) is star and (b) is target (debris). Its impossible to distinguish them by morphological characteristics or photometric features.

Fig. 1.
figure 1

Images of star and target

3 Target Detection via Prior Information

When attitude of video satellite is stabilized, background stars are moving extremely slowly in the video and can be considered static in several frames. At the same time, noise is random (dead pixels appear in a fixed position) and only targets are moving continuously. This is the most important distinction on motion characteristics of their image. Because of platform jitter, targets cant be detected by simple frame difference method. An adaptive space target detector based on prior information is proposed. The prior information comes from previous frames detection and Kalman filter. The procedure of the target detector is shown in Fig. 2.

Fig. 2.
figure 2

Procedure of target detector

3.1 Image Denoising

Noise in video image mainly includes the space radiation noise, the space background noise, and the CCD dark current noise and so on. Bilateral filter can be used to denoise, which is a simple, non-iterative scheme for edge-preserving smoothing [17]. The weights of the filter have two components, the first of which is the same weighting used by the Gaussian filter. The second component takes into account the difference in intensity between the neighboring pixels and the evaluated one. The diameter of the filter is set to 5, and weights are given by

$$\begin{aligned} w_{ij}=\frac{1}{K}\exp ^{-(x_i-x_c)^2+(y_i-y_c)^2/2\sigma _{s}^{2}}\exp ^{-(f(x_i,y_i)-f(x_c,y_c))^2/2\sigma _{r}^{2}} \end{aligned}$$
(2)

where K is the normalization constant, \((x_c,y_c)\) is the center of the filter, f(x, y) is the gray level at (x, y), \(\sigma _s\) is set to 10 and \(\sigma _r\) is set to 75.

3.2 Single Frame Binary Segmentation

In order to distinguish between stars and targets, it is necessary to remove the background in each frame image, but stray light leads to uneven gray level distribution of background. Figure 3(a) is a single frame of video image and Fig. 3(b) is its gray level histogram. The single peak shape of the gray histogram shows that classical global threshold method, which is traditionally used to segment star image [5, 20,21,22,23, 25], cant be used to segment the video image.

Fig. 3.
figure 3

Single frame image and its gray level histogram

Whether it is a star or a target, its gray level is greater than the pixels in its neighborhood. Consider using variable thresholding based on local image properties to segment image. Calculate standard deviation \(\sigma _{xy}\) and mean value \(m_{xy}\) for the neighborhood of every point (x, y) in image, which are descriptors of the local contrast and average gray level. Then the variable thresholding based on local contrast and average gray level is given by

$$\begin{aligned} T_{xy}=a\sigma _{xy}+bm_{xy} \end{aligned}$$
(3)

where a and b are constant and greater than 0. b is the contribution of local average gray level to the thresholding and can be set to 1. a is the contribution of local contrast to the thresholding and is the main parameter to set according to the target characteristic.

But on the other hand, for space moving targets, their brightness is sometimes changing according to the changing attitude. If a was set to be constant, the target would be lost in some frames. Considering the continuity of target motion, if in kth frame (x, y) is detected as probable target coordinate, the target detection probability is much greater in the \(5\times 5\) window of (x, y) in \((k+1)\)th frame. Integrated with the continuity of target brightness change, if there is no probable target detected in the \(5\times 5\) window of (x, y) in \((k+1)\)th frame, \(a_k\) can be reduced with a factor, i.e. \(a_k \rho (\rho <1)\). So the adaptive thresholding based on local image properties and prior information of previous frames detection and Kalman filter is given by

$$\begin{aligned} \begin{array}{l} T_k (x,y)=a_k(x,y)\sigma _{xy}+bm_{xy}\\ a_k(x,y)=a_{k-1}(x,y)\rho ^{P(x,y,k|k-1)} \end{array} \end{aligned}$$
(4)

where \(P(x,y,k|k-1)\) is the probability that (x, y) in kth frame is in the \(5\times 5\) window of the predicted coordinate of probable target detected in \((k-1)\)th frame, equals 0 or 1. The predicted coordinate is derived from Kalman filter, as shown in next section.

The difference between Eqs. 3 and 4 is the adaptive coefficient \(a_k(x,y)\) based on prior information of previous frames detection and Kalman filter. a is initially set to be a little greater than 1 and reset to the initial value when the gray level at (x, y) becomes large again (greater than 150).

Image binary segmentation algorithm is given by

$$\begin{aligned} g(x,y) = \left\{ \begin{array}{ll} 1 &{} \quad \text {if}\, f(x,y)>T_{xy}\\ 0 &{} \quad \text {if}\, f(x,y)\le T_{xy} \end{array} \right. \end{aligned}$$
(5)

where f(x, y) is gray level of original image at (x, y) and g(x, y) is gray level of segmented image at (x, y).

3.3 Coordinate Extraction

In the ideal optical system, the point target in the CCD focal plane occupies one pixel, but in practical imaging condition, circular aperture diffraction results in the target in the focal plane diffusing multi-pixel. So the coordinate of the target in the pixel frame is determined by the position of the center of gray level. Simple gray weighted centroid algorithm is used to calculate the coordinates, with positioning accuracy up to 0.1–0.3 pixels [11], by

$$\begin{aligned} (x_S,y_S)=\frac{\sum \limits _{(x,y)\in S} f(x,y)\cdot (x,y)}{\sum \limits _{(x,y)\in S} f(x,y)} \end{aligned}$$
(6)

where S is target area after segmentation, f(x, y) is gray level of original image at (x, y), and \((x_S,y_S)\) is the coordinate of S.

Kalman filter is an efficient recursive filter that estimates the internal state of a linear dynamic system from a series of noisy measurements, which can be used to predict probable targets coordinate in next frame.

Assuming the state vector of the target at the kth frame is \(\mathbf {x}_k=(x_k,y_k,vx_k,vy_k)^T\), i.e. its coordinate and velocity (unit: pixel/\(\varDelta t\)) in pixel frame, and the system equation is

$$\begin{aligned} \left\{ \begin{array}{l} \mathbf {x}_k=F\mathbf {x}_{k-1}+\mathbf {w}=\left[ \begin{array}{cccc} 1 &{} 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 \end{array}\right] \mathbf {x}_{k-1}+\mathbf {w}\\ \quad \\ \mathbf {z}_k=H\mathbf {x}_k+\mathbf {v}=\left[ \begin{array}{cccc} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \end{array}\right] \mathbf {x}_{k}+\mathbf {v} \end{array} \right. \end{aligned}$$
(7)

where F is the state transition matrix, H is the measurement matrix, \(\mathbf {z}_k\) is the measurement vector, i.e. the coordinate at the kth frame, \(\mathbf {w}\) is the process noise which is assumed to be zero mean Gaussian white noise with covariance Q, denoted as \(\mathbf {w}\sim N(0, Q)\), and \(\mathbf {v}\) is the measurement noise which is assumed to be zero mean Gaussian white noise with covariance R, denoted as \(\mathbf {v}\sim N(0, R)\).

Let \(\hat{\mathbf {x}}_{k|l}\) be the estimate of \(\mathbf {x}_k\) given measurements up to and including at the lth frame, where \(l\le k\), \(P_{k|k}\) be the posteriori error covariance matrix to measure the estimated accuracy of the state estimate. Then \(\hat{\mathbf {x}}_{k|k}\) and \(P_{k|k}\) represent the state of the filter. The procedure of Kalman filter is as following:

  • Initialization. Initialize \(\hat{\mathbf {x}}_{0|0}\) and \(P_{0|0}\). For \(\hat{\mathbf {x}}_{0|0}=(x_{0|0},y_{0|0},vx_{0|0},vy_{0|0})^T\), \(x_{0|0},y_{0|0}\) are obtained by single frame binary segmentation and gray weighted centroid algorithm, and \(vx_{0|0},vy_{0|0}\) is set to zero.

  • Prediction. The prediction phase uses the state estimate from the previous frame to produce an estimate of the state at the current frame. The predicted coordinates will be used in single frame binary segmentation and trajectory association.

    $$\begin{aligned} \begin{array}{l} \hat{\mathbf {x}}_{k|k-1}=F\hat{\mathbf {x}}_{k-1|k-1}\\ P_{k|k-1}=FP_{k-1|k-1}F^T+Q \end{array} \end{aligned}$$
    (8)
  • Update. The update phase combined the prediction with the current measurement to refine the state estimate.

    $$\begin{aligned} \begin{array}{l} K_k=P_{k|k-1}H^T(HP_{k|k-1}H^T+R)^{-1}\\ \hat{\mathbf {x}}_{k|k}=\hat{\mathbf {x}}_{k|k-1}+K_k(\mathbf {z}_k-H\hat{\mathbf {x}}_{k|k-1})\\ P_{k|k}=(I-K_k H)P_{k|k-1} \end{array} \end{aligned}$$
    (9)

    where \(K_k\) is an intermediate variable called the Kalman gain.

3.4 Trajectory Association

After processing a frame image, some potential target coordinates are obtained. Associate them with existing trajectories, or generate new trajectories, by nearest neighborhood filter. The radius of the neighborhood is determined by target characteristic, which needs to be greater than the moving distance of target image in a frame and endure losing the target in several frames.

When a trajectory has 20 points, need to judge whether its a target. Velocity in the state vector is used. If the norm of sum of the velocity of the points in the trajectory is greater than a given thresholding, its a target; otherwise its not. That is to judge whether the point is moving. The thresholding here is mainly to remove image motion caused by the instability of the satellite platform and other noise. The thresholding is usually set to 2.

Thus targets are detected based on the continuity of motion in multi frame and trajectories are updated.

4 Experimental Results

The adaptive space target detector is verified using video image from TT-2. TT-2 has 4 space video sensors and video image used in this paper is from the high-resolution camera, whose focal length is 1000 mm, FOV is \(2^{\circ }30'\), and \(d_x=d_y=8.33\, \upmu \)m. Video image is 25 frames per second and the resolution is \(960\times 576\).

Continuous 1000 frame images taken in attitude stabilization are processed. In Eq. 4, \(a_{xy}\) is initially set to 1.1 and b is set to 1. The reduced factor is set to 0.8. The neighborhood in image segmentation is set to \(7\times 7\). In Eq. 7, Q is set to \(10^{-4}I_4\) and R is set to \(0.2I_2\), where \(I_n\) is the \(n\times n\) identity matrix. In the initialization of Kalman filter, \(P_{0|0}\) is set to \(\left[ \begin{array}{cccc} 0.2 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0.2 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \\ \end{array} \right] \). The radius of neighborhood in trajectory association is set to 5.

In Fig. 4, 3 target areas are obtained after segmenting the 9th frame (as shown in (a)) and 5 target areas are obtained after segmenting the 30th frame (as shown in (b)). These areas include targets, stars and noise, which cant be distinguished in a single frame. Figure 4 also shows the brightness change of the target.

Fig. 4.
figure 4

Result of segmenting the 9th and 30th frame image

Figure 5 is derived by overlaying the 1000 frame images. The trajectory of the target can be found in the white box of Fig. 5. Figure 5 shows that the brightness of the target varies considerably.

Fig. 5.
figure 5

Overlay of the 1000 frame images

The adaptive space target detector detects the target well as shown in Fig. 6. For image of \(960\times 576\) is too large, Fig. 6 is interested in local image and a white box is used to identify the target every 50 frames. Figure 7 gives the trajectory of the target.

Fig. 6.
figure 6

Detecting the target in the image

Fig. 7.
figure 7

Trajectory of the target

Besides, for these 1000 frames, if \(a_{xy}\) in Eq. 4 is set to constant, the target will be lost in 421 frames whereas Eq. 4 with adaptive coefficient detected the target in 947 frames. Probability of detection improved a lot.

Another example is to use the adaptive space target detector to process an overexposed video of 187 frames taken in attitude stabilization. Overlaying the 187 frame images gives Fig. 8. The trajectory of the target in the black box of Fig. 8 is easily neglected by naked eyes.

Fig. 8.
figure 8

Overlay of the 187 frame overexposed images

The adaptive space target detector detects the target well as shown in Fig. 9. Figure 9 are local images of the 30th, 60th, 90th, 120th, 150th, and 180th frame and a white box is used to identify the target at each frame. Figure 10 gives the trajectory of the target.

Fig. 9.
figure 9

Detecting the target in the overexposed images

Fig. 10.
figure 10

Trajectory of the target in an overexposed video

5 Conclusion

In this paper an adaptive space target detector in video satellite image based on prior information is proposed. Considering the continuity of target motion and brightness change, adaptive thresholding based on local image properties and prior information of previous frames detection and Kalman filter is used to segment a single frame image. Then the algorithm uses the correlation of target motion in multi-frame to detect the target. Experimental results about video image from TT-2 demonstrate the effectiveness of the algorithm.