Keywords

1 Introduction

With the prevalence of infrared cameras and infrared sensors, the infrared video plays an increasing important role on human life and production. For example, the infrared videos about the wild animal activity which were obtained from infrared video surveillance equipments have brought great convenience for the wild animal researchers. When the target and background brightness have not a distinct difference in the infrared video sequences or people’s some need, it is very necessary to separate the moving object from the backgrounds. How to effectively extract the moving target in an infrared video is a problem worthy of studying.

This paper aims at developing an effective scheme to extract the moving objects in infrared video sequences. Motivated by the regularization models proposed in [1, 2] for other applications, we take a similar regularization approach for moving objects extraction from background.

2 Detailed Algorithm

In this section, we present the algorithm in details. First, each frame of the infrared video sequences is reformed to a column, then combining all columns to a new matrix \( D \in {\mathbb{R}}^{m \times n} \). Because the video content is composed of the background and moving objects, the matrix D can be represented as the sum of background and moving objects. If we represent the background component as \( A \in {\mathbb{R}}^{m \times n} \) and represent moving object component as \( E \in {\mathbb{R}}^{m \times n} \), then the matrix D can be expressed as

$$ D = A + E. $$
(1)

In video sequences, the adjacent frames have most of the same background information, especially in the video with high frame rate. Thus, matrix A has many same columns and it is a low-rank matrix. Because the size of the moving objects in each frame is far less than the size of frame, the number of nonzero is far less than the element number in matrix E. Thus, matrix E is a sparse matrix. Based on these observations, if we can accurately decompose the matrix D into the sum of a low-rank matrix and a sparse matrix, the moving objects can be extracted from background.

2.1 Notation

Before presenting the details of decomposing D into a low-rank matrix A and a sparse matrix E, we first define some notations for the simplicity of discussions. The L1 norm and the Frobenius norm of a matrix \( X \in {\mathbb{R}}^{{n_{1} \times n_{2} }} \) are defined by:

$$ \left\| X \right\|_{1} = \sum\limits_{i = 1}^{{n_{1} }} {\sum\limits_{j = 1}^{{n_{2} }} {\left| {x_{i,j} } \right|} } \;{\text{and}}\;\left\| X \right\|_{F} = (\sum\limits_{i = 1}^{{n_{1} }} {\sum\limits_{j = 1}^{{n_{2} }} {\left| {x_{i,j} } \right|^{2} } } )^{1/2} , $$
(2)

respectively. Where \( x_{i,j} \) is the \( (i,j) \)-th element of X. Assuming that r is the rank of X, the singular value decomposition of X is then defined by

$$ X = U\sum V^{T} ,\quad \sum = diag(\{ \sigma_{i} \}_{1 \le i \le r} ). $$
(3)

Where U and V are \( n_{1} \times r \) and \( n_{2} \times r \) matrices with orthonormal columns respectively. The nuclear norm of X is defined as the sum of singular values, i.e.

$$ \left| X \right|_{ * } = \sum\limits_{i = 1}^{r} {\left| {\sigma_{i} } \right|} . $$
(4)

The shrinkage operator \( S_{\tau } :\,{\mathbb{R}} \to {\mathbb{R}} \) is defined by

$$ S_{\tau } (x) = \text{sgn} (x)\hbox{max} (\left| x \right| - \tau ,0). $$
(5)

Where \( \tau \ge 0 \). When \( S_{\tau } \) is extended to matrices by applying it element-wise.

The singular shrinkage operator \( D_{\tau } (x) \) is defined [3] by

$$ D_{\tau } = US_{\tau } (\Sigma )V^{T} . $$
(6)

It is noted that \( S_{\tau } (X) \) and \( D_{\tau } (x) \) are the solutions of the following two minimization problems respectively

$$ \mathop {\hbox{min} }\limits_{Y} \tau \left\| Y \right\|_{1} + \frac{1}{2}\left\| {Y - X} \right\|_{F}^{2} ,\;\mathop {\hbox{min} }\limits_{Y} \tau \left\| Y \right\|_{ * } + \frac{1}{2}\left\| {Y - X} \right\|_{F}^{2} . $$
(7)

2.2 Sparse and Low-Rank Decomposing

In order to exactly extract the sparse matrix E and low-rank matrix A, we can solve the following minimization problem to estimate A and E:

$$ \mathop {\hbox{min} }\limits_{{A,E \in {\mathbb{R}}^{{n_{1} \times n_{2} }} }} rank(A) + \lambda \left\| E \right\|_{{_{0} }} \;{\text{s}} . {\text{t}} .\;D = A + E. $$
(8)

Where \( \lambda \) is a suitable regularization parameter. \( rank( \cdot ) \) denotes the rank for a matrix. \( \left\| \cdot \right\|_{{_{0} }} \) denotes the pseudo-norm that counts the number of non-zeros.

The minimization problem (8) is a non-convex problem. In general, it is very hard to solve. Referring to the approaches in [4, 5], we try to solve the follow minimization to estimate A and E.

$$ \mathop {\hbox{min} }\limits_{{A,E \in {\mathbb{R}}^{m \times n} }} \left\| A \right\|_{ * } + \lambda \left\| E \right\|_{1} \;{\text{s}} . {\text{t}} .\;D = A + E. $$
(9)

Where \( \left\| \cdot \right\|_{1} \) is the element-wise sum of absolute values for a matrix.

The minimization model (9) above has been proposed in [1, 2] to extract low-dimensional structure from a data matrix. It could be viewed as a replacement of the Principal Component Analysis (PCA) method. The minimization approaches is termed as Principal Component Pursuit (PCP) for solving the problem of background subtraction in video surveillance. In their approach, the observed video matrix (array of image frames) is decomposed into the low-rank matrix structure (static background) and the sparse matrix structure (moving objects).

In our approach, we convert the minimization question (9) to an augmented Lagrange multiplier form:

$$ \mathop {\hbox{min} }\limits_{{A,E \in {\mathbb{R}}^{m \times n} }} \left\| A \right\|_{ * } + \lambda \left\| E \right\|_{1} + \frac{1}{2\mu }\left\| {D - A - E} \right\|_{F}^{2} . $$
(10)

Here, the value of \( \lambda \) is set the same as [1] suggested:

$$ \lambda { = }{1 \mathord{\left/ {\vphantom {1 {\sqrt {\hbox{max} (m,n)} }}} \right. \kern-0pt} {\sqrt {\hbox{max} (m,n)} }}. $$
(11)

Where m, n are the number of rows and columns of the matrix D.

In recent years, there are some good methods on how to efficiently solve L1 norm related minimization problem. One of them is the accelerated proximal gradient (APG) method, which shows a very good performance on solving L1 norm and nuclear norm related minimization problems (e.g. [69]). Another promising approach is the ADMM (alternating directions method of multipliers) which also can efficiently solve such problems (e.g. [1012]). In our approach, we used the APG method to solve the minimization problem (10).

The general APG method aims at solving the following minimization problem:

$$ \mathop {\hbox{min} }\limits_{X} \quad g(X) + f(X) $$
(12)

Where g is a non-smooth function, f is a smooth function. Algorithm 1 describes the specific scheme of APG.

Based on the APG method, the minimization problem (10) can be converted to (12) by setting

$$ \left\{ {\begin{array}{*{20}c} {X = (A,E)} \\ {g(X) = \mu \left\| A \right\|_{ * } + \lambda \mu \left\| E \right\|_{1} } \\ {f(X) = \frac{1}{2}\left\| {D - A - E} \right\|_{F}^{2} } \\ \end{array} } \right.. $$
(13)

When applying Algorithm 1 to solve the (10), the minimization problem in Step 4 of Algorithm 1 becomes (noticing L f  = 2 in our case)

$$ \mathop {\hbox{min} }\limits_{A,E} \begin{array}{*{20}c} {} \\ \end{array} \mu \left\| A \right\|_{ * } + \lambda \mu \left\| E \right\|_{1} + \left\| {A - G_{k}^{A} } \right\|_{F}^{2} + \left\| {E - G_{k}^{E} } \right\|_{F}^{2} . $$
(14)

Since A and E are separable in the above minimization, their solutions can be obtained separately by applying singular value shrinkage operator on \( G_{k}^{A} \) and soft shrinkage operator on \( G_{k}^{E} \), i.e. \( A_{k + 1} = D_{{{\mu \mathord{\left/ {\vphantom {\mu 2}} \right. \kern-0pt} 2}}} (G_{k}^{A} ) \), \( E_{k + 1} = S_{{{{\lambda \mu } \mathord{\left/ {\vphantom {{\lambda \mu } 2}} \right. \kern-0pt} 2}}} (G_{k}^{E} ) \).

The detailed algorithm for solving the minimization problem (10) is described in Algorithm 2.

After the low-rank matrix A and the sparse matrix E are obtained by Algorithm 2, the low-rank matrix A and the sparse matrix E will be reformed to the format of the original infrared video sequences.

3 Experimental Results and Analysis

In this section, we evaluate the performance of the proposed method on three infrared video sequences “irw1”, “irw2” and “plane”. In order to facilitate the evaluation, our algorithm is compared with the inexact augmented Lagrange multipliers (ALM) algorithm [11] for its high efficiency in solving minimization problems. For a fair comparison, in each algorithm, the error tolerance \( \varepsilon \) is set to \( 1.0 \times 10^{ - 7} \) and the maximal iterations number \( K \) is set to 1000. 30 frames of each infrared video sequence were input to two algorithms in experiments. The sizes of each frame of the infrared video “irw1”, “irw2” and “plane” are 240 × 320, 240 × 320 and 200 × 256 respectively. All the experiments are performed on a desktop computer (CPU 2.30 GHz, RAM 3.25 GB) with the MATLAB R2012b software. Figures 1, 3 and 5 show the results of extracted objects and background in three infrared videos by the Algorithm 2. Figures 2, 4 and 6 show the results of extracted objects and background by the ALM algorithm. The performance of two algorithms in terms of the runtime, iteration number and the rank of the extracted low-rank matrix A are listed in Table 1.

Fig. 1.
figure 1

Extracting result of background and object from “irw1” by Algorithm 2

Fig. 2.
figure 2

Extracting result of background and object from “irw1” by ALM algorithm

Table 1. Comparison of the results of extracting objects in different infrared video sequence by two algorithms.

From above figures, it can be seen that, no matter big or small, quick or slow, the moving objects can be completely extracted by the Algorithm 2. In Fig. 2, one foot of the man had not been extracted to the moving object opponent by ALM algorithm. From Figs. 3 and 4, we can find that, for the small object plane, the extracted plane has clear edge by Algorithm 2 than that by ALM algorithm. From Figs. 5 and 6, it can be seen that, for the slow moving man, partial contour of the man was not extracted to the object opponent by ALM algorithm. As can be seen from Table 1, compared to ALM algorithm, Algorithm 2 has the following distinct advantages: the rank of the recovered background more lower, running time more less and fewer iteration number to reach convergence. These advantages for rapid analysis and process large amounts of infrared video data is important.

Fig. 3.
figure 3

Extracting result of background and object from “plane” by Algorithm 2

Fig. 4.
figure 4

Extracting result of background and object from “plane” by ALM algorithm

Fig. 5.
figure 5

Extracting result of background and object from “irw2” by Algorithm 2

Fig. 6.
figure 6

Extracting result of background and object from “irw2” by ALM algorithm

In order to verify the validity of the proposed algorithm for optical videos, Figs. 7 and 8 show the results of extracted object and background in an optical video “highway” by Algorithm 2 and ALM algorithm respectively.

Fig. 7.
figure 7

Extracting result of background and object from optical video “highway” by Algorithm 2

Fig. 8.
figure 8

Extracting result of background and object from optical video “highway” by ALM algorithm

From Figs. 7 and 8, it can be seen that, the two algorithms are still able to extract the moving objects in an optical video. The extracted backgrounds by two algorithms have no obvious difference from the visual point of view, but there are more car tracks which belong to the background in Fig. 8(b) than that in Fig. 7(c).

4 Conclusions

In this paper, we presented a scheme to extract the moving objects from infrared video sequence. We convert the problem of extracting the moving object from videos to a sparse and low-rank matrix decomposition problem. The resulting L1 norm related minimization problem can also be efficiently solved by many recently developed numerical methods. The effectiveness of our proposed algorithm is also validated to other types of video (e.g., optical videos). The experiments show that, compared to ALM algorithm, our algorithm has distinct advantages in extracting moving object from infrared videos and optical videos.