Keywords

1 Introduction

The use of digital images and video sequences has greatly increased recently because of the rapid growth of the Internet and multimedia systems. It is often necessary to identify a certain image in a database that has a large number of images in various types of the applications of digital images/videos, such as security and evaluation of image validity. The image database generally consists of images in a compressed form to reduce the amount of data. Several international standards for searching for images/videos have been developed [14] in connection to this. “Identification” in this work is defined as the operation of finding an image that is identical to a given original image from an image database. In this paper, a robust scheme for identifying JPEG XR coded images is proposed.

JPEG XR is an image coding standard from the JPEG committee [5, 6]. It allows lossy and lossless coding for still images and videos. It supports not only images with 8 bits but also images with over 8 bits and floating point representation. Thus, it can support various kinds of images including high dynamic range(HDR) images [79] for a new generation of digital cameras. Therefore the proposed scheme is widely available for identifying many kinds of images.

So far, several schemes have been developed for identifying compressed images [1019]. The schemes described in [1317] are for the JPEG standard, and the schemes in [1719] are for the JPEG 2000 standard, where some properties of transform coefficients i.e. DCT(Discrete Cosine Transform) and DWT(Discrete Wavelet Transform) coefficients, play an important role for image identification. In addition, they have been extended to image identification schemes in the encrypted domain securely to operate images/videos [20, 21]. However, there is still none for the JPEG XR standard. Moreover the previous schemes are not available for JPEG XR images, because JPEG XR is the only image coding standard that uses a lapped biorthogonal transform(LBT), which is different from DCT and DWT [22].

Because of this situation, a scheme for identifying JPEG XR coded images is considered in this paper. The aim of the proposed scheme is to identify JPEG XR images that are generated from the same original image under various compression ratios. The proposed scheme does not produce false negative matches in any compression ratio. A new property of the positive and negative signs of LBT coefficients is utilized to identify the images. The experimental results shows the proposed scheme is effective for not only still images, but also video sequences in terms of the retrieval performance such as false positive, false negative and true positive matches.

Fig. 1.
figure 1

Process of image identification

2 Background

2.1 Image Identification Model

Let us consider that there are two or more compressed images, which have different or the same compression ratios. Those images are originated from the same image and compressed by the same compression method. In this paper, the identification of those images is referred to as image identification. In other words, if the images do not originate from the same image, or are not compressed by the same compression method, they are unidentifiable from each other.

A simplified model of the image identification system is shown in Fig. 1. The system consists of three components: namely, a client (user), an image identifier, and a database. The database may contain various types of data, such as compressed images, parameter (feature) of the images, and image information (metadata). An identification is initialized by the user, by sending a query, which can be any kind of the data mentioned above to the image identifier. Then the image identifier checks the availability of the query in the database. Afterwards, if the query information is available, it can be directly sent or confirmed to the user. This paper focuses on querying some properties of JPEG XR images.

2.2 Applications

There are numerous applications for the previously mentioned identification model. Some examples are described in the following.

  • a. Security

    In a compressed image environment, it is important to identify any alterations in image caused by disturbances or alterations other than the compression itself. For instance, identifying the presence of malicious attacks, such as intentional cropping, or the addition or removal of objects.

  • b. Detection of Errors in Images

    In image and video communications, a slight quality degradation due to compression noise is commonly accepted. However, the image quality degradation due to other causes, such as transmission and decoding errors, are usually unacceptable. A method to identify those errors in a fast and automatic way is required in such applications.

  • c. Evaluation of Image Validity

    Let us consider two images of the same scene, for example: chest X-ray images of two patients. Those images may have been labelled by name, date, or content description. However, this approach is very sensitive to human error, such as mislabelling. The mislabelled images can cause a misdiagnosis, which in turn could threaten a patient’s life. Therefore, a more efficient and save method to guarantee the image validity is required.

  • d. Image Information Retrieval

    In addition to image querying to obtain identical image, image querying to obtain image information (metadata) is comparably important. For the images, the metadata may include: photographer’s name, image format, and date and time. The digital library is one area where metadata identification is important.

2.3 JPEG XR

JPEG XR is an image coding standard from the JPEG committee. It allows lossy and lossless coding for still images and videos. It supports not only fixed point representation but also floating point representation. Thus, it can support various kinds of images including HDR images for a new generation of digital cameras.

The block diagram of JPEG XR encoding is illustrated in Fig. 2. JPEG XR is based on a block transform design, and it uses some of the same high level building blocks as in most image compression schemes, such as color conversion, spatial transformation, scalar quantization, coefficient scanning, and entropy coding. The encoding consists of the following basic steps:

  1. (1)

    Performing a color conversion.

  2. (2)

    Dividing an image into non-overlapped consecutive \(16\times 16\) blocks, called macro block, and then each macro block into consecutive \(4\times 4\) blocks, called block (see Fig. 3(a)).

  3. (3)

    Applying two basic operators i.e. core transform and optional overlap filtering to the blocks, where the operators are hierarchically executed twice shown in Fig. 3(b).

  4. (4)

    Applying a coefficient quantization approach controlled by quantization parameters (QPs).

  5. (5)

    Executing adaptive coefficient scanning to convert the two-dimensional array transform coefficients within a block into a one-dimensional vector to be encoded. Finally, the coefficients are entropy encoded.

In step (3), one temporally DC coefficient and 15 HP coefficients are obtained for each block by the 1st-level core transform, and 16 temporally DC coefficients are gathered from each macro block as shown in Fig. 3(b). The 2nd-level core transform is then applied to them. As a result, one DC coefficient, 15 LP coefficients and \(15\times 16\) HP coefficients are calculated for each macro block, where core transform, referred to as lapped biorthogonal transform (LBT), is common between two levels. Therefore, the transform coefficients are often called LBT coefficients, which consist of DC, LP and HP ones.

The overlap filtering may be used to reduce blocking artifacts. JPEG XR has three overlapping-modes. When mode 0 is chosen, no overlap filtering is performed. Otherwise, only the 1st-level overlap filtering is performed for mode 1, and both filtering operations are done for mode 2.

Fig. 2.
figure 2

Basic block diagram of JPEG XR encoding (* There are three modes)

Fig. 3.
figure 3

Lapped biorthogonal transform used in JPEG XR

3 Proposed Identification Scheme

The aim of the proposed scheme is to identify JPEG XR images that are generated from the same original image under various compression ratios. The proposed scheme does not produce false negative matches in any compression ratio. A new property of the positive and negative signs of LBT coefficients is utilized to identify the images.

3.1 Notation and Terminologies

Several notations and terminologies used in the following sections are listed here.

  • x represents an image. x can be “Q” for image Q, “D” for image D and “O” for the original image, where all images have the same size.

  • B represents the number of blocks in an image.

  • M represents the number of macroblocks in an image.

  • N represents the number of coefficients in a \(4\times 4\) core transform, and the number of blocks in a macroblock, where \(N=16\).

  • \(DC_x(m)\) represents the DC coefficient of the \(m^{\text{ th }}\) macroblock in image x, where \(0\le m < M\).

  • \(LP_x(m, n)\) represents the \(n^{\text{ th }}\) LP coefficient of the \(m^{\text{ th }}\) macroblock in image x, where \(0\le m < M, 1\le n < N\).

  • \(HP_x(b, n)\) represents the \(n^{\text{ th }}\) HP coefficient of the \(b^{\text{ th }}\) block in image x, where \(0\le b < B, 1\le n < N\).

  • P represents the number of all coefficients in an image, where \(P = MN + B(N-1)\).

  • \(\text{ sgn }(c)\) represents the sign of a real value c as

    $$\begin{aligned} \text{ sgn }(c) = \left\{ \begin{array}{ll} -1, &{} c < 0\;,\\ 0, &{} c = 0\;,\\ 1, &{} c > 0\;. \end{array}\right. \end{aligned}$$
    (1)
  • \(C_x(k)\) represents LBT coefficients sequence given by

    (2)

    where \(\text{ mod }(x, d)\) denotes the remainder when x is divided by d, and \(\lfloor x\rfloor \) denotes the integer part of x. The length of \(C_x(k)\) is \(P = MN+B(N-1), \) (see Fig. 4).

Fig. 4.
figure 4

LBT coefficients sequence \(C_x(k)\)

Fig. 5.
figure 5

Examples of LBT (DC and LP) coefficients. Image \(D_1\) has the same signs as image Q except for zero-value coefficients

3.2 Identification Scheme

The proposed scheme focuses on the positive and negative signs of LBT coefficients, which can be obtained by entropy-decoding from JPEG XR bit streams. It is verified that quantized LBT coefficients have the following property.

  • When images Q and \(D_i\) are generated from the same original image O, the positive and negative signs of LBT coefficients of the two images are equivalent in the corresponding location, even though quantization parameters (QPs) are different. Namely, the relation is given as

    $$\begin{aligned} \text{ sgn }(C_Q(k)) = \text{ sgn }(C_{D_i}(k)), (0\le k < P)\;, \end{aligned}$$
    (3)

    where this property does not apply in zero-value coefficients.

The above property, which can be theoretically explained, is illustrated in Fig. 5. Figure 5(a) and (b) are examples of quantized LBT coefficients of images Q and \(D_1\) that are generated from the same original image O. It is confirmed that the positive and negative signs of LBT coefficients of the two images are equivalent in the corresponding location, except for the case in zero-value coefficients. On the other hand, image \(D_2\) in Fig. 5(c) that is generated from the other original image, does not have the same signs as those in Fig. 5(a). In this manner, there is no guarantee that two images generated from different original images have the same signs. Note that the number of zero-value coefficients depend on quantization parameters (QPs).

Let us define image Q as a JPEG XR coded image that is given by user (a query image) and image \(D_i\) is a JPEG XR image that is given from a database \({\varvec{D}}\), where \(D_i \in {\varvec{D}}\) (see Fig. 6). The positive and negative signs of the quantized LBT coefficients of the images Q and \(D_i\) in the corresponding locations are compared, and the results are used to decide whether the images are compressed from the same original image.

Fig. 6.
figure 6

Image identification for JPEG XR images

When compressed image Q and image \(D_i\) (\(i = 1, 2, \cdots \)) are compared, the identification algorithm is accomplished according to the following steps.

  1. (a)

    Set the value of L, where L is the number of LBT coefficients used for identification (\(1 \le L \le P\)).

  2. (b)

    Set \(k := 0\).

  3. (c)

    For the \(k^{\text{ th }}\) coefficients A, extract the positive and negative signs. If \(\text{ sgn }(C_A(k)) = 0\), proceed to step (e).

  4. (d)

    If \(\text{ sgn }(C_Q(k)) \ne \text{ sgn }(C_{D_i}(k))\), the algorithm decides that image Q and \(D_i\) were not compressed from the same original image, and the process is halted. Otherwise, proceed to step (e)

  5. (e)

    Set \(k := k + 1\).

  6. (f)

    If \(k = L\), it is decided that image Q has the same original image as image \(D_i\). Otherwise, continue to step (c).

When \(L = M\) is chosen, only DC coefficients are used for identification. Otherwise, DC and LP coefficients are used for \(L = MN\), and all LBT coefficients are done for \(L = P\), respectively.

4 Simulation

To evaluate the performance of the proposed scheme, several simulations are conducted.

4.1 Simulation Conditions

The simulation conditions are presented in Table 1. Two still images with \(8\times 3\) bpp(bit per pixel), two still HDR images with the OpenEXR format (\(16\times 4\) bpp) [23] and three video sequences, i.e. “Mobile”, “Flower” and “Deadline” were used in the simulation (Fig. 7). “Mobile” and “Flower” are in a class of images with large object movements between subsequent frames. “Deadline” is vice versa. All images were compressed with 9 different quantization parameters (QP). In the following section, for example, “Mobile” frame No.5 with \(QP=10\) will be referred to as “Mobile5-10”. The JPEG XR reference software 1.8 [24] was used in the simulation. The simulation was run on a PC, with a 2.7 GHz processor and a main memory of 16 Gbytes.

Table 1. Simulation conditions
Fig. 7.
figure 7

Images and videos used in simulation

Table 2. Querying results for still images

4.2 Evaluation for Still Images

Four still images including HDR ones were compressed with nine different quantization parameters (QPs) shown in Table 1 to generate 36 compressed images, of which four images with \(QP=50\) were in the database \({\varvec{D}}\), and \(4\times 9=36\) compressed images were used as a query image. The original uncompressed versions were not included in the simulation. Identification was accomplished by querying a compressed image.

Querying results for still images are shown in Table 2. Table 2 summarizes the number of true-positive (TP), true-negative (TN), false-positive (FP) and false-negative (FN) matches. Besides, the table shows the false-positive-rate (FPR) and true-positive-rate (TPR) [25], defined by

$$\begin{aligned} FPR = FP / (FP + TN)\;,\end{aligned}$$
(4)
$$\begin{aligned} TPR = TP / (TP + FN)\;. \end{aligned}$$
(5)

Moreover, the \(F_1\)-score (\(F_1\)) [25] is known to be one measure used in the field of information retrieval for measuring the performance of search, document classication, and query classification. A higher \(F_1\)-score means better performance. The value \(F_1\) is given by

$$\begin{aligned} F_1 = \frac{2}{1/\text{ precision } + 1 / \text{ recall }}\;, \end{aligned}$$
(6)
$$\begin{aligned} \text{ precision } = \frac{TP}{TP+FP}\;,\end{aligned}$$
(7)
$$\begin{aligned} \text{ recall } = \frac{TP}{TP+FN}\;. \end{aligned}$$
(8)

It is confirmed that there were not any false positive and false negative matches, under all overlapping modes (OM) and any compression ratios. In other words, querying with all QPs resulted in a perfect identification for all images.

4.3 Evaluation for Videos

The three video sequences shown in Table 1 were used to confirm the effectiveness of the proposed scheme. Originally, there were 100 uncompressed frames for each video sequence. All video frames were compressed with three different quantization parameters i.e. \(QP = 10, 50\) and 90. As a result, 300 compressed frames were generated from each sequence, and 900 compressed frames were used in total in the simulation. Three video sequences with \(QP=50\) i.e. 300 frames in total were in the database D, and all compressed frames i.e. 900 frames were used as a query image. The original uncompressed versions were not included in the simulation. Therefore, \(900\times 300\) combinations were carried out to evaluate the proposed scheme.

Fig. 8.
figure 8

\(F_1\)-scores for experimental results (Flower)

Fig. 9.
figure 9

Compressed frames with different QPs (Mobile frame 50 and 49)

Querying results for videos are shown in Table 3. From the results, it is confirmed that a larger L proves higher recognition accuracy and a smaller QP also gives higher one, because these conditions enable to supply a large number of the positive and negative signs of LBT coefficients to image identifier. In particular, for \(L = P\), querying with all QPs resulted in a perfect identification for all video sequences. The performance trends can be reconfirmed via \(F_1\)-scores as shown in Fig. 8. Besides, compared to “Mobile” and “Flower”, \(F_1\)-scores decrease for “Deadline”, since it does not include large objective movements between subsequent frames. For all conditions, it is worth noting that there were no false negatives.

Figure 9 shows examples of compressed frames with different QPs, where PSNR(Peak Signal to Noise Ratio) is a measure of image quality. From these examples, it is shown that the successive frames are very similar and moreover compressed frames include large amount of quantization noise in general. The proposed scheme enables to detect the slight difference between frames, even though there is such a situation.

Table 3. Querying results for video images (Overlapping mode \(=1\))

5 Conclusion

A novel scheme for identifying JPEG XR images in the compressed domain has been proposed in this paper. The conventional schemes for compressed images are not available for JPEG XR images, due to the use of a LBT. A new property of the positive and negative signs of LBT coefficients has been considered robustly to identify the images. The proposed scheme does not produce false negative matches in any compression ratio. The experimental results have showed the proposed scheme is effective for not only still images, but also video sequences in terms of the retrieval performance such as false positive, false negative and true positive matches. In particular, in the case of using DC and LP coefficients, i.e. \(L = MN\), querying with all QPs resulted in a near-perfect identification for all images and videos. The proposed scheme will be extended to a identification scheme in the encrypted domain as a future work.