Keywords

1 Introduction

In recent years, the light field imaging has been the most promising means for virtual reality, due to the abundant information recorded from three-dimensional (3D) scene. The light field (LF) describes the set of light rays traveling in every direction through every point in 3D space [1]. Such light filed is expressed as a seven-dimensional (7D) function when published initial. However, the 7D light field model is difficult to realize, so it is simplified to four-dimensional (4D) representation for practicability [2]. Generally, the 4D light field can be parameterized by the coordinates of their interaction with two planes in arbitrary position. The two parameterized coordinates refer to the planes of micro lens and pixels under micro-lens, which denote the space and angular information respectively.

This paper analyzes the images captured by the cameras with micro-lens array. The pixels behind each micro lens named super-pixel which records the ray direction, the number of pixels on the super-pixel expresses the angular resolution [3]. In addition, the sub-aperture images (SAI) are formed by extracting the same position pixels from super-pixel, and the number of micro-lens represents space resolution of light field [4]. The most common applications of light field images process are related to SAI, especially for compression and reconstruction of light field [5,6,7,8]. The light field images containing abundant detailed information, which benefits from its multiple angles of views. Accordingly, the process system of light field needs much larger storage than general 3D content, so many researchers devote to study with the efficaciously compression and reconstruction algorithms for light field. LF compression and reconstruction need the metrics to assess the artifacts induced by the process algorithms. In addition, the research on acquisition and display of light field also desires the appropriate metrics to evaluate the quality accurately for the more stunning visual experience. However, there is still no standard subjective evaluation method and suitable objective metrics for light field.

A few subjective perception quality assessment databases have been designed in [9,10,11], which serve as ground truth for questing objective metrics. Subjective assessment spends lots of manpower and material resource, and it is time-consuming because of the large data contained. Furthermore, it cannot be built in encoder algorithm, hence it is urgent to study objective metrics specialized for light field.

There are few objective metrics for LF in the state-of-the-art. At present, the classic algorithms like PSNR and SSIM are mostly used to evaluate the performance of compression and reconstruction algorithms. The final objective score for the overall quality is obtained by averaging the score of each image in SAI. Although the resolution of the light field image is not high enough, nevertheless, the number of SAI is general 15 × 15. As a result, the quality assessment of SAI consumes time seriously. So the most urgent task of light field image quality assessment (LFIQA) is not only to improve the accuracy but also save time as far as possible. In addition, there are also other objective metrics published. The computation efficiency is promoted in [12] by extracting views on a circle motion animation of the scene around the central view, but it ignores the vignetting effect on edges of micro-lens, which affects the quality of light field images at great extent. A reduced reference LFIQA metric is proposed in [13] based on depth map of origin and distorted LF images. It saves the running time, but its results are dependent on the depth estimation method and do not fit well with the subjective scores.

The SAI has been researched a lot for quality evaluating, while the refocus image is only used to picture segmentation or depth estimation. We are illuminated by light refocus properties in solving depth map [14], the light intensity distribution can be refocused nearby the original focused scenes according to the ray tracing theory [4]. The refocused image can represent the properties of light field due to the mapping process.

In this paper, the refocus character of LF is taken into account because the refocus images contain the distortion information mapped from lenslet images We find that multiple images which focus at different objects in scene can be obtained via setting different depth resolution. The paper demonstrates a framework of image quality assessment based on refocus to represent the properties of light field.

The rest of the paper is organized as follows: Sect. 2 briefly describes the two frameworks of LF image quality evaluation. Section 3 analyzes and compares two frameworks through several objective metrics, and finally in Sect. 4 concluding remarks are drawn.

2 LFRIQA Framework

The most researches of objective evaluation for light field are conducted based on the sub-aperture image quality assessment (SAIQA) framework. The objective evaluation of light field image is mainly applied to assess the artifacts induced by compression and reconstruction algorithm. The procedure of SAIQA framework contains three steps. Firstly, the sub-aperture images can be extracted from the 4DLF images, and the 4DLF image can be obtained from lenslet image through remapping process. It needs to be noticed that it is reversible for conversion between 4DLF and sub-aperture images. Secondly, the selected objective metrics are used to compute the score of each image from sub-aperture images. Finally, the final score of light field are expressed by averaging the array scores of sub-aperture images, and the details of SAIQA frame are visualized in Fig. 1, indicting with blue lines. The conventional objective metric using sub-aperture frame is expressed as follows:

Fig. 1.
figure 1

The diagram of SAIQA and RIQA frameworks. (Color figure online)

$$ LF_{SAIQA} = \frac{1}{kl}\sum\limits_{i = 1}^{k} {\sum\limits_{j = 1}^{l} {f_{(i,j)} \left( {SAI_{ref} ,SAI_{dis} } \right)} } $$
(1)

Where LFSAIQA is the final perceived quality value, k and l denote the index value with row and column \( f( \cdot ) \) n of sub-aperture image, and k = l = 9 in the following contrast test. Then the is used to represent the selected image quality metric such as PSNR or SSIM, SAIref and SAIdis indicate referenced and distorted SAI of corresponding position respectively.

In addition to the usage of sub-aperture image in subjective LFIQA, the refocus image has also been used as an evaluation strategy considering that the perception of depth information attracts observer easier than pictures on sub-aperture, that is to say, the artifacts appeared in refocus image has more influence on the properties of light filed. We suppose that the images on the border of sub-aperture are more annoying to the viewers than those on any other area. Averaging the whole images quality cannot fit well with human visual system (HVS), while it may be solved by drawing a weight array to the sub-aperture images. Then the artifacts induced to sub-aperture images from encoder algorithm also impact on depth information which can be sliced into several refocus images. Moreover, the refocus model can maximize the weight of border distortion as far as possible. The refocused images can be acquired by refocus process with 4DLF images, as shown in Fig. 1 with red lines.

Considering the effect of vignetting to perspective views at the border of the sub-aperture images array, the viewpoint is more legible when its position is closer to the center. The perception of observer is generally affected by the border images according to the assumption of most apparent distortion [15], so that the quality of effective viewpoint of light field can be pulled down by the useless corner view. Therefore, most of subjective quality assessment methods select the central 9 × 9 views. We choose the same views for subjective assessment, and take the distortion of border into account as far as possible.

The comparison of two objective quality evaluation frameworks in the following study adopts the 4D LF synthesized with the central 9 × 9 views. The expression (2) demonstrates the framework of LFIQA based on refocus. It is worked by averaging the objective score with each refocus image.

$$ LF_{RIQA} = \frac{1}{S}\sum\limits_{i = 1}^{S} {f_{(i)} \left( {R_{ref} ,R_{dis} } \right)} $$
(2)

where S is the amount of refocus images, \( f( \cdot ) \) is used to represent the selected image quality metric such as PSNR or SSIM, Rref and Rdis indicate referenced and distorted refocus image of corresponding refocus position respectively.

In this paper, the light field images are refocused at different positions with same interval. In the following implementation, considering that the parameter variation of positive defocus is not remarkable compared with the negative defocus, so we choose 0.1 times of focal length as the smaller negative defocus value and 1.6 times of focal length as the positive defocus value. In addition, the refocusing interval is set to 0.15 for saving time and algorithm stability. We use 10 refocus images to take place of the 81 sub-aperture images, and then compute the objective score with those refocus images, and average them to the last score.

3 Performance Analysis of RIQA Framework

There have been a few subjective evaluate methods for light field images currently, which may be slightly different, but they are basically based on the sub-aperture images and refocus images. In this paper, we compared the performance of SAIQA and RIQA frame with subjective LFIQA database of Shanghai University (SHU) [9, 10] and VALID [11]. The details of two databases are shown in Table 1.

Table 1. Comparison of existing IQA datasets of LFIs

The SHU includes eight contents with five compression algorithms at six compression ratios (CR), The database contains artifacts such as gaussian blur, JPEG, JEPG2000, motion blur and white noise those artifacts Then VALID includes five contents with five compression algorithms at four quantization parameters (QPs), which containing of HEVC, VP9, [16,17,18] artifacts. The VALID contains 10bit depth (the original bit depth of images) and 8bit depth. Although the 8bit part just have HEVC and VP9, there are three subjective evaluations, therefore, both bit depths above will be tested later. The angular resolutions of SHU and VALID are 15 × 15, 13 × 13 respectively, and the corresponding spacial resolutions are 625 × 434 and 626 × 434. For the purpose of validity and practicability, the analysis of objective metrics on two frameworks employed the central 9 × 9 viewpoints and 625 × 434 resolution.

In order to compare the performance of two frameworks, we used nine representative full reference IQA metrics, including peak signal to noise ratio (PSNR), structural similarity index metric (SSIM) [19], multi-scale SSIM (MS-SSIM) [20], information content weighting SSIM (IW-SSIM) [21], feature similarity index metric (FSIM) [22], gradient similarity metric (GSM) [23], visual information fidelity (VIF) [24], visual saliency index (VSI) [25], and sparse feature fidelity (SFF) [26]. For a better understanding of the correlation between the mean opinion score (MOS) and the objective metrics above. Figure 2(a–i) shows the scatter distributions of MOS versus the predicted scores by nine objective metrics for SAIQA frameworks on the SHU database. Correspondingly, the Fig. 2(j–r) show the homologous scatter diagrams under RIQA framework. The black lines are curves fitted with the five-parameter logistic function. The results show that, compared with SAIQA framework, the objective score predicted by RIQA has a stronger correlation with MOS In the scatter diagrams of RIQA, the scatter points around the fitting curves are more aggregated than that of SAIQA.

Fig. 2.
figure 2figure 2

Scatter plots of subjective MOS versus the predicted scores by objective metrics on the SAIQA-SHU (a–i) and RIQA-SHU database (j–r).

The correlation between the predicted score and MOS was calculated using root-mean-square error (RMSE), Pearson linear correlation coefficient (PLCC), Spearman rank order correlation coefficient (SROCC), and Kendall rank order correlation coefficient (KROCC) metrics. The first two metrics need to undergo the nonlinear regression process before fitting with MOS, which denote the accuracy of correlation between MOS and the predicted score. Moreover, the KROCC and SROCC are used to measure the monotonicity of objective IQA metrics. A better objective metric is expected to have a higher absolute value of PLCC, KROCC, SROCC a lower RMSE.

The performance of the above two frameworks in the SHU database is shown in Table 2. It can be seen that the performance of most objective metrics in RIQA is outperform the SAIQA framework. Judging from the above four indexes, RIQA framework can improve the performance both in terms of accuracy and monotonicity. The best results for two frameworks are in bold front. For the SAIQA method, the SFF obtained the best result than other metrics (measured in terms of four indexes above). This result could be due to the fact that the SFF takes into account the independent component analysis (PCA), which simulates the sparse representation of images by primary visual cortex. SFF performs better than SAIQA in the RIQA framework, while VSI outperforms others with RIQA framework in SHU database. The VSI combines the excellent visual saliency (VS) map and gradient map as feature maps as well as VS map employed as weighted function to reflect the importance of the local regions. To some extent, the saliency map plays an important role in the evaluation of refocused images, which can be researched in the future. In addition, GSM is also superior to the best result of SAIQA. This can be explained in part by the effectiveness of IQA using refocused images.

Table 2. Performance of RIQA and SAIQA in SHU database

The performance of RIQA and SAIQA for two bit-depth forms in VALID is listed in Table 3. For the length reasons, we do not show the specific fitted scatter diagram for the single distortion type. The 8bit part of VALID contains three subjective evaluation methodologies (interactive, passive, passive-interactive). It can be seen that different subjective evaluation strategies have different fitting results, indicating that it is very important to study the subjective evaluation. The objective metrics still improved by RIQA though it is not as prominent as SHU. It can be seen that SAIQA is much more consistent with passive subjective evaluation method, while RIQA has great consistency with interactive subjective evaluation method. In addition, there is still a lot of room for improvement in the study of LFIQA with refocus properties, such as locating the refocus range as well as extracting the key refocus location. In a word, the RIQA framework can realize the improvement of most of objective metrics than traditional IQA based on SAI. Besides, using RIQA framework takes much less running time than SAIQA. Table 4 records the detailed time of two frameworks with different objective metrics under different database.

Table 3. Performance of RIQA and SAIQA in VALID database
Table 4. The average running time of RIQA and SAIQA frameworks on different objective metrics and different databases

The execution time here is calculated by averaging all the content, all the distorted types and all the distorted levels for each objective metric. All of the experiments were run on a PC with 3.70-GHz Intel Core i7-8700K CPU and 32 GB of RAM. Figure 3 shows that the RIQA framework can save more time for the metrics that consume more time. Considering that the sub-aperture images extraction from 4DLF is more time consuming than the refocus process, the paper does not calculate the time of extracting process.

Fig. 3.
figure 3

Execution time of RIQA and SAIQA frameworks on different objective metrics under different databases

It is easily to comprehend the consume time is longer with SAIQA frame, because the light field images have 15 * 15 viewpoints in general. Although the paper just use the central 9 * 9 viewpoints for the effectiveness, it still needs to calculate more images, so it is inevitable to spend a lot of time. It is terrible for real time evaluating process of light field images. However, the refocused images only a few pieces, which is the most important reason to save time on such a large extent. It should be noted that even if we compute the average execution time, there will be also different results on the different databases with the same metrics, which can be ascribed to the reason of different databases have different contents as well as the difference in the allocation of running memory by the computer. However, the overall tendency is close to the distribution of broken lines in Fig. 3. The objective metrics suitable to the light field not only require the high correlation with MOS, but also call for the relatively short time.

The Q in Table 4 represents the quotient of corresponding objective metrics between two frameworks. Obviously, the refocus character of light field can solve this problem to a great extent. One interesting thing is that the database with more images takes less time for single LF image, this could be caused by the link of calling images, but it does not affect the comparison of running time between two frameworks. based on the consideration of performance and time, refocusing image, as a form of light field, obviously has a broad research prospect in the field of LFIQA.

4 Conclusion

In this paper, we proposed a new LFIQA framework based on refocus property of light field. The new method is demonstrated by various objective metrics with SHU and VALID databases. The RIQA frame has two advantages than general SAIQA frame. Firstly, it improves the performance of most objective metrics, even to the different distortion. Secondly, the RIQA frame saves the time at a large extent. In addition, the RIQA frame can deal with the assessment of compressed or reconstructed algorithm based on lenslet images directly.