Keywords

1 Introduction

From the last two decades, a great number of researches have been conducted to design robust No-Reference Image Quality Assessment (NR-IQA) algorithms. As they don’t use the reference image, these algorithms can be embedded in the development of new multimedia services. NR-IQA algorithms aim at predicting image quality from objective features extracted from distorted images. To reach this goal, these algorithms, either assume a priori knowledge of involved distortions, or look for a generic approach by directly assessing the image quality regardless the type of image distortion [1]. For specific approaches, metrics are mainly designed to quantify distortions induced by image encoders such as JPEG and JPEG2000. In studies [2, 3], the block effect is estimated in the spatial domain while studies [4, 5] quantify the same effect in the frequency domain. Assessment algorithms for blur effect are proposed in [6, 7]. Distortions induced by JPEG2000 such as blur and ringing are considered in [8, 9]. The blur effect is characterized by an increase of the spread of edges while the ringing effect produces halos and/or rings near sharp object edges. As a result, the proposed metrics are generally based on the measurement of edges spreading. All of these metrics are interesting and some of them perform very well. However they remain limited by the distortions that they have to know.

Generic approaches aim to be universal and look to address all applications fields. Usually, the generic approaches follow two trends: Signal based approach that extracts and analyzes features in image signals and visually based approach that aims to mimic the human visual system (HVS) properties. Signal based approaches present a good trade-off between performance and complexity. Generally, these approaches require two steps. In the first one, relevant features are extracted while in the second, these features are pooled in order to produce the quality score of the image under test. The first step has been the subject of several investigations [10,11,12]. In contrast, the second step usually uses conventional combinations. To overcome this drawback, statistical modeling of natural images has been considered where the features extraction procedure is followed by a learning step. In the case of no-reference metrics which is the context of this paper, these learning methods can well map features and subjective assessments [13,14,15,16,17, 23].

Since the result of evaluation ultimately depends on the final observer, the visually-based trend looks for designing metrics according to HVS behavior when assessing quality. Several metrics based on one or more HVS properties, are proposed in literature. The HVS sensitivity to image signals is used in [24, 25]. The HVS sensitivity to spatial frequencies, luminance and/or structural information is pointed out in [26,27,28]. Nowadays, several HVS models are proposed in image quality assessment and their experimental results are very promising.

The goal of this paper is to combine the advantages of visually-based methods and the interesting results [29] of learning techniques in the context of NR-IQA metrics. More specifically, it uses the spatial frequency sensitivity of HVS to decompose the test image, extracts the statistical features of each visual sub-band and combines them using a multivariate Gaussian distribution, to assess the quality.

The rest of this paper is organized as follows: The proposed metric is presented in Sect. 2. The visual sub-band decomposition is described in Sect. 3. The experimental results are shown and discussed in Sect. 4.

2 The LEVIQI Index

2.1 General Framework

Figure 1 gives the overall framework of the proposed method.

The first block models the spatial frequency selectivity of the human visual system and performs a perceptual decomposition on both training and testing images. The steerable pyramid [30] is used to achieve the perceptual decomposition described in Sect. 2.2. The steerable pyramid is a multi-scale, multi-orientation ans self inverting image decomposition. With this decomposition, the image is divided into a set of sub-bands localized in scale and orientation. An example of the first level image decomposition where four oriented band-pass filters are used, is given Fig. 2.

The second block extracts the statistical features of each filtered visual channel. Instead of looking to define new features, this paper will exploit some features derived from well-known metrics.

The computed features and the DMOS (Difference of Mean Opinion Scores) values of training images are then used by the learning block to fit an IQA Multivariate Gaussian Distribution. Due to its simple parametric form, the MVGD is widely used for modeling vector-features signals and has been a sound choice in many image and video applications. Therefore, the resulting model, namely LEarning-based and Visual-based Image Quality Index (LEVIQI) is given by:

$$\begin{aligned} \small LEVIQI\left( x\right) =\frac{1}{\left( 2\pi \right) ^{k/2}\left| \varSigma \right| ^{1/2}}\exp \left( -\frac{1}{2}\left( x-\beta \right) ^{T}\varSigma ^{-1}\left( x-\beta \right) \right) \end{aligned}$$
(1)

where \(x = \left( f_{1},...,f_{k}, DMOS\right) \) corresponds to the extracted features to which is added the DMOS of training distorted images. \(\beta \) and \(\varSigma \) denote the mean and covariance matrix of the MVGD model and are estimated using the maximum likelihood method. The features extracted from testing images with DMOS values lying between 0 and 100 with a step of 0.5, are fed into the learned LEVIQI to assess objective quality of image under test. The DMOS of the test image is the one that maximizes the distribution \(p\left( x,\beta ,\varSigma \right) \).

Fig. 1.
figure 1

Overall framework of the proposed method.

Fig. 2.
figure 2

Steerable pyramid-based image decomposition. Illustration of the first level image decomposition (radial selectivity) where four oriented band-pass filters are used (angular selectivity). The second level of decomposition is the result of the same process performed on low-pass filter 1 (low-pass filter 0 downsampled by the factor 2)

The probabilistic model is trained on the LIVE IQA database, for which, one has access to DMOS values. To ensure a robustness of results, multiple training sets were constructed. In each, the image database was subdivided into distinct training and test sets (completely content-separate). For each train set, 80% of the LIVE IQA Database content was chosen, inducing that the remaining 20% is considered for the test set. Specifically, each training set contained images derived from 23 original images, while each test set contained the images derived from the remaining 6 original images. 1000 randomly chosen training and test sets were obtained and the prediction of the quality scores was run over the 1000 iterations.

2.2 Visual Sub-band Decomposition

It is well known that the retinal image is processed by different frequency channels that are narrowly tuned around specific spatial frequencies and orientations. Numerous psychophysical experiments have been conducted to estimate the bandwidth of these channels. Quiet different values have been obtained by different types of experiments.

According to these experiments, most of the proposed decompositions suggest a spatial frequency bandwidth of approximately one octave and an orientation sensitivity that varies between 20\(^\circ \) and 60\(^\circ \) depending on the spatial frequency.

This study uses the decomposition of Fig. 3. Discussed in [31], this decomposition uses a radial frequency selectivity that is symmetric on a log-frequency axis with bandwidths nearly constant at one octave. It consists of one isotropic low-pass and three band-pass channels. The angular selectivity is constant and equal to 30\(^{\circ }\).

Fig. 3.
figure 3

Perceptual decomposition.

2.3 Selected Features

All features considered in this paper are extracted from nine commonly used learning-based NR-IQA metrics: (1) BRISQUE [18], (2) QAC [19], (3) BLIINDS-II [17], (4) NIQE [23], (5) DIIVINE [16], (6) BIQI [15], (7) IL-NIQE [20], (8) SSEQ [21] and (9) OG-IQA [22]. A reason of the choice of those trial algorithms is motivated by the fact that the code of all of them is publicly available.

Yet, since around 200 features are available from all the trial algorithms, only the most relevant are selected. Furthermore, some of them modelize similar visual characteristics, e.g., luminance sensitivity, sub-band anisotropy, and so on. So, it is not necessary to select features which modelize similar characteristics.

All features are computed for all original images (and their associated degraded version) of the LIVE IQA database [33]. Then, the Spearman Rank-Order Correlation Coefficient (SROCC) between values of features and subjective DMOS is computed. Finally, only the 10 highest correlated attributes are considered to design LEVIQI under the constraint that the selected features do not modelize similar characteristics.

Table 1. Selected features to design LEVIQI

Table 1 presents the selected distinct features. Yet, these attributes have not been used as they have been defined in their associated NR-IQA schemes, but have been modified to adapt the visual sub-band decomposition.

GGD Fit Parameters: The first set of features, derived from BRISQUE, includes the shape parameter \(\alpha \) and the variance \(\sigma ^{2}\) of generalized Gaussian distribution (GGD) fit of the MSCN (mean subtracted contrast normalized) coefficients for each sub-band. The MSCN coefficients refer to the transformed luminances \(\hat{I}(i,j)\) given by:

$$\begin{aligned} \hat{I}(i,j)=\frac{I\left( i,j\right) -\mu \left( i,j\right) }{\sigma \left( i,j\right) +C}, \end{aligned}$$
(2)

where (i,j) are spatial indices, C=1, and \( \mu \left( i,j\right) =\sum _{k=-3}^{3}\sum _{l=-3}^{3}\omega _{k,l}I_{k,l}\left( i,j\right) , \) and \( \sigma \left( i,j\right) =\sqrt{\sum _{k=-3}^{3}\sum _{l=-3}^{3}\omega _{k,l}\left( I_{k,l}\left( i,j\right) -\mu \left( i,j\right) \right) ^{2}}, \) \(\omega \) is a 2D circularly-symmetric Gaussian weighting function.

The shape \(\alpha \) and the variance \(\sigma ^{2}\) parameter of the GGD are computed over all subbdands and then pooled by computing the lowest 10th percentile average of the local \(\alpha \) scores and the local \(\sigma ^{2}\) scores across the sub-bands.

Coefficient of Frequency Variation: This feature, derived from BLIINDS2, is defined for each subband as:

$$\begin{aligned} \zeta =\frac{\sigma _{\left| X\right| }}{\mu _{\left| X\right| }}=\sqrt{\frac{\varGamma \left( 1/\gamma \right) \varGamma \left( 3/\gamma \right) }{\varGamma ^{2}\left( 2/\gamma \right) }-1} \end{aligned}$$
(3)

where \(\sigma _{\left| X\right| }\) and \(\mu _{\left| X\right| }\) are the standard deviation and mean of the frequency coefficient magnitudes \(\left| X\right| \), respectively.

The feature \(\zeta \) is computed for all sub-bands of the image and pooled by taking the highest 10th percentile and over all of the sub-band scores of the image.

Energy Sub-band Ratio Measure: This attribute, also derived from BLIINDS2, is used to capture local spectral signature. This feature is defined as:

$$\begin{aligned} R_{n,k}=\frac{\left| E_{n,k}-\frac{1}{n-1}\sum _{j<n}E_{j,k}\right| }{E_{n,k}+\frac{1}{n-1}\sum _{j<n}E_{j,k}} \end{aligned}$$
(4)

where \(E_{n,k} = \sigma _{n,k}^{2}\) is the average energy in the frequency sub-band n for a given radial band k, where \(n \in \{1,2,3,\dots ,6\}\) and \(k \in \{1,2,3\}\). The mean of \((R_{n,k})_{n \in [2,\dots ,6]}\) is computed on the three radial bands k of the image and pooled by computing the highest 10th percentile average of the scores of the image.

Across Scale and Spatial Correlations: These two features are derived from features designed for DIIVINE. In order to capture the statistical dependencies between high-pass (HP) responses of natural images and their band-pass (BP) counterparts, a structural correlation is modeled as \( \rho =(2\sigma _{xy}+C_{2})/(\sigma _{x}^{2}+\sigma _{y}^{2}+C_{2})\) where \(\sigma _{xy}\) is the cross-variance between the windowed regions from the BP and HP bands, and \(\sigma _{x}^{2},\) \(\sigma _{y}^{2}\) are their windowed variances respectively; \(C_{2}\) is a stabilizing constant. The mean of the 18 correlation values (corresponding to the 18 sub-bands) is computed.

In order to capture spatial correlation statistics, the joint empirical distribution between coefficients at (ij) and the set of spatial locations at a distance of \(\tau \) is computed for each \(d_{1}^{\theta }\), \(\theta \in \left\{ 0^{o},30^{o},60^{o},90^{o},120^{o},150^{o}\right\} \). The correlation between these two variables denoted X and Y is estimated by:

$$\begin{aligned} \rho \left( \tau \right) =\frac{E_{PXY(x,y)}\left[ \left( X-E_{PX(x)}\left[ X\right] \right) ^{T}\left( Y-E_{PY(y)}\left[ Y\right] \right) \right] }{\sigma _{X}\sigma _{Y}} \end{aligned}$$
(5)

where \(E_{PX(x)}\left[ X\right] \) is the expectation of X with respect to the marginal distribution \(p_{X}(x)\) (similarly for Y and (X,Y)). \(\rho \left( \tau \right) \) is plotted for various distance across distortions and the obtained curve is fitted with a \(3^{rd}\) order polynomial. The coefficients of the polynomial and the error between the fit and the actual \(\rho \left( \tau \right) \) form an 30 dimensional feature vector (5 values for each direction). The mean of \(\rho \left( \tau \right) \) is computed over the three radial bands and pooled by computing the highest 10th percentile average of the scores.

Local Spatial and Spectral Entropy Features: The spatial entropy is computed on the obtained image after applying the first non-directional lower-pass filter applying.

Concerning the spectral entropy, a modified version of the attribute designed for SSEQ is defined. The variance across the six orientations per radial band \(var\left( E\left[ R_{\theta }\right] \right) \) is computed. \(\left( E\left[ R_{\theta }\right] \right) \) is the average of the Renyi entropy \(R_{\theta }\left[ n\right] \) per orientation for the three sub-bands (one per radial band) of orientation \(\theta \).

Finally, the mean of the three variance values per radial band is computed.

Chromatic Statistics: This attributed is derived from IL-NIQE and is computed on the color image before applying the DCP transform. From the RGB coordinates system, a logarithmic-scale opponent color space \(\mathcal {R}\mathcal {G}\mathcal {B}\) is defined as \(\mathcal {R}=\log R -\mu _R\), \(\mathcal {G}=\log G - \mu _G\) and \(\mathcal {B}=\log B -\mu _B\) where \(\mu _R\) (resp. \(\mu _G\) and \(\mu _B\)) is the mean \(\log R\) (resp. \(\log G\) and \(\log B\)) over the image. Finally, the following linear color transform is applied on the \(\mathcal {R}\mathcal {G}\mathcal {B}\) color space as \(l_1 = (\mathcal {R}+\mathcal {G}+\mathcal {B})/\sqrt{3}\), \(l_2 = (\mathcal {R}+\mathcal {G}-2\mathcal {B})/\sqrt{6}\), \(l_3 = (\mathcal {R}-\mathcal {G})/\sqrt{2}\). The distributions of each channel \(l_1, l_2, l_3\) conforms to a Gaussian probability law. In this paper, only the chromatic channels \(l_2\) and \(l_3\) are considered, since \(l_1\) refers to a luminance channel. Finally, the two model parameters \(\mu _C\) and \(\sigma ^2_C\) are estimated using a multivariate Gaussian model.

3 Performance Evaluation

3.1 Experimental Setup

Trial Databases: To provide comparison of NR-IQA algorithms, two publicly available databases are used: (1) TID2013 database [32] and (2) CSIQ database [34]. Since the LIVE database [33] has been used to train both the proposed metric and most of the trail NR-IQA schemes, it has not been used to evaluate performances of NR-IQA methods.

The TID2013 database contains images with multiple distortions. This database consists of 25 original images on which 24 different type of distortions have been applied using five degradation levels per distortion. A total of 3000 distorted images are generated. It is worth noting that seven new types of degradations have been introduced with respect to the 17 types of degradation existing in the previous version of the database, known as TID2008. The database contains 524340 subjective ratings from 971 different observers, and ratings are reported in the form of MOS.

The CSIQ Database consists of 30 original images, each is distorted using six different types of distortions at four to five different levels of distortion. A total of 866 distorted images have been generated. The database contains 5000 subjective ratings from 35 different observers, and ratings are reported in the form of DMOS.

Trial NR-IQA: To assess the performance of the proposed metric, nine commonly used NR-IQA schemes are used to compare LEVIQI with. The trial metrics are those mentioned in Sect. 2.3. Only NR-IQA schemes whose at least one feature has been adapted to design LEVIQI index, i.e., BRISQUE, BLIINDS2, DIIVINE, SSEQ and ILNIQE are used to compare the performance of LEVIQI with.

Statistical Significance and Hypothesis Testing: Results obtained from the proposed metric are compared to results provided by all trial NR-IQA algorithms. To perform this evaluation, the Spearman Rank Order Correlation Coefficient (SROCC) is computed between the DMOS values and the predicted scores obtained from NR-IQA algorithms. In addition, to ascertain which differences between NR-IQA schemes performance are statistically significant, we applied an hypothesis test using the residuals between the DMOS values and the ratings provided by the IQA algorithms. This test is based on the t-test that determines whether two population means are equal or not. This test yields us to take a statistically-based conclusion of superiority (or not) of an NR-IQA algorithm.

3.2 Experimental Results

Table 2 gives SROCC mean values computed between predicted scores from NR-IQA schemes from which some extracted features are used to design LEVIQI and MOS values for the TID2013 Images database. When considering the whole database, LEVIQI overperforms BRIQUE, BLIINDS2, DIIVINE, SSEQ ad ILNIQE. In 67% of cases (16 subsets out of 24), scores predicted by LEVIQI allow to have a better correlation (higher SROCC value) with human judgments than that of any other tested NR-IQA. For the remaining cases (8 remaining subsets), the performance of LEVIQI is very close to the best values. If we consider multidistorted images associated to the seven last subset of Table 2, LEVIQI performs better than trial quality schemes except for ‘Multiplicative Gaussian Noise’ and ‘Comfort Noise’. Yet, obtained SROCC values are highly competitive with the best quality index.

Table 2. SROCC values computed between predicted scores using NR-IQA schemes from which some attributes are extracted to design LEVIQI and MOS values for TID2013 Images database.
Table 3. SROCC values computed between predicted scores from which some attributes are extracted to design LEVIQI and MOS values for CSIQ Images database.

Similar results as those of Table 2 are given in Table 3 for CSIQ Images database. For this database also, The global performance (SROCC value of cumulative subsets) of LEVIQI is higher than that of trial metrics. More specifically, LEVIQI performs better 4 times over 6. For the two remaining distortions ‘Gaussian Noise’ and ‘Additive Gaussian Pink Noise’, LEVIQI presents the second best performance.

In addition, Table 4 gives obtained results when a One-sided t-test is used to provide statistical significance of NR-IQA/LEVIQI quality scores on the 6 multidistortions subsets of TID2013 database (the 6 last subsets in Table 2). Each entry in this table is coded using six symbols. The position of each symbol corresponds to one subset (first position corresponds to ‘Change of Color Saturation’ subset, second position for ‘Multiplicative Gaussian Noise’ subset etc.). Each symbol gives the result of the hypothesis test on the subset. If the symbol equals ‘1’, the NR-IQA on the row is statistically better than the NR-IQA on the column (‘0’ means worse, ‘-’ is used when NR-IQAs are indistinguishable). Results confirm that, most of the time, LEVIQI is more consistent with human judgments than trial NR-IQA schemes from which some features have been extracted. As multidistortions subsets are not common to LIVE database, these results are more reliable.

Table 4. Statistical significance comparison of trial NR-IQA/LEVIQI quality scores on TID2013 database multidistortions subsets.

In a similar way, Table 5 gives obtained results when a One-sided t-test is used to provide statistical significance of NR-IQA/LEVIQI quality scores on CSIQ database. One can notice that distortions present in CSIQ database are also present in LIVE database. For these learned distortions, LEVIQI exceeds in several cases all of the trial NR-IQA schemes.

Table 5. Statistical significance comparison of NR-IQA/LEVIQI quality scores on CSIQ database subsets.

Finally, to compare the computational complexity of the proposed algorithm, we measured the average computation time required to assess an image of size \(512 \times 512\), using a computer with Intel Core-I7 processor at 2.2 GHz. Table 6 reports the measurement results, which are rough estimates only, as no code optimization has been done on our Matlab implementations. LEVIQI is superior to DIIVINE and BLIINDS2 while inferior to BRISQUE and SSEQ and similar to ILNIQE.

Table 6. Comparison of computational time (in second/image)

4 Conclusion

In this paper, a machine learning-based and human vision-based quality index called LEVIQI has been proposed in the purpose of no reference quality evaluation. The model utilizes ten derivative relevant attributes from five well-known and highly competitive NR-IQA schemes. The selected attributes address human vision characteristics such as frequency sensitivity, chromatic sensitivity, anisotropy and contrast sensitivity. These attributes have been adapted to be computed on 18 sub-bands generated from three different radial bands and six different orientations, to simulate human perception sensitivity. Obtained results show that the performance of LEVIQI is competitive with top-performing NR-IQA schemes. The potential of this model to be used in real applications is the subject of investigations where real-time implementation of LEVIQI index is considered using efficient vectorization.