1 Introduction

Visual saliency has already got extensive studied by computer vision researchers and cognitive scientists. Saliency detection is a very important step to solve visual tasks, such as image segmentation [1], visual tracking [2], image and video compression [3]. Existing saliency detection models can be divided into spatial domain [4,5,6] and frequency domain [7,8,9,10,11,12] according to different computing domains.

In the frequency domain, the information of image is reflected in amplitude spectrum and phase spectrum. The existing frequency domain saliency detection models compute saliency map by using the information of amplitude spectrum or phase spectrum or combination of both. According to the different methods of using the information of frequency domain, we divide existing frequency domain saliency detection models into four groups: (i) the model uses original phase spectrum [7]; (ii) the model uses optimized amplitude spectrum and original phase spectrum [8, 10, 12, 13]; (iii) the model uses optimized amplitude spectrum and optimized phase spectrum [11, 14, 15]; (iv) the model uses wavelet transform [16, 17]. We propose model which can detect salient object by using optimized amplitude spectrum and original phase spectrum, so the proposed model belongs to the third group.

Hou and Zhang [8] introduced the frequency domain calculation into saliency detection. They combined spectral residual (SR) and phase spectrum to compute saliency map. In the first group, Guo et al. [7] found that only using phase spectrum from Fourier Transform (PFT) can achieve similar detection results to SR. On the basis of PFT, Guo extended the algorithm PFT to saliency detection of color image, and used phase spectrum from quaternion Fourier Transform (PQFT) to improve the overall performance of saliency map. In the second group, Li et al. [10] inspired by [7] and proposed the saliency model about Hypercomplex Fourier Transform (HFT), and the detection results were better. In the third group, Li et al. [15] proposed Hypercomplex Spectral Contrast (HSC), which utilized amplitude spectral contrast and phase spectral contrast to calculate saliency map and enhanced detection accuracy by average of multiscale saliency map. Li et al. [11] computed saliency map by combination of effective amplitude spectrum and phase spectrum, which were obtained by designed amplitude spectrum filters and phase spectrum filters, respectively. In the fourth group, Nevrez [16] proposed a saliency detection model, and the final saliency map was fusion of local saliency map and global saliency map both calculated by wavelet transform.

In the above algorithms, some algorithms directly abandon the amplitude spectrum, and others ignore the effect of different amplitudes to the salient object. In our work, we find that smoothing the peaks (above our selected threshold) in log amplitude spectrum can obtain better detection results. Thus, we propose a new saliency detection model which is based on the optimized amplitude spectrum by smoothing the peaks in log amplitude spectrum. Compared with the state-of-the-act models, the proposed model makes full use of the amplitude spectral information and detects a more complete object region.

2 Amplitude Spectrum Analysis

In the frequency domain, the peaks in log amplitude correspond to the repetitive patterns [10]. In this paper, we define the peaks as the amplitude higher than the selected threshold. The log amplitude spectrum is smoothed to suppress the repetitive background regions, which filter out non-salient regions. But, many salient regions are also repetitive patterns which correspond to some local peaks in log amplitude spectrum. These local peaks are also suppressed when smoothing the entire log amplitude spectrum, so the information of corresponding salient regions is also naturally filtered out. If we only smooth the peaks in log amplitude spectrum and don’t change the original local peaks, which will keep more salient information and have a better detection results. Therefore, we propose a novel model which smooth the peaks in log amplitude spectrum, and the saliency map is obtained by inverse transform combining smoothed amplitude spectrum and original phase spectrum.

To demonstrate the effectiveness of our method, we construct one dimensional signals \(f_{1}(t)\) and \(f_{2}(t)\). The signal \(f_{1}(t)\) is a periodic signal, called the first signal. And the signal \(f_{2}(t)\) is generated by the frequency doubling of the 701–800 sampling segment in \(f_{1}(t)\), called second signal. In Fig. 1, row 1st and 2nd show the original waveforms and the corresponding log amplitude spectrum of the first signal. Row 3rd and 4th show the original waveforms and the corresponding log amplitude spectrum of \(f_{2}(t)\). In this paper, the high frequency part in \(f_{2}(t)\) is called salient segment (701–800 sampling segment), and the low frequency part is called non-salient segment.

Fig. 1.
figure 1

The non-salient segment leads to the three peaks, the salient segment leads to the local peaks (in red boxes). (Color figure online)

Fig. 2.
figure 2

Detection results by using the two models. Smoothing the peaks in log amplitude can get better detection result. (Color figure online)

According to [10], the higher amplitude in amplitude spectrum correspond to the repetitive patterns. So the three peaks of row 2nd in Fig. 1 are caused by repetitive patterns in \(f_{1}(t)\). In fact, the three peaks of row 4th in Fig. 1 are also caused by the repetitive non-salient segment and the two local peaks (in red boxes) are caused by the repetitive salient segment in \(f_{2}(t)\). If the entire log amplitude spectrum in row 2nd is smoothed, the three peaks are suppressed, so as the two local peaks are also suppressed, which weaken the non-salient segment and also weaken the salient segment. If only the peaks in log amplitude spectrum are smoothed, the obtained results only weaken the non-salient segment and get better saliency map. So we propose the model that only needs to smooth the peaks in log amplitude spectrum and only weakens the non-salient segment, and the proposed model obtains better saliency detection results.

The detection results of the models smoothing the entire log amplitude and the peaks are shown in Fig. 2. The detected non-salient segment has a lower amplitude (red dotted boxes) and the detected salient segment has a higher amplitude (red solid boxes) by the proposed model. Thus, the proposed model can obtain better detection results.

3 The Methodology

According to the analysis in Sect. 2, the peaks in log amplitude spectrum are caused by background regions and the local peaks are caused by salient regions. Thus, we propose the saliency detection model which is based on the optimized amplitude spectrum and the original phase spectrum. Firstly, we optimize the amplitude spectrum by smoothing the peaks in log amplitude spectrum according to different thresholds. Then, we obtain raw saliency maps by the inverse transform combining the optimized amplitude spectrum and original phase spectrum. Lastly, we select the raw saliency map with smallest entropy as the final saliency map.

3.1 Smoothing the Peaks in Log Amplitude Spectrum

In our model, we use the hypercomplex matrix to represent the color image. Equation 1 is the structure of hypercomplex matrix. The hypercomplex matrix Fourier transform and the inverse transformation are shown as Eqs. 2 and 3, respectively. The nature of quaternion refers to paper [7].

$$\begin{aligned} f(n,m)=a+bi+cj+dk \end{aligned}$$
(1)
$$\begin{aligned} F(u,v)=\frac{1}{\sqrt{MN}} \sum _{m=0}^{M-1} \sum _{n=0}^{N-1} e^{-u2 \pi ((\frac{mv}{M})+(\frac{nu}{N}))}f(n,m) \end{aligned}$$
(2)
$$\begin{aligned} f(n,m)=\frac{1}{\sqrt{MN}} \sum _{v=0}^{M-1} \sum _{u=0}^{N-1} e^{u2 \pi ((\frac{mv}{M})+(\frac{nu}{N}))}F(u,v) \end{aligned}$$
(3)

RGB color system is the most commonly used color system, but it is device-dependent color system. Lab color system is based on physiological characteristics, and has a larger color space than RGB color system. L channel represents brightness, a channel represents the range from red to green and b channel represents the range from yellow to blue. Lab color system is more in line with human visual perception system. Thus, we use Lab color system to represent images in our work. Equation 4 is a pure imaginary matrix, where L, a and b are three imaginary parts. Equation 2 can be rewritten after Fourier Transform, see Eq. 5. \(A(u,v)=\Vert F(u,v)\Vert \) and \(p(u,v)=angle(F(u,v))\) are amplitude spectrum and phase spectrum, respectively. \(L(u,v)=log(A(u,v))\) is the log amplitude spectrum.

$$\begin{aligned} f(n,m)=Li+aj+bk \end{aligned}$$
(4)
$$\begin{aligned} F(u,v)=\Vert A(u,v)\Vert e^{uP(u,v)} \end{aligned}$$
(5)

The original amplitude spectrum has a large spectral drop. When optimizing amplitude spectrum, therefore, we choose to smooth the peaks in log amplitude spectrum rather than original amplitude spectrum. How to determine the peaks in log amplitude spectrum? We choose a suitable threshold. The peaks are the amplitude over than the threshold. We use matrix \(\varGamma (u,v)\) to record the positions of peaks in the amplitude spectrum in Eq. 6. \(L_{S}(u,v)\) is the log amplitude spectrum after smoothing the peaks and \(g(\sigma )\) is a Gaussian kernel (Eq. 7).

$$\begin{aligned} \varGamma (u,v)={\left\{ \begin{array}{ll}1, &{} A(u,v)>threshold,\\ 0, &{} otherwise. \end{array}\right. } \end{aligned}$$
(6)
$$\begin{aligned} L_{S}(u,v)=[g(\sigma )*L(u,v)]\cdot \varGamma (u,v)+L(u,v)\cdot [1-\varGamma (u,v)] \end{aligned}$$
(7)

3.2 Choose the Thresholds and Smoothing Scale

Choose the appropriate thresholds. In our work, the threshold is an important parameter. Due to the diversity of images, it is difficult to find a suitable threshold for all images, so we decide to choose a number of thresholds. We have done a statistic of the images in the database ASD [18], after normalized the size of images to \(128*128\). In Table 1, The num1 is the number of mean peaks higher than the corresponding threshold. The \(key=mean(A(u,v))\) is the average amplitude of the amplitude spectrum.

If the number of peaks is less than one, smoothing the peaks in log amplitude spectrum is similar to what has nothing changed in the amplitude spectrum. If the number of peaks is too large, smoothing the peaks in log amplitude spectrum is similar to smoothing the entire log amplitude spectrum. Thus, when the number of peaks is too much nor too little, the detection results will be poor. In Table 1, when threshold is equal to key or 1.25key, there are too many peaks; when threshold is equal to 2.75key, the number of peaks is too small. And they are not suitable as thresholds. Finally, we choose the best threshold from experimental result in Eq. 8. And we assume that the optimal threshold for determining the peaks must appear in Eq. 8.

Table 1. num1 are the number of mean peaks in different thresholds.
Table 2. num2 are the number of images in different best thresholds

According to the eight thresholds in Table 1, we count the number which each threshold is selected as the optimal threshold, in the database ASD [18]. We obtain eight raw saliency maps by eight thresholds, and then we calculate the entropy of each raw saliency map. The threshold of the salient map with smallest entropy is the optimal threshold of the image. The statistical results are shown in Table 2, and num2 is the number of images which the corresponding threshold is the optimal threshold. The probability of the optimal threshold appears within our chosen best threshold (Eq. 8) is \((78+171+229+286+142)/1000=90.6\%\). So the chosen best thresholds are reasonable.

Choose the appropriate smoothing scale. In saliency map, the brighter the salient regions, the darker the non-salient regions, the better of detection results. So we define the Salient Contrast Ratio (SCR) as measure the detection effect in Eq. 9. The greater of SCR, the better of detection results.

$$\begin{aligned} threshold=u\cdot key, (u=1.5, 1.75, 2, 2.25, 2.5) \end{aligned}$$
(8)
$$\begin{aligned} SCR=S\cdot \varTheta (*)/sum(\varTheta (*))-S\cdot (1-\varTheta (*))/sum(1-\varTheta (*)) \end{aligned}$$
(9)

where, S is the raw saliency map. According to the ground truth, \(\varTheta (*)\) are equal to one in the salient positions and are equal to zero in the non-salient positions. We determine the optimal smoothing scale \(\sigma \) according to the images in the database ASD [18]. We calculate the saliency map to each smoothing scale in [0.4, 10] at 0.1 step length, and then compute the average SCR of each scale about all images in ASD. The curve of SCR is shown in Fig. 3 with increasing of \(\sigma \). When \(\sigma =1.6\), the average SCR reaches the maximum. Thus, we select the best smoothing scale \(\sigma =1.6\).

Fig. 3.
figure 3

SCR is increased firstly and then reduced. When \(\sigma =1.6\), SCR reaches the maximum.

Fig. 4.
figure 4

Raw saliency maps shown in 2nd–6th columns. The final saliency map is shown in the red box. (Color figure online)

3.3 Optimized Amplitude and Final Saliency Map

After selecting the suitable thresholds in Eq. 8 and smoothing scale \(\sigma =1.6\), we can obtain five optimized log amplitude \(L_{S}(u,v;k)\) according to Eqs. 7 and 8. The raw saliency maps are calculated (Eq. 10) by the inverse Fourier Transform combining optimized amplitude and original phase spectrum. In Eq. 10, \(h(\sigma _{0})\) is low-pass filter, (\(\sigma _{0}\) = 5).

The entropy of image represents aggregation characteristic of the gray distribution. The smaller the entropy, the more concentrated the gray distribution. Thus, we can determine the final salient map by the entropy of image. We calculate the entropy of each raw saliency map, and the final saliency map is the raw saliency map with smallest entropy (Eq. 11).

$$\begin{aligned} S(x,y;k)=h(\sigma _{0})*\Vert F^{-1}[exp(L_{S}(u,v;k)+i\cdot P(u,v))]\Vert ^2 \end{aligned}$$
(10)
$$\begin{aligned} S_{f}\Leftarrow arg min(entropy(S(x,y;k))) \end{aligned}$$
(11)

In Fig. 4, the raw saliency maps are calculated by optimized amplitude spectrum and original phase spectrum according to different thresholds in Eq. 8, shown in 2nd–6th columns. The final saliency map is chosen by Eq. 11, and it is the raw saliency map with smallest entropy, shown in the red box in Fig. 4. Our model which computes the saliency map by using the optimized Amplitude spectrum by Hypercomplex Fourier Transform is referred as AHFT in this paper. The AHFT model is summarized in Algorithm 1.

figure a

4 Experimental Results

We tested AHFT on four open authoritative databases, including: ASD [18], MSRA10K [5], DUT-OMRON [19] and SED2 [1]. These four databases contain a sea of images and have manually marked Ground Truth. ASD database is the most cited database. MSRA10K database contains 10k images which are selected from MSRA database. The images in DUT-OMRON database have more complex background than the images in MSRA10K database. In SED2 database, images contain two or more salient objects. We compared the detection results among the following methods, including our model, the highly cited model of IT [4], the frequency domain models of SR [8], PFT [7], PQFT [7], DCT [9], QDCT [12], FT [18], WAVE [16], HFT [10], and the latest model of UHM [6].

Fig. 5.
figure 5

Comparison saliency maps between our model and other models.

Figure 5 shows the saliency maps. Most algorithms can only detect salient object boundary. The algorithms of FT [18] and WAVE [16] can highlight the entire salient region, but the brightness are not enough. UHM [6] can also enhance the entire salient region, but the background region cannot be suppressed effectively, thus the precision is poor. Only the proposed model can detect the whole salient object region and suppress the background region effectively.

Table 3. Comparison scores of AUC, \(F_{\beta }\) and MAE between our model and other models.

In order to evaluate the performance of our model comprehensively, we compare the detection results of various algorithms with three objective evaluation criteria AUC [11], \(F_{\beta }\) [5] and MAE [20]. AUC score is the area under the receiver operating characteristic (ROC) curve. The best AUC score is equal to 1. \(F_{\beta }\) is the comprehensive Precision and Recall evaluation criteria and is more objective to reflect the performance of saliency detection models. In Eq. 12, when \(\beta ^2 < 1\), Recall has a greater impact; when \(\beta ^2 > 1\), Precision has a greater impact. According to [5, 16], we also choose \(\beta ^2=0.3\) in our experiment. Mean absolute error (MAE) is a good measure of detection accuracy. The smaller of MAE, the better of detection results.

$$\begin{aligned} F_{\beta }=\frac{(1+\beta ^{2})\cdot Precision\cdot Recall}{\beta ^{2} \cdot Precision + Recall} \end{aligned}$$
(12)

The objective evaluation scores are shown in Table 3, and the bold numbers are the best values. Our method is obviously better than others in AUC. On the whole, our model outperforms the state-of-the-art models.

5 Conclusions

By analyzing amplitude spectrums of signal, we find that the repeated background in signal always corresponds to the peaks in log amplitude spectrum. Similarly, the salient object is also a repetitive pattern, which corresponds to the local peaks in log amplitude spectrum. If the entire log amplitude spectrum is smoothed, it will not only suppress the peaks but also suppress the local peaks in log amplitude spectrum. If only the peaks in log amplitude spectrum are smoothed, the background information is suppressed and the salient information is preserved. Obviously, the latter model achieves better detection result than the former model. Therefore, we propose a novel saliency model which combines the optimized amplitude spectrum by smoothing the peaks and the original phase spectrum to compute the saliency map. The proposed model can detect more complete saliency objects and obtain better detection results.