Abstract
Saliency detection is prerequisite for many computer vision tasks. The existing frequency domain models can not always detect a complete object. We propose a novel salient object detection model based on an optimized amplitude spectrum. This model computes saliency map in two steps. Firstly, we optimize amplitude spectrum by smoothing the peaks in log amplitude spectrum. The raw saliency maps are computed by combining the optimized amplitude spectrum and the original phase spectrum according to different thresholds. Secondly, we compute the entropy of raw saliency maps and select the raw saliency map with the smallest value of entropy as the final saliency map. Our model detects more complete object region. By testing on the databases ASD, MSRA10K, DUT-OMRON and SED2, experiments demonstrate that the proposed model outperforms the state-of-the-art models.
1 Introduction
Visual saliency has already got extensive studied by computer vision researchers and cognitive scientists. Saliency detection is a very important step to solve visual tasks, such as image segmentation [1], visual tracking [2], image and video compression [3]. Existing saliency detection models can be divided into spatial domain [4,5,6] and frequency domain [7,8,9,10,11,12] according to different computing domains.
In the frequency domain, the information of image is reflected in amplitude spectrum and phase spectrum. The existing frequency domain saliency detection models compute saliency map by using the information of amplitude spectrum or phase spectrum or combination of both. According to the different methods of using the information of frequency domain, we divide existing frequency domain saliency detection models into four groups: (i) the model uses original phase spectrum [7]; (ii) the model uses optimized amplitude spectrum and original phase spectrum [8, 10, 12, 13]; (iii) the model uses optimized amplitude spectrum and optimized phase spectrum [11, 14, 15]; (iv) the model uses wavelet transform [16, 17]. We propose model which can detect salient object by using optimized amplitude spectrum and original phase spectrum, so the proposed model belongs to the third group.
Hou and Zhang [8] introduced the frequency domain calculation into saliency detection. They combined spectral residual (SR) and phase spectrum to compute saliency map. In the first group, Guo et al. [7] found that only using phase spectrum from Fourier Transform (PFT) can achieve similar detection results to SR. On the basis of PFT, Guo extended the algorithm PFT to saliency detection of color image, and used phase spectrum from quaternion Fourier Transform (PQFT) to improve the overall performance of saliency map. In the second group, Li et al. [10] inspired by [7] and proposed the saliency model about Hypercomplex Fourier Transform (HFT), and the detection results were better. In the third group, Li et al. [15] proposed Hypercomplex Spectral Contrast (HSC), which utilized amplitude spectral contrast and phase spectral contrast to calculate saliency map and enhanced detection accuracy by average of multiscale saliency map. Li et al. [11] computed saliency map by combination of effective amplitude spectrum and phase spectrum, which were obtained by designed amplitude spectrum filters and phase spectrum filters, respectively. In the fourth group, Nevrez [16] proposed a saliency detection model, and the final saliency map was fusion of local saliency map and global saliency map both calculated by wavelet transform.
In the above algorithms, some algorithms directly abandon the amplitude spectrum, and others ignore the effect of different amplitudes to the salient object. In our work, we find that smoothing the peaks (above our selected threshold) in log amplitude spectrum can obtain better detection results. Thus, we propose a new saliency detection model which is based on the optimized amplitude spectrum by smoothing the peaks in log amplitude spectrum. Compared with the state-of-the-act models, the proposed model makes full use of the amplitude spectral information and detects a more complete object region.
2 Amplitude Spectrum Analysis
In the frequency domain, the peaks in log amplitude correspond to the repetitive patterns [10]. In this paper, we define the peaks as the amplitude higher than the selected threshold. The log amplitude spectrum is smoothed to suppress the repetitive background regions, which filter out non-salient regions. But, many salient regions are also repetitive patterns which correspond to some local peaks in log amplitude spectrum. These local peaks are also suppressed when smoothing the entire log amplitude spectrum, so the information of corresponding salient regions is also naturally filtered out. If we only smooth the peaks in log amplitude spectrum and don’t change the original local peaks, which will keep more salient information and have a better detection results. Therefore, we propose a novel model which smooth the peaks in log amplitude spectrum, and the saliency map is obtained by inverse transform combining smoothed amplitude spectrum and original phase spectrum.
To demonstrate the effectiveness of our method, we construct one dimensional signals \(f_{1}(t)\) and \(f_{2}(t)\). The signal \(f_{1}(t)\) is a periodic signal, called the first signal. And the signal \(f_{2}(t)\) is generated by the frequency doubling of the 701–800 sampling segment in \(f_{1}(t)\), called second signal. In Fig. 1, row 1st and 2nd show the original waveforms and the corresponding log amplitude spectrum of the first signal. Row 3rd and 4th show the original waveforms and the corresponding log amplitude spectrum of \(f_{2}(t)\). In this paper, the high frequency part in \(f_{2}(t)\) is called salient segment (701–800 sampling segment), and the low frequency part is called non-salient segment.
According to [10], the higher amplitude in amplitude spectrum correspond to the repetitive patterns. So the three peaks of row 2nd in Fig. 1 are caused by repetitive patterns in \(f_{1}(t)\). In fact, the three peaks of row 4th in Fig. 1 are also caused by the repetitive non-salient segment and the two local peaks (in red boxes) are caused by the repetitive salient segment in \(f_{2}(t)\). If the entire log amplitude spectrum in row 2nd is smoothed, the three peaks are suppressed, so as the two local peaks are also suppressed, which weaken the non-salient segment and also weaken the salient segment. If only the peaks in log amplitude spectrum are smoothed, the obtained results only weaken the non-salient segment and get better saliency map. So we propose the model that only needs to smooth the peaks in log amplitude spectrum and only weakens the non-salient segment, and the proposed model obtains better saliency detection results.
The detection results of the models smoothing the entire log amplitude and the peaks are shown in Fig. 2. The detected non-salient segment has a lower amplitude (red dotted boxes) and the detected salient segment has a higher amplitude (red solid boxes) by the proposed model. Thus, the proposed model can obtain better detection results.
3 The Methodology
According to the analysis in Sect. 2, the peaks in log amplitude spectrum are caused by background regions and the local peaks are caused by salient regions. Thus, we propose the saliency detection model which is based on the optimized amplitude spectrum and the original phase spectrum. Firstly, we optimize the amplitude spectrum by smoothing the peaks in log amplitude spectrum according to different thresholds. Then, we obtain raw saliency maps by the inverse transform combining the optimized amplitude spectrum and original phase spectrum. Lastly, we select the raw saliency map with smallest entropy as the final saliency map.
3.1 Smoothing the Peaks in Log Amplitude Spectrum
In our model, we use the hypercomplex matrix to represent the color image. Equation 1 is the structure of hypercomplex matrix. The hypercomplex matrix Fourier transform and the inverse transformation are shown as Eqs. 2 and 3, respectively. The nature of quaternion refers to paper [7].
RGB color system is the most commonly used color system, but it is device-dependent color system. Lab color system is based on physiological characteristics, and has a larger color space than RGB color system. L channel represents brightness, a channel represents the range from red to green and b channel represents the range from yellow to blue. Lab color system is more in line with human visual perception system. Thus, we use Lab color system to represent images in our work. Equation 4 is a pure imaginary matrix, where L, a and b are three imaginary parts. Equation 2 can be rewritten after Fourier Transform, see Eq. 5. \(A(u,v)=\Vert F(u,v)\Vert \) and \(p(u,v)=angle(F(u,v))\) are amplitude spectrum and phase spectrum, respectively. \(L(u,v)=log(A(u,v))\) is the log amplitude spectrum.
The original amplitude spectrum has a large spectral drop. When optimizing amplitude spectrum, therefore, we choose to smooth the peaks in log amplitude spectrum rather than original amplitude spectrum. How to determine the peaks in log amplitude spectrum? We choose a suitable threshold. The peaks are the amplitude over than the threshold. We use matrix \(\varGamma (u,v)\) to record the positions of peaks in the amplitude spectrum in Eq. 6. \(L_{S}(u,v)\) is the log amplitude spectrum after smoothing the peaks and \(g(\sigma )\) is a Gaussian kernel (Eq. 7).
3.2 Choose the Thresholds and Smoothing Scale
Choose the appropriate thresholds. In our work, the threshold is an important parameter. Due to the diversity of images, it is difficult to find a suitable threshold for all images, so we decide to choose a number of thresholds. We have done a statistic of the images in the database ASD [18], after normalized the size of images to \(128*128\). In Table 1, The num1 is the number of mean peaks higher than the corresponding threshold. The \(key=mean(A(u,v))\) is the average amplitude of the amplitude spectrum.
If the number of peaks is less than one, smoothing the peaks in log amplitude spectrum is similar to what has nothing changed in the amplitude spectrum. If the number of peaks is too large, smoothing the peaks in log amplitude spectrum is similar to smoothing the entire log amplitude spectrum. Thus, when the number of peaks is too much nor too little, the detection results will be poor. In Table 1, when threshold is equal to key or 1.25key, there are too many peaks; when threshold is equal to 2.75key, the number of peaks is too small. And they are not suitable as thresholds. Finally, we choose the best threshold from experimental result in Eq. 8. And we assume that the optimal threshold for determining the peaks must appear in Eq. 8.
According to the eight thresholds in Table 1, we count the number which each threshold is selected as the optimal threshold, in the database ASD [18]. We obtain eight raw saliency maps by eight thresholds, and then we calculate the entropy of each raw saliency map. The threshold of the salient map with smallest entropy is the optimal threshold of the image. The statistical results are shown in Table 2, and num2 is the number of images which the corresponding threshold is the optimal threshold. The probability of the optimal threshold appears within our chosen best threshold (Eq. 8) is \((78+171+229+286+142)/1000=90.6\%\). So the chosen best thresholds are reasonable.
Choose the appropriate smoothing scale. In saliency map, the brighter the salient regions, the darker the non-salient regions, the better of detection results. So we define the Salient Contrast Ratio (SCR) as measure the detection effect in Eq. 9. The greater of SCR, the better of detection results.
where, S is the raw saliency map. According to the ground truth, \(\varTheta (*)\) are equal to one in the salient positions and are equal to zero in the non-salient positions. We determine the optimal smoothing scale \(\sigma \) according to the images in the database ASD [18]. We calculate the saliency map to each smoothing scale in [0.4, 10] at 0.1 step length, and then compute the average SCR of each scale about all images in ASD. The curve of SCR is shown in Fig. 3 with increasing of \(\sigma \). When \(\sigma =1.6\), the average SCR reaches the maximum. Thus, we select the best smoothing scale \(\sigma =1.6\).
3.3 Optimized Amplitude and Final Saliency Map
After selecting the suitable thresholds in Eq. 8 and smoothing scale \(\sigma =1.6\), we can obtain five optimized log amplitude \(L_{S}(u,v;k)\) according to Eqs. 7 and 8. The raw saliency maps are calculated (Eq. 10) by the inverse Fourier Transform combining optimized amplitude and original phase spectrum. In Eq. 10, \(h(\sigma _{0})\) is low-pass filter, (\(\sigma _{0}\) = 5).
The entropy of image represents aggregation characteristic of the gray distribution. The smaller the entropy, the more concentrated the gray distribution. Thus, we can determine the final salient map by the entropy of image. We calculate the entropy of each raw saliency map, and the final saliency map is the raw saliency map with smallest entropy (Eq. 11).
In Fig. 4, the raw saliency maps are calculated by optimized amplitude spectrum and original phase spectrum according to different thresholds in Eq. 8, shown in 2nd–6th columns. The final saliency map is chosen by Eq. 11, and it is the raw saliency map with smallest entropy, shown in the red box in Fig. 4. Our model which computes the saliency map by using the optimized Amplitude spectrum by Hypercomplex Fourier Transform is referred as AHFT in this paper. The AHFT model is summarized in Algorithm 1.
4 Experimental Results
We tested AHFT on four open authoritative databases, including: ASD [18], MSRA10K [5], DUT-OMRON [19] and SED2 [1]. These four databases contain a sea of images and have manually marked Ground Truth. ASD database is the most cited database. MSRA10K database contains 10k images which are selected from MSRA database. The images in DUT-OMRON database have more complex background than the images in MSRA10K database. In SED2 database, images contain two or more salient objects. We compared the detection results among the following methods, including our model, the highly cited model of IT [4], the frequency domain models of SR [8], PFT [7], PQFT [7], DCT [9], QDCT [12], FT [18], WAVE [16], HFT [10], and the latest model of UHM [6].
Figure 5 shows the saliency maps. Most algorithms can only detect salient object boundary. The algorithms of FT [18] and WAVE [16] can highlight the entire salient region, but the brightness are not enough. UHM [6] can also enhance the entire salient region, but the background region cannot be suppressed effectively, thus the precision is poor. Only the proposed model can detect the whole salient object region and suppress the background region effectively.
In order to evaluate the performance of our model comprehensively, we compare the detection results of various algorithms with three objective evaluation criteria AUC [11], \(F_{\beta }\) [5] and MAE [20]. AUC score is the area under the receiver operating characteristic (ROC) curve. The best AUC score is equal to 1. \(F_{\beta }\) is the comprehensive Precision and Recall evaluation criteria and is more objective to reflect the performance of saliency detection models. In Eq. 12, when \(\beta ^2 < 1\), Recall has a greater impact; when \(\beta ^2 > 1\), Precision has a greater impact. According to [5, 16], we also choose \(\beta ^2=0.3\) in our experiment. Mean absolute error (MAE) is a good measure of detection accuracy. The smaller of MAE, the better of detection results.
The objective evaluation scores are shown in Table 3, and the bold numbers are the best values. Our method is obviously better than others in AUC. On the whole, our model outperforms the state-of-the-art models.
5 Conclusions
By analyzing amplitude spectrums of signal, we find that the repeated background in signal always corresponds to the peaks in log amplitude spectrum. Similarly, the salient object is also a repetitive pattern, which corresponds to the local peaks in log amplitude spectrum. If the entire log amplitude spectrum is smoothed, it will not only suppress the peaks but also suppress the local peaks in log amplitude spectrum. If only the peaks in log amplitude spectrum are smoothed, the background information is suppressed and the salient information is preserved. Obviously, the latter model achieves better detection result than the former model. Therefore, we propose a novel saliency model which combines the optimized amplitude spectrum by smoothing the peaks and the original phase spectrum to compute the saliency map. The proposed model can detect more complete saliency objects and obtain better detection results.
References
Alpert, S., Galun, M., Brandt, A., Basri, R.: Image segmentation by probabilistic bottom-up aggregation and cue integration. TPAMI 34(2), 315–327 (2012)
Borji, A., Frintrop, S., Sihite, D.N., Itti, L.: Adaptive object tracking by learning background context. In: CVPR, pp. 23–30 (2012)
Xue, J.R., Li, C., Zheng, N.N.: Proto-object based rate control for JPEG2000: an approach to content-based scalability. TIP 20(4), 1177–1184 (2011)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. TPAMI 20(11), 1254–1259 (1998)
Cheng, M.M., Mitra, N.J., Huang, X.L., Philip, H.S., Hu, S.M.: Global contrast based salient region detection. TPAMI 37(3), 569–582 (2015)
Tavakoli, H.R., Laaksonen, J.: Bottom-up fixation prediction using unsupervised hierarchical models. In: Chen, C.-S., Lu, J., Ma, K.-K. (eds.) ACCV 2016. LNCS, vol. 10116, pp. 287–302. Springer, Cham (2017). doi:10.1007/978-3-319-54407-6_19
Guo, C.L., Ma, Q., Zhang, L.M.: Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In: CVPR, pp. 1–8 (2008)
Hou, X.D., Zhang, L.Q.: Saliency detection: a spectral residual approach. In: CVPR, pp. 1–8 (2007)
Hou, X.D., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. TPAMI 34(1), 194–201 (2012)
Li, J., Levine, M.D., An, X.J., Xu, X., He, H.E.: Visual saliency based on scalespace analysis in the frequency domain. TPAMI 35(4), 996–1010 (2013)
Li, J., Duan, L.Y., Chen, X., Huang, T., Tian, Y.: Finding the secret of image saliency in the frequency domain. TPAMI 37(12), 2428–2440 (2015)
Schauerte, B., Stiefelhagen, R.: Predicting human gaze using quaternion DCT image signature saliency and face detection. In: WACV, pp. 137–144 (2012)
Fang, Y.M., Lin, W.S., Lee, B.S., Lau, C.T., Chen, Z.Z., Lin, C.W.: Bottom-up saliency detection model based on human visual sensitivity and amplitude spectrum. Trans. Multimedia 14(1), 187–198 (2012)
Li, C., Xue, J.R., Tian, Z.Q., Li, L., Zheng, N.N.: Saliency detection based on biological plausibility of hypercomplex Fourier spectrum contrast. Opt. Lett. 37(17), 3609–3611 (2012)
Li, C., Xue, J.R., Zheng, N.N., Lan, X.G., Tian, Z.Q.: Spatio-temporal saliency perception via hypercomplex frequency spectral contrast. Sensors 13(3), 3409–3431 (2013)
Imamoglu, N., Lin, W.S., Fang, Y.M.: A saliency detection model using low-level features based on wavelet transform. Trans. Multimedia 15(1), 96–105 (2013)
Murray, N., Vanrell, M., Otazu, X., Parraga, C.A.: Low-level spatiochromatic grouping for saliency estimation. TPAMI 35(11), 2810–2816 (2013)
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: CVPR, pp. 1597–1604 (2009)
Yang, C., Zhang, L.H., Lu, H.C., Ruan, X., Yang, M.H.: Saliency detection via graph based manifold ranking. In: CVPR, pp. 3166–3173 (2013)
Zhu, W.J., Liang, S., Wei, Y.C., Sun, J.: Saliency optimization from robust background detection. In: CVPR, pp. 2814–2821 (2014)
Acknowledgments
The paper was supported in part by the National Natural Science Foundation (NSFC) of China under Grant No. 61365003 and Gansu Province Basic Research Innovation Group Project No. 1506RJIA031.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Li, C., Wan, Y., Liu, H. (2017). Salient Object Detection Based on Amplitude Spectrum Optimization. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-70090-8_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70089-2
Online ISBN: 978-3-319-70090-8
eBook Packages: Computer ScienceComputer Science (R0)