1 Introduction

With the development of exploring the ocean by unmanned underwater vehicles (UUVs), the recognition of underwater objects is known as a major issue. That is, how to acquire a clear underwater image is a question. In the past years, sonar has been widely used for the detection and recognition of objects in underwater environment. Because of acoustic imaging principle, the sonar images have the shortcomings of low signal to ratio, low resolution et al. Consequently, optical vision sensors must be used instead for short-range identification because of the low quality of images restored by sonar imaging [1].

In contrast to common photographs, underwater optical images suffer from poor visibility owing to the medium, which causes scattering, color distortion, and absorption. Large suspended particles cause scattering similar to the scattering of light in fog or turbid water that contain many suspended particles. Color distortion occurs because different wavelengths are attenuated to different degrees in water; consequently, images of ambient underwater environments are dominated by a bluish tone, because higher wavelengths are attenuated more quickly. Absorption of light in water substantially reduces its intensity. The random attenuation of light causes a hazy appearance as the light backscattered by water along the line of sight considerably degrades image contrast. In particular, objects at a distance of more than 10 m from the observation point are almost indistinguishable because colors are faded as characteristic wavelengths are filtered according to the distance traveled by light in water [2].

Many researchers have developed techniques to restore and enhance underwater images. Most of the recent approaches can enhance the image contrast, they have several drawbacks that reduce their practical applicability. First, the imaging equipment is difficult to use in practice (e.g., a range-gated laser imaging system, which is rarely applied in practice [3, 4]). Second, multiple input images are required [5] (e.g., different polarization images or different exposed images) for fusing a high-quality image. Third, the image processing approaches are not suitable for underwater images [68] as they ignore the imaging environment, in addition to being time consuming. Fourth, manual operation is needed in processing, which leads to lack of intelligence [9].

Instead of multiple input images, we focus on enhancement methods which use a single optical image. Fattal [9] estimated the scene radiance and derived the transmission image using a single image. However, this method cannot be used for sufficiently process images with heavy haze. It also needs manual operation, which limits the application scope. He et al. [7] analyzed abundant natural sky images, found that it contains a dark channel in most color images, and proposed a scene-depth-information-based dark channel prior dehazing algorithm. However, this algorithm requires significant computation time with a complexity of O(N2), and the processed images may have artificial halos in some cases. To overcome this disadvantage, He et al. also proposed a guided image filter [10], which used the foggy image as a reference image. However, this method leads to incomplete haze removal and does not meet the requirements for real-time processing. Ancuti et al. [11] compared Laplacian contrast, contrast, saliency, and exposedness features between a white-balanced image and color-corrected image. Then, they utilized the exposure fusion algorithm to obtain the final result. However, this method has two main disadvantages: images are obtained with dark corners and processing parameters are difficult to set, which is problematic because the exposure blending algorithm used is sensitive to the parameters set. In Serikawa’s work [12], we proposed a guided trigonometric filter to refine the depth map. The optimization algorithm can achieve better results with a peak signal-to-noise ratio (PSNR) improved by 1 dB compared to traditional methods. However, this method does not take the wavelength into account.

In an underwater environment, the captured images are significantly influenced by inherent optical properties (e.g., wavelength, scatter, and absorption). Inspired by Chiang’s work [13], in the present paper, we propose a novel shallow-ocean optical imaging model and a corresponding enhancement algorithm. We first estimate the depth map through dark channels. Second, considering the positions of the lighting lamp, camera, and imaging plane, we develop a rational imaging model. The effects of scattering are removed by using a guided weighted median filter (GWMF). Finally, color correction is performed by spectral properties. In our experiments conducted for verifying our proposed model and algorithm, we used a commercial RGB camera and natural underwater light. The performance of the proposed method is evaluated both analytically and experimentally.

2 Shallow Water Imaging Model

Artificial light and atmospheric light traveling through the water are the sources of illumination in a shallow ocean environment. The amount of radiation light W(x) formed after wavelength attenuation can be formulated according to the energy attenuation model as follows [13],

$$\begin{aligned} \begin{array}{l} E_{\lambda }^{W} (x)=E_{\lambda }^{A} (x)\cdot Nrer(\lambda )^{D(x)}+E_{\lambda }^{I} (x)\cdot Nrer(\lambda )^{L(x)}, \\ \quad \lambda \in \{r,g,b\} \\ \end{array} \end{aligned}$$
(1)

At the scene point x, the artificial light reflected again travels distance L(x) to the camera forming pixel \(I_{\lambda }(x)\), \(\lambda \in \{r,g,b\}\). D(x) is the scene depth underwater. The color distortion (absorption) and scattering are occurred in this process. We suppose the absorption and scattering rate is \(\rho (x)\), artificial light \(J_{\lambda }(x)\) emanated from point x is equal to the amount of illuminating ambient light \(E_{\lambda }^{\omega }(x)\) reflected,

$$\begin{aligned} \begin{array}{l} E_{\lambda }^{\omega } (x)=\left( {E_{\lambda }^{A}(x)\cdot Nrer(\lambda )^{D(x)}+E_{\lambda }^{I} (x)\cdot Nrer(\lambda )^{L(x)}} \right) \cdot \rho _{\lambda } (x), \\ \lambda \in \{r,g,b\} \end{array} \end{aligned}$$
(2)

By following the underwater dehazing model [14], the image \(I_{\lambda } (x)\) formed at the camera can be formulated as follows:

$$\begin{aligned} \begin{array}{l} I_{\lambda } (x)=\left[ {\left( {E_{\lambda }^{A} (x)\cdot Nrer(\lambda )^{D(x)}+E_{\lambda }^{I} (x)\cdot Nrer(\lambda )^{L(x)}} \right) \cdot \rho _{\lambda } (x)} \right] \cdot t_{\lambda } (x) \\ \quad \qquad +\left( {1-t_{\lambda } (x)} \right) \cdot B_{\lambda } ,\quad \lambda \in \{r,g,b\} \\ \end{array} \end{aligned}$$
(3)

where the background \(B_{\lambda }\) represents the part of the object reflected light \(J_{\lambda }\) and ambient light \(E_{\lambda }^{W}\) scattered toward the camera by particles in the water. The residual energy ratio \(t_{\lambda }(x)\) can be represented alternatively as the energy of a light beam with wavelength \(\lambda \) before and after traveling distance d(x) within the water \(E_{\lambda }^{residual}(x)\) and \(E_{\lambda }^{initial} (x)\), respectively, as follows:

$$\begin{aligned} t_{\lambda } (x)=\frac{E_{\lambda }^{residual} (x)}{E_{\lambda }^{initial} (x)}=10^{-\beta (\lambda )d(x)}=Nrer(\lambda )^{d(x)} \end{aligned}$$
(4)

where Nrer is the normalized residual energy ratio [12], in the Ocean Type I, it follows:

$$\begin{aligned} N_{rer} (\lambda )=\left\{ {{\begin{array}{*{20}l} {0.8\sim 0.85\quad if\;\lambda =650\sim 750~\mu m(red)} \\ {0.93\sim 0.97\quad if\;\lambda =490\sim 550~\mu m(green)} \\ {0.95\sim 0.99\quad if\;\lambda =400\sim 490~\mu m(blue)} \\ \end{array} }} \right. \end{aligned}$$
(5)

Consequently, subscribing the Eq. (3) and (4), we can obtain:

$$\begin{aligned} \begin{array}{l} I_{\lambda } (x)=\left[ {\left( {E_{\lambda }^{A} (x)\cdot Nrer(\lambda )^{D(x)}+E_{\lambda }^{I} (x)\cdot Nrer(\lambda )^{L(x)}} \right) \cdot \rho _{\lambda } (x)} \right] \\ \cdot ~Nrer (\lambda )^{d(x)} +\left( {1-Nrer(\lambda )^{d(x)}} \right) \cdot B_{\lambda } ,\quad \lambda \in \{r,g,b\} \\ \end{array} \end{aligned}$$
(6)

The above equation incorporates light scattering during the course of propagation from object to the camera d(x), and the wavelength attenuation along both the light-object path L(x), scene depth D(x) and object-camera path d(x). Once the light-object distance L(x), scene depth D(x) and object-camera distance d(x) is known, the final clean image will be recovered. Figure 1 shows the diagrammatic sketch of the proposed model.

Fig. 1.
figure 1figure 1

Diagram of shallow ocean optical imaging model

3 Underwater Scene Reconstruction

3.1 Camera-Object Distance Estimation

In [13], the author found the red color channel is the dark channel of underwater images. During our experiments, we found that the lowest channel of RGB channels in turbidly water is not always the red color channel; the blue color channel is also very significant. The reason is that we usually take the artificial light in imaging. Although the red wavelength absorbed easily through traveling in water, the distance between the camera and object is not enough to absorb the red wavelength significantly (See Fig. 2). The blue channel may be the lowest. Consequently, in this paper, we take the minimum pixel value as the rough depth map.

Fig. 2.
figure 2figure 2

RGB histogram of Underwater Image. (a) Underwater image. (b) RGB Histogram

As mentioned in Eq. (6), light \(J_{\lambda }(x)\) reflected from point x is

$$\begin{aligned} \begin{array}{l} J_{\lambda } (x)=\left( {E_{\lambda }^{A} (x)\cdot Nrer(\lambda )^{D(x)}+E_{\lambda }^{I} (x)\cdot Nrer(\lambda )^{L(x)}} \right) \cdot \rho _{\lambda } (x), \\ \quad \quad \lambda \in \{r,b\} \\ \end{array} \end{aligned}$$
(7)

We define the minimum pixel channel \(J_{dark}(x)\) for the underwater image \(J_{\lambda }(x)\) as

$$\begin{aligned} J_{dark}(x)=\min \limits _{\lambda } \min \limits _{y\in \varOmega (x)} J_{\lambda }(y), \lambda \in \{r,b\} \end{aligned}$$
(8)

If point x belongs to a part of the foreground object, the value of the minimum pixel channel is very small. Taking the min operation in the local patch \(\varOmega (x)\) on the hazy image \(I_{\lambda } (x)\) in Eq. (6), we have

$$\begin{aligned} \begin{array}{l} \min \limits _{y\in \varOmega (x)} \left( {I_{\lambda } (y)} \right) =\min \limits _{y\in \varOmega (x)} \left\{ {J_{\lambda } (y)\cdot Nrer(\lambda )^{d(y)}+\left( {1-Nrer(\lambda )^{d(y)}} \right) \cdot B_{\lambda } } \right\} , \\ \quad \quad \quad \quad \lambda \in \{r,b\} \\ \end{array} \end{aligned}$$
(9)

Since \(B_{\lambda }\) is the homogeneous background light and the residual energy ratio \(Nrer(\lambda )^{d(y)}\) on the small local patch \(\varOmega (x)\) surrounding point x is essentially a constant \(Nrer(\lambda )^{d(x)}\), the min value on the second term of Eq. (9) can be subsequently removed as

$$\begin{aligned} \begin{array}{l} \min \limits _{y\in \varOmega (x)} \left( {I_{\lambda } (y)} \right) =\min \limits _{y\in \varOmega (x)} \left\{ {J_{\lambda } (y)\cdot Nrer(\lambda )^{d(x)}+\left( {1-Nrer(\lambda )^{d(x)}} \right) \cdot B_{\lambda } } \right\} ,\\ \lambda \in \{r,b\} \\ \end{array} \end{aligned}$$
(10)

We rearrange the above equation and perform on more min operation among red color channel and blue channel as follows:

$$\begin{aligned} \begin{array}{l} \min \limits _{\lambda } \left\{ {\frac{\min _{y\in \varOmega (x)} \left( {I_{\lambda } (y)} \right) }{B_{\lambda } }} \right\} =\min \limits _{\lambda } \left\{ {\frac{\min \limits _{y\in \varOmega (x)} J_{\lambda } (y)}{B_{\lambda } }\cdot Nrer(\lambda )^{d(x)}} \right\} \\ +\min \limits _{\lambda } \left( {1-Nrer(\lambda )^{d(x)}} \right) ,\lambda \in \{r,b\} \\ \end{array} \end{aligned}$$
(11)

Therefore, the second term of the above equation is dark channel equal to 0. Consequently, the estimated depth map is

$$\begin{aligned} \min \limits _{\lambda } \left( {Nrer(\lambda )^{d(x)}} \right) =1-\min \limits _{\lambda } \left\{ {\frac{{\min }_{y\in \varOmega (x)} \left( {I_{\lambda } (y)} \right) }{B_{\lambda } }} \right\} ,\lambda \in \{r,b\} \end{aligned}$$
(12)

3.2 Depth Map Refinement by GWMF

In the above subsection, we roughly estimated the camera-object distance d(x). This distance depth contains mosaic effects and produces less accurately. Consequently, we need to use the proposed guided weighted median filter to reduce the mosaicking. In this section, we introduce our constant time algorithm for weighted guided median filter at first.

The traditional median filter has been considered as an effective way of removing “outliers”. The traditional median filter usually leads to morphological artifacts like rounding sharp corners. To address this problem, the weighted median filter [15] has been proposed. The weighted median filter is defined as

$$\begin{aligned} h(\mathbf {x},i)=\sum \limits _{\mathrm{\mathbf{y}}\in N(\mathrm{\mathbf{x}})} {W(\mathrm{\mathbf{x}},\mathrm{\mathbf{y}})\delta (V(\mathrm{\mathbf{y}})-i)} \end{aligned}$$
(13)

where W(x, y) corresponds to the weight assigned to a pixel y inside a local region centered at pixel x, the weight W(x, y) depends on the image d that can be different from V. N(x) is a local window near pixel x. i is the discrete bin index, and \(\delta \) is the Kronecker delta function, \(\delta \) is 1 when the argument is 0, and is 0 otherwise.

Then the compute the refined depth map by guided weighted median filter is defined as:

$$\begin{aligned} I_{x}^{WG} =\frac{\sum \nolimits _{y\in N(x)} {f_{S} (x,y)f_{R} (I_{x} ,I_{y} )I_{y} W_{y} } }{\sum \nolimits _{y\in N(x)} {f_{S} (x,y)f_{R} (I_{x} ,I_{y} )W_{y} } } \end{aligned}$$
(14)

where y is a pixel in the neighborhood N(x) of pixel x. Note that kernels other than Gaussian kernels are not excluded.

$$\begin{aligned} f_{S} (x,y)=\upsilon (x-y)=\textstyle {1 \over 2}e^{-\textstyle {{(x-y)(x-y)} \over {2\sigma _{D}^{2} }}} \end{aligned}$$
(15)

where x and y denote pixel spatial positions. The spatial scale is set by \(\sigma _{D}\). The range filter weights pixels based on the photometric difference,

$$\begin{aligned} f_{R} (I_{x} ,I_{y} )=w(f(x)-f(y))=\frac{1}{2}e^{-\textstyle {{(f(x)-f(y))(f(x)-f(y))} \over {2\sigma _{R}^{2} }}} \end{aligned}$$
(16)

where f(•) is image tonal values. The degree of tonal filter is set by \(\sigma _{R}\). \(W_{y}\) is the weight map, which is defined as:

$$\begin{aligned} W_{y} =\sum \limits _{y\in N(x)} {f_{s} (y,q)f_{R} (y,q)e^{-(\vert \vert I_{y} -I_{q} \vert \vert _{2} )/2\sigma _{R} }} \end{aligned}$$
(17)

where q is the coordinate of support pixel centered around pixel y. The final refined depth map is produced by:

$$\begin{aligned} h(\tilde{{d}}(x),i)=\sum \limits _{\mathrm{\mathbf{y}}\in N(\mathrm{\mathbf{x}})} {I_{x}^{WG} (d(x),x)\delta (V(x)-i)} \end{aligned}$$
(18)

This filters images, preserving edges and filters noise based on a dimensionality reduction strategy, having high quality results, while achieving significant speedups over existing techniques, such as bilateral filter [10], guided filter [14], trilateral filter [16] and weighted bilateral median filter [15]. The refined depth image is shown in Fig. 3.

Fig. 3.
figure 3figure 3

Depth map refinement by weighted normalized convolution domain filter. (a) Input course depth image. (b) Refined depth image.

3.3 De-scattering

From above subsection, we obtained the refined depth map d(x). In order to remove the scatter, we also need to solve the reflectivity \(\rho _{\lambda }~ (x)\). We take the least squares solution for achieving this by

$$\begin{aligned} \begin{array}{l} \rho _{\lambda } (x)=\left( {J_{\lambda } (x)^{T}\cdot J_{\lambda } (x)} \right) ^{-1}\cdot J_{\lambda } (x)^{T} \\ \;\;\;\;\cdot ~\left( {E_{\lambda }^{A} (x)\cdot Nrer(\lambda )^{D(x)}+E_{\lambda }^{I} (x)\cdot Nrer(\lambda )^{L(x)}} \right) , \quad \lambda \in \{r,g,b\} \\ \end{array} \end{aligned}$$
(19)

After removing the artificial light, the Eq. (6) can be written as

$$\begin{aligned} \begin{array}{l} I_{\lambda }(x)=E_{\lambda }^{A}(x)\cdot Nrer(\lambda )^{D(x)}\cdot \rho _{\lambda }(x)\cdot Nrer(\lambda )^{d(x)} \\ \quad +\left( {1-Nrer(\lambda )^{d(x)}} \right) \cdot B_{\lambda },\quad \lambda \in \{r,g,b\} \\ \end{array} \end{aligned}$$
(20)

According to dehazing model, we can obtain the descattered image by

$$\begin{aligned} \begin{array}{l} \tilde{{J}}_{\lambda } (x)=\frac{I_{\lambda } (x)-\left( {1-Nrer(\lambda )^{d(x)}} \right) \cdot B_{\lambda } }{Nrer(\lambda )^{d(x)}} \\ \quad \quad \;=E_{\lambda }^{A} (x)\cdot Nrer(\lambda )^{D(x)}\cdot \rho _{\lambda } (x)\cdot Nrer(\lambda )^{d(x)}, \quad \lambda \in \{r,g,b\} \\ \end{array} \end{aligned}$$
(21)

In this paper, we assume the light for imaging is uniform. Consequently, we need not to correct the vignetting effects here (Fig. 4).

Fig. 4.
figure 4figure 4

De-scattered result. (a) Input image. (b) De-scattered image.

3.4 Color Correction

In [13], the author simply corrected the scene color by the attenuation of water depth. However, in practice, the spectral response function of a camera maps the relative sensitivity of the camera imaging system as a function of the wavelength of the light. We take the chromatic transfer function \(\tau \) for weighting the light from the surface to a given depth of objects as

$$\begin{aligned} \tau _{\lambda } =\frac{E_{\lambda }^{surface} }{E_{\lambda }^{object} } \end{aligned}$$
(22)

where the transfer function \(\tau \) at wavelength \(\lambda \) is derived from the irradiance of surface \(E_{\lambda }^{surface}\) by the irradiance of the object \(E_{\lambda }^{object}\). Based on the spectral response of RGB camera, we convert the transfer function to RGB domain:

$$\begin{aligned} \tau _{RGB} =\sum \nolimits ^k {\tau _{\lambda } \cdot C_{c} (\lambda )} \end{aligned}$$
(23)

where the weighted RGB transfer function is \(\tau _{RGB}\), \(C_{c}(\lambda )\) is the underwater spectral characteristic function for color band c, \(c\in \{r,g,b\}\). k is the number of discrete bands of the camera spectral characteristic function.

Finally, the corrected image as gathered from the weighted RGB transfer function by

$$\begin{aligned} J_{\lambda } (x)=\hat{{J}}_{\lambda } (x)\cdot \tau _{RGB} \end{aligned}$$
(24)

where \(J_{\lambda } (x)\) and \(\hat{{J}}_{\lambda }(x)\) are the color corrected and uncorrected images respectively. Figure 6 shows the color corrected result.

4 Experiments and Discussions

The performance of the proposed algorithm is evaluated both objectively and subjectively, utilizing ground-truth color patches. We also compare the proposed method with the state-of-the-art methods. Both results demonstrate superior haze removal and color balancing capabilities of the proposed method over the others.

Fig. 5.
figure 5figure 5

Color correction result. (a) Input image. (b) Color corrected image

In simulation experiment, Fig. 5 shows the results, and Table 1 shows the quantitative analysis results. In the simulation, we take OLYMPUS Tough TG-2 underwater camera, the water depth D(x) is 0.3 m, camera-object distance d(x) is 0.8 m, light-object distance L(x) is 0.5 m. Firstly, we take the shallow scene in clean water. Then, we captured the noisy image by adding some turbid liquid in the tank. The computer used is equipped with Windows XP and an Intel Core 2 (2.0 GHz) with 2 GB RAM. The size of the images is \(416\times 512\) pixels.

Fig. 6.
figure 6figure 6

Simulation results by different algorithms. (a) Noise-free image. (b) Noisy image. (c) Ancuti’s result. (d) Bazeille’s result. (e) Chiang’s result. (f) Fattal’s result. (g) He’s result. (h) Nicholas’s result. (i) Our result.

Bazeille’s method simply used image processing technologies, which ignored the physical model of underwater, distorted the image seriously. While Fattal’s approach performs well, however, it needed manually operation for determine the background and objects. The algorithms proposed by Nicholas and He are very time consuming, with the computation complex over \(O(N^{2})\). Ancuti et al. took the high dynamic range imaging ideas for underwater enhancement. The enhanced image relays on the pre-processed white balance image and color corrected image, which may be based on a wrong assumption. Chiang et al. firstly recommend the effects of wavelength is highly influence the underwater images. However, the Laplacian matting for depth map refinement is time consuming, and also neglected the fact that color distortion is corrected to the scene depth, camera spectral properties, and inherent optical properties. The processing time of our method is 15.4 ms, and the result also superior than the others.

In addition to the visual analysis mentioned above, we conducted quantitative analysis, mainly from the perspective of mathematical statistics and the statistical parameters for the images (see Table 1). This analysis includes PSNR, and SSIM. PSNR means the peak signal to noise ratio (values are over 0, the higher the best), and SSIM is named as structural similarity (values are between 0 (worst) to 1 (best)). Table 1 displays the values that have been filtered by applying MSE,PSNR and SSIM measured on several images. The results indicate that our approach works well for haze removal.

Table 1. Comparative analysis of different underwater image enhancement methods

5 Conclusions

In this paper, we have explored and successfully implemented novel image enhancement methods for underwater optical image enhancement. We proposed a simple prior based on the difference in attenuation among the different color channels, which inspired us to estimate the transmission depth map correctly. Another contribution compensated the transmission by guided weighted filter, which has the benefits of edge-preserving, noise removing, and a reduction in the computation time. Moreover, the proposed spectral-based underwater image color correction method successfully recover the underwater distorted images. Furthermore, the proposed method had solved the limitation of the influence of possible artificial light sources. Abundant experiments present the proposed method is suitable for underwater imaging, and solve the major problem of underwater optical imaging.