# Image inpainting based on sparse representations with a perceptual metric

- 4.1k Downloads
- 8 Citations

## Abstract

This paper presents an image inpainting method based on sparse representations optimized with respect to a perceptual metric. In the proposed method, the structural similarity (SSIM) index is utilized as a criterion to optimize the representation performance of image data. Specifically, the proposed method enables the formulation of two important procedures in the sparse representation problem, 'estimation of sparse representation coefficients’ and 'update of the dictionary’, based on the SSIM index. Then, using the generated dictionary, approximation of target patches including missing areas via the SSIM-based sparse representation becomes feasible. Consequently, image inpainting for which procedures are totally derived from the SSIM index is realized. Experimental results show that the proposed method enables successful inpainting of missing areas.

## Keywords

Mean Square Error Sparse Representation Optimal Vector Target Image Image Inpainting## 1 Introduction

In the field of image processing, there exist many studies on image restoration/enhancement such as image denoising[1, 2, 3], image deblurring[4, 5], and image inpainting[6]. Furthermore, it is well known that the performance of these studies has been rapidly improved in recent years[1, 2, 4]. Missing area reconstruction is one of the most attractive topics for study in the field of image restoration since it has a number of applications. Unnecessary object removal, missing block reconstruction in an error-prone environment in wireless communication, and restoration of corrupted old films are representative applications. Since missing area reconstruction can be used in many applications, it has various names including inpainting, image completion, error concealment, and blotch and scratch removal. In this paper, we use 'inpainting’ since this is one of the most common names in this research field.

Many inpainting methods for the above applications have been proposed[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]. Most methods are broadly classified into two categories: missing structure reconstruction[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] and missing texture reconstruction[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]. In addition, there have been proposed several inpainting methods which adopt the combined use of the structure and texture reconstruction approaches[20, 42].

Variational image inpainting methods which aim at successful structure component reconstruction have traditionally been studied. Variational image inpainting is performed based on the continuity of the geometrical structure of images. Most variational inpainting methods solve partial differential equations (PDEs). One of the pioneering works was proposed by Masnou et al.[7]. Furthermore, Bertalmio et al. proposed a representative image inpainting technique which is based on PDEs. Not only the above methods but also several improved methods have recently been proposed[12, 13, 14, 15]. Although these variational image inpainting methods enable successful reconstruction of the structure components, images also include other different important components, i.e., texture components, and alternative methods tend to output better results. The remainder of this paper focuses on the reconstruction of textures with discussion of its details.

Results of pioneering work based on texture synthesis were reported by Efros et al.[21]. Their method is based on the Markov random field model, and inpainting is realized by copying known pixels within a target image. It is well known that successful inpainting of pure texture images can be realized using their method. In recent years, their ideas have been improved by many researchers[22, 23, 24, 25, 26, 27, 28, 29, 30].

Drori et al.[23] and Criminisi et al.[24] developed more accurate inpainting techniques. Drori et al. proposed a fragment-based image completion algorithm that can preserve not only textures but also structures within target images. Criminisi et al. proposed an exemplar-based inpainting method, and it became a benchmarking method in this study field. Their method adopts a patch-based greedy sampling algorithm, and faster and simpler inpainting becomes feasible. Recently, many improved versions of the above exemplar-based inpainting method[25, 26, 27, 28, 29] have intensively been proposed. Specifically, Meur et al. proposed multiresolution analysis-based inpainting approaches using the exemplar-based method[28, 29]. Kwok et al. proposed a much faster inpainting method in which useful schemes for calculating patch similarities in exemplar-based inpainting were introduced[30]. They also reported that their method provided better results than those of the previously reported methods in some cases.

The above existing methods based on texture synthesis and exemplar-based inpainting generally copy pixel values to missing areas directly. Thus, if target images contain uniform and simple textures, the methods can perform accurate inpainting. However, if the above conditions are not satisfied, it becomes difficult to approximate missing textures by only the best matched examples. Therefore, many inpainting methods that approximate patches including missing areas using subspaces generated from known areas within target images have been proposed. In these methods, target patches are generally represented by linear combinations of bases that span the obtained subspaces. The performance of inpainting therefore depends on the generated subspaces and linear coefficients for calculating the linear combination. Amano et al. proposed a principal component analysis (PCA)-based missing area inpainting method using back projection for lost pixels[31]. They utilized an eigenspace that enabled derivation of inverse projection for the inpainting. Several inpainting methods in which kernel methods are introduced into PCA-based subspace construction have also been proposed[32, 33, 34, 35]. Based on nonlinear eigenspaces, successful representation of image data becomes feasible, i.e., the methods are suitable for approximating nonlinear structures in images.

Recently, sparse representation for image inpainting has been intensively studied. Sparse representation enables adaptive selection of optimal bases suitable for approximating target images[36, 37]. This means subspaces utilized for the inpainting can be adaptively provided. Therefore, several inpainting methods using sparse representation have been proposed[38, 39, 40, 41, 42]. Furthermore, Xu et al. have shown the effective use of sparse representation for realizing image inpainting[41]. Specifically, in their method, new modeling of patch priority and patch representation, which are two crucial steps for patch propagation in an exemplar-based inpainting approach, based on sparsity is adopted. In similar ideas, several inpainting methods based on neighbor embedding approaches are proposed[43, 44]. These methods are derived from the aspect of the manifold learning and provide good results. Furthermore, inpainting methods based on rank minimization have also been proposed[45].

The above-described existing methods are based on least squares approximation for inpainting. This means that inpainting minimizing the mean square error (MSE) of intensities, which is the most popular metric, is performed. However, several works[46, 47] show that MSE optimal algorithms cannot provide high visual quality. Thus, it may not be appropriate to use MSE as a quality measure for the inpainting. It should be noted that using kernel PCA (KPCA)[32, 33], methods such as those shown in[34] and[35] try to approximate nonlinear image features. These methods perform least squares approximation in high-dimensional nonlinear feature spaces, and it has been reported that improvement in performance was achieved in some cases.

Recently, image quality assessment has become popular in overcoming the problem of MSE and its variants. Criteria such as noise quality measure[48], information fidelity criterion[49], and visual information fidelity[50] are well known as perceptual distortion measures, and their performances have been evaluated in detail[51]. The structural similarity (SSIM) index[52] is utilized as one of the most representative quality measures in many fields of image processing. Since its formulation is simple and easy to analyze, the SSIM index can be applied to not only image quality assessment but also design of linear equalizers[53]. Therefore, successful inpainting based on this quality measure can be expected.

In this paper, we present an inpainting method based on sparse representations optimized with respect to a perceptual metric. In order to perform inpainting using sparse representation, the SSIM index is used for a criterion to optimize the representation performance.

Specifically, the proposed method introduces the SSIM-based criterion into two important procedures in the sparse representation problem, i.e., 'estimation of the sparse representation coefficients’ and 'update of the dictionary’. This is the biggest difference between the proposed method and existing methods. Then, by deriving the sparse representation of target patches including missing areas based on the generated dictionary, inpainting based on the SSIM index is realized. Note that in the above approach, since optimization problems maximizing the SSIM index are nonconvex, the computation scheme in[53] is adopted, and nonconvex optimization problems are reformulated as quasi-convex problems. In the proposed method, the optimal subspace can be adaptively provided for each target patch using sparse representation. Furthermore, since the SSIM index, which is a better perceptual criterion than the traditional MSE and its variants, is used, successful inpainting can be expected.

A similar approach has also been proposed by Rehman et al. for realizing noise removal and super-resolution[54]. On the other hand, we present a new scheme for realizing inpainting in this paper, and the target application is different from those in[54]. Basically, in our method, the algorithms for estimation of sparse representation coefficients and generation of the dictionary are different from those in the method of Rehman et al. Furthermore, the biggest difference between our method and the method in[54] is generation of the dictionary. Specifically, in the existing method[54], the dictionary is obtained by directly using the K-SVD algorithm[36], which is based on the MSE-based criterion, where SVD represents singular value decomposition. On the other hand, the proposed method tries to obtain the dictionary based on the SSIM-based criterion, and all of the procedures are based on the SSIM index.

This paper is organized as follows. First, in Section 2, we briefly explain sparse representation and the SSIM index, which are used in the proposed method, as preliminaries. Next, in Section 3, we explain the overview of the proposed method. An inpainting method via sparse representation based on the SSIM index is proposed in Section 4. Experimental results that verify the performance of the proposed method are shown in Section 5. Finally, conclusions are given in Section 6.

## 2 Preliminaries

In this section, we briefly explain sparse representation and the SSIM index used in the proposed method as preliminaries. They are presented in Sections 2.1 and 2.2, respectively.

### 2.1 Sparse representation

Sparse representation of signals is explained in this subsection. The basic algorithm for sparse representation and the K-SVD algorithm[36], which is closely related to the proposed method, are shown in this subsection. Thus, we briefly explain their ideas.

Given an overcomplete dictionary **D** ∈ **R**^{n×K} whose columns are prototype signal-atoms **d**_{ j } ∈ **R**^{ n }(*j* = 1,2,…,*K*), a target signal **y** ∈ **R**^{ n } can be represented as a sparse linear combination of these atoms^{a}. Specifically, **y** is approximated as **y** ≅ **D** **x**(**x** ∈ **R**^{ K }), where **x** is a vector containing the representation coefficients of signal **y**, and it satisfies ||**y**-**D** **x**||_{ p } ≤ *ε*. In this subsection, we assume *p* = 2.

*n*<

*K*and

**D**is a full-rank matrix, an infinite number of solutions are available for the above representation problem. Thus, a new constraint is introduced into this problem, and the solution is obtained by solving

_{0}represents the

*l*

^{0}-norm. Furthermore,

*T*determines the sparsity of the signals. The above equation represents the optimal representation coefficient vector

**x**minimizing the distance${||\mathbf{y}-\mathbf{Dx}||}_{2}^{2}$ which is calculated under the constraint that the number of the nonzero elements in

**x**is

*T*or less. For example, Figure1a shows an example of the sparse representation of the target vector

**y**, where in this example, ||

**x**||

_{0}= 6. Therefore, the number of the nonzero elements in

**x**is six. By limiting the number of the nonzero elements, we can obtain the solution of the above linear combination. It is well known that calculation of the optimal solution is a nondeterministic polynomial-time hard (NP-hard) problem[55]. Thus, several methods that approximately provide solutions of the above problem have been proposed, and the simplest ones are matching pursuit (MP)[56] and orthogonal MP (OMP) algorithms[57, 58, 59]. The basis pursuit algorithm is also a representative algorithm solving the problems by replacing the

*l*

^{0}-norm with an

*l*

^{1}-norm[60]. The focal underdetermined system solver is a similar algorithm using

*l*

^{ p }-norm (

*p*≤ 1)[61].

**y**

_{ i }(

*i*= 1,2,…,

*N*), there exist dictionary matrices providing the sparse solution

**x**

_{ i }. The K-SVD algorithm[36] can provide the optimal dictionary matrix

**D**and coefficient vectors

**x**

_{ i }(

*i*= 1,2,…,

*N*) by solving

where **X** = [**x**_{1},**x**_{2},…,**x**_{ N }] and **Y** = [**y**_{1},**y**_{2},…,**y**_{ N }], and ||·||_{ F } represents the Frobenius norm. In Equation 2, this problem is to obtain the optimal dictionary matrix **D** and representation coefficient vectors **x**_{ i }(*i* = 1,2,…,*N*) minimizing the sum of ||**y**_{ i }-*Dx*_{ i }||^{2}(*i* = 1,2,…,*N*) under the constraint that the number of the nonzero elements in **x**_{ i }(*i* = 1,2,…,*N*) is *T* or less. Figure1b shows the relationship between **Y** and **DX**, where the number of the nonzero values in each **x**_{ i } of **X** is six in this example. The K-SVD algorithm approximately calculates the optimal solution of Equation 2 by iterating calculation of **x**_{ i }(*i* = 1,2,…,*N*) based on the OMP algorithm and update of the atoms **d**_{ j }(*j* = 1,2,…,*K*) in the dictionary matrix **D** using singular value decomposition (SVD). Specifically, the representation coefficient vector **x**_{ i }(*i* = 1,2,…,*N*) is estimated one by one, and each atom **d**_{ j }(*j* = 1,2,…,*K*) in the dictionary matrix **D** is also updated one by one. As described above, for updating **d**_{ j }(*j* = 1,2,…,*K*), SVD is adopted for effectively providing the approximately optimal solution.

### 2.2 Structural similarity index

**y**

_{1}and

**y**

_{2}(∈

**R**

^{ n }), and its specific definition is as follows:

*l*(

**y**

_{1},

**y**

_{2}) and

*c*(

**y**

_{1},

**y**

_{2}) respectively compare the mean and variance of the two signal vectors. Furthermore,

*s*(

**y**

_{1},

**y**

_{2}) measures their structural correlation. Therefore, from Equation 3, the similarity between two signal vectors is obtained from the three similarities of their luminance, contrast, and structure components, i.e.,

*l*(

**y**

_{1},

**y**

_{2}),

*c*(

**y**

_{1},

**y**

_{2}), and

*s*(

**y**

_{1},

**y**

_{2}), which are closely related to the human visual system (HVS), where their details are shown below. Note that the parameters

*α*> 0,

*β*> 0, and

*γ*> 0 determine the relative importance of the three components in Equation 3. Next, the three terms,

*l*(

**y**

_{1},

**y**

_{2}),

*c*(

**y**

_{1},

**y**

_{2}), and

*s*(

**y**

_{1},

**y**

_{2}), are obtained as

In the above equations,${\mu}_{{\mathbf{y}}_{1}}$ and${\mu}_{{\mathbf{y}}_{2}}$ are the means of **y**_{1} and${\mathbf{y}}_{2},{\mathit{\sigma}}_{{\mathbf{y}}_{1}}^{2}$ and${\mathit{\sigma}}_{{\mathbf{y}}_{2}}^{2}$ are the variances of **y**_{1} and **y**_{2}, and${\sigma}_{{\mathbf{y}}_{1},{\mathbf{y}}_{2}}$ is the cross covariance between **y**_{1} and **y**_{2}. The constants *C*_{1},*C*_{2}, and *C*_{3} are necessary to avoid instability when the denominators are very close to zero.

*α*=

*β*=

*γ*= 1 and${C}_{3}=\frac{{C}_{2}}{2}$, and formulation of the SSIM index is simplified by

Note that in the proposed method shown in Section 4, *C*_{1} = (*K*_{1}*I*_{max}) and *C*_{2} = (*K*_{2}*I*_{max}), where *I*_{max} = 255, *K*_{1} = 0.01, and *K*_{2} = 0.03. Thus, *α*,*β*,*γ*,*C*_{1},*C*_{2}, and *C*_{3} are set to the values shown in[52].

In[47] and[52], the effectiveness of the SSIM index as a quality measure, its superiority to MSE, and its variants are presented in detail. Generally, MSE cannot reflect perceptual distortions, and its value becomes higher for images altered with some distortions such as mean luminance shift, contrast stretch, spatial shift, spatial scaling, and rotation but with negligible loss of subjective image quality. Furthermore, blurring severely deteriorates image quality, but its MSE becomes lower than those of the above alterations. On the other hand, the SSIM index is defined by separately calculating three similarities in terms of luminance, variance, and structure, which are derived on the basis of the HVS not accounted for by MSE. Therefore, it becomes a better quality measure providing a solution to the above problem, and this is also confirmed in[47]. We can therefore expect that the use of this similarity for inpainting will provide successful results.

Note that moment invariants take not only image features, such as means and variance, but also image degradations, such as translation, scaling, and rotation, into accounts to generate some invariants and to properly match images without setting any constant. Therefore, in the rest of this subsection, we show some discussions of advantage and disadvantage of the use of the SSIM index by comparing with moment invariants.

#### 2.2.1 Advantage

In the proposed method, we use the SSIM index to represent the visual quality of inpainting results. The SSIM index is defined based on several characteristics in the HVS. As shown in Equations 3 to 7, the SSIM index is related to luminance and contrast masking and the correlation. This means that the SSIM index is obtained from the three elements, i.e., Equations 4 to 6. Specifically, the first term defined in Equation 4 is consistent with Weber’s law, which states that the HVS is sensitive to the relative luminance change, and not to the absolute luminance change. The second term defined in Equation 5 is derived based on the contrast masking characteristic that the contrast change is less sensitive when there is a high base contrast than there is a low base contrast. Then, in the third term defined in Equation 6, the structure comparison is conducted after luminance subtraction and contrast normalization. If we ignore *C*_{3}, it is equivalent to calculating the correlation coefficient. In this way, it can be seen that the SSIM index is derived by a bottom-up scheme according to the HVS. This means the proposed method using the SSIM index can perform the inpainting with consideration of the sensitivity to the HVS.

#### 2.2.2 Disadvantage

It is known that the SSIM index tends to be robust to translation, scaling, and rotation. However, as those gaps become larger, it also becomes difficult to provide accurate visual quality using the SSIM index due to its definition. On the other hand, moment invariants can output several useful criteria which are invariant under translation, scaling, and rotation. Therefore, if a new visual quality measure can be derived from these moment invariants, successful inpainting based on the derived measure can be also expected. Furthermore, the SSIM index has several parameters compared to the moment invariants.

Note that when comparing with the MSE and its variants, the SSIM index can only be calculated from some areas. This means the SSIM index is calculated in a block-wise scheme, not in a pixel-wise scheme. Therefore, to realize the use of the SSIM index for inpainting, we have to adopt the block-wise procedures.

## 3 Overview of our proposed framework

### 3.1 Generation of dictionary

First, in the generation of the dictionary, we clip known patches not including any missing areas from the target image, and the dictionary matrix **D** shown in Section 2.1 is calculated from these patches. In the same manner as the traditional sparse representation problems, we iteratively perform two procedures, 'calculation of the representation coefficients’ and 'update of the atoms included in the dictionary matrix **D**’. The procedures are similar to those of the traditional method (K-SVD algorithm[36]). The contribution of the proposed method, i.e., the difference from the traditional method, is the introduction of the SSIM index. Specifically, the representation coefficients and the atoms of the dictionary matrix are calculated in such a way that the SSIM-based approximation performance becomes the highest. This means that the cost function${||\mathbf{Y}-\mathbf{DX}||}_{F}^{2}$ in Equation 2 is replaced with that of the SSIM index. Note that in the calculation of the representation coefficients, the maximization problem of the SSIM index is a nonconvex problem, and thus, it is reformulated as a quasi-convex problem using the computation scheme in[53]. On the other hand, in the update of the atoms of the dictionary matrix, we use a simple steepest ascent algorithm since the introduction of the computation scheme in[53] needs high computation costs. In K-SVD algorithm[36], the atoms can be effectively updated using SVD, but this scheme is based on the least-square approximation, and therefore, we use the simple steepest ascent algorithm.

### 3.2 Inpainting of missing areas

In the inpainting of missing areas, we first clip a patch including missing areas from the target image. Note that we have to determine which patch should be first selected for the inpainting. In the proposed method, we calculate the patch priority for determining the inpainting order based on the method in[24]. Therefore, the patch maximizing the patch priority is selected, and its missing areas are reconstructed in the proposed method.

For the selected patch (denoted as the target patch) including missing areas, the inpainting procedures are performed. Specifically, the proposed method performs the sparse representation of the target patch to estimate the missing intensities. Note that the cost function in Equation 1 is replaced with an SSIM version. Thus, this is the difference from the traditional sparse representation approach and the biggest contribution in our method. The sparse representation of the target patch maximizing the SSIM index is then performed, where this nonconvex maximization problem is also reformulated as a quasi-convex problem using the computation scheme in[53]. In Figure2, the specific procedures for calculating the sparse representation are shown. Their details are shown in the following section. From the approximation results obtained by the above sparse representation, the proposed method outputs the estimated intensities within the missing areas of the target patch.

By iterating the patch selection based on the patch priority and its SSIM-based missing area reconstruction, we can inpaint the whole missing areas within the target image.

## 4 Image inpainting via SSIM-based sparse representation

The inpainting method via SSIM-based sparse representation is presented in this section. As described in the previous section, the proposed method is divided into two algorithms, generation of a dictionary and inpainting algorithm. In the first algorithm, the dictionary is generated from known patches *f*_{ i } (*i* = 1,2,…,*N*) within the target image, where *N* is the number of known patches, and their size is *w* × *h* pixels. It should be noted that the proposed method performs calculation of the dictionary based on the new perceptually optimized criterion, i.e., the SSIM index. The details of this calculation are shown in Section 4.1. In the second algorithm, the proposed method clips a patch *f* including missing areas from the target image and estimates their unknown intensities. In this algorithm, sparse representation based on the SSIM index is introduced into the inpainting. Its details are shown in Section 4.2. For the following explanation, we denote unknown and known areas within *f* as Ω and$\stackrel{\u0304}{\mathrm{\Omega}}$, respectively.

### 4.1 Generation of the dictionary

In this subsection, the algorithm for generating the dictionary is presented. In the proposed method, we calculate the dictionary matrix **D** in Equation 2 for reconstructing the missing areas within the target image. Note that the difference from Equation 2 is the use of the SSIM index. In contrast to Equation 2 in minimizing the MSE of the approximation results, the proposed method maximizes the SSIM index of the approximation results by the sparse representation. Similar to K-SVD algorithm[36], since it is difficult to simultaneously obtain the dictionary matrix and the representation coefficients, we iteratively update these two. Specifically, for the calculation of the representation coefficients optimal in terms of the SSIM index, we use their simple estimation scheme similar to some matching pursuit algorithms. Furthermore, its nonconvex optimization problem is reformulated as a quasi-convex problem using the calculation scheme in[53]. On the other hand, each atom of the dictionary matrix is updated one by one by a simple steepest ascent algorithm. The details are shown below.

As described above, known patches *f*_{ i }(*i* = 1,2,…,*N*) with sizes of *w* × *h* pixels are clipped from the target image in the same interval. This means that the patches *f*_{ i } for generating the dictionary are selected from known parts, which are not damaged, of the target image. Next, for each patch *f*_{ i }, we define a vector **y**_{ i } ∈ **R**^{ wh }, whose elements are its raster-scanned intensities. Using an overcomplete dictionary matrix **D** ∈ **R**^{wh×K} containing *K* prototype atoms **d**_{ j } ∈ **R**^{ wh }(*j* = 1,2,…,*K*), each vector **y**_{ i } is represented as a sparse linear combination of these atoms, **y**_{ i } ≅ *Dx*_{ i }, where it satisfies SSIM(**y**_{ i },*Dx*_{ i }) ≥ *η* for a fixed value *η* that corresponds to *ε* in the previous section. The vector **x**_{ i } ∈ **R**^{ K } contains the representation coefficients of **y**_{ i }.

*wh*<

*K*and

**D**is a full-rank matrix, an infinite number of solutions are available for the representation problems. Therefore, in the same manner as Equation 1, the proposed method adopts the solution of

This means that the optimal vector of **x**_{ i } is obtained by maximizing the SSIM index between **y**_{ i } and *Dx*_{ i } under the constraint that the number of the nonzero elements in **x**_{ i } is *T* or less. The optimal representation coefficients can then be obtained by solving the above equation.

**D**can be obtained by solving the following maximization problem:

This means that we calculate the dictionary matrix **D** maximizing the approximation performance of all **y**_{ i }(*i* = 1,2,…,*N*) in terms of the SSIM index under the constraint that the number of the nonzero elements in **x**_{ i }(*i* = 1,2,…,*N*) is *T* or less. In the proposed method, the optimal dictionary matrix **D** is estimated using a scheme similar to the K-SVD algorithm[36], where the procedures are based on the SSIM index. Specifically, this scheme is divided into two procedures, calculation of the optimal vector **x**_{ i }(*i* = 1,2,…,*N*) and update of the dictionary matrix **D**, and they are iteratively performed. We show each of the procedures below.

#### 4.1.1 Calculation of the optimal vector x^{ i }

By fixing the dictionary matrix **D**, the optimal vector **x**_{ i } is calculated for each **y**_{ i }. Specifically, **x**_{ i } can be calculated on the basis of Equation 8. In this optimization problem, we select *T* optimal atoms that provide the optimal linear combination based on the SSIM index. Therefore, we adopt the simplest algorithm that selects the optimal atoms one by one, and it is similar to several matching pursuit algorithms[56, 57, 58, 59]. Specifically, for each **y**_{ i }(*i* = 1,2,…,*N*), we first search one atom which provides its optimal approximation, maximizing the SSIM index. Furthermore, by adding another atom to the previously selected atoms, we calculate their SSIM-based linear combination approximating each **y**_{ i }, and then, the optimal atom maximizing the SSIM index with the previously selected atoms is selected. Then, by iterating this procedure *T* times, the *T* optimal atoms can be selected for each **y**_{ i }. Therefore, the procedures are quite simple. In each iteration, we simply select one atom in such a way that the linear combination of this atom and the previously selected atoms maximizes the SSIM index for approximating each **y**_{ i }(*i* = 1,2,…,*N*).

The details of the *t* th (*t* = 1,2,…,*T*) optimal atom selection are shown below.

*t*th optimal atom selection for

**y**

_{ i }, the following vector is first defined:

*w*

*h*× (

*t*- 1) matrix containing

*t*- 1 atoms previously selected from

**d**

_{ j }(

*j*= 1,2,…,

*K*) in

*t*- 1 iterations. In addition,

is a coefficient vector for calculating${\mathbf{y}}_{i,j}^{(t)}$. The vector${\mathbf{x}}_{i}^{(t)}$ contains representation coefficients that respectively correspond to the atoms in${\mathbf{D}}_{i}^{(t-1)}$, and *x*_{ j } is that corresponding to **d**_{ j }. Here, we show the specific definitions of${\mathbf{x}}_{i,j}^{(t)},{\mathbf{y}}_{i,j}^{(t)}$, and${\mathbf{D}}_{i,j}^{(t)}$. First,${\mathbf{x}}_{i,j}^{(t)}$ is the sparse representation coefficient vector for representing **y**_{ i } with the atom **d**_{ j } selected to be appended at iteration *t*, and${\mathbf{y}}_{i,j}^{(t)}$ is the corresponding approximation of **y**_{ i }. Next,${\mathbf{D}}_{i,j}^{(t)}$ is a matrix including *t* - 1 atoms previously selected in *t* - 1 iterations and the atom **d**_{ j } at iteration *t* which are used for representing **y**_{ i }. The proposed method estimates the optimal vector${\widehat{\mathbf{y}}}_{i,j}^{(t)}$ of${\mathbf{y}}_{i,j}^{(t)}$ (*j* = 1,2,…,*K*) that provides the optimal representation performance. Then the optimal atom **d**_{ j } is selected to maximize the SSIM index for the representation of **y**_{ i } by itself together with the atoms selected in the previous *t* - 1 iterations.

**y**

_{ i }, where

**1**= [1,1,…,1]

^{′}is a

*w*

*h*× 1 vector, and the vector/matrix transpose is denoted by the superscript

^{′}in this paper. Similarly,${\mu}_{{\mathbf{y}}_{i,j}^{(t)}}$ and${\sigma}_{{\mathbf{y}}_{i,j}^{(t)}}^{2}$ are the mean and variance of${\mathbf{y}}_{i,j}^{(t)}$, respectively, and are obtained as follows:

**H**=

**H**

^{′}and

**H**

^{2}=

**H**are satisfied, and

**I**is the identity matrix. In addition,

**y**

_{ i }and${\mathbf{y}}_{i,j}^{(t)}$ and is defined as

It should be noted that the criterion in Equation 23 is a nonconvex function of${\mathbf{x}}_{i,j}^{(t)}$, and it is difficult to obtain the global optimal solution. Thus, we introduce the calculation scheme used in[53] into the estimation of the optimal vector${\widehat{\mathbf{x}}}_{i,j}^{(t)}$. Specifically, the nonconvex problem is transformed into a quasi-convex formulation. The main idea of this scheme is shown as follows. By fixing the mean of${\mathbf{y}}_{i,j}^{(t)}\phantom{\rule{0.3em}{0ex}}(={{\mathit{\mu}}_{{\mathbf{D}}_{i,j}^{(t)}}}^{\prime}{\mathbf{x}}_{i,j}^{(t)})$, we can focus only on the second term in Equation 23. Therefore, the maximization problem can be simplified.

Therefore, it can be seen that the first term of the above equation can be fixed by fixing${\rho}_{i,j}^{(t)}$ since${\mu}_{{\mathbf{y}}_{i}}$ is a constant.

Thus, the cost function becomes more simple, i.e., we can focus only on the second term of the SSIM index under the constraint fixing${{\mathit{\mu}}_{{\mathbf{D}}_{i,j}^{(t)}}}^{\prime}{\mathbf{x}}_{i,j}^{(t)}={\rho}_{i,j}^{(t)}$.

*τ*using a standard bisection procedure, and the optimal vectors${\widehat{\mathbf{x}}}_{i,j}^{(t)}\left({\rho}_{i,j}^{(t)}\right)$ are calculated for several values of${\rho}_{i,j}^{(t)}(={\mu}_{{\mathbf{y}}_{i}}-R\delta ,\dots ,{\mu}_{{\mathbf{y}}_{i}}-2\delta ,{\mu}_{{\mathbf{y}}_{i}}-\delta ,{\mu}_{{\mathbf{y}}_{i}},{\mu}_{{\mathbf{y}}_{i}}+\delta ,{\mu}_{{\mathbf{y}}_{i}}+2\delta ,\dots ,{\mu}_{{\mathbf{y}}_{i}}+R\delta )$ to select${\widehat{\mathbf{x}}}_{i,j}^{(t)}$ maximizing Equation 15. Note that

*δ*is the searching interval, and

*R*determines the searching range. Their specific values are shown in Section 5.1. The detailed procedures for estimating

*τ*in the proposed method are as follows:

- (i)
An initial value of

*τ*(say*τ*_{0}) is determined between zero to one. Furthermore,*U*_{ τ }= 1.0 and*L*_{ τ }=*τ*_{0}, where*U*_{ τ }and*L*_{ τ }respectively represent the upper limit and the lower limit of*τ*. In this paper, we set*τ*_{0}= 0.2. - (ii)
The optimization problem in Equation 28 is solved using

*τ*. - (iii)Two criteria
*S*_{ τ }and*D*_{ τ }are calculated as$\begin{array}{l}{S}_{\tau}=\tau \left({\sigma}_{{\mathbf{y}}_{i}}^{2}+{{\mathbf{x}}_{i,j}^{(t)}}^{\prime}{\mathbf{K}}_{i,j}^{(t)}{\mathbf{x}}_{i,j}^{(t)}+{C}_{2}\right)-\left(2{{\mathbf{k}}_{i,j}^{(t)}}^{\prime}{\mathbf{x}}_{i,j}^{(t)}+{C}_{2}\right),\\ {D}_{\tau}={U}_{\tau}-{L}_{\tau}.\end{array}$ - (iv)
According to the obtained criteria

*S*_{ τ }and*D*_{ τ }, the following steps are operated: - (a)
If

*S*_{ τ }≥ 0 and*D*_{ τ }<*ε*, the final optimal solution of*τ*is output, where*ε*= 0.05. - (b)
If

*S*_{ τ }≥ 0 but*D*_{ τ }≥*ε*, $\tau =\frac{{U}_{\tau}+{L}_{\tau}}{2}$ and*U*_{ τ }=*τ*. - (c)
Otherwise, $\tau =\frac{{U}_{\tau}+{L}_{\tau}}{2}$ and

*L*_{ τ }=*τ*. - (v)
Procedures (ii) to (iv) are iterated.

#### 4.1.2 Update of dictionary matrix D

**x**

_{ i }(

*i*= 1,2,…,

*N*), the proposed method updates the dictionary matrix

**D**. We update each dictionary element, i.e., each atom, one by one in a greedy fashion. Specifically, we choose one atom and update it in such a way that the representation performance, i.e., the sum of the SSIM index, becomes the highest. We perform the update of each atom

**d**

_{ j }(

*j*= 1,2,…,

*K*) by solving the following problem:

where **x**_{ i }(*j*) is a *j* th element of **x**_{ i }.

In the above equation, we try to maximize the approximation performance of **y**_{ i } (*i* = {1,2,…,*N*|**x**_{ i }(*j*) ≠ 0}) by **x**_{ i }(*j*)**d**_{ j }, i.e., by the target atom **d**_{ j } and its corresponding representation coefficient **x**_{ i }(*j*). Note that it is difficult to maximize Equation 28 in the same way as the calculation of the optimal vector **x**_{ i } (*i* = 1,2,…,*N*) since the optimization problem is too complex. Thus, using the well-known steepest ascent algorithm, the proposed method updates each atom **d**_{ j } (*j* = 1,2,…,*K*). Specifically, the proposed method performs an update of the dictionary matrix **D** by the following procedures:

Step 1. Select one atom **d**_{ j } (*j* = 1,2,…,*K*).

**d**

_{ j }by iterating the following equation:

where *ζ* is a fixed small parameter.

Step 3. Replace the selected atom **d**_{ j } with the vector obtained by step 2. Note that a new dictionary matrix, whose *j* th column, i.e., **d**_{ j }, is only updated, is obtained.

Step 4. Repeat steps 1 to 3 for all atoms **d**_{1},**d**_{2},…,**d**_{ K } within the dictionary matrix **D**.

Using the above procedures, the proposed method can update the dictionary matrix **D**.

Finally, we clarify the relationship between the K-SVD algorithm[36] and our SSIM-based algorithm. First, the biggest difference between the proposed method and the K-SVD algorithm is the use of different quality metrics. The K-SVD algorithm tries to minimize the MSE for performing sparse representation and dictionary generation. On the other hand, the proposed method tries to maximize the SSIM index for them. Specifically, for the calculation of sparse representation coefficients, we adopt an algorithm similar to the OMP algorithm, but the quality measure is the SSIM index, not the MSE. Therefore, representation coefficients are obtained to maximize the SSIM index which is used as the representation performance. Then, the optimal solution is obtained on the basis of the algorithm used in[53], which is quite different from the algorithm based on the MSE. Furthermore, for generation of the dictionary, the proposed method updates each atom, and its scheme is also similar to that of the K-SVD algorithm. However, the proposed method performs the update of each atom in such a way that the sum of the SSIM index becomes highest and, thus, SVD is not used for the calculation. Then, since the update procedure is too complicated, we simply adopt the steepest ascent algorithm in our method.

### 4.2 Inpainting algorithm

In this subsection, the inpainting algorithm of the missing area Ω in the target patch *f* based on the SSIM index is presented. In the proposed method, the target patch *f* is approximated by a sparse linear combination of the atoms of the dictionary matrix **D** obtained in the previous subsection. In this approach, we introduce the SSIM index as the approximation performance, and then, the optimal reconstruction results maximizing the SSIM index can be obtained. Note that to obtain the optimal sparse linear combination maximizing the SSIM index, we also introduce the calculation scheme in[53]. Note that different from the previous subsection, since we simultaneously estimate the representation coefficients and the missing intensities, the calculation scheme in[53] is extended. Then, the inpainting of the missing area Ω within the target patch *f* can be realized based on the SSIM index. The details are shown below.

**y**of

*f*, where

Note that **E** ($\in {\mathbf{R}}^{{N}_{\stackrel{\u0304}{\mathrm{\Omega}}}\times \mathit{\text{wh}}}$) is a matrix whose diagonal elements are one or zero, and it extracts only known intensities within **y** to obtain **y**^{∗} ($\in {\mathbf{R}}^{{N}_{\stackrel{\u0304}{\mathrm{\Omega}}}}$), where${N}_{\stackrel{\u0304}{\mathrm{\Omega}}}$ is the number of known pixels in *f*. From Equation 31, the proposed method tries to estimate the unknown vector **y** approximated by the linear combination of the atoms in the dictionary matrix **D** under the constraints that the known intensities in$\stackrel{\u0304}{\mathrm{\Omega}}$ are fixed and the number of the nonzero elements in **x** is *T* or less.

*T*atoms used for approximating

**y**. Specifically, the proposed method selects

*T*optimal atoms from

**D**by solving the following problem:

where its solution can be obtained on the basis of the same algorithm as the calculation of the optimal vector **x**_{ i } described in the previous subsection. Then, a matrix$\widehat{\mathbf{D}}$ containing atoms whose corresponding coefficients in$\widehat{\mathit{\alpha}}$ are nonzero values is obtained.

In the proposed method, we estimate$\widehat{\mathbf{y}}$ and$\widehat{\mathbf{a}}$, maximizing Equation 34 under the constraint **E** **y** = **y**^{∗} using the computation scheme in[53] in a similar way shown in the previous subsection. Note that we have to estimate the two vectors, and this computation scheme is extended as follows.

**y**and

**a**, but the first term in Equation 34 is a function only of$\frac{1}{\mathit{\text{wh}}}{\mathbf{1}}^{\prime}\mathbf{y}\left(=\rho \right)$ and${{\mathit{\mu}}_{\widehat{\mathbf{D}}}}^{\prime}\mathbf{a}\left(=\omega \right)$ and, thus, we rewrite Equation 33 in the same way as that in the previous subsection.

By fixing$\frac{1}{\mathit{\text{wh}}}{\mathbf{1}}^{\prime}\mathbf{y}=\rho $ and${{\mathit{\mu}}_{\widehat{\mathbf{D}}}}^{\prime}\mathbf{a}=\omega $, the first term of the SSIM index shown in Equation 34 can be fixed, and the cost function of Equation 33 can be simplified.

Therefore, the overall problem is to find the highest SSIM index by searching over ranges of *ρ* and *ω* as shown in Figure2. Note that their search ranges are set to${\mu}_{{\mathbf{y}}^{\ast}}-R\delta ,\dots ,{\mu}_{{\mathbf{y}}^{\ast}}-2\delta ,{\mu}_{{\mathbf{y}}^{\ast}}-\delta ,{\mu}_{{\mathbf{y}}^{\ast}},{\mu}_{{\mathbf{y}}^{\ast}}+\delta ,{\mu}_{{\mathbf{y}}^{\ast}}+2\delta ,\dots ,{\mu}_{{\mathbf{y}}^{\ast}}+R\delta $, where${\mu}_{{\mathbf{y}}^{\ast}}$ is the mean of **y**^{∗}. Thus, the solution can be obtained in the same manner as that shown in the previous subsection.

Note that the optimal value of *τ* can be obtained as shown in the previous subsection.

**y**and

**a**as follows:

In Equation 40, the first and second terms are from the cost function, and the third, fourth, and fifth terms are from the constraints.

Then, by solving the above problem, the proposed method can calculate the optimal vectors$\widehat{\mathbf{a}}$ and$\widehat{\mathbf{y}}$. Finally, from the obtained result$\widehat{\mathbf{y}}$, the proposed method outputs the estimated intensities in the missing area Ω.

*f*. Therefore, the proposed method clips patches including missing areas and performs inpainting to estimate all missing intensities. This means that the proposed method gradually reconstructs missing areas patch by patch starting from the missing boundary. It should be noted that in order to realize this scheme, we have to determine the order in which patches along the fill-front

*∂*Ω of missing areas are filled. We call this order 'patch priority’. In the proposed method, patch priorities are determined by the method proposed by Criminisi et al.[24]. Specifically, given a patch

*f*

_{ p }centered at pixel

**p**that is in the fill-front of the missing areas within the target image, its priority

*P*(

**p**) is defined as follows:

*C*(

**p**) and

*D*(

**p**) are called confidence term and data term, respectively, and they are defined as follows:

In the above equations, *I* and Θ are the whole areas of the target image and whole missing areas, respectively. Furthermore, area (*f*_{ p }) (= *w* × *h*) represents the number of pixels included within the target patch *f*_{ p }. Then, *I*_{max} is a normalization factor (e.g., *I*_{max} = 255 for a typical gray scale image),$\nabla {I}_{\mathbf{p}}^{\perp}$ is an isophote at pixel **p**, and **n**_{ p } is a unit vector orthogonal to the fill-front at pixel **p**. Note that *C*(**p**) is initially set as *C*(**p**) = 0∀**p** ∈ Θ and *C*(**p**) = 1∀**p** ∈ (*I*-Θ). After performing the inpainting, *C*(**p**) is substituted into those in the inpainted areas for the following inpainting process.

*f*

_{ p }. Therefore, if the target patch

*f*

_{ p }contains many known intensities, its value becomes higher. Furthermore, after the inpainting, the reconstructed pixels have the values less than one, i.e., the reconstructed pixels have higher reliability than that of the missing pixels but lower reliability than that of the original pixels. Furthermore, as shown in Figure3, the data term is a function of the strength of isophotes at the fill front

*δ*Ω[24]. Therefore, by calculating the inner product of the isophote$\nabla {I}_{\mathbf{p}}^{\perp}$ at pixel

**p**and unit vector

**n**

_{ p }orthogonal to the fill-front at pixel

**p**, the linear structures can be reconstructed first. In this way, we can restore all of the missing areas within the target image according to the patch priorities in Equation 43.

## 5 Experimental results

In this section, we verify the performance of the proposed method in order to confirm its effectiveness. First, we show results of subjective evaluation of the proposed method using several test images. Furthermore, results of quantitative evaluation using peak signal-to-noise ratio (PSNR) obtained from MSE and the SSIM index are shown, and the effectiveness of the use of the SSIM index is also discussed.

In this section, we show the conditions of the experiments in Section 5.1. In this subsection, we mainly explain the details of the experiments and the comparative methods. In Section 5.2, subjective and quantitative results are shown in comparison with those of the existing methods, and the effectiveness of the proposed method is also discussed. In Section 5.3, we show some examples by applying the proposed method to test images including larger missing areas.

### 5.1 Conditions of experiments

- 1.
Methods based on PCA or KPCA [31, 34, 35]

These existing methods generate eigenspaces or nonlinear eigenspaces of patches for inpainting based on PCA or KPCA. Since it is well known that eigenspaces can provide least-square approximation of target data, i.e., eigenspaces are the optimal subspaces based on MSE, the method in[31] is suitable for comparison with the proposed method. Furthermore, the methods in[34] and[35] utilize nonlinear eigenspaces to perform the approximation of nonlinear texture features in images, and we therefore used these methods in the experiments.

- 2.
Exemplar-based inpainting methods [24, 30]

Several exemplar-based inpainting methods have been proposed. The method in[24] is a representative method, and its improved version was proposed in[30], both methods being based on least-square error approaches. In the proposed method, we determine the patch priority using the scheme in[24] and, thus, the difference between our method and[24] is the algorithm for estimating missing intensities. Therefore, the method in[24] is suitable in confirming the effectiveness of the proposed inpainting algorithm, i.e., the missing intensity estimation algorithm. Furthermore, although the method in[30] improves on the speed rather than inpainting performance improvement, it is reported in their paper that their method improves the performance of[24] in some cases. Therefore, in the experiments, we used these methods as comparative methods.

- 3.
Sparse representation-based inpainting methods [41]

As described above, the method in[41] adopts the new modeling of patch priority and patch representation, which are two crucial steps for patch propagation in the exemplar-based inpainting approach, based on sparsity. It should be noted that since this method is based on sparse representation but uses MSE-based criteria, it is suitable for comparison.

In this paper, we regard those in[35] and[41] as state-of-the-art methods.

Furthermore, the method in[41] has improved the performance in both patch approximation improvement based on sparse representation and patch priority estimation. Thus, we regard this method as a state-of-the-art method.

In the experiments, we used the above methods as comparative methods for evaluation of our method. For performing inpainting by the proposed method and existing methods[24, 30, 41], patch size was fixed to 15 (*w* = *h* = 15). Furthermore, the existing methods in[31, 34] and[35] simply perform inpainting in a raster scanning order. Then, for some test images, since target patches contain missing areas in the whole parts, those methods cannot perform inpainting on those missing areas. Thus, in the experiments, patch size was set to 30. Note that much smaller patches were used in some existing methods in previous studies and that accurate performance could be achieved. In these experiments, we used such difficult conditions in order to make the difference in the performances of the proposed method and the existing methods clearer. Furthermore, in our method, we simply determined *T* = 10, *δ* = 5, and *R* = 6.

### 5.2 Subjective and quantitative evaluations

Generally, natural images contain much more powers in low-frequency components than those in high-frequency components. Low-dimensional subspaces obtained from the MSE-based criteria in the existing methods therefore tend to represent only such low-frequency components. Thus, since it becomes difficult to represent high-frequency components, their results suffer from oversmoothness. On the other hand, the SSIM index contains a term comparing components not including average components, i.e., variances, as shown in Equation 5, and, thus, subspaces used for inpainting tend to successfully represent high-frequency components. Therefore, the proposed method can perform inpainting successfully. Furthermore, the proposed method adopts sparse representation in addition to the SSIM index. This approach enables adaptive selection of the optimal atoms for each target patch including missing areas. This means that the optimal subspace can be provided for each target patch by our method.

**Performance comparison (PSNR) of the proposed method and existing methods**

Image number | [31] | [34] | [35] | [24] | [30] | [41] | Proposed method |
---|---|---|---|---|---|---|---|

1 (Figure4) | 18.55 | 18.13 | | 16.85 | 16.72 | 17.51 | 17.22 |

2 (Figure5) | 16.51 | 15.98 | | 14.26 | 14.68 | 15.08 | 15.51 |

3 (Figure6) | 19.93 | 19.48 | | 18.01 | 17.86 | 18.98 | 18.62 |

4 (Figure10, first column) | 15.95 | 16.51 | | 14.97 | 15.19 | 15.97 | 15.85 |

5 (Figure10, second column) | 17.06 | 16.86 | | 15.70 | 15.51 | 16.08 | 16.14 |

6 (Figure10, third column) | 14.42 | 13.81 | | 11.79 | 12.20 | 12.02 | 13.38 |

7 (Figure10, fourth column) | 15.93 | 16.07 | | 15.22 | 15.49 | 15.27 | 14.05 |

8 (Figure11, first column) | 12.98 | 12.74 | | 11.34 | 10.92 | 11.81 | 12.04 |

9 (Figure11, second column) | 15.59 | 15.57 | | 13.56 | 13.55 | 13.42 | 15.55 |

10 (Figure11, third column) | 14.38 | 14.66 | | 13.42 | 13.43 | 13.82 | 13.86 |

11 (Figure11, fourth column) | 16.80 | 17.09 | | 15.47 | 15.48 | 15.98 | 16.00 |

**Performance comparison (SSIM) of the proposed method and existing methods**

Image number | [31] | [34] | [35] | [24] | [30] | [41] | Proposed method |
---|---|---|---|---|---|---|---|

Image 1 (Figure4) | 0.6355 | 0.6090 | 0.6411 | 0.6822 | 0.6773 | 0.7145 | |

Image 2 (Figure5) | 0.5130 | 0.5154 | 0.5999 | 0.5077 | 0.5277 | 0.5308 | |

Image 3 (Figure6) | 0.6248 | 0.6051 | 0.6538 | 0.7318 | 0.7246 | 0.7569 | |

Image 4 (Figure10, first column) | 0.5833 | 0.5762 | 0.6373 | 0.6563 | 0.6708 | 0.7036 | |

Image 5 (Figure10, second column) | 0.6419 | 0.6424 | 0.6774 | 0.7298 | 0.7196 | 0.7410 | |

Image 6 (Figure10, third column) | 0.6460 | 0.6458 | 0.7346 | 0.6750 | 0.6933 | 0.6756 | |

Image 7 (Figure10, fourth column) | 0.6711 | 0.6766 | 0.7134 | 0.7478 | 0.7521 | 0.7402 | |

Image 8 (Figure11, first column) | 0.5871 | 0.5522 | 0.6282 | 0.6561 | 0.6394 | 0.6840 | |

Image 9 (Figure11, second column) | 0.6501 | 0.6599 | 0.7645 | 0.6852 | 0.6799 | 0.6700 | |

Image 10 (Figure11, third column) | 0.6240 | 0.6295 | 0.7185 | 0.6992 | 0.6980 | 0.7060 | |

Image 11 (Figure11, fourth column) | 0.6864 | 0.7069 | 0.7685 | 0.7155 | 0.7108 | 0.7352 | |

In recent years, several researchers of image quality assessment have also pointed out the problem that MSE and its variants cannot reflect some degradations[46, 51]. Therefore, in order to tackle this problem, several criteria for determining image qualities have been proposed, the SSIM index being a representative criterion. In the proposed method, we focus on this criterion and realize inpainting that maximizes the SSIM index. Therefore, it is natural that the proposed method achieves the highest SSIM values. Note that even though the use of the SSIM index for inpainting is effective, it is difficult to perfectly determine the order of inpainting performance that is the same as the subjective evaluation. This means that ranking of inpainting performance that perfectly reflects subjective evaluation is difficult, and further improvement is necessary in future work.

As quantitative evaluation, we have shown PSNR and SSIM index of the results by our method and other existing methods. Next, we focus on the computation cost of the proposed method. We first compare the computation times of the proposed method and other multivariate analysis-based methods in[31] and[35]^{ b }. The average computation time for obtaining the results of images 1 to 11 by our method was about 342.1 s. Then the proposed method is about 0.78 to 3.1 times (1.6 times on average) slower than the method in[31]. Note that the ratio smaller than one means that the computation time of the proposed method is shorter. In the method in[31], the procedure for the inpainting is simple since it only needs the computation of the eigenvector matrix and the calculation of the back projection for lost pixels. Therefore, the fast computation can be realized. On the other hand, the proposed method is about 1.2 to 4.7 times (2.9 times on average) faster than the method in[35]. In this method, the kernel PCA is adopted, and we have to calculate the projection onto the nonlinear subspace using the kernel trick, i.e., we cannot perform the direct projection. Furthermore, in this approach, the classification of the target patch including missing areas is performed, and thus, the inpainting procedures are performed for all clusters. Therefore, this needs high computation costs. Furthermore, the computation time of our method is about 4.3 to 12.5 times (7.4 times on average) longer than that of the exemplar-based method in[24]. Note that the method in[30] used as the comparative method in the experiments drastically improves the computation costs of[24], and it also introduces the GPU implementation. The CPU version improving the computation costs of[24] has also been proposed by the same authors[62]. In[62], Kwok et al. reported inpainting that was about 15 to 50 times faster than that of the method in[24].

In addition, compared with the MSE-based inpainting approach, which calculates the optimal sparse representation coefficients based on the MSE, the proposed method requires complex optimization procedures as shown in the previous section. In the MSE-based approach, it is well known that the normal equation can be simply solved, and it is much simpler than our method. It is therefore necessary to improve the speed of computation by introducing some alternative approaches into our inpainting method. This topic will be investigated in subsequent studies.

### 5.3 Inpainting of larger missing areas

The above scheme is similar to existing methods that simply select only the best matched examples, but the difference is shown below. In existing methods using only the best matched examples, the best matched patch is selected by monitoring errors in the known neighboring areas around the missing areas. On the other hand, the proposed method performs reconstruction of the patches based on SSIM-based sparse representation, and then, the examples that are best matched to the reconstructed patches are selected using the SSIM index, i.e., the best matched examples are selected from well-approximated reconstruction patches. This is the biggest difference between the existing methods and the proposed method.

It should be noted that although the proposed method can perform accurate reconstruction of patches, the obtained results tend to include color that is not included within the target image. This is because the proposed method does not adopt any specific procedures to avoid spurious color. Therefore, in the experiments on reconstruction of larger missing areas, we adopted the above scheme to avoid the propagation of spurious color.

From the obtained results, we can confirm that the proposed method enables successful inpainting of such large missing areas. Note that the images shown in Figures13,14,15, and16 are used as test images in several papers such as in[23, 24, 30] and[41]. Furthermore, since the flag images that correspond to Figures13b,14b,15b, and16b are generated in each paper, i.e., positions of missing areas are different from each other in those papers, we show discussion by comparing the results obtained by our method shown in Figures13,14,15, and16 and the results shown in those papers. From the results shown in these figures, we can see that the proposed method achieves comparable performance or some improvements, though it should be noted that since we do not have ground truth images for these test images, we perform subjective evaluation. Specifically, as shown in Figure15, the proposed method and the methods in[23] and[41] can achieve visually pleasant results. In this test image, since structural and textural components are simple and the percentage of missing areas is relatively small, it is easier to achieve successful inpainting. Similarly, Figure13 shows that successful inpainting could be achieved by our method and the methods in[24],[30], and[41], and improvement by our method can be confirmed in some areas. However, it should be noted that reconstruction of structural components, i.e., edges, by[41] can be realized more accurately. The biggest difference between[41] and other works including our method is priority estimation. Thus, by introducing an improved priority estimation scheme, the performance of the proposed method will be improved. Furthermore, Figure16 shows that results obtained by our method are comparable to results in[24] and[30]. Note that the flag images of this test image are different from each other in these methods, and we found that performance was affected by generation of flag images. This was also observed in the image shown in Figure14.

In order to simultaneously reconstruct the structure and texture regions, several methods have been proposed[23, 25, 42]. The method in[23] proposed a fragment-based algorithm which could preserve both structures and textures. A confidence map is used to determine which pixels have more surrounding information available. The reconstruction is performed from more confident pixels and is proceeded in a multiscale fashion from coarse to fine. Furthermore, a similar image fragment is found and copied to current unknown location, where a fragment is a circular neighborhood, and its radius is defined adaptive to its underlying structure. In contrast to the above advantage, it is reported in[24, 41] that this algorithm is extremely slow and may introduce blurring artifacts. The fragment is selected based on the absolute distance, and this tends to cause the problem, i.e., the blurring artifacts, similar to that caused when using the MSE-based distance. The method in[42] introduced a sparse representation model representing both structure and texture components to realize their simultaneous reconstruction. On the other hand, this method is based on least-square approximation, and the problem of using the MSE may occur. Therefore, by introducing this simultaneous representation model into the proposed SSIM-based approach, successful reconstruction can be expected. Furthermore, the method in[25] introduced interactive image editing tools to realize highly accurate structure reconstruction. Since the guide for the reconstruction can be provided by users, this improves the inpainting performance. Although this approach does not realize the perfectly automatic image inpainting, it will also improve the performance of the proposed method by adopting the interactive image editing tools.

## 6 Conclusions

In this paper, we have presented an inpainting method based on sparse representations optimized with respect to a perceptual metric. Using sparse representation, the proposed method adaptively provides subspaces optimal for reconstructing target patches including missing areas. In this approach, the SSIM-based criterion is introduced into calculation of the dictionary and inpainting algorithm. This enables perceptually optimized inpainting, and successful results can be obtained by the proposed method.

Although the proposed method can reconstruct large missing regions without blurring artifacts, it has more computational complexity than other existing approaches and also generates some artifacts in the output image as shown in Figure14. The computation cost and some artifacts caused by the proposed method should be concerned and solved in the future work.

Furthermore, extension of the algorithm to reconstruction of other types of missing image data is desirable for various applications. These topics will be future works and results will be presented in subsequent reports.

## Endnotes

^{a}In this paper, signal-atoms are simply referred to as 'atoms’ hereafter according to[38].

^{b}The experiments were performed on a personal computer using Intel(R) Core(TM) i7 950 CPU 3.06 GHz with 8.0 GB RAM. The implementation was performed using MATLAB.

## Appendix

*τ*is the same as finding the least upper bound of Equation 25, the first equivalence relationship holds. The second equivalence relationship holds since the denominator in Equation 25 is strictly positive, allowing us to multiply through and rearrange terms. In this way, we can derive Equation 26. Then,

*τ*becomes a true upper bound if

optimized in Equation 46 has a nonnegative optimal value, and the optimal vector${\widehat{\mathbf{x}}}_{i,j}^{(t)}\left({\rho}_{i,j}^{(t)}\right)$ in Equation 25 can be obtained. Thus, by applying the Lagrange multiplier approach to the above equation under the constraint${{\mathit{\mu}}_{{\mathbf{D}}_{i,j}^{(t)}}}^{\prime}{\mathbf{x}}_{i,j}^{(t)}={\rho}_{i,j}^{(t)}$, Equation 27 can be obtained.

## Notes

### Acknowledgements

This work was partly supported by Grant-in-Aid for Scientific Research (B) 25280036 and Grant-in-Aid for Young Scientists (B) 22700088, Japan Society for the Promotion of Science (JSPS).

## Supplementary material

## References

- 1.Yan R, Shao L, Liu Y: Nonlocal hierarchical dictionary learning using wavelets for image denoising.
*IEEE Trans. Image Process*2013, 22(12):4689-4698.MathSciNetCrossRefGoogle Scholar - 2.Shao L, Yan R, Li X, Liu Y: From heuristic optimization to dictionary learning: A review and comprehensive comparison of image denoising algorithms.
*IEEE Trans. Cybern.*2013. doi:101109/TCYB20132278548Google Scholar - 3.Buades A, Coll B, Morel J: A review of image denoising algorithms, with a new one.
*Multiscale Model. Simul*2005, 4(2):490-530. 10.1137/040616024MathSciNetCrossRefMATHGoogle Scholar - 4.Shao L, Zhang H, de Haan G: An overview and performance evaluation of classification-based least squares trained filters.
*IEEE Trans. Image Process*2008, 17(10):1772-1782.MathSciNetCrossRefGoogle Scholar - 5.Hansen PC, Nagy JG, O’Leary DP:
*Deblurring Images: Matrices, Spectra, and Filtering (Fundamentals of Algorithms)*. Philadelphia: Society for Industrial and Applied Mathematics; 2006.CrossRefMATHGoogle Scholar - 6.Tauber Z, Li ZN, Drew M: Review and preview: disocclusion by inpainting for image-based rendering.
*IEEE Trans. Syst., Man, Cybern., Part, C: Appl. Rev*2007, 37(4):527-540.CrossRefGoogle Scholar - 7.Masnou S, Morel J: Level lines based disocclusion.
*Proc. IEEE Int. Conf. Image Process. (ICIP)*1998, 3: 259-263.Google Scholar - 8.Bertalmio M, Sapiro G, Caselles V, Ballester C: Image inpainting. In
*Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, (SIGGRAPH’00)*. New York: ACM; 2000:417-424.CrossRefGoogle Scholar - 9.Bertalmio M, Bertozzi A, Sapiro G: Navier-stokes, fluid dynamics, and image and video inpainting.
*Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit*2001, 1: 355-362.Google Scholar - 10.Chan TF, Shen J: Nontexture inpainting by curvature-driven diffusions.
*J. Vis. Commun. Image Representation*2001, 12(4):436-449. 10.1006/jvci.2001.0487CrossRefGoogle Scholar - 11.Ogawa T, Haseyama M, Kitajima H: Restoration method of missing areas in still images using GMRF model.
*IEEE Int. Symp. Circuits Syst. (ISCAS)*2005, 5: 4931-4934.CrossRefGoogle Scholar - 12.Rares A, Reinders MJT, Biemond J: Edge-based image restoration.
*IEEE Trans. Image Process*2005, 14(10):1454-1468.CrossRefMATHGoogle Scholar - 13.Auclair-Fortier M-F, Ziou D: A global approach for solving evolutive heat transfer for image denoising and inpainting.
*IEEE Trans. Image Process*2006, 15(9):2558-2574.CrossRefGoogle Scholar - 14.Bertalmio M: Strong-continuation, contrast-invariant inpainting with a third-order optimal PDE.
*IEEE Trans. Image Process*2006, 15(7):1934-1938.CrossRefGoogle Scholar - 15.Liu J, Li M, He F: A novel inpainting model for partial differential equation based on curvature function.
*J. Multimedia*2012, 7(3):239-246.Google Scholar - 16.Wang M, Yan B, Ngan KN: An efficient framework for image/video inpainting.
*Signal Process: Image Commun*2013, 28(7):753-762. 10.1016/j.image.2013.03.002Google Scholar - 17.Qi F, Han J, Wang P, Shi G, Li F: Structure guided fusion for depth map inpainting.
*Pattern Recognit. Lett*2013, 34: 70-76. 10.1016/j.patrec.2012.06.003CrossRefGoogle Scholar - 18.C Ballester C, Bertalmio M, Caselles V, Sapiro G: Filling-in by joint interpolation of vector fields and gray levels.
*IEEE Trans. Image Process*2001, 10(8):1200-1211. 10.1109/83.935036MathSciNetCrossRefMATHGoogle Scholar - 19.Kokaram A: A statistical framework for picture reconstruction using 2D AR models.
*Image Vis. Comput*2004, 22(2, 1):165-171.CrossRefGoogle Scholar - 20.Bertalmio M, Vese L, Sapiro G, Osher S: Simultaneous structure and texture image inpainting.
*IEEE Trans. Image Process*2003, 12(8):882-889. 10.1109/TIP.2003.815261CrossRefGoogle Scholar - 21.Efros AA, Leung TK: Texture synthesis by nonparametric sampling.
*Proceedings of the Seventh IEEE International Conference on Computer Vision*1999, 2: 1033-1038.CrossRefGoogle Scholar - 22.Wey LW, Levoy M: Fast texture synthesis using tree-structured vector quantization. In
*Proceedings of SIGGRAPH 2000*. New York: ACM; 2000:479-488.Google Scholar - 23.Drori I, Cohen-Or D, Teshurun H: Fragment-based image completion. In
*Proceedings of SIGGRAPH 2003*. New York: ACM; 2003:303-312.Google Scholar - 24.Criminisi A, Perez P, Toyama K: Region filling and object removal by exemplar-based image inpainting.
*IEEE Trans. Image Process*2004, 13(9):1200-1212. 10.1109/TIP.2004.833105CrossRefGoogle Scholar - 25.Barnes C, Shechtman E, Finkelstein A, Goldman DB: PatchMatch: a randomized correspondence algorithm for structural image editing.
*ACM Trans. Graph. (Proc. SIGGRAPH)*2009. doi:10.1145/1531326.1531330Google Scholar - 26.Zhang Q, Lin J: Exemplar-based image inpainting using color distribution analysis.
*J. Inf. Sci. Eng*2012, 28(4):641-654.MathSciNetGoogle Scholar - 27.Shibata T, Iketani A, Senda S: Fast and structure-preserving image Inpainting based on probabilistic structure estimation.
*IEICE Trans. Inf. Syst*2012, 95(7):1731-1739.CrossRefGoogle Scholar - 28.Le Meur O, Guillemot C: Super-resolution-based inpainting. In
*Computer Vision - ECCV 2012*Edited by: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C. 12th European Conference on Computer Vision, Florence, 7-13 October 2012, Proceedings, Part VI (Springer, Heidelberg, 2012), pp. 554–567Google Scholar - 29.Le Meur O, Ebdelli M, Guillemot C: Hierarchical super-resolution-based inpainting.
*Image Process. IEEE Trans*2013, 22(10):3779-3790.MathSciNetCrossRefGoogle Scholar - 30.Kwok TH, Sheung H, Wang CCL: Fast query for exemplar-based image completion.
*IEEE Trans. Image Process*2010, 19(12):3106-3115.MathSciNetCrossRefGoogle Scholar - 31.Amano T, Sato Y: Image interpolation using BPLP method on the eigenspace.
*Syst. Comput. Japan*2007, 38: 87-96. 10.1002/scj.10319CrossRefGoogle Scholar - 32.Schölkopf B, Mika S, Burges CJC, Knirsch P, Müller KR, Rätsch G, Smola AJ: Input space versus feature space in kernel-based methods.
*IEEE Trans. Neural Netw*1999, 10(5):1000-1017. 10.1109/72.788641CrossRefGoogle Scholar - 33.Mika S, Schölkoph B, Smola A, Müller KR, Scholz M, Rätsch G: Kernel PCA and de-noising in feature spaces.
*Adv. Neural Inf. Process. Syst*1999, 11: 536-542.Google Scholar - 34.Kim KI, Franz MO, Schölkoph B: Iterative kernel principal component analysis for image modeling.
*IEEE Trans. Pattern Anal. Mach. Intell*2005, 27(9):1351-1366.CrossRefGoogle Scholar - 35.Ogawa T, Haseyama M: Missing intensity interpolation using a kernel PCA-Based POCS algorithm and its applications.
*IEEE Trans. Image Process*2011, 20(2):417-432.MathSciNetCrossRefGoogle Scholar - 36.Aharon M, Elad M, Bruckstein A: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation.
*IEEE Trans. Signal Process*2006, 54(11):4311-4322.CrossRefGoogle Scholar - 37.Elad M, Aharon M: Image denoising via sparse and redundant representations over learned dictionaries.
*IEEE Trans. Image Process*2006, 15(12):3736-3745.MathSciNetCrossRefGoogle Scholar - 38.Mairal J, Elad M, Sapiro G: Sparse representation for color image restoration.
*IEEE Trans. Image Process*2008, 17: 53-69.MathSciNetCrossRefMATHGoogle Scholar - 39.Wohlberg B: Inpainting with sparse linear combinations of exemplars. In
*IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009)*. Piscataway: IEEE; 2009:689-692.CrossRefGoogle Scholar - 40.Shen B, Hu W, Zhang Y, Zhang YJ: Image inpainting via sparse representation. In
*IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009)*. Piscataway: IEEE; 2009:697-700.CrossRefGoogle Scholar - 41.Xu Z, Sun J: Image inpainting by patch propagation using patch sparsity.
*IEEE Trans. Image Process*2010, 19(5):1153-1165.MathSciNetCrossRefGoogle Scholar - 42.Elad M, Starck JL, Querre P, Donoho D: Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA).
*Appl. Comput. Harmonic Anal*2005, 19(3):340-358. 10.1016/j.acha.2005.03.005MathSciNetCrossRefMATHGoogle Scholar - 43.Turkan M, Guillemot C: Locally linear embedding based texture synthesis for image prediction and error concealment. In
*19th IEEE International Conference on Image Processing (ICIP)*. Piscataway: IEEE; 2012:3009-3012.Google Scholar - 44.Guillemot C, Turkan M, Meur OL, Ebdelli M: Object removal and loss concealment using neighbor embedding methods.
*Signal Process.: Image Commun*28(10):1405-1419.Google Scholar - 45.Takahashi T, Konishi K, Furukawa T: Structured matrix rank minimization approach to image inpainting. In
*IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS)*. Piscataway: IEEE; 2012:860-863.CrossRefGoogle Scholar - 46.Girod B: What’s wrong with mean-squared error? In
*Digital Images and Human Vision*. Edited by: Watson AB. Cambridge: MIT Press; 1993:207-220.Google Scholar - 47.Wang Z, Bovik AC:
*Modern Image Quality Assessment*. San Rafael: Morgan & Claypool Publishers; 2006.Google Scholar - 48.Damera-Venkata N, Kite TD, Geisler WS, Evans BL, Bovik AC: Image quality assessment based on a degradation model.
*IEEE Trans. Image Process*2000, 4(4):636-650.CrossRefGoogle Scholar - 49.Sheikh HR, Bovik AC, de Veciana G: An information fidelity criterion for image quality assessment using natural scene statistics.
*IEEE Trans. Image Process*2005, 14(12):2117-2128.CrossRefGoogle Scholar - 50.Sheikh HR, Bovik AC: Image information and visual quality.
*IEEE Trans. Image Process*2006, 15(2):430-444.CrossRefGoogle Scholar - 51.Sheikh HR, Sabir MF, Bovik AC: A statistical evaluation of recent full reference image quality assessment algorithms.
*IEEE Trans. Image Process*2006, 15(11):3440-3451.CrossRefGoogle Scholar - 52.Wang Z, Bovik AC, Sheikh HR, Simoncelli EP: Image quality assessment: From error visibility to structural similarity.
*IEEE Trans. Image Process*2004, 13(4):600-612. 10.1109/TIP.2003.819861CrossRefGoogle Scholar - 53.Channappayya SS, Bovik AC, Caramanis C, Heath RW: Design of linear equalizers optimized for the structural similarity index.
*IEEE Trans. Image Process*2008, 17(6):857-872.MathSciNetCrossRefGoogle Scholar - 54.Rehman A, Rostami M, Wang Z, Brunet D, Vrscay ER: SSIM-inspired image restoration using sparse representation.
*EURASIP J. Adv. Signal Process*2012, 2012: 16. 10.1186/1687-6180-2012-16CrossRefGoogle Scholar - 55.Davis G, Mallat S, Avellaneda M: Adaptive greedy approximations.
*J. Construct. Approx*1997, 13: 57-98.MathSciNetCrossRefMATHGoogle Scholar - 56.Mallat S, Zhang Z: Matching pursuits with time-frequency dictionaries.
*IEEE Trans. Signal Process*1993, 41(12):3397-3415. 10.1109/78.258082CrossRefMATHGoogle Scholar - 57.Chen S, Billings SA, Luo W: Orthogonal least squares methods and their applications to non-linear system identification.
*Int. J. Contr*1989, 50(5):1873-1896. 10.1080/00207178908953472MathSciNetCrossRefMATHGoogle Scholar - 58.Pati YC, Rezaiifar R, Krishnaprasad PS: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition.
*Conf. Rec. 27th Asilomar Conf. Signals, Syst. Comput*1993, 1: 40-44.Google Scholar - 59.Tropp JA: Greed is good: algorithmic results for sparse approximation.
*IEEE Trans. Inf. Theory*2004, 50(10):2231-2242. 10.1109/TIT.2004.834793MathSciNetCrossRefMATHGoogle Scholar - 60.Chen SS, Donoho DL, Saunders MA: Automatic decomposition by basis pursuit.
*SIAM Rev*2001, 43: 129-159. 10.1137/S003614450037906XMathSciNetCrossRefMATHGoogle Scholar - 61.Gorodnitsky IF, Rao BD: Sparse signal reconstruction from limited data using FOCUSS: a re-weighted norm minimization algorithm.
*IEEE Trans. Signal Process*1997, 45(3):600-616. 10.1109/78.558475CrossRefGoogle Scholar - 62.Kwok TH, Wang CCL: Interactive image inpainting using DCT based exemplar matching.
*Adv. Vis. Comput., Lecture Notes Comput. Sci*2009, 5876/2009: 709-718.CrossRefGoogle Scholar

## Copyright information

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.