# Sparsity of solutions for variational inverse problems with finite-dimensional data

- 127 Downloads

## Abstract

In this paper we characterize sparse solutions for variational problems of the form \(\min _{u\in X} \phi (u) + F(\mathcal {A}u)\), where *X* is a locally convex space, \(\mathcal {A}\) is a linear continuous operator that maps into a finite dimensional Hilbert space and \(\phi \) is a seminorm. More precisely, we prove that there exists a minimizer that is “sparse” in the sense that it is represented as a linear combination of the extremal points of the unit ball associated with the regularizer \(\phi \) (possibly translated by an element in the null space of \(\phi \)). We apply this result to relevant regularizers such as the total variation seminorm and the Radon norm of a scalar linear differential operator. In the first example, we provide a theoretical justification of the so-called staircase effect and in the second one, we recover the result in Unser et al. (SIAM Rev 59(4):769–793, 2017) under weaker hypotheses.

## Mathematics Subject Classification

49J45 49N45 52A05 49N15## 1 Introduction

*sparse*in a certain sense. In this case, the initial data can often be recovered by solving a minimization problem with a suitable regularizer of the form

*H*finite-dimensional Hilbert space, models the finite number of observations (that is small compared to the dimension of

*X*) and \(y \in H\) is noise-free data.

When the domain *X* is finite-dimensional and the regularizer is the \(\ell _1\) norm, the problem falls into the established theory of *compressed sensing* [11, 19] that has seen a huge development in recent years. In this case, sparsity is intended as a *high number of zero coefficients* with respect to a certain basis of *X*.

In an infinite dimensional setting, when the domain *X* is usually a Banach space, there has been a clear evidence that the action of the regularizers is promoting different notions of sparsity, but there have not been a comprehensive theory explaining this effect.

*sparsity*plays a crucial role in the field of image processing and computer vision: in many cases, the recovered image in a variational model can be interpreted as

*sparse*with respect to a notion of

*sparsity*that is depending on the regularizer. For example, for classical total variation (TV) denoising [32]

*least error method*to recover a sparse solution with a fixed bound on the number of non-zero coefficients. Finally, it has been noted that suitable \(\ell ^1\)-type regularizers enforce sparsity when data are represented in a

*wavelet basis*(see for example [3, 20]).

The intrinsic sparsity of infinite-dimensional variational models with finite-dimensional data has been investigated by various authors in specific cases and in different contexts. One of the most important instances can be found in [16]: here, the authors notice that the regularizer is linked to the convex hull of the set of sparse vectors that we aim to recover. This was also noticed in *optimal control* theory (see, for example, [12]) and used in practice for developing efficient algorithms to solve optimization problems that are based on the sparsity of the minimizers [9, 10, 31].

More recently, several authors have investigated deeply the connection between regularizers and *sparsity*. In 2016, Unser et al. in [36] have studied the case where \(\phi (u) = \Vert Lu\Vert _{{\mathcal {M}}}\), *L* is a *scalar* linear differential operator and \(\Vert \cdot \Vert _\mathcal {M}\) denotes the Radon norm. They showed the existence of a *sparse* solution, namely a linear combination of counterimages of Dirac deltas which can be expressed using a fundamental solution of *L*. Also, the work of Flinth and Weiss [26] is worth mentioning, where they give an alternative proof of the result in [36] with less restrictive hypotheses. In both works, however, the case of a vector-valued differential operator was not treated and therefore, problems involving the total variation regularizer were not covered. After this manuscript was finalized, we discovered a recent preprint [5] where the authors study a similar abstract problem and apply it, in particular, to the TV regularizer in order to justify the staircase effect. We remark that [5] and the present paper were developed independently and differ in terms of the proofs as well as the applications.

*sparsity*for minimizers of general linear inverse problems with finite-dimensional data constraints. More precisely, we choose to work with locally convex spaces in order to deal, in particular, with weak* topologies. The latter is necessary in order to treat variational problems with TV regularization or Radon-norm regularization. We consider the following problem:

*X*is a locally convex space, \(\phi : X \rightarrow [0,+\infty ]\) is a lower semi-continuous seminorm, \(\mathcal {A}: X \rightarrow H\) is a linear continuous map with values in a finite-dimensional Hilbert space

*H*and

*F*is a proper, convex, lower semi-continuous functional. (Notice that this generality allows problems of the type (1) for noise-free data as well as soft constraints in case of noisy data.) Additionally we ask that \(\mathcal {A}(\text{ dom }\, \phi ) = H\) (see Assumption [

**H0**] below) and that \(\phi \) is coercive when restricted to the quotient space of

*X*with the null-space of \(\phi \) that we denote by \({\mathcal {N}}\) (see Assumption [

**H1**] below). Under these hypotheses we prove that there exists a

*sparse*minimizer of (3), namely a minimizer that can be written as a linear combination of extremal points of the unit ball associated to \(\phi \) (in the quotient space \(X/{\mathcal {N}}\)). More precisely, we obtain the following result:

### Theorem

Notice that our result completely characterizes the sparse solution \(\overline{u}\) of (3) and relates the notion of *sparsity* with structural properties of the regularizer \(\phi \). Moreover, our hypotheses are minimal for having a well-posed variational problem (3).

The strategy to prove the previous theorem relies on the application of Krein–Milman’s theorem and Carathéodory’s theorem in the quotient space of \({\mathcal {A}}(X)\) that allows to represent any element in the image by \(\mathcal {A}\) of the unit ball of the regularizer as a convex combination of the extremal points (see Theorem 3.3). In order to prove minimality for the element having the desired representation, we derive optimality conditions for Problem (3) (Proposition 2.12). For this purpose, we need to prove a *no gap* property in the quotient space between primal and dual problem. In locally convex vector spaces this is not straightforward and requires the notion of Mackey topology [34].

In the second part of our paper we apply the main result to specific examples of popular regularizers. First of all we recover the well-known result (see for example [35]) that by minimizing the Radon norm of a measure under finite-dimensional data constraints, one recovers a minimizer that is made of delta peaks. Indeed, according to our theory which applies when the space of Radon measures \(\mathcal {M}(\Omega )\) is equipped with the weak* topology, Dirac deltas are extremal points of the unit ball associated with the Radon norm of a measure and our result applies straightforwardly (see Sect. 4.1).

Then, we consider the TV regularizer for BV functions in bounded domains. Also in this case, our result applies when \(BV(\Omega )\) is equipped with the weak* topology. This justifies the usage of locally convex spaces in the general theory. In order to confirm the heuristic observation that *sparse* minimizers show a peculiar staircase effect, we characterize the extremal points of the unit ball associated to the TV norm (in the quotient space \(BV(\Omega )/\mathbb {R}\)). In particular, we extend a result of [1, 24] to the case where \(\Omega \) is a bounded domain. In order to achieve that, we need an alternative notion of simple sets of finite perimeter (see Definition 4.5). We prove the following theorem:

### Theorem

Finally, we apply our main result to the setting considered in [26, 36], i.e., where the regularizer is given by \(\phi (u) = \Vert Lu\Vert _\mathcal {M}\) for a scalar linear differential operator *L*. We remove the hypotheses concerning the structure of the null-space of *L* and we work in the space of finite-order distributions equipped with the weak* topology. This allows us to have a general framework for these inverse problems that does not require additional assumptions on the Banach structure of the minimization domain (see [5, 26] for comparison). It also justifies once more the usage of locally convex spaces in the abstract theory. In this setting, as an application of our main theorem, we are able to recover the same result as in [26, 36].

### Theorem

*s*sufficiently large, depending only on

*L*and \(\Omega \)) and \(\phi (u) = \Vert Lu\Vert _{{\mathcal {M}}}\). Then, there exists \({\overline{u}}\) a minimizer of (3) such that

*L*obtained by the Malgrange–Ehrenpreis theorem translated by

*x*).

## 2 Setting and preliminary results

### 2.1 Basic assumptions on the functionals

*H*be an

*N*-dimensional real Hilbert space and \({\mathcal {A}} : X \rightarrow H\) a linear continuous operator and we denote by \({\mathcal {A}}^* : H \rightarrow X^*\) its continuous adjoint, defined thanks to Riesz’s theorem as

*H*the duality product between

*X*and \(X^*\).

*F*and \(\phi \) separately.

– **Assumptions on***F*:

*H*, which is the standard topology on finite-dimensional spaces.

– * Assumptions on* \(\phi \):

*X*. We make the following additional assumption:

[

**H0**] \({\mathcal {A}}(\text{ dom }\, \phi ) = H\),

*X*, we consider the following quotient space:

*u*in the quotient space by \(\pi _{{\mathcal {N}}}\). Likewise, for \(U \subset X\), we tacitly identify the Minkowski sum \(U + \mathcal {N}\subset X\) with its image under \(\pi _\mathcal {N}\) in \(X_\mathcal {N}\).

- [
**H1**] \(\phi _{{\mathcal {N}}}\) is coercive, i.e. the sublevel setsare compact for every \(\alpha > 0\).$$\begin{aligned} S^-(\phi _{{\mathcal {N}}},\alpha ):=\{u_{\mathcal {N}} \in X_{{\mathcal {N}}} : \phi _{{\mathcal {N}}}(u_{\mathcal {N}})\le \alpha \} \end{aligned}$$

### Remark 2.1

Note that \(\phi _\mathcal {N}\) is lower semi-continuous in \(X_\mathcal {N}\): Indeed, as \(\phi \) is lower semi-continuous, the superlevel-sets \(S^+(\phi , \alpha ) = \{ u \in X: \phi (u) > \alpha \}\) are open in *X* for each \(\alpha \). Now, as \(\phi _\mathcal {N}(u_\mathcal {N}) > \alpha \) if and only if \(\phi (u) > \alpha \), we have \(S^+(\phi _\mathcal {N}, \alpha ) = \pi _\mathcal {N}(S^+(\phi , \alpha ))\). Since \(\pi _N\) is an open map (*X* is a topological group with respect to addition), each \(S^+(\phi _\mathcal {N}, \alpha )\) is open in \(X_\mathcal {N}\) meaning that \(\phi _\mathcal {N}\) is lower semi-continuous.

As a consequence, in order to obtain [**H1**], it suffices that each \(S^-(\phi _\mathcal {N}, \alpha )\) is contained in a compact set.

From now on we assume that \(\mathcal {A}\), *F* and \(\phi \) satisfy the properties described above.

### 2.2 Existence of minimizers

We state the following minimization problem:

### Problem 2.2

*X*) Given \(\phi \), \({\mathcal {A}}\) and

*F*with the assumptions given in the previous section, define for \(u\in X\) the following functional:

In order to prove the existence of minimizers for Problem 2.2 we state an auxiliary minimization problem in the quotient space \(X_{{\mathcal {N}}}\).

### Problem 2.3

*F*, \(\phi \) and \({\mathcal {A}}\) with the assumptions given in the previous section, we define

Note that the functional \({\mathscr {J}}\) is well-defined in \(X_{{\mathcal {N}}}\) as both summands in (11) are constant on \(u + {\mathcal {N}}\) for every \(u \in X\). We aim at proving existence of minimizers for Problem 2.3. For this reason we firstly prove a lemma about the coercivity of functionals defined in quotient spaces.

### Lemma 2.4

*Y*be a locally convex space and \(f: Y \rightarrow (-\infty , +\infty ]\) be coercive. Given \({\mathcal {M}} \subset Y\) a closed subspace of

*Y*, we define, \({\widetilde{f}} : Y_\mathcal {M}\rightarrow (-\infty , +\infty ]\) on the space \(Y_\mathcal {M}= Y/\mathcal {M}\) as

### Proof

### Proposition 2.5

There exists a minimizer for Problem 2.3.

### Proof

*F*is proper, using Hypothesis [

**H0**] we infer that the infimum of Problem 2.3 is not \(+\infty \). Likewise, since

*F*is convex, lower semi-continuous and coercive, it is bounded from below such that the infimum of Problem 2.3 is also not \(-\infty \). Let us show that the proper and convex function \(u \mapsto \inf _{\psi \in \mathcal {N}} F(\mathcal {A}(u + \psi ))\) is lower semi-continuous in

*X*. For that purpose, observe that \(\mathcal {A}(\mathcal {N})\) is a subspace of the finite-dimensional space

*H*and hence closed. Denote by \(H_\mathcal {N}\) the quotient space \(H/\mathcal {A}(\mathcal {N})\) on which we define \(F_\mathcal {N}: H_\mathcal {N}\rightarrow (-\infty ,+\infty ]\) according to

*F*is assumed to be coercive, applying Lemma 2.4 yields that \(F_\mathcal {N}\) is also coercive and lower semi-continuous in particular. Now,

*u*by \(u + \varphi \), \(\varphi \in \mathcal {N}\) does not change the value of this functional, so by the same argument as in Remark 2.1, we deduce that \(u_\mathcal {N}\mapsto \inf _{\psi \in \mathcal {N}} F(\mathcal {A}(u + \psi ))\) and consequently \({\mathscr {J}}\), is lower semi-continuous.

**H1**], we infer that \(S^-({\mathscr {J}},\alpha )\) is compact for every \(\alpha \in \mathbb {R}\).

We are now in position to prove the existence of minimizers for Problem 2.2.

### Theorem 2.6

Given \({\overline{u}}_{\mathcal {N}} = {\overline{u}} + {\mathcal {N}}\) a minimizer for Problem 2.3, there exists \({\overline{\psi }} \in {\mathcal {N}}\) such that \({\overline{u}} + {\overline{\psi }}\) is a minimizer for Problem 2.2.

### Proof

*F*is proper, convex, lower semi-continuous and coercive as well as \(\mathcal {A}(\mathcal {N})\) is finite-dimensional and hence closed in

*H*, the infimum is realized and finite. Denoting by \({\overline{\eta }}\) a minimizer, we choose \({\overline{\psi }} \in {\mathcal {N}}\) such that \({\mathcal {A}} {\overline{\psi }} = {\overline{\eta }}\). Then, \({\overline{v}} := {\overline{u}} + {\overline{\psi }}\) is a minimizer for Problem 2.2. Indeed,

### Remark 2.7

### 2.3 Optimality conditions

In this section we want to obtain optimality conditions for Problem 2.3 deriving a dual formulation and showing that under our hypotheses we have *no gap* between the primal and the dual problem.

*Y*, define the following family of seminorms on \(Y^*\):

*Mackey topology*and it is denoted by \(\tau (Y^*,Y)\). It is the strongest topology on \(Y^*\) such that

*Y*is still the dual of \(Y^*\) (see Theorem 9 in Section A.4 of [4]).

*Fenchel conjugate*functionals which are defined as follows. Given a real locally convex space

*Y*and a proper function \(f: Y \rightarrow (-\infty ,+\infty ]\) we denote by \(f^* : Y^* \rightarrow (-\infty ,+\infty ]\) the conjugate of

*f*defined as

### Proposition 2.8

*Y*be a real locally convex space. Given a proper, lower semi-continuous, convex function \(f:Y \rightarrow (-\infty ,+\infty ]\), the following statements are equivalent:

- (i)
\(f^*\) is continuous in zero for the Mackey topology \(\tau (Y^*,Y)\).

- (ii)for every \(\alpha \in \mathbb {R}\), the sublevel-setis compact with respect to the weak topology.$$\begin{aligned} S^-(f,\alpha ) := \{x \in Y: f(x) \le \alpha \} \end{aligned}$$

### Remark 2.9

In the next proposition, we will apply this result for \(f=\phi _{\mathcal {N}}\), a proper and lower semi-continuous seminorm. In this case, the proof of Proposition 2.8 is straightforward. Indeed, \(\phi _{\mathcal {N}}^* = I_{\{\rho _S(u^*) \le 1\}}\) where *I* is the indicator function and \(S =\{u: \phi _{\mathcal {N}}(u) \le 1\}\). Hence, if *S* is weakly compact, then thanks to the definition of the Mackey topology, \(\phi ^*_{\mathcal {N}}\) is continuous in zero.

Conversely, if \(\phi _\mathcal {N}^*\) is continuous in zero, then there exist absolutely convex, weakly compact sets \(A_1,\ldots ,A_n \subset X_\mathcal {N}\) and \(\varepsilon _1, \ldots , \varepsilon _n > 0\) such that \(\rho _{A_i}(u^*) \le \varepsilon _i\) for \(i=1,\ldots ,n\) implies \(\rho _S(u^*) \le 1\). This, however, means that \(S \subset \varepsilon _1^{-1} A_1 + \ldots + \varepsilon _n^{-1} A_n\). Indeed, if this were not the case, one could separate a \(u \in S\) from the absolutely convex and weakly compact set \(\varepsilon _1^{-1} A_1 + \ldots + \varepsilon _n^{-1} A_n\) by a \(u^* \in X_\mathcal {N}^*\) such that \(\langle u^*, u\rangle > 1\) as well as \(\langle u^*, \sum _{i=1}^n \varepsilon _i^{-1} u_i \rangle \le 1\) for \(u_i \in A_i\). In particular, \(\rho _{A_i}(u^*) \le \varepsilon _i\) for each \(i=1,\ldots ,n\) leading to the contradiction \(\rho _{S}(u^*) \le 1\). Due to lower semi-continuity of \(\phi _\mathcal {N}\), *S* is a closed convex subset of a weakly compact set and hence weakly compact. By positive homogeneity of \(\phi _\mathcal {N}\), the sets \(S^-(\phi _\mathcal {N}, \alpha )\) are compact for all \(\alpha \in \mathbb {R}\).

### Remark 2.10

We denote by \({\mathcal {A}}_{\mathcal {N}}^*: H_{\mathcal {N}} \rightarrow X_{{\mathcal {N}}}^*\) its adjoint that has finite-dimensional image and is hence continuous for each topology that makes \(X_{{\mathcal {N}}}^*\) a topological vector space. Given \(w\in H\), we denote by \(w_{{\mathcal {N}}} := w + \mathcal {A}({\mathcal {N}})\) an element of \(H_{\mathcal {N}}\).

### Remark 2.11

Notice again that \(F_{\mathcal {N}}\) is proper, convex and, applying Lemma 2.4 with \(f = F\) and \({\mathcal {M}} = {\mathcal {A}}({\mathcal {N}})\), it is also coercive in \(H_{\mathcal {N}}\).

*Y*, the element \(x^* \in Y^*\) is called a

*subgradient*of

*f*in \(x \in Y\), if

### Proposition 2.12

- (i)
\({\mathcal {A}}_{\mathcal {N}}^* \overline{w}_{{\mathcal {N}}} \in \partial \phi _{{\mathcal {N}}} (\overline{u}_{\mathcal {N}})\),

- (ii)
\({\mathcal {A}}_{\mathcal {N}} \overline{u}_{\mathcal {N}} \in \partial F_{\mathcal {N}}^* (-\overline{w}_{{\mathcal {N}}})\).

### Proof

**H1**] (that implies in particular that the sublevel sets of \(\phi _{{\mathcal {N}}}\) are weakly compact), using Proposition 2.8, we have that \(\phi ^*_{\mathcal {N}}\) is continuous in zero. Hence, applying Theorem III.4.1 in [22], the problem (\({\mathscr {P}}^*\)) has zero gap to its dual which coincides, as the dual space of \(X_\mathcal {N}^*\) is \(X_\mathcal {N}\) and \(\phi _\mathcal {N}^{**} = \phi _\mathcal {N}\) as well as \(F_\mathcal {N}^{**} = F_\mathcal {N}\), with Problem 2.3, i.e.,

The functional \(\phi _{{\mathcal {N}}}^* \circ {\mathcal {A}}_{\mathcal {N}}^* + F_{\mathcal {N}}^* \circ (-{{\,\mathrm{id}\,}})\) is convex, proper and lower semi-continuous. We aim at showing that it is also coercive. It is enough to prove that \(\phi _{{\mathcal {N}}}^* \circ {\mathcal {A}}_{\mathcal {N}}^*\) is the indicator function of a compact convex set as \(F_\mathcal {N}^*\) is proper, convex and lower semi-continuous.

**H0**] and the definition of \({\mathcal {A}}_{\mathcal {N}}\), we have

**H1**], \(\phi _\mathcal {N}\) is coercive, so Lemma 2.4 yields that \(\phi _0\) is coercive. As \(\phi _\mathcal {N}\) is a seminorm, \(\phi _0\) is proper and convex. It follows that

*G*is proper, convex and lower semi-continuous. As \(\text{ dom }\, G = H_{\mathcal {N}}\), convexity implies that

*G*is continuous everywhere in \(H_\mathcal {N}\) and in particular, in zero. Consequently, \(G^* = \phi _\mathcal {N}^* \circ \mathcal {A}_\mathcal {N}^*\) is coercive. It follows that \(G^*\) is the indicator function of a compact convex set as

*G*is one-homogeneous. Hence, applying the direct method of calculus of variations in \(H_{\mathcal {N}}\) we infer that Problem (\({\mathscr {P}}^*\)) has a minimizer that we denote by \(\overline{w}_{{\mathcal {N}}} \in H_{\mathcal {N}}\).

Vice versa, if there exist \(\overline{w}_{{\mathcal {N}}}\) and \({\overline{u}}_{\mathcal {N}}\) that satisfy the optimality conditions (i) and (ii), applying again Proposition III.4.1 in [22] we deduce that \({\overline{u}}_{\mathcal {N}}\) is a minimizer of Problem 2.3 and \(\overline{w}_{{\mathcal {N}}}\) is a minimizer of (\({\mathscr {P}}^*\)). \(\square \)

### Remark 2.13

- (i)
\({\mathcal {A}}_{\mathcal {N}}^* {\overline{w}}_{{\mathcal {N}}} \in {\mathcal {K}}\),

- (ii)
\(\langle {\mathcal {A}}_{\mathcal {N}}^* {\overline{w}}_{{\mathcal {N}}}, {\overline{u}}_{\mathcal {N}}\rangle = \phi _{{\mathcal {N}}}({\overline{u}}_{\mathcal {N}})\).

## 3 Abstract main result: existence of a sparse minimizer

### Definition 3.1

*Extremal points*) Given a convex set

*K*of a locally convex space we define the extremal points of

*K*as the points \(k\in K\) such that if there exists \(t \in (0,1)\), \(k_1,k_2 \in K\) such that

The set of extremal points of *K* will be denoted by *Ext*(*K*).

First we need a lemma about the behaviour of extremal points under a linear mapping.

### Lemma 3.2

*K*be a convex set in a locally convex space

*X*. Given

*Y*a real topological vector space and a linear map \(L:X \rightarrow Y\) the following statements hold:

- (i)
If

*L*is continuous and*K*is compact, then \(Ext(LK) \subset LExt(K)\). - (ii)
If

*L*is injective, then \(Ext(LK) = LExt(K)\).

### Proof

To prove (i) let us consider \(k \in K\) such that *Lk* is an extremal point of *LK*. We want to show that there exists \({\overline{k}} \in Ext(K)\) such that \(Lk = L{\overline{k}}\) which proves the first claim.

*L*), by the Krein–Milman theorem, it admits an extremal point denoted by \({\overline{k}} \in (k + \ker L) \cap K\). In order to conclude the proof we need to prove that \({\overline{k}} \in Ext(K)\). Assume the convex combination

*L*we obtain that

*L*to both sides and using that \(Lk \in Ext(LK)\) we obtain that \(Lk = Lk_1 = Lk_2\). Then the injectivity of

*L*implies that \(k=k_1=k_2\), thus \(k \in Ext(K)\).

*L*is injective and using that \(k \in Ext(K)\) we conclude that \(k = k_1= k_2\) and hence \(Lk = Lk_1= Lk_2\). \(\square \)

We are now in the position to prove our main theorem.

### Theorem 3.3

### Proof

We apply Propositions 2.5 and 2.12 to find \({\overline{u}}_{\mathcal {N}} \in X_{\mathcal {N}}\) a minimizer of Problem 2.3 and \({\overline{w}}_{{\mathcal {N}}} \in H_{\mathcal {N}}\) such that properties (i) and (ii) in Proposition 2.12 hold. If \(\phi _{{\mathcal {N}}}({\overline{u}}_{\mathcal {N}}) = 0\), then applying Theorem 2.6, we infer that there exists \({\overline{\psi }} \in {\mathcal {N}}\) such that \({\overline{\psi }}\) is a minimizer of Problem 2.2. Therefore, Eq. (21) holds with \(p=0\). Hence, we suppose without loss of generality that \({\overline{u}}_\mathcal {N}\notin {\mathcal {N}}\), i.e. \(\phi _\mathcal {N}({\overline{u}}_\mathcal {N}) > 0\).

**H1**] we infer that \(B_{\mathcal {N}}\) is compact and thanks to Remark 2.10 we have that \({\mathcal {A}}_{\mathcal {N}} B_{\mathcal {N}}\) is compact in \(H_{\mathcal {N}}\) as well. As \(\frac{1}{\phi _{\mathcal {N}}({\overline{u}}_{\mathcal {N}})}{\mathcal {A}}_{\mathcal {N}} \left( u_{\mathcal {N}}\right) \in {\mathcal {A}}_{\mathcal {N}} B_{\mathcal {N}} \subset H_{\mathcal {N}}\), by the Krein–Milman theorem and Carathéodory theorem, we have that

*p*is minimal, in the sense that it is the minimal number such that a decomposition like (23) holds.

*p*.

### Remark 3.4

Let us point out similarities and differences to the work [5], where a theorem similar to Theorem 3.3 has been shown. First, instead of seminorms, [5] deals with general convex regularizers. Moreover, in [5], the existence of minimizers for the considered variational inverse problem is assumed a priori, with the goal of disentangling the main result (which is purely geometric) from the topology chosen on *X*. In contrast, we make suitable assumptions that ensure existence of minimizers for the inverse problem and that the set of extremal points of the balls of the regularizer is non-empty. In such a way, we provide an operative result with hypotheses that can be easily checked.

It is worth to notice that both our result and [5] do not provide a sparse representation for *every* minimizer of the variational inverse problem. However, the points of view are complementary. In [5], the authors characterize, with a help of a theorem by Dubins and Klee [21, 28], the minimizers belonging to the finite-dimensional faces of the set of the solutions (we refer to [5] for the definition of the face of a convex set). In particular, when the dimension of a face is zero, i.e., the face is an extremal point, it is possible to obtain a sparse representation of the minimizer in terms of the extremal points and extremal rays of a certain sublevel set of the regularizer (see Section 2 in [5] for the definition of extremal ray). This is still true when the dimension of the face is larger than zero and finite (see Theorem 1 in [5]). Existence of extremal points is then, e.g., obtained by Klee’s extension of the Krein–Milman theorem [27] in case of regularizers whose sublevel sets are closed, convex and locally compact in an appropriate locally convex space. On the contrary, our theorem always provides the existence of a minimizer represented as a convex combination of extremal points of the ball of the regularizer. Due to the different techniques used, such a sparse minimizer does not necessarily belong to a finite-dimensional face of the set of the solutions.

## 4 Examples of sparsity for relevant regularizers

In this section we study the structure of the extremal points for relevant regularizers, in order to applying the results of the previous section. The first example is about the Radon norm in the space of measures.

### 4.1 The Radon norm for measures

*H*is a finite dimensional Hilbert space.

In order to get more information from Theorem 3.3 we need to characterize the extremal points of *B*. This result is well-known, but we go through it for the reader’s convenience.

### Proposition 4.1

*B*defined as above we have that

### Proof

*u*not supported on a singleton. Then \(\Vert u\Vert _{\mathcal {M}} =1\) and there exists a measurable set \(A \subset \Omega \) such that \(0<|u|(A)<1\). We havewhich implies that

*u*is not an extremal point. Hence all the extremal points of

*B*are of the form \(a\delta _x\) where \(a \in \mathbb {R}\) and \(x\in \Omega \). As the extremal points of

*B*have unit Radon norm we deduce immediately that \(|a| = 1\). \(\square \)

From Proposition 4.1 we obtain immediately the following theorem:

### Theorem 4.2

*X*, \(\phi \), \({\mathcal {A}}\) and

*F*, there exists a minimizer of Problem 2.2 denoted by \({\overline{u}} \in X\) such that

### 4.2 The total variation for BV functions

We equip *X* with the weak* topology for BV functions by interpreting \(BV(\Omega )\) as a dual space (see, for instance [2, Remark 3.12]). As in the previous example we consider a linear, continuous and surjective map \({\mathcal {A}} : BV(\Omega ) \rightarrow H\) and \(F: H \rightarrow (-\infty ,+\infty ]\) that satisfies the assumptions given in Sect. 2.1. Under these choices we want use Theorem 3.3 to characterize the sparse solutions of Problem 2.2.

Notice that with the chosen topology on *X*, the functional \(\phi (u) = |Du|(\Omega )\) is a lower semi-continuous seminorm and \({\mathcal {A}}\) satisfies assumption [**H0**]. Therefore in order to apply Theorem 3.3 we just need to verify Hypothesis [**H1**] that is the content of the next lemma. Notice that in this specific case, we have \({\mathcal {N}} = \mathbb {R}\) as \(\Omega \) is connected.

### Lemma 4.3

### Proof

We first remark that the metrizability of the space \(X_{\mathcal {N}}\) on bounded sets is not straightforward to show. Therefore we work with *nets* instead of sequences (we refer to Sections 1.3, 1.4, 1.6 in [30] for the basic properties of nets).

*finite perimeter*is a measurable set \(A \subset \Omega \) such that \(\chi _A \in BV(\Omega )\) for the characteristic function \(\chi _A\) of

*A*. In this case, we call \(P(A,\Omega ) = |D\chi _A|(\Omega )\) the

*perimeter*of

*A*.

### Definition 4.4

(*Decomposable set*) A set of finite perimeter \(E \subset \Omega \) is *decomposable* if there exists a partition of *E* in two sets *A*, *B* with \(|A| > 0\) and \(|B| > 0\) such that \(P(E,\Omega ) = P(A,\Omega ) + P(B,\Omega )\). A set of finite perimeter is *indecomposable* if it is not *decomposable*.

In [1], the notion of *saturated* set is introduced that is suitable in the case \(\Omega = \mathbb {R}^d\). In our case of bounded domains, we do not need this requirement, but we ask that both the set and its complement are indecomposable.

### Definition 4.5

(*Simple set*) We say that a set of finite perimeter *E* is *simple* if both *E* and \(\Omega {\setminus } E\) are *indecomposable*.

*E*defined as

*E*is then defined as \(\partial ^* E = \mathbb {R}^d {\setminus } (E^0 \cup E^1)\).

We will also need the following result due to Dolzmann and Müller [18].

### Lemma 4.6

*E*.

With the following theorem we are able to characterize the extremal points of \(B_{\mathcal {N}}\) in a rather straightforward way without relying on indecomposability results for the reduced boundary as in [25].

### Theorem 4.7

### Proof

*u*assumes two values almost everywhere. In order to do that we define

*s*. As \(u_\mathcal {N}\) is an extremal point of \(B_\mathcal {N}\), it follows that \(u_\mathcal {N}= (u_1)_\mathcal {N}= (u_2)_\mathcal {N}\) which means that there exist \(c_1,c_2 \in \mathbb {R}\) such that \(u = u_1 + c_1 = u_2 + c_2\). Now, for \(x \in \Omega \) such that \(u(x) \ge s\), this implies \(u(x) = 2 s + c_1\). Likewise, if \(u(x) \le s\), then \(u(x) = c_2\). Hence,

*u*assumes at most two values almost everywhere. However, since \(|Du|(\Omega ) = 1\), it assumes exactly two values almost everywhere and \(2s + c_1 > c_2\). (Moreover, the set \(\{u(x) = s\}\) must be a null set.)

*u*of \(u_{\mathcal {N}}\), we can suppose that \(u(x) \in \{0,a\}\) almost everywhere, where \(a > 0\). Defining \(E = \{x\in \Omega : u(x) = a\}\) and using the fact that \(|Du|(\Omega ) = 1\) one concludes that \(u= \frac{\chi _E}{P(E,\Omega )}\). Suppose now by contradiction that

*E*is decomposable and let

*A*and

*B*be the sets of finite perimeter given by Definition 4.4. Then, \(P(A,\Omega ) > 0\) and \(P(B,\Omega ) > 0\) and defining

*A*and

*B*given by Definition 4.4, Formula (33) is a non-trivial convex combination of

*u*.

*A*and

*B*its decomposition according to Definition 4.4. Define

*u*.

Thus, *E* must be a simple set and the first inclusion is proven.

*E*and \(\Omega {\setminus } E\) then yields that \(u_i = d_i \chi _E + c_i\) for some \(c_i,d_i \in \mathbb {R}\), \(i=1,2\). By (35), we further deduce \(|Du_1|(\Omega ) = |Du_2|(\Omega ) = 1\) which implies that \(|d_1| = |d_2| = P(\Omega ,E)^{-1} > 0\). Clearly, \(d_1\) and \(d_2\) cannot both be negative. Also, \(d_1\) and \(d_2\) cannot have opposite sign as in this case, comparing \(|D\chi _E|(\Omega )/P(E,\Omega )\) and \(|\lambda Du_1 + (1-\lambda )Du_2|(\Omega )\) leads to the contradiction

We have shown the following theorem.

### Theorem 4.8

### 4.3 Radon norm of a scalar differential operator

In this section we consider the case where \(\phi (u) = \Vert Lu\Vert _{{\mathcal {M}}}\), namely the Radon norm of a linear, translation-invariant scalar differential operator *L*. This was already treated in [26, 36] in different settings. Our goal is to show that our theory applies straightforwardly to this case. We start some useful properties of scalar differential operators that we are going to use. In what follows we denote by \(\alpha = (\alpha _1,\ldots ,\alpha _d) \in \mathbb {N}^{d}\) a multi-index and we employ the standard multi-index notation and conventions.

#### 4.3.1 Some technical lemmas

*G*for

*L*is ensured by virtue of the classical Malgrange–Ehrenpreis theorem (see for example Theorem 8.5 in [33]).

### Theorem 4.9

*L*a non-zero differential operator with linear coefficients according to (37) there exists a distribution \(G \in D(\mathbb {R}^d)^*\) which is a fundamental solution for

*L*, namely

*T*is indeed well-defined because \({\widetilde{\mu }}\) is compactly supported on \(\mathbb {R}^d\) and \({\widetilde{\mu }} \star G \in D(\mathbb {R}^d)^*\). Define then \(T_\Omega : {\mathcal {M}}(\Omega ) \rightarrow D(\Omega )^*\) as \(T_\Omega (\mu ) = (T\mu )_{|_{\Omega }}\).

### Remark 4.10

### Lemma 4.11

### Proof

*G*is finite on bounded sets which means that there exists \(s\in \mathbb {N}\) such that

#### 4.3.2 Existence of a sparse minimizer

*L*given in Eq. (37). With

*s*is given by Lemma 4.11, we set \(X = C_0^s(\Omega )^*\), the space of distributions of order

*s*equipped with the weak* topology. From now on we consider the weak differential operator

*L*mapping between \(X \rightarrow C_0^{s+q}(\Omega )^*\). Notice that with this definition,

*L*is a continuous operator when

*X*and \(C_0^{s+q}(\Omega )^*\) are equipped with the weak* topology. Indeed, the adjoint \(L^*\) according to (38) maps continuously between the spaces \(C_0^{s+q}(\Omega ) \rightarrow C_0^s(\Omega )\) as a classical differential operator. Thus, considering \(u_n {\mathop {\rightharpoonup }\limits ^{*}} u\) in

*X*and \(\varphi \in C_0^{s+q}(\Omega )\) we have \(L^*\varphi \in C_0^{s}(\Omega )\) and hence,

### Remark 4.12

*X*(with respect to the weak* topology). Indeed, once again, as \(C_0^s(\Omega )\) is separable we know that weak* lower semi-continuity for

*L*is equivalent to weak* sequential lower semi-continuity. Therefore, we consider a sequence \((u_n)_n \subset X\) such that \(u_n {\mathop {\rightharpoonup }\limits ^{*}} u\) in

*X*and we suppose without loss of generality that

*L*is weak*-weak* closed we infer that \(v = Lu\) and from the lower semi-continuity of the Radon norm with respect to weak* convergence in \({\mathcal {M}}(\Omega )\) we conclude that \(\Vert Lu\Vert _{{\mathcal {M}}} \le C\).

In order to apply Theorem 3.3, it remains to verify Assumption [**H1**]. This is the content of the next proposition. We remind that \(X_{\mathcal {N}} = X + {\mathcal {N}}\) (equipped with the quotient of the weak* topology of *X*) where \({\mathcal {N}}\) is the null-space of *L* and \(\phi _{\mathcal {N}}(u_{\mathcal {N}}) = \phi _{\mathcal {N}}(u + {\mathcal {N}}) := \phi (u)\) (for notational convenience we denote by *L* the operator acting on \(X_{\mathcal {N}}\) in the natural way).

### Proposition 4.13

### Proof

Similarly to the proof of Lemma 4.3 we employ *nets* since metrizability of the space \(X_{\mathcal {N}}\) does not play a role in this context.

*X*(Theorem 1.6.2 in [30]). As the projection on the quotient is a continuous operation we deduce also that

We are now in position to apply Theorem 3.3. Consider \({\mathcal {A}} : X \rightarrow H\) a linear continuous operator such that [**H0**] holds and \(F:H \rightarrow (-\infty ,+\infty ]\) satisfying the assumptions in Sect. 2.1.

*G*translated by

*x*, i.e., such that \(LG_x = \delta _x\).

### Theorem 4.14

### Proof

*L*on both sides and using Remark 4.10 we deduce that \(\mu = 0\).

## 5 Conclusions and open problems

The abstract main result of this paper contained in Theorem 3.3 about the structure of a minimizer of a variational problem with finite dimensional data appears to be widely applicable, thanks to its generality. The usability of this theorem to concrete problems relies, however, on the characterization of the extremal points of the unit ball associated with the given regularizer. Such a characterization appears to be fundamental for devising suitable algorithms that rely on the structure of the minimizers given by Theorem 3.3.

The total variation of a function with bounded variation.

The Radon norm of a scalar differential operator.

*TV*seminorm, leading, e.g., to a better understanding of how higher-order regularizers reduce the staircase effect.

*TV*and \(TV^2\) models. For example, in [8], the so called

*total generalized variation*was introduced, which is defined in the following way:

*k*and \(\alpha = (\alpha _0,\ldots , \alpha _{k-1})\) are positive parameters. The characterization of extremal points of the ball associated with these particular regularizers is, up to our knowledge, still not known and would lead to a deep understanding of the regularization effects in respective variational models.

## Notes

### Acknowledgements

Open access funding provided by University of Graz. The authors gratefully acknowledge the funding of this work by the Austrian Science Fund (FWF) within the Project P 29192. We also thank Professor Luigi Ambrosio for the useful remarks regarding [1].

## References

- 1.Ambrosio, L., Caselles, V., Masnou, S., Morel, J.-M.: Connected components of sets of finite perimeter and applications to image processing. J. Eur. Math. Soc.
**3**(1), 39–92 (2001)MathSciNetCrossRefGoogle Scholar - 2.Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, Oxford (2000)zbMATHGoogle Scholar
- 3.Antoniadis, A., Fan, J.: Regularization of wavelet approximations. J. Am. Stat. Assoc.
**96**(455), 939–967 (2001)MathSciNetCrossRefGoogle Scholar - 4.Aubin, J.-P.: Mathematical Methods of Game and Economic Theory, Studies in Mathematics and Its Applications, vol. 7. North-Holland Publishing Co., Amsterdam (1979)Google Scholar
- 5.Boyer, C., Chambolle, A., Castro, Y., Duval, V., de Gournay, F., Weiss, P.: On representer theorems and convex regularization. SIAM J. Optim.
**29**(2), 1260–1281 (2019)MathSciNetCrossRefGoogle Scholar - 6.Bredies, K., Carioni, M., Fanzon, S., Romero, F.: On the extremal points of the ball of the Benamou–Brenier energy (2019). Arxiv preprint arXiv:1907.11589. https://arxiv.org/pdf/1907.11589.pdf
- 7.Bredies, K., Kaltenbacher, B., Resmerita, E.: The least error method for sparse solution reconstruction. Inverse Probl.
**32**(9), 094001 (2016)MathSciNetCrossRefGoogle Scholar - 8.Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci.
**3**(3), 492–526 (2010)MathSciNetCrossRefGoogle Scholar - 9.Bredies, K., Lorenz, D.A.: Iterated hard shrinkage for minimization problems with sparsity constraints. SIAM J. Sci. Comput.
**30**(2), 657–683 (2008)MathSciNetCrossRefGoogle Scholar - 10.Bredies, K., Pikkarainen, H.K.: Inverse problems in spaces of measures. ESAIM Control Optim. Calc. Var.
**19**(1), 190–218 (2013)MathSciNetCrossRefGoogle Scholar - 11.Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory
**52**(2), 489–509 (2006)MathSciNetCrossRefGoogle Scholar - 12.Casas, E., Clason, C., Kunisch, K.: Approximation of elliptic control problems in measure spaces with sparse solutions. SIAM J. Control Optim.
**50**(4), 1735–1752 (2012)MathSciNetCrossRefGoogle Scholar - 13.Chambolle, A., Duval, V., Peyré, G., Poon, C.: Geometric properties of solutions to the total variation denoising problem. Inverse Probl.
**33**(1), 015002 (2017)MathSciNetCrossRefGoogle Scholar - 14.Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems. Numer. Math.
**76**(2), 167–188 (1997)MathSciNetCrossRefGoogle Scholar - 15.Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math.
**66**(5), 1632–1648 (2006)MathSciNetCrossRefGoogle Scholar - 16.Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math.
**12**(6), 805–849 (2012)MathSciNetCrossRefGoogle Scholar - 17.Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math.
**57**(11), 1413–1457 (2004)MathSciNetCrossRefGoogle Scholar - 18.Dolzmann, G., Müller, S.: Microstructures with finite surface energy: the two-well problem. Arch. Ration. Mech. Anal.
**132**(2), 101–141 (1995)MathSciNetCrossRefGoogle Scholar - 19.Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory
**52**(4), 1289–1306 (2006)MathSciNetCrossRefGoogle Scholar - 20.Donoho, D.L., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc.
**90**(432), 1200–1224 (1995)MathSciNetCrossRefGoogle Scholar - 21.Dubins, L.E.: On extreme points of convex sets. J. Math. Anal. Appl.
**5**, 237–244 (1962)MathSciNetCrossRefGoogle Scholar - 22.Ekeland, I., Témam, R.: Convex Analysis and Variational Problems, Classics in Applied Mathematics, vol. 28, English edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1999)CrossRefGoogle Scholar
- 23.Federer, H.: Geometric Measure Theory, vol. 153. Springer, Berlin (1969)zbMATHGoogle Scholar
- 24.Fleming, W.H.: Functions with generalized gradient and generalized surfaces. Ann. Mat. Pura Appl. (4)
**44**(92), 93–103 (1957)MathSciNetCrossRefGoogle Scholar - 25.Fleming, W.H.: Functions whose partial derivatives are measures. Ill. J. Math.
**4**, 452–478 (1960)MathSciNetCrossRefGoogle Scholar - 26.Flinth, A., Weiss, P.: Exact solutions of infinite dimensional total-variation regularized problems. Inf. Inference J. IMA
**8**(3), 407–443 (2018)MathSciNetCrossRefGoogle Scholar - 27.Klee, V.L.: Extremal structure of convex sets. Arch. Math.
**8**(3), 234–240 (1957)MathSciNetCrossRefGoogle Scholar - 28.Klee, V.L.: On a theorem of Dubins. J. Math. Anal. Appl.
**7**, 425–427 (1963)MathSciNetCrossRefGoogle Scholar - 29.Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math.
**61**(2), 633–658 (2000)MathSciNetCrossRefGoogle Scholar - 30.Pedersen, G.K.: Analysis Now, Graduate Texts in Mathematics, vol. 118. Springer, New York (1989)Google Scholar
- 31.Pieper, K., Tang, B.Q., Trautmann, P., Walter, D.: Inverse point source location with the Helmholtz equation on a bounded domain (2019). Arxiv preprint arXiv:1805.03310. https://arxiv.org/pdf/1805.03310.pdf
- 32.Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D
**60**(1–4), 259–268 (1992).**(Experimental mathematics: computational issues in nonlinear science [Los Alamos, NM, 1991])**MathSciNetCrossRefGoogle Scholar - 33.Rudin, W.: Functional Analysis, International Series in Pure and Applied Mathematics, 2nd edn. McGraw-Hill Inc., New York (1991)Google Scholar
- 34.Schaefer, H.H.: Topological Vector Spaces. Springer, Berlin (1971).
**(Third printing corrected, Graduate Texts in Mathematics, Vol. 3)**CrossRefGoogle Scholar - 35.Shapiro, A.: On duality theory of conic linear problems. In: Semi-infinite Programming (Alicante, 1999), Nonconvex Optim. Appl., vol. 57, pp. 135–165. Springer (2001)Google Scholar
- 36.Unser, M., Fageot, J., Ward, J.P.: Splines are universal solutions of linear inverse problems with generalized TV regularization. SIAM Rev.
**59**(4), 769–793 (2017)MathSciNetCrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.