On monotone and primaldual active set schemes for \(\ell ^p\)type problems, \(p \in (0,1]\)
 258 Downloads
Abstract
Nonsmooth nonconvex optimization problems involving the \(\ell ^p\) quasinorm, \(p \in (0, 1]\), of a linear map are considered. A monotonically convergent scheme for a regularized version of the original problem is developed and necessary optimality conditions for the original problem in the form of a complementary system amenable for computation are given. Then an algorithm for solving the above mentioned necessary optimality conditions is proposed. It is based on a combination of the monotone scheme and a primaldual active set strategy. The performance of the two algorithms is studied by means of a series of numerical tests in different cases, including optimal control problems, fracture mechanics and microscopy image reconstruction.
Keywords
Nonsmooth nonconvex optimization Activeset method Monotone algorithm Optimal control problems Image reconstruction Fracture mechanicsMathematics Subject Classification
49K99 49M05 65K101 Introduction
Optimization of problems as (1.1) arises frequently in many applications as an efficient way to extract the essential features of generalized solutions. In particular, many problems in sparse learning and compressed sensing can be written as (1.1) with \({\varLambda }=I\), I being the identity (see e.g. [11, 41] and the references therein). In image analysis, \(\ell ^p\)regularizers as in (1.1) have recently been proposed as nonconvex extensions of the total generalized variation (TGV) regularizer used to reconstruct piecewise smooth functions (e.g. in [24, 42]). Also, the use of \(\ell ^p\)functionals with \(p\in (0,1)\) is of particular importance in fracture mechanics (see [43]). Recently, sparsity techniques have been investigated also by the optimal control community, see e.g. [9, 22, 27, 32, 48]. The literature on sparsity optimization problems as (1.1) is rapidly increasing, here we mention also [1, 6, 18, 44].
The nonsmoothness and nonconvexity make the study of problems as (1.1) both an analytical and a numerical challenge. Many numerical techniques have been developed when \({\varLambda }=I\) (e.g. in [20, 27, 29, 30]) and attention has recently been given to the case of more general operators, here we mention e.g. [24, 36, 42] and we refer to the end of the introduction for further details. However, the presence of the matrix inside the \(\ell ^p\)term combined with the nonconvexity and nonsmoothness remains one main issue in the development of numerical schemes for (1.1).
In the present work, we first propose a monotone algorithm to solve a regularized version of (1.1). The scheme is based on an iterative procedure solving a modified problem where the singularity at the origin is regularized. The convergence of this algorithm and the monotone decay of the cost during the iterations are proved. Then its performance is successfully tested in four different situations, a timedependent control problem, a fracture mechanics example for cohesive fracture models, an Mmatrix example, and an elliptic control problem.
We also focus on the investigation of suitable necessary optimality conditions for solving the original problem. Relying on an augmented Lagrangian formulation, optimality conditions of complementary type are derived. For this purpose we consider the case where \({\varLambda }\) is a regular matrix, since in the general case the optimality conditions of complementary type are not readily obtainable. An active set primaldual strategy which exploits the particular form of these optimality conditions is developed. A new particular feature of our method is that at each iteration level the monotone scheme is used in order to solve the nonlinear equation satisfied by the non zero components. The convergence of the active set primaldual strategy is proved in the case \({\varLambda }=I\) under a diagonal dominance condition. Finally the algorithm was tested on the same timedependent control problem as the one analysed for the monotone scheme as well as for a microscopy image reconstruction example. In all the above mentioned examples the matrix inside the \(\ell ^p\)term appears as a discretized gradient with very different purposes, e.g. as a regularization term in imaging and with modelling purposes in fracture mechanics.
Similar type of algorithms were proposed in [27] and [20] for problems as (1.1) in case of no matrix inside the \(\ell ^p\)term and in the infinite dimensional sequence spaces \(\ell ^p\), with \(p \in [0,1]\). In particular in [27] a primaldual active set strategy has been studied for the case \(p=0\), its convergence was proved under a diagonal dominance condition and its performance was tested in three different situations. In the case \(p \in (0,1]\) a monotone convergent scheme was proposed but no numerical tests were made. Then inspired by [27], the two authors of the present paper developed in [20] a primaldual active set strategy for \(p\in (0,1]\) and tested its performance in diverse test cases. Note that in [20] the convergence of the primaldual strategy was not investigated. The monotone and primaldual active set monotone algorithm studied in the present paper are inspired by the schemes proposed respectively in [27] and [20], but with the main novelties that now we treat the case of a regular matrix in the \(\ell ^p\)term and we provide diverse numerical tests for both the schemes. Moreover, we prove the convergence of the primaldual active set strategy. Note that the monotone scheme has not been tested in the earlier papers.
Let us recall some further literature concerning \(\ell ^p\), \(p \in (0,1]\) sparse regularizers. Iteratively reweighted leastsquares algorithms with suitable smoothing of the singularity at the origin were analysed in [13, 34, 35]. In [37] a unified convergence analysis was given and new variants were also proposed. An iteratively reweighted \(\ell _1\) algorithm ([8]) was developed in [14] for a class of nonconvex \(\ell ^2\)\(\ell ^p\) problems, with \(p \in (0,1)\). A generalized gradient projection method for a general class of nonsmooth nonconvex functionals and a generalized iterated shrinkage algorithm are analysed respectively in [6] and in [52]. Also, in [44] a surrogate functional approach combined with a gradient technique is proposed. However, all the previous works do not investigate the case of a linear operator inside the \(\ell ^p\)term.
Then in [42] an iteratively reweighted convex majorization algorithm is proposed for a class of nonconvex problems including the \(\ell ^p\), \(p \in (0,1]\) regularizer acting on a linear map. However, an additional assumption of Lipschitz continuity of the objective functional is required to establish convergence of the whole sequence generated by the algorithm. Nonconvex \(TV^p\)models with \(p \in (0,1)\) for image restoration are studied in [24] by a Newtontype solution algorithm for a regularized version of the original problem.
We mention also [30], where a primaldual active set method is studied for problems as in (1.1) with \({\varLambda }=I\) for a large class of penalties including also the \(\ell ^p\), with \(p \in [0,1)\). A continuation strategy with the respect to the regularization parameter \(\beta \) is proposed and the convergence of the primaldual active set strategy coupled with the continuation strategy is proved. However, in [30], differently from the present work, the nonlinear problem arising at each iteration level of the active set scheme is not investigated. Moreover, in [30] the matrix A has normalized column vectors, whereas in the present work A is a general matrix.
Finally, in [36] an alternating direction method of multipliers (ADMM) is studied in the case of a regular matrix inside the \(\ell ^p\)term, optimality conditions were derived and convergence was proved. We underline that in [36] the linear map inside the \(\ell ^p\)term has to be surjective, which can be restrictive per applications. Although the ADMM in [36] is also deduced from an augmented Lagrangian formulation, we remark that the optimality conditions of [36] are of a different nature than ours and hence the two approaches cannot readily be compared. We refer to Remark 4 for a more detailed explanation.
Concerning the general importance of \(\ell ^p\)functionals with \(p \in (0,1)\), numerical experience has shown that their use can promote sparsity better than the \(\ell ^1\)norm (see [10, 19, 49]), e.g. allowing possibly a smaller number of measurements in feature selection and compressed sensing (see also [11, 12, 40]). Moreover, many works demonstrated empirically that nonconvex regularization terms in total variationbased image restoration provide better edge preservation than the \(\ell ^1\)regularization (see [5, 39, 40, 45]). Also, the use of nonconvex optimization can be considered from natural image statistics [26] and it appears to be more robust with respect to heavytailed distributed noise (see e.g. [51]).
The paper is organized as follows. In Sect. 2 we present our proposed monotone algorithm and we prove its convergence. In Sect. 3 we report our numerical results for the four test cases mentioned above. In Sect. 4 we derive the necessary optimality conditions for (1.1), we describe our primaldual active set strategy and prove convergence in the case \({\varLambda }=I\). Finally in Sect. 5 we report the numerical results obtained by testing the active set monotone algorithm in the two situations mentioned above.
2 Existence and monotone algorithm for a regularized problem
Theorem 1
For any \(\beta >0\), there exists a solution to 2.1.
Proof
Remark 1
Notice that by the coercivity of the functional J in (2.1), the coercivity of \(J_\varepsilon \) and hence existence for (2.3) follow as well.
We have the following convergence result.
Theorem 2
For \(\varepsilon >0\), let \(\{x_k\}\) be generated by (2.6). Then, \(J_{\varepsilon }(x_k)\) is strictly monotonically decreasing, unless there exists some k such that \(x^k = x^{k+1}\) and \(x^k\) satisfies the necessary optimality condition (2.5). Moreover every cluster point of \(x^k\), of which there exists at least one, is a solution of (2.5).
Proof
In the following proposition we establish the convergence of (2.3)–(2.1) as \(\varepsilon \) goes to zero.
Proposition 1
Let \(\{x_\varepsilon \}_{\varepsilon >0}\) be solution to (2.3). Then any cluster point of \(\{x_\varepsilon \}_{\varepsilon >0}\) as \(\varepsilon \rightarrow 0^{+}\), of which there exists at least one, is a solution of (2.1).
Proof
Then by the uniform boundedness of \(\{x_\varepsilon \}_{\varepsilon }\) there exists a subsequence and \({\bar{x}} \in {{\mathbb {R}}}^d\) such that \(x_{\varepsilon _l} \rightarrow {\bar{x}}\). Since \(\{x_\varepsilon \}_{\varepsilon }\) solves (2.3), by letting \(\varepsilon \rightarrow 0\) and using the definition of \({\varPsi }_\varepsilon \), we easily get that \({\bar{x}}\) is a solution of (2.1). \(\square \)
3 Monotone algorithm: numerical results
The focus of this section is to investigate the performance of the monotone algorithm in practice. For this purpose we choose four problems with matrices A of very different structure: a timedependent optimal control problem, a fracture mechanics example, the M matrix and a stationary optimal control problem.
3.1 The numerical scheme
For further references it is convenient to recall the algorithm in the following form (see Algorithm 1). Note that a continuation strategy with respect to the parameter \(\varepsilon \) is performed. The initialization and range of \(\varepsilon \)values is described for each class of problems below.
In the presentation of our numerical results, the total number of iterations shown in the tables takes into account the continuation strategy with respect to \(\varepsilon \). However, it does not take into account the continuation with respect to \(\beta \). We remark that in all the experiments presented in the following sections, the value of the functional for each iterations was checked to be monotonically decreasing accordingly to Theorem 2.
The following notation will hold for the rest of the paper. For \(x \in {{\mathbb {R}}}^d\) we will denote \(x_0=\#\{i\, :\, x_i> 10^{10}\},\)\(x_0^c=\#\{i\, :\, x_i\le 10^{10}\},\) and by \(x_2\) the euclidean norm of x.
3.2 Timedependent control problem
In Table 1 we report the results of our tests for \(p=.5\) for \(\beta \) incrementally increasing by factor of 10 from \(10^{3}\) to 1. We report only the values for the second control \(u_2\) since the first control \(u_1\) is always zero. In the third row we see that \((Du_2_0)^c\) increases with \(\beta \), consistent with our expectation. Note also that the quantity \(Du_2_p^p\) decreases for \(\beta \) increasing.
For any \(i=1,\ldots , m\), we say that i is a singular component of the vector \(D u_2\) if \(i \in \{i \, :\, (Du_2)_i<\varepsilon \}\). In particular, note that the singular components are the ones where the \(\varepsilon \)regularization is most influential. In the sixth row of Table 1 we show their number at the end of the \(\varepsilon \)path following scheme (denoted by Sp) and we observe that it coincides with the quantity \(Du_2_0^c\), which is reassuring the validity of our \(\varepsilon \)strategy.
Sparsity in a timedependent control problem, \(p=.5\), mesh size \(h=\frac{1}{50}\). Results obtained by Algorithm 1
\(\beta \)  \(10^{3}\)  \(10^{2}\)  \(10^{1}\)  1 

No. of iterates  630  635  29  19 
\(Du_2_0^c\)  97  99  100  100 
\(Du_2^p_p\)  158  16.7  \(6*10^{5}\)  \(10^{4}\) 
\( \text{ Residue } \)  \(3*10^{3}\)  \(2*10^{3}\)  \(1.2*10^{3}\)  \(2.5*10^{10}\) 
\(\text{ Sp }\)  97  99  100  100 
3.3 Quasistatic evolution of cohesive fracture models
In this section we focus on a modelling problem for quasistatic evolutions of cohesive fractures. This kind of problems requires the minimization of an energy functional, which has two components: the elastic energy and the cohesive fracture energy. The underlying idea is that the fracture energy is released gradually with the growth of the crack opening. The cohesive energy, denoted by \(\theta \), is assumed to be a monotonic nondecreasing function of the jump amplitude of the displacement, denoted by \(\llbracket u \rrbracket \). Cohesive energies were introduced independently by Dugdale [16] and Barenblatt [3], we refer to [43] for more details on the models. Let us just remark that the two models differ mainly in the evolution of the derivative \(\theta '(\llbracket u\rrbracket )\), that is, the bridging force, across a crack amplitude \(\llbracket u \rrbracket \). In Dugdale’s model this force keeps a constant value up to a critical value of the crack opening and then drops to zero. In Barenblatt’s model, the dependence of the force on \(\llbracket u \rrbracket \) is continuous and decreasing.
In this section we test the \(\ell ^p\)term \(0<p<1\) as a model for the cohesive energy. In particular, the cohesive energy is not differentiable in zero and the bridging force goes to infinity when the jump amplitude goes to zero. Note also that the bridging force goes to zero when the jump amplitude goes to infinity.
Our numerical experiments were conducted with a discretization in 2N intervals with \(N=100\) and a prescribed potential crack \({\varGamma }=0.5\). The time step in the time discretization of [0, T] with \(T=3\) is set to \(dt=0.01\). The parameters of the energy functional \(J_h(u_h)\) are set to \(\beta =1, \gamma =50\). The parameter \(\varepsilon \) is decreased from \(10^{1}\) to \(10^{12}\).

Pure elastic deformation in this case the jump amplitude is zero and the gradient of the displacement is constant in \({\varOmega } \backslash {\varGamma }\);

Prefracture the two lips of the fracture do not touch each other, but they are not free to move. The elastic energy is still present.

Fracture the two parts are free to move. In this final phase the gradient of the displacement (and then the elastic energy) is zero.
3.4 Mmatrix
Mmatrix example, \({\varLambda }=(d+1)[D_1;D_2], p=.1\), mesh size \(h=\frac{1}{64}\). Results obtained by Algorithm 1
\(\beta \)  \(10^{4}\)  \(10^{3}\)  \(10^{2}\)  \(10^{1}\)  1  10 

No. of iterates  1701  2469  3929  4254  14  7 
\({\varLambda } x^c_0\)  16  103  791  5384  7938  7938 
\({\varLambda } x^p_p\)  \(6*10^3\)  \(5.8*10^3\)  \( 5* 10^{3}\)  \(2.4*10^3\)  584  464 
\( \text{ Residue } \)  \(2.7*10^{7}\)  \(5.5*10^{6}\)  \( 9*10^{5}\)  \(9*10^{4}\)  \(3*10^{12}\)  \(2.7*10^{12}\) 
Sp  247  696  2097  5599  7938  7938 
In Tables 2 we show the performance of Algorithm 1 for \(p=.1\), \(h=1/64\) as mesh size and \(\beta \) incrementally increasing by factor of 10 from \(10^{4}\) to 10. In Fig. 2 we report the graphics of the solutions for different values of \(\beta \) between .01 and .3 where most changes occur in the graphics.
We observe significant differences in the results with respect to different values of \(\beta \). Consistently with our expectations, \({\varLambda } x^c_0\) increases with \(\beta \) (see the third row of Table 2). For example, for \(\beta =1, 10\), we have \({\varLambda } x^c_0=7938\), or equivalently, \({\varLambda } x_0=0\), that is, the solution to (3.13) is constant. Moreover the fourth row shows that \({\varLambda } x^p_p\) decreases when \(\beta \) increases.
The fifth row exhibits the \(\ell ^\infty \) norm of the residue, which is \(O(10^{4})\) for all the considered \(\beta \). We remark that the number of iterations is sensitive with respect to \(\beta \), in particular it increases when \(\beta \) is increasing from \(10^{4}\) to \(10^{1}\) and then it decreases significantly for \(\beta =1,10\).
The algorithm was also tested for different values of p. The results obtained show dependence on p, in particular \({\varLambda } x^c_0\) decreases as p is increasing. For example, for \(p=.5\) and \(\beta =.1\) we have \({\varLambda } x^c_0=188, {\varLambda } x^p_p=528\).
In the sixth row of Table 2 we show the number of singular components of the vector \({\varLambda } x\) at the end of the \(\varepsilon \)path following scheme, that is, \(Sp:=\#\{ i\,\, \,\, ({\varLambda } x)_i<\varepsilon \}\). For most values of \(\beta \), we note that Sp is comparable to \({\varLambda } x_0^c\). This again confirms that the \(\varepsilon \)strategy is effective.
Remark 2
The algorithm was also tested in the following two particular cases: \({\varLambda }=I\), where I is the identity matrix of size \(d^2\), and \({\varLambda }=(d+1)D_1\), where \(D_1\) is as in (3.16).
In the case \({\varLambda }=I\) the variational problem (3.13) for \(\beta >0\) gives a sparsity enhancing solution for the elliptic equation (3.14), that is, the displacement y will be 0 when the forcing f is small. Indeed, in this case we have sparsity of the solution increasing with \(\beta \). Also, the residue is \(O(10^{8})\) and the number of iterations is considerably smaller than in the case \({\varLambda }\) is as in (3.15).
For \( {\varLambda }=(d+1)D_1\) we show the graphics in Fig. 3. Comparing the graphs for \(\beta =.3\) in Figs. 2 and 3, we can find subdomains where the solution is only unidirectionally piecewise constant in Fig. 3 and piecewise constant in Fig. 2. The number of iterations, \({\varLambda } x_0^c, {\varLambda } x_p^p\) and the residue are comparable to the ones of Table 2.
3.5 Elliptic control problem
Sparsity in an elliptic control problem, \(p=.1\), mesh size \(h=\frac{1}{64}\). Results obtained by Algorithm 1
\(\beta \)  \(10^{3}\)  \(10^{2}\)  \(10^{1}\)  1 

No. of iterates  102  119  5204  10440 
\({\varLambda } u_0^c\)  799  1486  1673  2376 
\({\varLambda } u^p_p\)  \(3.2*10^{4}\)  \(2.6*10^4\)  \(2.6*10^4\)  \(1.2*10^4\) 
\( \text{ Residue } \)  \(1.6*10^{5}\)  \( 2.4*10^{4}\)  \(2*10^{3}\)  \(7*10^{3}\) 
From our tests we conclude that the monotone algorithm is reliable to find a solution of the \(\varepsilon \)regularized optimality condition (2.5) for a diverse spectrum of problems. It is also stable with respect to the choice of initial conditions. According to the last rows of Tables 1, 2 and 3 we have that \(\#\{i\, \, ({\varLambda } x)_i\le 10^{10}\}\) is typically very close to the number of singular components at the end of the \(\varepsilon \)path following scheme. Depending on the choice of \(\beta \) the algorithm requires on the order of \(O(10^2)\) to \(O(10^3)\) iterations to reach convergence. In the following sections we aim at analysing an alternative algorithm for which the iteration number is smaller, despite the fact that the convergence can be proved only in special cases.
4 The active set monotone algorithm for the optimality conditions
First necessary optimality conditions for problem (4.1) in the form of a complementary systems are derived and a sufficient condition for the uniqueness for solutions to this system are established. Subsequently an activeset strategy is proposed relying on this form of the optimality conditions.
4.1 Necessary optimality conditions
For any matrix \(A \in {\mathbb {M}}^{m\times d}\), we denote by \(A_i\) the ith column of A. We have the following necessary optimality conditions for a global minimizer of (4.1).
Theorem 3
Proof
Remark 3
We remark that Theorem 3 still hold when considering (4.1) in the infinite dimensional sequence spaces \(\ell ^p\) in the case \({\varLambda }=I\).
From Theorem 3 we can deduce a lower bound for the nonzero components of \({\varLambda } x\) following the arguments in [27], Corollary 2.1. The result of Corollary 1 are of the same order with respect to \(\beta ,p,\) and A, compared to the existing lower bound for \(\ell ^2\)\(\ell ^p\) problems derived in the literature, see e.g. [15]. However, the proof of [15] is different from the one presented in [27] and relies mainly on second order necessary optimality conditions.
Corollary 1
If \(({\varLambda }{\bar{x}})_i \ne 0\), then \(({\varLambda }{\bar{x}})_i \ge \left( \frac{2 \beta (1p)}{(A{\varLambda }^{1})_i_2^2}\right) ^{\frac{1}{2p}}.\)
4.2 The augmented Lagrangian formulation and the primaldual active set strategy
System (4.15)–(4.17) in Algorithm 2 is implicit. It can be solved, for example, by an iterative procedure based on Algorithm 1. The final scheme resulting from the combination of Algorithm 1 and 2 is described in Algorithm 3 in the following section.
Remark 4
Remark 5
For our problem (4.18) from (4.24) and further easy computations, we see that the triple \(x=0, y=0, \lambda =b\) is a solution of (4.25), whereas, as already remarked, it does not satisfy (4.22). This is strictly connected to the fact that (4.22) is a primaldual optimality condition of complementary type, which is equivalent to say that (4.22) is a necessary condition for global minimizers. On the contrary (4.25) is a necessary condition for local ones.
Finally we mention that in [36] an alternate direction method of multipliers is proposed to find stationary points of the type (4.25) (see Eq. 4 of [36]). The method is derived from an augmented Lagrangian formulation, where however, differently from the current paper, the penalization term in [36] is chosen ”large enough”.
4.3 Uniqueness and convergence
The goals of this subsection are to provide sufficient conditions for uniqueness of solutions to (4.14) and convergence of Algorithm 2. This algorithm proved to be very efficient in our numerical tests which we shall report on further below. Its analysis, however, is quite challenging and will require additional assumptions. Before specifying them we introduce some additional notation.
Let us make some remarks on these specifications. In the case that Q is a diagonal matrix \(QB=0\) and (4.28) is trivially satisfied. We observe that for \(p\rightarrow 0\), we have \(\alpha =\gamma =\frac{1}{2}\). In particular (4.28) coincides when \({\varLambda }=I\) with the diagonal dominance condition considered in [27] to prove the convergence of the primaldual active set strategy in the case \(p=0\) and \({\varLambda }=I\). Note we note that the admissible range of \(\varepsilon \) in (4.31) decreases with \(\beta \) and \(p< 1\).
We first prove the following lemma, which is a key ingredient for the proof of Theorem 4 and will be used also in the proof of the convergence result Theorem 5.
Lemma 1
 (i)Let \(y \in {{\mathbb {R}}}\), \(y\ne 0\), and \(\lambda =\frac{\beta p y}{\max \{\varepsilon ^{2p}, y^{2p}\}}\). Then for any \(i=1,\ldots , d\) it holds$$\begin{aligned} B_i^{\alpha } \lambda \le \beta ^{\gamma }C_2. \end{aligned}$$(4.33)
 (ii)Let \(y^1, y^2 \in {{\mathbb {R}}}^d\) be such that for all \(i=1,\ldots , d\) we have \(y^1_i\ne 0\), \(y^2_i \ne 0\), andThen \(y^1=y^2\).$$\begin{aligned}&B_i^{\gamma }(y_i^1y_i^2)+B_i^{\alpha }\left( \frac{\beta p y_i^1}{\max (\varepsilon ^{2p}, y_i^1^{2p})}\frac{\beta p y_i^2}{\max (\varepsilon ^{2p}, y_i^2^{2p})}\right) \nonumber \\&\quad =\left( B^{\alpha }(BQ)B^{\gamma }B^{\gamma }(y^1y^2)\right) _i. \end{aligned}$$(4.34)
Proof
Theorem 4
(Uniqueness) Assume that (4.2), (4.28), and (4.31) hold. Then there exists at most one solution to (4.14) satisfying (4.29) with \(\delta > \frac{2\rho C_3}{(1\rho )C_1}.\) An analogous statement holds with (4.29) replaced by (4.30).
Proof
Case 3 The only remaining case consists in \(y_i={\hat{y}}_i=0\) for all i. But then \(\lambda =\hat{\lambda }\) by (4.14) and \(x={{\hat{x}}}\) by the invertibility of \({\varLambda }\) as desired. This concludes the proof under assumption (4.29).
4.3.1 Convergence
Here we give a sufficient condition for the convergence of the primaldual active set method. As in [27] we utilize the diagonal dominance condition (4.28) and consider a solution \(x, \lambda \) to (4.14) which satisfies the strict complementary conditions. As such it is unique according to Theorem 4.
Theorem 5
Note that \({\mathcal {S}}^n\) is the set of all indices which are inactive both for the \({\bar{x}}, {\bar{\lambda }}\) and for \(x^n, \lambda ^n\), and analogously \({\mathcal {S}}^n\) is the set of active indices for both pairs. Note also that due to the finite dimensionality the case \({\mathcal {T}}^n={\mathcal {T}}^{n+1}\) for some n must occur.
Proof
We divide the proof into three steps. In Step (i) we verify a bound on \(x^n {\bar{x}}\) which will be used throughout the rest of the proof, in Step (ii) we prove the claimed properties of \({\mathcal {S}}^n\) and \({\mathcal {T}}^n\), and in Step (iii) we conclude the proof of convergence.
Step (iii) Assume now that \({\mathcal {S}}^n={\mathcal {S}}^{n+1} \) and \({\mathcal {T}}^n={\mathcal {T}}^{n+1}\subset {\mathcal {A}}({\bar{x}}, {\bar{\lambda }})\). Then \({\mathcal {S}}^n={\mathcal {I}}({\bar{x}}, {\bar{\lambda }})\). This is proved in Step 2 for the case \(n\ge 2\) and in case \(n=1\) we have \({\mathcal {S}}^1= {\mathcal {S}}^2= {\mathcal {I}}({\bar{x}}, {\bar{\lambda }})\).
It follows that \({\mathcal {S}}^n={\mathcal {I}}({\bar{x}}, {\bar{\lambda }})\) and \({\mathcal {T}}^n={\mathcal {A}}({\bar{x}}, {\bar{\lambda }})\). Comparing (4.14), which is satisfied by \({\bar{x}}, {\bar{\lambda }}\), and (4.15)–(4.17), which holds with \(y_i^n=y_i^{n+1}, \lambda _i^n=\lambda _i^{n+1}\), we conclude that \(x^{n+1}={\bar{x}}, \lambda ^{n+1}={\bar{\lambda }}\). \(\square \)
Remark 6
If \(\bar{x}_i \ne 0\) for all i we have \({\mathcal {A}}({\bar{x}},{\bar{\lambda }})=\emptyset \). Then by the definition of \({\mathcal {T}}^n\) and Step (ii) of the proof of Theorem 5 we obtain \({\mathcal {T}}^1={\mathcal {T}}^2=\emptyset \) and therefore Algorithm 2 is a 2step algorithm, in this case.
5 Active set monotone algorithm: numerical results
Here we describe the active set monotone scheme (see Algorithm 3) and discuss the numerical results for two different test cases. The first one is the timedependent control problem from Sect. 3.2, the second one is an example in microscopy image reconstruction. Typically the active set monotone scheme requires fewer iterations and achieves a lower residue than the monotone scheme of Sect. 2.
5.1 The numerical scheme
The proposed active set monotone algorithm consists of an outer loop based on the primaldual active set strategy and an inner loop which uses the monotone algorithm to solve the nonlinear part of the optimality condition.
Remark 7
Note that the system matrix associated to (5.3) is symmetric.
The algorithm stops when the residue of (5.2) and (4.3) (for the inner and the outer cycle respectively) is \(O(10^{12})\) in the control problem and \(O(10^{8})\) in the microscopy image example.
5.2 Sparsity in a timedependent control problem
We test the active set monotone algorithm on the timedependent control problem described in Sect. 3.2, with the same discretization in space and time (\({\varDelta } x= {\varDelta } t=\frac{1}{50}\)) and target function b. Also the initialization of x and the \(\varepsilon \)range are the same. In Tables 4 we report the results of our tests for \(p=.1\) and \(\beta \) incrementally increasing by factor of 10 from \(10^{3}\) to 1. We report only the values for the second control \(u_2\) since the first control \(u_1\) is always zero. As expected, \(Du_2^c_0\) increases and \(Du_2^p_p\) decreases when \(\beta \) is increasing. Note that the number of iterations of the inner and outer cycle are both small.
The algorithm was also tested for the same p as in Sect. 3.2, that is \(p=.5\), for the same range of \(\beta \) as in Table 4. Comparing to the results achieved by Algorithm 1, we obtained the same values for the \(\ell ^0\)term for corresponding values of \(\beta \) and a considerably smaller residue within a significantly fewer number of inner iterations.
Sparsity in a timedependent control problem, \( p=.1\), mesh size \(h=\frac{1}{50}\). Results obtained by Algorithm 3
\(\beta \)  \(10^{3} \)  \(10^{2}\)  \(10^{1}\)  1 

No. of outer iter.  1  1  4  1 
No. of inner iter.  20  20  30  20 
\(Du_2^c_0\)  95  95  98  100 
\(Du_2^p_p\)  18  17  14  0 
\( \text{ Residue } \)  \(10^{15}\)  \(10^{15}\)  \(10^{14}\)  \(10^{16}\) 
5.3 Compressed sensing approach for microscopy image reconstruction
In this subsection we present an application of the active set monotone scheme to compressed sensing for microscopy image reconstruction. We focus on the STORM (stochastic optical reconstruction microscopy) method, which is based on stochastically switching and highprecision detection of single molecules to achieve an image resolution beyond the diffraction limit. The literature on the STORM has been intensively increasing, see e.g. [4, 23, 25, 46]. The STORM reconstruction process consists in a series of imaging cycles. In each cycle only a fraction of the fluorophores in the field of view are switched on (stochastically), such that each of the active fluorophores is optically resolvable from the rest, allowing the position of these fluorophores to be determined with high accuracy. Despite the advantage of obtaining subdiffractionlimit spatial resolution, in these single molecule detectionbased techniques such as STORM, the time to acquire a superresolution image is limited by the maximum density of fluorescent emitters that can be accurately localized per imaging frame, see e.g. [31, 38, 47]. In order to get at the same time better resolution and higher emitter density per imaging frame, compressive sensing methods based on \(l^1\) techniques have been recently applied, see e.g. [2, 21, 50] and the references therein. In the following, we propose a similar approach based on our \(l^p\) with \(p<1\) methods. We mention that \(l^p\) with \(0<p\le 1\) techniques based on a concaveconvex regularizing procedure, and hence different from ours, are used in [33].
First we tested the procedure for same resolution images, in particular the conventional and the true images are both \(128\times 128\) pixel images. Then the algorithm was tested in the case of a \(16\times 16\) pixel conventional image and a \(128 \times 128\) true image. The values for the impulse response A and the measured data b were chosen according to the literature, in particular A was taken as the Gaussian PSF matrix with variance \(\sigma =8\) and size \(3\times \sigma =24\), and b was simulated by convolving the impulse response A with a random 01 mask over the image adding a white random noise so that the signal to noise ratio is .01.
We carried out several tests with the same data for different values of \(p,\beta \). We report only our results for \(p=.1\) and \(\beta =10^{6}, \beta =10^{9}\) for the same and the different resolution case respectively, since for these values the best reconstructions were achieved. The number of single frame reconstructions carried out to get the full reconstruction was 5, 10 for the same, different resolution case, respectively.
Number of iterations and residue for the cross image (different res.), \(p=.1, \beta =10^{9}\). Results obtained by Algorithm 3
Frame  1  2  3  4  5  6  7  8  9  10 

Iterations outer  100  98  100  100  100  100  100  85  100  100 
Iterations inner  147  190  144  184  145  186  146  187  145  165 
Residue  \(10^{8}\)  \(10^{8}\)  \(10^{8}\)  \(10^{8}\)  \(10^{9}\)  \(10^{8}\)  \(10^{8}\)  \(10^{8}\)  \(10^{8}\)  \(10^{8}\) 
Number of iterations for the phantom (same res.), \(p=.1, \beta =10^{6}\). Results obtained by Algorithm 3
Frame  1  2  3  4  5 

Iterations outer  6  11  7  6  6 
Iterations inner  9  14  12  7  7 
Residue  \(10^{8}\)  \(10^{10}\)  \(10^{12}\)  \(10^{8}\)  \(10^{8}\) 
A second test on a non sparse standard phantom image is carried out. In Fig. 8 we show the reconstruction in the case of same resolution images. Note that a high percentage of emitters is correctly localized and the boundaries of the image are wellrecovered. Also in this case the location and exact recoveries show a linear decay with respect to the noise.
In Tables 5 and 6 we report the number of iterations needed for each single frame reconstruction. For the cross image in the different resolution case (Table 5), the number of iterations is averagely 100, 164 for the outer cycle and inner cycle, respectively. Note that for the phantom in the same resolution case (Table 6) the number of iterations is lower, that is averagely 7.2, 9.8 for the outer cycle and inner cycle, respectively. The numbers of iterations for the cross image in case of same resolution are comparable to the ones of Table 5. As shown in the third line of each tables, the residue is always less than or equal to \(10^{8}\).
Notes
Acknowledgements
Open access funding provided by University of Graz. We thank the referees for very thoughtful suggestions and remarks, which helped to improve our results.
References
 1.Artina, M., Fornasier, M., Solombrino, F.: Linearly constrained nonsmooth and nonconvex minimization. SIAM J. Optim. 23, 1904–1937 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Babcock, H.P., Moffitt, J.R., Cao, Y., Zhuang, X.: Fast compressed sensing analysis for superresolution imaging using L1homotopy. Opt. Express 21, 28583–28596 (2013)CrossRefGoogle Scholar
 3.Barenblatt, G.I.: The mathematical theory of equilibrium cracks in brittle fracture. Adv. Appl. Math. Mech. 7, 55–129 (1962)MathSciNetCrossRefGoogle Scholar
 4.Betzig, E., Patterson, G.H., Sougrat, R., Lindwasser, O.W., Olenych, S., Bonifacino, J.S., Davidson, M.W., LippincottSchwartz, J., Hess, H.F.: Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006)CrossRefGoogle Scholar
 5.Black, M.J., Rangarajan, A.: On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. Int. J. Comput. Vis. 19, 57–91 (1996)CrossRefGoogle Scholar
 6.Bredies, K., Lorentz, D.A., Reiterer, S.: Minimization of nonsmooth, nonconvex functionals by iterative thresholding. J. Optim. Theory Appl. 165, 78–112 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 7.Candes, E., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59, 1207–1223 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Candes, E.J., Wakin, M.B., Byod, S.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 9.Casas, E., Clason, C., Kunisch, K.: Approximation of elliptic control problems in measure spaces with sparse solutions. SIAM J. Control Optim. 50, 1735–1752 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Chartrand, R.: Exact reconstruction of sparse signals via noconvex minimization. IEEE Signal Process. Lett. 14, 707–710 (2007)CrossRefGoogle Scholar
 11.Chartrand, R.: Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data. In: IEEE Interantional Symposium on Biomedical Imaging: From Nano to Macro (2009)Google Scholar
 12.Chartrand, R., Staneva, V.: Restricted isometry properties and nonconvex compressing sensing. Inverse Probl. 24, 035020 (2008)CrossRefzbMATHGoogle Scholar
 13.Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressing sensing. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2008)Google Scholar
 14.Chen, X., Zhou, W.: Convergence of the reweighted \(\ell _1\) minimization algorithm for \(\ell _2\)\(\ell _p\) minimization. Comput. Optim. Appl. 59, 47–61 (2014)MathSciNetCrossRefGoogle Scholar
 15.Chen, X., Xu, F., Ye, Y.: Lower bound theory of nonzero entries in solutions of \(\ell ^2\)\(\ell ^p\) minimization. SIAM J. Sci. Comput. 32, 2832–2852 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Dugdale, D.S.: Yielding of steel sheets containing slits. J. Mech. Phys. Solids 8, 100–104 (1960)CrossRefGoogle Scholar
 17.Duval, V., Peyré, G.: Exact support recovery for sparse spikes deconvolution. Found. Comput. Math. 15, 1315–1355 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 18.Fornasier, M., Ward, R.: Iterative thresholing meets freediscontinuity problems. Found. Comput. Math. 10, 527–567 (2015)CrossRefzbMATHGoogle Scholar
 19.Foucart, S., Lai, M.J.: Sparsest solutions of underdetermined linear systems via \(\ell _q\)minimization for \(0<q\le 1\). Appl. Comput. Harmon. Anal. 26, 395–407 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 20.Ghilli, D., Kunisch, K.: A monotone scheme for sparsity optimization in \(\ell ^p\) with \(p\in (0,1]\). In: IFAC WC Proceedings (2017)Google Scholar
 21.Gu, L., Sheng, Y., Chen, Y., Chang, H., Zhang, Y., Lv, P., Ji, W., Xu, T.: Highdensity 3D single molecular analysis based on compressed sensing. Biophys. J. 106, 2443–2449 (2014)CrossRefGoogle Scholar
 22.Herzog, R., Stadler, G., Wachsmuth, G.: Directional sparsity in optimal control of partial differential equations. SIAM J. Control Optim. 50, 943–963 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 23.Hess, S.T., Girirajan, T.P., Mason, M.D.: Ultrahigh resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006)CrossRefGoogle Scholar
 24.Hintermüller, M.: Wu, Tao: Nonconvex \(TV^q\)models in image restoration: analysis and a trustregion regularizationbased superlinearly convergent solver. SIAM J. Imaging Sci. 6, 1385–1415 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 25.Huang, B., Babcock, H.P., Zhuang, X.: Breaking the diffraction barrier: superresolution imaging of cells. Cell 143, 1047–1058 (2010)CrossRefGoogle Scholar
 26.Huang, J., Mumford, D.: Statistics of natural images and models. In: International Conference on Computer Vision and Pattern Recognition (CVPR), Fort Collins, pp. 541–547 (1999)Google Scholar
 27.Ito, K., Kunisch, K.: A variational approach to sparsity optimization based on Lagrange multiplier theory. Inverse Probl. 30, 015001 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 28.Ito, K., Kunisch, K.: Lagrange multiplier approach to variational problems and applications. In: Advances in Design and Control 15. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2008)Google Scholar
 29.Jiao, Y., Jin, B., Lu, X.: A primal dual active set with continuation algorithm for the \(\ell ^0\)regularized optimization problem. Appl. Comput. Harmon. Anal. 39, 400–426 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 30.Jiao, Y., Jin, B., Lu, X., Ren, W.: A primal dual active set algorithm for a class of nonconvex sparsity optimization (2013) (preprint) Google Scholar
 31.Jones, S.A., Shim, S.H., He, J., Zhuang, X.: Fast, threedimensional superresolution imaging of live cells. Nat. Methods 8, 499–505 (2011)CrossRefGoogle Scholar
 32.Kalise, D., Kunisch, K., Rao, Z.: Infinite horizon sparse optimal control. J. Optim. Theory Appl. 172, 481–517 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
 33.Kim, K., Min, J., Carlini, L., Unser, M., Manley, S., Jeon, D., Ye, J.C.: Fast maximum likelihood highdensity lowSNR superresolution localization microscopy. In: International Conference on Sampling Theory and Applications, Bremen, Federal Republic of Germany, pp. 285–288 (2013)Google Scholar
 34.Lai, M.J., Wang, J.: An unconstrained \(\ell _q\) minimization with \(0<q\le 1\) for sparse solution of underdetermined linear systems. SIAM J. Optim. 21, 82–101 (2011)MathSciNetCrossRefGoogle Scholar
 35.Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(\ell _q\) minimization. SIAM J. Numer. Anal. 51, 927–957 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 36.Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25, 2434–2460 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 37.Lu, Z.: Iterative reweighted minimization methods for \(\ell _p\) regularized unconstrained nonlinear programming. Math. Program Ser. A 147, 277–307 (2014)CrossRefzbMATHGoogle Scholar
 38.Nieuwenhuizen, R.P.J., Lidke, K.A., Bates, M., Puig, D.L., Grünwald, D., Stallinga, S., Rieger, B.: Measuring image resolution in optical nanoscopy. Nat. Methods 10, 557–562 (2013)CrossRefGoogle Scholar
 39.Nikolova, M.: Minimizers of constfunctions involving nonsmooth datafidelity terms: applications to the processing of outliers. SIAM J. Numer. Anal. 40, 965–994 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
 40.Nikolova, M., Ng, M.K., Tam, C.P.: Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction. IEEE Trans. Image Process. 19, 3073–3088 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 41.Nikolova, M., Ng, M.K., Zhang, S., Ching, W.K.: Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. 1, 2–25 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 42.Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8, 331–372 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 43.Del Piero, G.: A variational approach to fracture and other inelastic phenomena. J. Elast. 112, 3–77 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 44.Ramlau, R., Zarzer, C.: On the minimization of a Tikhonov functional with nonconvex sparsity constraints. Electron. Trans. Numer. Anal. 39, 476–507 (2012)MathSciNetzbMATHGoogle Scholar
 45.Roth, S., Black, M.J.: Fields of experts. Int. J. Comput. Vis. 82, 205–229 (2009)CrossRefGoogle Scholar
 46.Rust, M., Bates, M., Zhuang, X.: Subdiffractionlimit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–796 (2006)CrossRefGoogle Scholar
 47.Shroff, H., Galbraith, C.G., Galbraith, J.A., Betzig, E.: Livecell photoactivated localization microscopy of nanoscale adhesion dynamics. Nat. Methods 5, 417–423 (2008)CrossRefGoogle Scholar
 48.Stadler, G.: Elliptic optimal control problems with L1control cost and applications for the placement of control devices. Comput. Optim. Appl. 44, 159–181 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 49.Sun, Q.: Recovery of sparsest signals via \(\ell ^q\)minimization. Appl. Comput. Hamon. Anal. 32, 329–341 (2012)Google Scholar
 50.Zhu, L., Zhang, W., Elnatan, D., Huang, B.: Faster STORM using compressed sensing. Nat. Methods 9, 721–723 (2012)CrossRefGoogle Scholar
 51.Zoubir, A., Koivunen, V., Chakhchoukh, Y., Muma, M.: Robust estimation in signal processing: a tutorialstyle treatment of fundamental concepts. IEEE Signal Process. Mag. 29, 61–80 (2012)CrossRefGoogle Scholar
 52.Zuo, W., Meng, D., Zhang, L., Feng, X., Zhang, D.: A generalized iterated shrinkage algorithm for nonconvex sparse conding. In: IEEE International Conference on Computer Vision, pp. 217–224 (2013)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.