On a Monotone Scheme for Nonconvex Nonsmooth Optimization with Applications to Fracture Mechanics
 676 Downloads
Abstract
A general class of nonconvex optimization problems is considered, where the penalty is the composition of a linear operator with a nonsmooth nonconvex mapping, which is concave on the positive real line. The necessary optimality condition of a regularized version of the original problem is solved by means of a monotonically convergent scheme. Such problems arise in continuum mechanics, as for instance cohesive fractures, where singular behaviour is usually modelled by nonsmooth nonconvex energies. The proposed algorithm is successfully tested for fracture mechanics problems. Its performance is also compared to two alternative algorithms for nonsmooth nonconvex optimization arising in optimal control and mathematical imaging.
Keywords
Nonsmooth nonconvex optimization Monotone algorithm Fracture mechanics Sparse recoveryMathematics Subject Classification
49M05 49Kxx 65Kxx 74Rxx 74Pxx 49J53 49K991 Introduction
In this paper, we investigate a class of nonconvex and nonsmooth optimization problems, where the penalty is the composition of a nonsmooth nonconvex mapping with a linear operator and the smooth part is a least squaretype term.
Similar optimization problems, in the case where the operator inside the penalty coincides with the identity matrix, have attracted increasingly attention due to their applications to sparsity of solutions, feature selection, and many other related fields as, e.g. compressed sensing, signal processing, and machine learning (see, e.g. [1, 2]). The convex nonsmooth case of the \(\ell ^1\) norm has gained large popularity and has been thoroughly studied. The convexity allows to formulate efficient and globally convergent algorithms to find a numerical solution. Here, we mention [3, 4] where the basis pursuit and the Lasso problems were introduced to solve \(\ell ^1\) minimization problems.
Recently, increased interest has arisen towards nonconvex and nonsmooth penalties, such as the \(\ell ^\tau \) quasinorm, with \(\tau \) larger or equal to zero and less than 1 (see, e.g. [5, 6, 7, 8, 9, 10]), the smoothly clipped absolute deviation (SCAD) [11, 12], and the minimax concave penalty (MCP) [12, 13]. The nonconvexity has been shown to provide some advantages with respect to the convex models. For example, it allows to require less data in order to recover exactly the solution (see, e.g. [14, 15, 16]) and it tends to produce unbiased estimates for large coefficients [11, 17, 18]. Note that all the previously mentioned works deal with the particular case where the operator coincides with the identity.
Nonconvex optimization problems as we consider, where the operator inside the penalty is different form the identity, arise also in the modelling of cohesive fractures in continuum mechanics, where the concavity of the penalty is crucial to model the evolution of the fracture energy released within the growth of the crack opening. Here, the operator is of importance to model the jump of the displacement between the two lips of the fractures. We refer to [19, 20, 21, 22] and Sect. 3.1 for more details.
The study of these problems for nonconvex penalties, including \(\ell ^\tau \), with \(\tau \) strictly positive and less than 1, the SCAD and the MCP functionals, and for linear operators not necessarily coinciding with the identity, is also motivated by applications different from those arising in fracture mechanics. For example, in imaging the \(\ell ^\tau \) quasinorm, with \(\tau \) strictly positive and less than 1, of the numerical gradient of the solution has been proposed as a nonconvex extension of the total variation (like TV) regularizer (see, e.g. [6, 10]) in order to reconstruct piecewise smooth solutions. The SCAD and the MCP penalties have been used for highdimensional regression and variable selection methods in highthroughput biomedical studies [23]. We mention also that the SCAD has been proposed as a nonconvex penalty in the network estimation to attenuate the bias problem [24].
The main difficulties in the analysis of these problems come from the interplay between the nonsmoothness, the nonconvexity, and the coupling between coordinates which is described by the operator inside the penalty. Since standard algorithms are not readily available, the resolution of these problems requires the development of new analytical and numerical techniques.
In the present paper, we propose a monotonically convergent algorithm to solve this kind of problems. This is an iterative procedure which solves the necessary optimality condition of a regularized version of the original problem. A remarkable property of our scheme is the strict monotonicity of the functional along the sequence of iterates. The convergence of the iteration procedure is proved under the same assumptions that guarantee the existence of solutions.
The performance of the scheme is successfully tested to simulate the evolution of cohesive fractures for several different test configurations. Then, we turn to an issue of high relevance, namely the comparison between two alternative algorithms, the GIST “General Iterative Shrinkage and Thresholding” algorithm for \(\ell ^\tau \) minimization, with \(\tau \) strictly positive and less than 1, and the FISTA “Fast Iterative ShrinkageThresholding Algorithm” for \(\ell ^1\) minimization. The comparison is carried out with respect to the infimal value reached by the iteration procedure and with respect to computing time. Our results show that the monotone algorithm is able to reach a smaller value of the objective functional that we consider when compared to the one of GIST. Note that differently from GIST, the monotone scheme solves a system of nonlinear equations at each iteration level. We remark that in [25], GIST was compared with the IRLS “iterative reweighted least squares” algorithm, which is another popular scheme for \(\ell ^\tau \) minimization, with \(\tau \) strictly positive and less than 1. The results of [25] show that GIST and IRSL have nearly the same performance, with only one difference which is speed, where GIST appears to be the faster one.
An analogous procedure to the one proposed in the present paper was developed in [20] to solve similar problems where the nonconvex penalty coincides with \(\ell ^\tau \) quasinorm, with \(\tau \) strictly positive and less than or equal to 1. With respect to [20], in the present paper, we deal with more general concave penalties. Moreover, we carry out several numerical experiments for diverse situations in cohesive fracture mechanics, comparing the behaviours for different concave penalties such as the SCAD, the MCP, and the \(\ell ^\tau \) penalty, with \(\tau \) strictly positive and less than 1. Finally, in the present paper, we compare the performance of the scheme with that of GIST.
Let us recall some further literature concerning nonconvex nonsmooth optimization of the type investigated in the present paper. In [12, 26], a primaldual active settype algorithm has been developed; in the case, the operator inside the penalty coincides with the identity. For more references, in this case, we refer to [20]. Concerning \(\ell ^\tau \) minimization, with \(\tau \) larger than or equal to zero and less than or equal to 1 when the operator is not the identity, other techniques have recently been investigated. Here, we mention iteratively reweighted convex majorization algorithms [10], alternating direction method of multiplier (ADMM) [9] and finally a Newtontype solution algorithm for a regularized version of the original problem [6]. Finally we recall the paper [21], where a novel algorithm for nonsmooth nonconvex optimization with linear constraints is proposed, consisting of a generalization of the wellknown nonstationaryaugmented Lagrangian method for convex optimization. The convergence to critical points is proved and several tests were made for freediscontinuity variational models, such as the Mumford–Shah functional. The nonsmoothness considered in [21] does not allow singular behaviour of the type that the \(\ell ^\tau \) term, with \(\tau \) larger than or equal to zero and strictly less than 1 does.
The paper is structured as follows. In Sect. 2, Sect. 2.1, we state the precise assumptions, in Sect. 2.2, we prove existence for the problem in consideration, in Sect. 2.3, we propose the monotone scheme to solve a regularized version of the original problem and we prove its convergence, and finally in Sect. 2.4, we study the asymptotic behaviour as the concavity and regularization parameters go to zero. In Sect. 3, we present the precise form of our scheme. In Sect. 3.1, we discuss our numerical experience for cohesive evolution of fracture mechanics and in Sect. 3.2, we compare the performance of our scheme to that of GIST for three different test cases, the academical Mmatrix example, an optimal control problem, and a microscopy imaging example.
2 Existence and Monotone Algorithm
2.1 Assumptions
 (a)
 (i)
\(\phi (t)\) is a constant, when \(t\ge t_0\) for some \(t_0>0\);
 (ii)
A is coercive, i.e. \(\text{ rank }(A)=n\).
 (i)
 (b)
 (i)
for some \(\gamma >0\), it holds \(\phi (at)=a^\gamma \phi (t)\) for all \(t \in {\mathbb {R}}\) and \(a \in {\mathbb {R}}^+\);
 (ii)
\(\text{ Ker }(A)\cap \text{ Ker }(\varLambda )=\{0\}.\)
 (i)
 \({{\varvec{\ell }}^\tau }\)
 \(\tau \in ]0,1], \lambda >0\)satisfying (b)(i).$$\begin{aligned} \phi (t)=\lambda t^\tau , \end{aligned}$$(2)
 SCAD
 \(\tau>1, \lambda >0\)satisfying (a)(i).$$\begin{aligned} \phi (t)=\left\{ \begin{array}{lll} \frac{\lambda ^2(\tau +1)}{2}, &{} \quad t\ge \lambda \tau ,\\ \frac{\lambda \tau t\frac{1}{2}(t^2+\lambda ^2)}{\tau 1}, &{}\quad \lambda < t\le \lambda \tau ,\\ \lambda t, &{}\quad t\le \lambda , \end{array} \right. \, \end{aligned}$$(3)
 MCP
 \(\tau>1, \lambda >0\)satisfying (a)(i).$$\begin{aligned} \phi (t)=\left\{ \begin{array}{lll} \lambda \left( t\frac{t^2}{2\lambda \tau }\right) , &{}\quad t< \lambda \tau ,\\ \frac{\lambda ^2\tau }{2},&{}\quad t\ge \lambda \tau , \end{array} \right. \, \end{aligned}$$(4)
Remark 2.1
The singularity at the origin of the three penalties leads to sparsity of the solution. In the SCAD and the MCP, the derivative vanishes for large values to ensure unbiasedness.
Problems as (1) with \(\phi \) given by the \(\ell ^\tau \)quasinorm with \(\tau \in ]0,1[\) were studied in [20]. For more details on its statistical properties, such as variable selection and oracle property, of the \(\ell ^\tau \)quasinorm, we refer to [14, 15, 27, 28].
2.2 Existence
First, we prove coercivity of the functional J in (1) under assumptions (a) or (b).
Lemma 2.1
Let assumptions (H) and either (a) or (b) hold. Then, the functional J in (1) is coercive.
Proof
In the following theorem, we state the existence of at least a minimizer to (1) under either (a) or (b). We omit the proof since it follows directly by the continuity and coercivity of the functional in (1).
Theorem 2.1
Let assumptions (H) and either (a) or (b) hold. Then, there exists at least one minimizer to problem (1).
Remark 2.2
We remark that when assumption (a) (i) holds but A is not coercive, existence can still be proven in case \(\varLambda \in {\mathbb {R}}^{n\times n}\) is invertible. Indeed, by the invertibility of \(\varLambda \), one can define \(\bar{y}=\varLambda ^{1}{\bar{x}}\), where \({\bar{x}}\) is a minimizer of \(\bar{J}(x)=\frac{1}{2}(A\varLambda ^{1})xb_2^2+\varPhi (x)\) and prove that \(\bar{y}\) is a minimizer of (1). The existence of a minimizer for the functional \(\bar{J}\) was proven in [26], Theorem 2.1.
However, in our analysis, we cover the two cases (a) and (b) since when (a) (ii) is replaced by the invertibility of \(\varLambda \), we cannot prove the coercivity of J, which is a key element for the convergence of the algorithm that we analyse (see the following section).
2.3 A Monotone Convergent Algorithm
Theorem 2.2
Assume (H) and either (a) or (b). For \(\varepsilon >0\), let \(\{x_k\}\) be generated by (10). Then, \(J_{\varepsilon }(x_k)\) is strictly monotonically decreasing, unless there exists some k such that \(x^k = x^{k+1}\), and \(x^k\) satisfies the necessary optimality condition (9). Moreover, every cluster point of \(x^k\), of which there exists at least one, is a solution of (9).
Proof
The proof strongly depends on the coercivity of the functional J and it follows arguments similar to those of [7, Theorem 4.1].
In the following proposition, we establish the convergence of (5) to (1) as \(\varepsilon \) goes to zero.
Proposition 2.1
Assume (H) and either (a) or (b). Denote by \(\{x_\varepsilon \}_{\varepsilon >0}\) a solution to (5). Then any cluster point of \(\{x_\varepsilon \}_{\varepsilon >0}\), of which there exists at least one, is a solution of (1).
Proof
From the coercivity of \(J_\varepsilon \), we have that \(\{x_\varepsilon \}_{\varepsilon }\) is bounded for \(\varepsilon \) small. Hence, there exists a subsequence and \({\bar{x}} \in {\mathbb {R}}^n\) such that \(x_{\varepsilon _l} \rightarrow {\bar{x}}\) as \(l \rightarrow \infty \).
2.4 Asymptotic Behaviour as \(\lambda \searrow 0\) and \(\tau \searrow 0\) for the Power Law
Theorem 2.3
Assume that \(ker(A) \cap ker (\varLambda )=\{0\}\) and let \(\tau >0\) be fixed. For each \(\lambda >0\), let \(x_{\lambda }\) be a minimizer of (21). Then, every cluster point of \(\{x_\lambda \}\), of which there exists at least one, is a solution to (22).
Proof
Theorem 2.4
Assume that \(\text{ rank }(A)=n\), and that \(\varLambda \in {\mathbb {R}}^{n \times n}\) is regular, and let \(\lambda >0\) be fixed. Then, every cluster point (of which there exists at least one) of solutions \(\{x_{\tau }\}\) to (21) converges as \(\tau \searrow 0\) to a solution of (25).
Proof
For any fixed \(i \in \{1,\ldots , r\}\), denote \(y_{\tau }=(\varLambda x_{\tau })_i\) and \({\bar{y}}=(\varLambda {\bar{x}})_i\) and notice that \(y_{\tau } \rightarrow \bar{y}\) as \(\tau \rightarrow 0\). If \({\bar{y}}>0\), we have \(\log (y_\tau ^\tau )=\tau \log (y_\tau ) \rightarrow 0\) as \(\tau \rightarrow 0\) and thus \( y_\tau ^\tau \rightarrow 1\) as \(\tau \rightarrow 0.\)
Remark 2.3
The assumption on \(\ker (A)\) is caused by the fact that the \(\cdot _0\) functional is not radially unbounded on \(\mathbb {R}^n\). Since Theorem 2.4 provides an existence result for (25), such an assumption is natural. To the best of our knowledge, existence of minimizers of (25) has only been addressed under assumption (a) (ii) (or assumptions implying it). In the context of algorithm development further, \(\ell ^1\) or \(\ell ^2\) regularization is often added, we refer to [12, 29, 30].
3 Algorithm and Numerical Results
Remark 3.1
Note that an \(\varepsilon \)continuation strategy is performed, that is, the procedure is performed for an initial value \(\varepsilon ^0\) and then \(\varepsilon \) is decreased up to a certain value. More specifically, in all our experiments, \(\varepsilon \) is initialized with \(10^{1}\) and decreased up to \(10^{12}\).
Remark 3.2
The stopping criterion is based on the \(l^\infty \)norm of Eq. (9) and the tolerance is set to \(10^{3}\) in all the following examples, except for the fracture problem where it is of the order of \(10^{15}\).
In the following subsection, we present our numerical results in cohesive fracture mechanics. Then, in Sect. 3.2, the performance of our algorithm is compared to two other schemes for nonconvex and nonsmooth optimization problems.
3.1 Application to QuasiStatic Evolution of Cohesive Fracture Models
In this section, we focus on the numerical realization of quasistatic evolutions of cohesive fractures. These kinds of problems require the minimization of an energy functional, which has two components: the elastic energy and the cohesive fracture energy. The underlying idea is that the fracture energy is released gradually with the growth of the crack opening. The cohesive energy, denoted by \(\theta \), is assumed to be a monotonic nondecreasing function of the jump amplitude of the displacement, denoted by \(\llbracket u \rrbracket \). Cohesive energies were introduced independently by Dugdale [31] and Barenblatt [32]; we refer to [19] for more details on the models. Among the vast existing literature on fracture mechanics, we also point out [33, 34, 35], as some of the most significant references to us. Let us just remark that the two models differ mainly in the evolution of the derivative \(\theta '(\llbracket u\rrbracket )\), that is, the bridging force, across a crack amplitude \(\llbracket u \rrbracket \). In Dugdale’s model, this force keeps a constant value up to a critical value of the crack opening and then drops to zero. In Barenblatt’s model, the dependence of the force on \(\llbracket u \rrbracket \) is continuous and decreasing.
In this section, we test the \(\ell ^\tau \)term \(0<\tau <1\) as a model for the cohesive energy. In particular, the cohesive energy is not differentiable in zero and the bridging force goes to infinity when the jump amplitude goes to zero. Note also that the bridging force goes to zero when the jump amplitude goes to infinity.
In our experiments, we consider three different types of cohesive energy, the \(\ell ^\tau \)\(\tau \in ]0,1[\), SCAD, and MCP penalties as defined in (2), (3), (4), respectively.
In Sects. 3.1.1 and 3.1.2, we show our results for onedimensional and twodimensional experiments, respectively.
3.1.1 OneDimensional Experiments
We remark that in the following experiments, the material function a(x) was always chosen as the identity. For tests with more general a(x), we refer to the twodimensional experiments reported in the following subsection. In Figs. 1 and 2, we report our results obtained by Algorithm 1, respectively, for the models \(\ell ^p\) and SCAD. In each figure, we show time frames to represent the evolution of the crack for different values of the parameter \(\tau \). Each time frame consists of three different time steps \((t_1, t_2, t_3)\), where \(t_2, t_3\) are chosen as the first instant where the prefracture and the fracture appear.

Pure elastic deformation: in this case, the jump amplitude is zero and the gradient of the displacement is constant in \(\varOmega \backslash \varGamma \);

Prefracture: the two lips of the fracture do not touch each other, but they are not free to move. The elastic energy is still present.

Fracture: the two parts are free to move. In this final phase, the gradient of the displacement (and then the elastic energy) is zero.
We remark that the formation of the crack is anticipated for smaller values of \(\tau \). As we see in Fig. 1, for \(\tau =.01\), prefracture and fracture are reached at \(t=.3\) and \(t=1.5\), respectively. As \(\tau \) is increased to \(\tau =.1\), prefracture and fracture occur at \(t=1\) and \(t=3\), respectively. We observe the same phenomenon for the SCAD (see Fig. 2).
We tested our algorithm also for the MCP model, where no prefracture phase can be observed, that is, the displacement breaks almost instantaneously to reach the complete fracture.
Finally, we remark that in our experiments, the residue is \(O(10^{16})\) and the number of iterations is small, e.g. 12, 15 for \(\tau =.01, .1\), respectively.
3.1.2 TwoDimensional Experiments
Our numerical experiments were conducted with a discretization in 2N intervals with \(N=80\). The time step, in the time discretization of [0, T] with \(T=3\), is set to \(\mathrm{d}t=0.01\). The parameters of the energy functional \(J_h(u_h)\) are set to \(\lambda =1, \gamma =50\). We perform two different series of experiments with boundary data, respectively, resulting from evaluating \(g_1, g_2\) on \(\partial \varOmega \), where \(g_1(t)(x)=(2x_10.5)t\) for every \(t \in [0,1], x=(x_1,x_2) \in \varOmega \) and the other one with boundary datum \(g_2(t)(x)=2t \cos (4(x_20.5))(x_10.5)\) for every \(t \in [0,1], x=(x_1,x_2) \in \varOmega .\) In Figs. 3, 4, 5, and 6, we show the results obtained with boundary datum \(g_1\) for each of the considered models, that is, \(\ell ^\tau \), SCAD, and MCP and in Fig. 7, the ones with boundary datum \(g_2\) for the \(\ell ^\tau \) model. In the case of boundary datum \(g_2\), we tested our algorithm also on the SCAD and the MCP models, obtaining similar results to the ones shown in Fig. 7. In these first experiments, the diagonal operators \(R_1, R_2\) are taken as the identity, that is, we suppose to have an homogeneous material.
As expected from a cohesive fracture model, we observe the three phases of pure elastic deformation, prefracture, and fracture (see Sect. 3.1.1 for an explanation of the model and the three phases).
Also, prefracture and fracture are reached at different times for different values of \(\tau \), typically they are anticipated for smaller values of \(\tau \).
3.2 Comparison with GIST
In this section, we present the result of experiments to compare the performance of Algorithm 1 with the following two other algorithms for nonconvex and nonsmooth minimization. We first compare with the GIST “General Iterative Shrinkage and Thresholding” algorithm for \(\ell ^\tau \), \(\tau <1\) minimization. We took advantage of the fact that for GIST^{1} an open source toolbox is available, which facilitated an unbiased comparison. Moreover, in [25], several tests were made to compare GIST and IRLS “Iteratively reweighted least squares”, showing that the two algorithms have nearly the same performance, with only significant difference in speed, where GIST appears to be the faster one.
Concerning \(\ell ^1\)minimization based algorithms, we compared our algorithm with the FISTA “Fast Iterative ShrinkageThresholding Algorithm”, see Sect. 3.2.
We remark that the results of [25] show no particular differences in the performance of the algorithm for different values of \(\tau \), except that the speed becomes much worse for p near to 1, say \(\tau =0.9\). Motivated also by this observations, the comparisons explained in the following were made for one fixed value of \(\tau \).
The comparison is carried out through the following three examples, the academical Mmatrix problem, an optimal control problem, and a microscopy imaging reconstruction example.
The monotone algorithm is stopped when the \(\ell ^\infty \)residue of the optimality condition (9) is of the order of \(10^{3}\) in the Mmatrix and optimal control problems and of the order of \(10^{8}\) in the imaging example. GIST is terminated if the relative change of the two consecutive objective function values is less than \(10^{5}\) or the number of iterations exceeds 1000. We remark that no significant changes were remarked by setting a lower tolerance than \(10^{5}\) or a bigger number of maximal iteration for GIST.
Since both GIST and the FISTA solve the problem (1) when the operator \(\varLambda \) coincides with the identity, we also make this choice in the following subsections. Finally, we remark that the three examples were analysed already in [20] with different aims.
3.2.1 MMatrix Example
We remark that in [37] and [20], the algorithm was also tested in the same situation for different values of \(\tau \) and \(\lambda \), showing, in particular and consistent with our expectations that the sparsity of the solution increases with \(\lambda \).
Here, we focus on the comparison between the performances of Algorithm 1 and GIST. In order to compare the two schemes, we focus on the value of the unregularized functional J in (34) reached by both algorithms, the time to acquire it, and the number of iterations. Our tests were conducted for \(\tau =0.5\), and \(\lambda \) incrementally increasing from \(10^{3}\) to 0.3. The parameter \(\varepsilon \) was decreased from \(10^{1}\) to \(10^{6}\). We report the values in Table 1 for \(\lambda =0.05\), since for the other values of \(\lambda \), the results we obtained are comparable.
We observe that Algorithm 1 achieves always lower values of the functional J, but in a longer time. The number of iterations needed by Algorithm 1 is smaller than the number of iterations of GIST for small values of \(\lambda \), more precisely for \(\lambda <0.1\). Note that for smaller \(\lambda \) the number of iterations of Algorithm 1 is smaller than the one of GIST. This suggests, consistent with our expectation, that the monotone scheme is slower than GIST mainly because it solves a nonlinear equation at each iteration.
Mmatrix example, \(\lambda =0.05\). (a) Comparison between value of functional, time and iterations between Algorithm 1 and GIST. (b) Value of iteration and time to which Algorithm 1 overcome GIST’s value of the functional
(a)  
\(\hbox {J}_{{\mathrm{GIST}}}\)  264.232  \(\hbox {J}_{{\mathrm{mon}}}\)  263.92 
Time\(_{{\mathrm{GIST}}}\)  0.701  Time\(_{{\mathrm{mon}}}\)  26.142 
Iterations\(_{{\mathrm{GIST}}}\)  384  Iterations\(_{{\mathrm{mon}}}\)  361 
(b)  
\(\hbox {J}_{{\mathrm{GIST}}}\)  264.232  Time\(_{{\mathrm{mon}}}\)  0.39 
Iter\(_{{\mathrm{mon}}}\)  5  Time\(_{{\mathrm{GIST}}}\)  0.701 
3.2.2 Optimal Control Problem
Similarly as in the previous subsection, we compare the values of the functional, the time and the number of iterations. The experiments are carried out for \(\tau =0.5\) and \(\lambda \) in the interval \(10^{3}\)0.2. We report only the values for the second control \(u_2\) since the first control \(u_1\) is always zero (as expected).
Optimal control problem. (a) and (b) Comparison between the value of J, time, iteration of Algorithm 1 and GIST. (c) Value of J, iterations, time for which Algorithm 1 overcomes GIST’s
\(\lambda \)  0.001  0.01 

(a)  
\(\hbox {J}_{{\mathrm{GIST}}}\)  0.073  0.599 
Time\(_{{\mathrm{GIST}}}\)  0.047  0.04 
Iterations\(_{{\mathrm{GIST}}}\)  157  3 
(b)  
\(\hbox {J}_{{\mathrm{mon}}}\)  0.068  0.185 
Time\(_{{\mathrm{mon}}}\)  15.140  14.866 
Iterations\(_{{\mathrm{mon}}}\)  28  32 
(c)  
\(\hbox {J}_{{\mathrm{mon}}}\)  0.071  0.185 
Iter\(_{{\mathrm{mon}}}\)  1  5 
Time\(_{{\mathrm{mon}}}\)  0.1  0.39 
Time\(_{{\mathrm{GIST}}}\)  0.047  0.04 
3.2.3 Compressed Sensing Approach for Microscopy Image Reconstruction
We compare Algorithm 1 and GIST in a microscopy imaging problem, in particular we focus on the STORM (stochastic optical reconstruction microscopy) method, based on stochastic switching and highprecision detection of single molecules to achieve an image resolution beyond the diffraction limit. The literature on the STORM has been intensively increasing, see e.g. [38, 39, 40, 41]. We refer in particular to [20] for a detailed description of the method and for more references.
The results shows that by GIST the Error− is always 197, whereas by Algorithm 1 is always under 53 and even smaller for small values of the noise. On the other hand, the Error\(+\) by GIST is always 0 and by Algorithm 1 is zero for small values of the noise and then monotonically increasing until it reaches 175 when the noise is equal to 0.1. Consistently with what expected, by Algorithm 1, the graphics show a linear decay w.r.t. the noise, differently from the behaviour showed by GIST. Moreover, the results found by Algorithm 1 lead to more accuracy in the recovery, in the sense that the quantity of missed emitters is smaller, whereas on the other hand, GIST seems to lead to a more sparser solutions (since the Error\(+\) is 0 by GIST).
Finally, we remark that in the case of the cross image, GIST is faster than our algorithm, consistently with the result presented in the previous subsection and as expected, since our algorithm solves a nonlinear equation for each minimization problem. On the other hand, in the case of the standard phantom image, GIST results to be far slower than Algorithm 1.
In Fig. 12, we report the results obtained in the same situation by the FISTA “Fast Iterative ShrinkageThresholding Algorithm” for \(\ell ^1\) minimization. We remark that by the FISTA, the Error\(+\) is always above 400, whereas by Algorithm 1 is zero for small value of the noise. This shows that Algorithm 1 leads to more sparsity with respect to the FISTA, consistently with our expectation since the FISTA is based on \(\ell ^1\) minimization.
4 Perspectives and Open Problems
An open problem of interest to us is the study of problems like (1) for the case where the linear mapping A is replaced by a nonlinear, smooth operator \(f:{\mathbb {R}}^n \mapsto {\mathbb {R}}^m\). One of the motivations arises from control of nonlinear dynamical system. We could proceed by iteratively applying the monotone scheme in Sect. 2.3 to auxiliary problems arising from linearization of f at the current iterate and by updating the sequence such obtained in an outer loop.
We are particularly interested in the case in which the nonlinear operator f is nonconvex. From a fracture mechanics point of view, this would mean considering not only smallstrain energy as in the current paper, but possibly polyconvex strain energy as in [44]. Considering a nonconvex energy would be more consistent from a mechanical point of view, and in particular in line with Coleman–Noll’s theorem.
5 Conclusions
We have developed a monotone convergent algorithm for a class of nonconvex nonsmooth optimization problems arising in the modelling of fracture mechanics and in imaging reconstruction, including the \(\ell ^\tau , \tau \in ]0,1]\), the smoothly clipped absolute deviation and the minimax concave penalty. Theoretically, we established the existence of a minimizer of the original problem under assumptions implying coercivity of the functional. Then, we derived necessary optimality conditions for a regularized version of the original problem. The optimality conditions for the regularized problem were solved through a monotonically convergent scheme based on an iterative procedure. We proved the convergence of the iteration procedure under the same assumptions that guarantee existence. A remarkable result is the strict monotonicity of the functional along the sequence of iterates generated by the scheme. Moreover, we proved the convergence of the regularized problem to the original one, as the regularization parameter goes to zero.
The procedure is very efficient and accurate. The efficiency and accuracy of the procedure was verified by numerical tests simulating the evolution of cohesive fractures and microscopy imaging. An issue of high relevance to us was the comparison of the scheme to two alternative algorithms, the GIST “General Iterative Shrinkage and Thresholding” algorithm for \(\ell ^\tau \) minimization, with \(\tau \) strictly positive and less than 1 and the FISTA “Fast Iterative ShrinkageThresholding Algorithm” for \(\ell ^1\) minimization. We first compared with GIST by focusing on the infimal value reached by the iteration procedure and on the computing time. Our results showed that the monotone algorithm is able to reach a smaller value of the objective functional when compared to GIST’s, therefore leading to a better accuracy. Finally we compared our scheme with FISTA in sparse recovery related to microscopy imaging. The results showed that the monotone scheme leads to more sparsity with respect to FISTA, as expected since FISTA concerns \(\ell ^1\) minimization.
Footnotes
 1.
The reference paper is [36], the toolbox can be found in https://github.com/rflamary/nonconvexoptimization.
Notes
Acknowledgements
Open access funding provided by University of Graz. This paper was supported by the ERC advanced Grant 668998 (OCLOC) under the EU’s H2020 research programme
References
 1.Candes, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inform. Theory 51(12), 4203–4215 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
 2.Donoho, D.L.: Compressed sensing. IEEE Trans. Inform. Theory 52(4), 1289–1306 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
 3.Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
 4.Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
 5.Bredies, K., Lorentz, D.A., Reiterer, S.: Minimization of nonsmooth, nonconvex functionals by iterative thresholding. J. Optim. Theory Appl. 165, 78–112 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
 6.Hintermüller, M., Wu, T.: Nonconvex \(TV^q\)models in image restoration: analysis and a trustregion regularizationbased superlinearly convergent solver. SIAM J. Imaging Sci. 6, 1385–1415 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
 7.Ito, K., Kunisch, K.: A variational approach to sparsity optimization based on Lagrange multiplier theory. Inverse Probl. 30, 015001 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
 8.Kalise, D., Kunisch, K., Rao, Z.: Infinite horizon sparse optimal control. J. Optim. Theory Appl. 172, 481–517 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
 9.Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25, 2434–2460 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
 10.Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8, 331–372 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
 11.Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
 12.Jiao, Y., Jin, B., Lu, X., Ren, W.: A primal dual active set algorithm for a class of nonconvex sparsity optimization, Preprint (2013)Google Scholar
 13.Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
 14.Chartrand, R., Staneva, V.: Restricted isometry properties and nonconvex compressive sensing. Inverse Probl. 24(3), 035020 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
 15.Foucart, S., Lai, M.J.: Sparsest solutions of underdetermined linear systems via \(\ell _q\)minimization for \(0<q\le 1\). Appl. Compt. Harmon. Anal. 26(3), 395–407 (2009)zbMATHCrossRefGoogle Scholar
 16.Sun, Q.: Recovery of sparsest signals via \(\ell ^q\)minimization. Appl. Comput. Harmon. Anal. 32(3), 329–341 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
 17.Zhang, C.H., Huang, J.: The sparsity and bias of the LASSO selection in highdimensional sparse estimation problems. Stat. Sci 27(4), 576–593 (2012)CrossRefGoogle Scholar
 18.Fan, J., Peng, H.: Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 32(3), 928–961 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
 19.Del Piero, G.: A variational approach to fracture and other inelastic phenomena. J. Elast. 112, 3–77 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
 20.Ghilli, D., Kunisch, K.: On monotone and primal dual active set schemes for sparsity optimization in \(\ell ^p\) with \(p\in ]0,1[\). Comput Optim. Appl. 72(1), 45–85 (2018)zbMATHCrossRefGoogle Scholar
 21.Artina, M., Fornasier, M., Solombrino, F.: Linearly constrained nonsmooth and nonconvex minimization. SIAM J. Optim. 23, 1904–1937 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
 22.Artina, M., Cagnetti, F., Fornasier, M., Solombrino, F.: Linearly constrained evolution of critical points and an application to cohesive fractures. Math. Models Methods Appl. Sci. 27(02), 231–290 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
 23.Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232–253 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
 24.Fan, J., Feng, Y., Wu, Y.: Network exploration via the adaptive LASSO and SCAD penalties. Ann. Appl. Stat. 3(2), 521–541 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
 25.Lyu, Q., Lin, Z., She, Y., Zhang, C.: A comparison of typical \(\ell ^p\) minimization algorithms. Neurocomputing 119, 413–424 (2013)CrossRefGoogle Scholar
 26.Jiao, Y., Jin, B., Lu, X.: A primal dual active set with continuation algorithm for the \(\ell ^0\)regularized optimization problem. Appl. Comput. Harmon. Anal. 39, 927–957 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
 27.Huang, J., Horowitz, J.L., Ma, S.: Asymptotic properties of bridge estimators in sparse highdimensional regression models. Ann. Stat. 26(2), 587–613 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
 28.Knight, K., Fu, W.: Asymptotics for lassotype estimators. Ann. Stat. 28(5), 1356–1378 (2000)MathSciNetzbMATHCrossRefGoogle Scholar
 29.Nikolova, M.: Minimizers of costfunctions involving nonsmooth datafidelity terms. Applications to the processing of outliers. SIAM J. Numer. Anal. 40, 965–994 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
 30.Nikolova, M., Ng, M.K., Zhang, S., Ching, W.K.: Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. 1, 2–25 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
 31.Dugdale, D.S.: Yielding of steel sheets containing slits. J. Mech. Phys. Solids 8, 100–104 (1960)CrossRefGoogle Scholar
 32.Barenblatt, G.I.: The mathematical theory of equilibrium cracks in brittle fracture. Adv. Appl. Math. Mech. 7, 55–129 (1962)MathSciNetCrossRefGoogle Scholar
 33.Freund, L.B.: Dynamic Fracture Mechanics. Cambridge University Press, Cambridge (2009)zbMATHGoogle Scholar
 34.Morozov, N., Petrov, Y.: Dynamics of Fractures, Foundations of Engineering Mechanics. Springer, Belrin (2000)CrossRefGoogle Scholar
 35.Klepaczko, J.R.: Crack Dynamics in Metallic Materials. CISM International Centre for Mechanical Sciences, Udine (1990)CrossRefGoogle Scholar
 36.Tuia, D., Flamary, R., Barlaud, M.: Nonconvex regularization in remote sensing. In: IEEE Transactions on Geoscience and Remote Sensing, Institute of Electrical and Electronics Engineers (2016)Google Scholar
 37.Ghilli, D., Kunisch, K.: A monotone scheme for sparsity optimization in \(\ell ^p\) with \(p\in ]0,1[\), IFAC WC Proceedings (2017)Google Scholar
 38.Rust, M., Bates, M., Zhuang, X.: Subdiffractionlimit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–796 (2006)CrossRefGoogle Scholar
 39.Betzig, E., Patterson, G.H., Sougrat, R., Lindwasser, O.W., Olenych, S., Bonifacino, J.S., Davidson, M.W., LippincottSchwartz, J., Hess, H.F.: Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006)CrossRefGoogle Scholar
 40.Hess, S.T., Girirajan, T.P., Mason, M.D.: Ultrahigh resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006)CrossRefGoogle Scholar
 41.Huang, B., Babcock, H.P., Zhuang, X.: Breaking the diffraction barrier: superresolution imaging of cells. Cell 143, 1047–1058 (2010)CrossRefGoogle Scholar
 42.Duval, V., Peyré, G.: Exact support recovery for sparse spikes deconvolution. Found. Comput. Math. 15, 1315–1355 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
 43.Candes, E., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59, 1207–1223 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
 44.Francfort, G.A., Marigo, J.J.: Revisiting brittle fracture as an energy minimization problem. J. Mech. Phys. Solids 46, 1319–1342 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.