1 Introduction

Filter line-search strategies have proven effective at solving large-scale, constrained nonlinear programs (NLPs) [6, 12, 16, 30, 37]. This algorithmic approach is implemented in the widely used IPOPT package [38]. In a filter line-search setting, it is essential to detect the presence of negative curvature and to regularize the Hessian of the Lagrangian when such is present. This ensures that the computed step is a descent direction when the constraint violation is sufficiently small. This in turn prevents the filter from accepting iteration subsequences that only improve the constraint violation.

The ability to handle negative curvature is essential when dealing with highly nonlinear and inherently ill-conditioned problems [43]. In a constrained NLP setting, the presence of negative curvature is detected using inertia information of the augmented matrix. Inertia information is provided by symmetric indefinite factorization routines such as MA27, MA57, MUMPS, and PARDISO [2, 20, 21, 35]. An inertia-revealing preconditioning strategy based on incomplete factorizations has also been proposed that enables the use of iterative linear strategies such as QMR [36]. Unfortunately, many modern and efficient linear algebra strategies and libraries are not capable of providing inertia information. Examples include iterative techniques such as multigrid (geometric and algebraic) and Lagrange–Newton–Krylov [7, 8]; parallel libraries for graphics processing units (GPUs) and distributed-memory systems such as MAGMA, ELEMENTAL, Trilinos, and PETSc [1, 4, 29]; and decomposition strategies for stochastic optimization, optimal control, and network problems that are widely used in convex optimization [22, 26, 3234, 41, 45]. Consequently, the need for inertia information hinders modularity, application scope, and scalability. Byrd et al. recently proposed a line-search exact penalty framework that does not require inertia information [11]. In their approach, termination tests are included to guarantee that the search step provides sufficient progress in the merit function. This approach can also deal with inexact linear algebra and has been extended to deal with rank-deficient Jacobians [17]. The strategy has also proven to be effective when used within an interior-point framework [18].

Trust-region strategies provide a natural mechanism to detect negative curvature. Modern trust-region implementations for constrained NLP decompose the search step into tangential and normal components. The tangential step is computed by approximately solving a trust-region subproblem [10, 23, 28]. This is typically done by using a projected conjugate gradient (PCG) scheme that detects negative curvature at the inner iterates. In the presence of a direction of negative curvature, the PCG path is continued along this direction until it reaches the trust-region boundary. This approach is guaranteed to be globally convergent because it can always improve the Cauchy step (or revert to it) [15]. It is well-known, however, that the quality of the step can be poor when the PCG procedure is terminated prematurely and this can result in excessive shrinking of the trust-region and slow progress. A Lanczos approach can be used to improve the quality of the PCG step when this hits the trust-region boundary but this requires a more expensive procedure [24]. Under a trust-region setting, one is limited in the linear algebra techniques that can be used to compute the search step. In particular, one requires schemes that are compatible with the globalization approach (i.e., improve the progress of the Cauchy step). Linear algebra strategies such as direct factorizations, GMRES, and QMR are not compatible in this sense. This is important because some of these linear algebra strategies might handle difficult linear systems more efficiently than PCG. This observation has in fact motivated the implementation of a hybrid trust-region and line-search strategy in the widely used KNITRO package [40]. In addition, trust-region settings require tailored preconditioners that project (exactly or inexactly) the iterates onto the nullspace of the constraint Jacobian [23, 28]. This can be a limitation when the preconditioner does not have the required projection properties or when it involves an iterative scheme such as multigrid. A line-search setting has the practical advantage that any linear algebra scheme can be used to compute the step (as long as the computed step is a descent direction) and this provides computational flexibility. Such flexibility motivates our interest in line-search approaches. The price to pay in a line-search setting is that regularization of the Hessian matrix (convexification) might be needed and this tends to decrease the quality of the computed step and increase the number of trial step computations. Trust-region settings, on the other hand, have the advantage that no explicit regularization is needed (this is done implicitly through the trust-region constraint) and this can enable more efficient handling of ill-posed problems such as those arising in parameter estimation [3]. Motivated by this property, we are interested in deriving step acceptance tests for line-search settings that limit the amount of regularization needed and that can better handle ill-posed problems.

In this work, we present an inertia-free filter line-search strategy for nonconvex NLPs. Motivated by curvature tests used in trust-region settings, the approach performs a curvature test along the tangential component of the computed step. The curvature test guarantees descent when the constraint violation is sufficiently small. When the test is not fulfilled, it triggers regularization of the Hessian matrix. We prove that global convergence of the algorithm can be guaranteed and that the norm of the step is a consistent criticality metric if the computed step satisfies the curvature test and if the iteration matrix is nonsingular at each iteration. These requirements are significantly less restrictive than the positive definiteness assumption for the reduced Hessian used in the standard filter line-search algorithm. We implement our developments in an interior-point framework and perform extensive numerical tests which include CUTE and Schittkowski problems as well as large-scale and highly nonlinear problems arising from power grid and natural gas networks. We demonstrate that two variants of the inertia-free approach are as efficient as the standard inertia detection strategy based on symmetric indefinite \(LBL^T\) factorizations in terms of iteration counts. We also demonstrate that the inertia-free variants can reduce the number of trial factorizations and solution times because of increased flexibility. We also demonstrate that such flexibility enables us to handle ill-posed problems more efficiently.

The paper is structured as follows. Section 2 presents the filter line-search algorithm of Wächter and Biegler [38, 39] in an interior-point framework and discusses assumptions needed to guarantee global convergence. Section 4 presents the new inertia-free strategies and establishes global convergence. Section 5 compares the numerical performance of both strategies. Section 6 presents concluding remarks.

2 Interior-point framework

Consider the NLP of the form

$$\begin{aligned} \min _{x\in \mathfrak {R}^{n}}&\; \quad f(x) \end{aligned}$$
(1a)
$$\begin{aligned} {\text {s.t.}}&\; c(x)=0 \end{aligned}$$
(1b)
$$\begin{aligned}&\;\,\, \quad \; x\ge 0. \end{aligned}$$
(1c)

Here, \(x\in \mathfrak {R}^{n}\) are primal variables and the objective and constraint functions are \(f:\mathfrak {R}^{n}\rightarrow \mathfrak {R}\) and \(c:\mathfrak {R}^{n }\rightarrow \mathfrak {R}^m\), respectively. We use a logarithmic barrier framework with subproblems of the form

$$\begin{aligned} \mathop {\min }_{x\in \mathfrak {R}^{n}}&\; \varphi ^{\mu } (x) := f(x) - \mu \displaystyle \sum _{j=1}^{n}\ln x^{(j)} \end{aligned}$$
(2a)
$$\begin{aligned} \text{ s.t. }&\; c(x) = 0 \end{aligned}$$
(2b)

where \(\mu >0\) is the barrier parameter and \(x^{(j)}\) is the jth entry of vector x. We consider a framework that solves a sequence of barrier problems (2) and drives the barrier parameter \(\mu \) monotonically to zero.

To approximately solve each barrier problem, we apply Newton’s method to its optimality conditions:

$$\begin{aligned} \nabla _x \varphi ^{\mu }(x) + \nabla _x c(x) \lambda&= 0 \end{aligned}$$
(3a)
$$\begin{aligned} c(x)&=0 \end{aligned}$$
(3b)

while enforcing \(x\ge 0\) along the search. Here, \(\lambda \in \mathfrak {R}^{m}\) are multipliers for equality constraints. The primal variables and multipliers at iteration k are denoted as \((x_k,\lambda _k)\). Their corresponding search directions \((d_k, \lambda _k^{+}-\lambda _k)\) can be obtained by solving the linear system

$$\begin{aligned} \left[ \begin{array}{ll} W_k &{}\quad {J}_k^T\\ {J}_k &{}\quad 0 \end{array} \right] \left[ \begin{array}{c} d_k \\ \lambda _k^+ \end{array} \right] = - \left[ \begin{array}{c} g_k\\ c_k \end{array} \right] . \end{aligned}$$
(4)

We refer to this system as the augmented system. Here, \(c_k:=c(x_k)\), \(J_k := \nabla _x c(x_k)^T\in \mathfrak {R}^{m\times n}\), \({g_k} := \nabla _x \varphi _k^{\mu } \), \(W_k:=H_k+\varSigma _k\), \({H}_k:=\nabla _{xx}{\mathcal {L}}(x_k,\lambda _k)\in \mathfrak {R}^{n\times n}\), \({\mathcal {L}}(x_k, \lambda _k) := \varphi ^{\mu }(x_k) + \lambda ^T_kc(x_k)\), and \(\varSigma _k:= {X}^{-2}_k\) with \(X_k:={\text {diag}}(x_k)\). One can show that the primal-dual approximation \(\varSigma _k\approx X_k^{-1}V_k\), where \(V_k:={\text {diag}}(\nu _k)\) and \(\nu _k\) are multiplier estimates for the bounds (1c), can be used as long as the products \(x_k^{(j)}\nu _k^{(j)}\) remain proportional to \(\mu \) [18, 39]. To enable compact notation, we define the augmented matrix

$$\begin{aligned} M_k :=\left[ \begin{array}{cc}W_k&{} J_k^T\\ J_k &{}0 \end{array}\right] . \end{aligned}$$
(5)

We can also consider the computation of the search directions \(d_k\) using the decomposition

$$\begin{aligned} d_k=n_k+t_k. \end{aligned}$$
(6)

Here, \(n_k\) is computed from

$$\begin{aligned} \left[ \begin{array}{ll} W_k &{}\quad {J}_k^T\\ {J}_k &{}\quad 0 \end{array} \right] \left[ \begin{array}{c} n_k \\ \cdot \end{array} \right] = - \left[ \begin{array}{c} 0\\ c_k \end{array} \right] , \end{aligned}$$
(7)

and \(t_k\) is computed from

$$\begin{aligned} \left[ \begin{array}{ll} W_k &{} \quad {J}_k^T\\ {J}_k &{}\quad 0 \end{array} \right] \left[ \begin{array}{c} t_k \\ \lambda _k^+ \end{array} \right] = - \left[ \begin{array}{c} g_k+W_kn_k\\ 0 \end{array} \right] , \end{aligned}$$
(8)

where \(\lambda _k^+\) is the multiplier update.

We define a two-dimensional filter of the form \({\mathcal {F}}:=\{(\theta (x), \varphi (x))\}\) with \(\theta (x)=\Vert c(x)\Vert \), \(\varphi (x):=\varphi ^{\mu }(x)\) for a fixed barrier parameter \(\mu \), where \(\Vert \cdot \Vert \) is the Euclidean norm. At each value of \(\mu \), the filter is initialized as

$$\begin{aligned} {\mathcal {F}}_0:=\{(\theta , \varphi ) ~|~ \theta \ge \theta ^{max}\} \end{aligned}$$
(9)

with a given parameter \(\theta ^{max}>0\). The filter at iteration k is denoted as \({\mathcal {F}}_k\). Given a search step \(d_k\), a line search is started from counter \(\ell \leftarrow 0\) and \(\alpha _{k,0}=\alpha _k^{max}\le 1\) to define trial iterates \(x_k(\alpha _{k,l}):=x_k+\alpha _{k,\ell }d_k\).

We define

$$\begin{aligned} m_k(\alpha ) := \alpha g_k^T d_k \end{aligned}$$
(10)

as the linear model of \(\varphi (x_k+\alpha d_k)-\varphi (x_k)\). We note that \(d_k\) is a descent directionFootnote 1 if \(m_k(\alpha )<0\). Having constants \(\kappa _{\theta } > 0\), \(s_{\theta } > 1\), \(s_{\varphi } \ge 1\), \(\gamma _\theta \in (0,1)\), and \(\eta _{\varphi }\in (0,1)\), we consider the following conditions to check whether a trial iterate should be accepted.

  • Filter Condition FC

    $$\begin{aligned} (\theta (x_k(\alpha _{k,\ell })),\varphi (x_k(\alpha _{k,\ell }))) \notin {\mathcal {F}}_k \end{aligned}$$
  • Switching Condition SC

    $$\begin{aligned} -m_k(\alpha _{k,\ell })>0 \text{ and } [-m_k(\alpha _{k,\ell })]^{s_{\varphi }} [\alpha _{k,\ell }]^{1-s_{\varphi }} > \kappa _{\theta } [\theta (x_k )]^{s_\theta } \end{aligned}$$
  • Armijo Condition AC

    $$\begin{aligned} \varphi (x_k (\alpha _{k,\ell })) \le \varphi (x_k ) + \eta _{\varphi }m_k (\alpha _{k,\ell } ). \end{aligned}$$
  • Sufficient Decrease Condition SDC

    $$\begin{aligned} \theta (x_k(\alpha _{k,\ell } )) \le (1 - \gamma _\theta )\theta (x_k ) \text{ or } \varphi (x_k (\alpha _{k,\ell } )) \le \varphi (x_k ) - \gamma _{\varphi }\theta (x_k ). \end{aligned}$$

The filter condition FC is the first requirement for accepting a trial iterate \(x_k(\alpha _{k,\ell })\). If the pair \((\theta (x_k(\alpha _{k,\ell })),\varphi (x_k(\alpha _{k,\ell })))\in {\mathcal {F}}_k\) (i.e., the trial iterate is contained in the filter), the step is rejected and we decrease the stepsize. If the trial iterate is not contained in the filter, we continue testing additional conditions. We have two possible cases:

  • If SC holds, the step \(d_k\) is a descent direction, and we check whether AC holds. If AC holds, we accept the trial point \(x_k(\alpha _{k,\ell })\). If not, we decrease the stepsize.

  • If SC does not hold and SDC holds, we accept the trial iterate \(x_k(\alpha _{k,\ell })\). If not, we decrease the stepsize.

If the trial iterate \(x_k(\alpha _{k,\ell })\) is accepted in the second case, the filter is augmented as

$$\begin{aligned} {\mathcal {F}}_{k+1}\leftarrow {\mathcal {F}}_{k}\cup \{(\theta , \varphi ) ~|~ \varphi \ge \varphi (x_k)-\gamma _\varphi \theta (x_k), ~~ \theta \ge (1-\gamma _\theta ) \theta (x_k) \} \end{aligned}$$
(11)

with parameters \(\gamma _\varphi \in (0,1)\); otherwise, we leave the filter unchanged (i.e., \({\mathcal {F}}_{k+1}\leftarrow {\mathcal {F}}_k\)). If the trial stepsize \(\alpha _{k,\ell }\) becomes smaller than \(\alpha ^{min}_k\) and the step has not been accepted in either case, we revert to feasibility restoration, and the filter is augmented. A strategy to obtain \(\alpha ^{min}_k\) is proposed in [39]. We define the set \({\mathcal {R}}_{inc}\) as the set of iteration counters k in which feasibility restoration is called.

We refer to the first condition of SDC as SDCC to emphasize that this condition accepts the trial iterate if it improves the constraint violation. Similarly, we refer to the second condition as SDCO to emphasize that this condition accepts the trial iterate if it improves the objective function. We refer to successful iterates in which the filter is not augmented (iterates in which the switching condition SC holds) as f -iterates. The filter line-search algorithm is summarized below.

Filter line-search algorithm

  1. 0.

    Given starting point \(x_0,\lambda _0\), constants \(\theta _{max} \in (\theta (x_0), \infty ]\), \(\gamma _\theta \), \(\gamma _{\varphi } \in (0,1)\), \(\eta _{\varphi }\in (0,1)\), \(\kappa _{\theta } > 0\), \(s_\theta > 1\), \(s_{\varphi }\ge 1\) and \(0 < \tau _1 \le \tau _2 < 1\).

  2. 1.

    Initialize filter \({\mathcal {F}}_0:=\{(\theta ,\varphi )\,:\,\theta \ge \theta _{max}\}\) and iteration counter \(k\leftarrow 0\).

  3. 2.

    Check Convergence. Stop if \(x_k\) is a stationary point (i.e., satisfies (3)).

  4. 3.

    Compute Search Direction. Compute step \((d_k,\lambda _k^+-\lambda _k)\) from (6)-(8).

  5. 4.

    Backtracking Line-Search.

    1. (a)

      Initialize. Set \(\alpha _{k,0}\leftarrow \alpha _{k}^{max}\) and counter \(\ell \leftarrow 0\).

    2. (b)

      Compute Trial Point. If \(\alpha _{k,\ell }\le \alpha _{k}^{min}\) revert to feasibility restoration in Step 8. Otherwise, set trial point \(x_{k}(\alpha _{k,\ell })\leftarrow x_k+\alpha _{k,\ell }d_k\).

    3. (c)

      Check Acceptability to the Filter. If FC does not hold, reject trial point \(x_k(\alpha _{k,\ell })\), and go to Step 4e.

    4. (d)

      Check Sufficient Progress.

      1. i

        If SC and AC hold, accept trial point \(x_k(\alpha _{k,\ell })\) and go to Step 5.

      2. ii

        If SC does not hold and SDC hold, accept trial point \(x_k(\alpha _{k,\ell })\), and go to Step 5. Otherwise, go to Step 4e.

    5. (e)

      New Trial stepsize. Choose \(\alpha _{k,\ell +1}\in [\tau _1\alpha _{k,\ell },\tau _2\alpha _{k,\ell }]\), set \(\ell \leftarrow \ell +1\), and go to Step 4b.

  6. 5.

    Accept Trial Point. Set \(\alpha _k\leftarrow \alpha _{k,\ell }\) and \(x_{k+1}\leftarrow x_k(\alpha _{k,\ell })\).

  7. 6.

    Augment Filter. If SC is not satisfied, augment filter using (11). Otherwise, leave filter unchanged.

  8. 7.

    Next Iteration. Increase iteration counter \(k\leftarrow k+1\) and go to Step 2.

  9. 8.

    Feasibility Restoration. Compute an iterate \(x_{k+1}\) that satisfies FC and SDC. Augment filter using (11), and go to Step 7.

The above algorithm seeks to solve the barrier subproblem (2) approximately for a fixed \(\mu \). Once this problem is solved, we decrease the barrier parameter \(\mu \) and we reset the filter (9). This outer loop is repeated until we find a stationary point for the original NLP (1). We highlight that the global convergence analysis of the filter line-search algorithm discussed in the next section does not require a particular stepsize rule for the multipliers (it only requires boundedness of the multiplier updates \(\lambda _k^+\)). Consequently, we do not provide a explicit stepsize rule in the above algorithm. This is left as an implementation issue at this point that is discussed in Sect. 5.

3 Inertia-based strategy

The global convergence analysis of the filter line-search algorithm provided in [39] assumes a step decomposition of the form

$$\begin{aligned} d_k&= Y_k\bar{q}_k+Z_k\bar{p}_k \end{aligned}$$
(12a)
$$\begin{aligned} \bar{q}_k&:=-(J_kY_k)^{-1}c_k \end{aligned}$$
(12b)
$$\begin{aligned} \bar{p}_k&:=-(Z_k^TW_kZ_k)^{-1}Z_k^T(g_k+W_kY_k\bar{q}_k), \end{aligned}$$
(12c)

where \(Y_k\in \mathfrak {R}^{n\times m}\) and \(Z_k\in \mathfrak {R}^{n\times (n-m)}\) are matrices such that the columns of \([Y_k\; Z_k]\) form an orthonormal basis for \(\mathfrak {R}^n\) and the columns of \(Z_k\) are the basis of the null space of \(J_k\) (i.e., \(J_kZ_k=0\)). The convergence analysis also relies on the criticality measure

$$\begin{aligned} \chi _k := \Vert Z_k\bar{p}_k\Vert =\Vert \bar{p}_k\Vert , \end{aligned}$$
(13)

where the last identity follows from the orthonormality of \(Z_k\) (i.e., \(Z_k^TZ_k=I\)). The analysis requires assumptions (G) in [39]. To facilitate the discussion, we state the relevant assumptions here:

Assumptions (G)

\((\mathrm{G}1)\) :

There exists an open set \(\varOmega \subseteq \mathfrak {R}^n\) with \([x_k,x_k+d_k]\subseteq \varOmega \) for all \(k\notin {\mathcal {R}}_{inc}\) in which \(\varphi (\cdot )\) and \(c(\cdot )\) are twice differentiable and their values and derivatives are bounded.

\((\mathrm{G}2)\) :

The matrices \(W_k\) are uniformly bounded for all \(k \notin {\mathcal {R}}_{inc}\).

\((\mathrm{G}3)\) :

The matrices \(W_k\) are uniformly positive definite on the null space of the Jacobian \(J_k\).

\((\mathrm{G}4)\) :

There exists a constant \(c_A > 0\) so that for all \(k\notin {\mathcal {R}}_{inc}\), the smallest singular value of \(J_k\) is bounded by \(c_{A}\).

\((\mathrm{G}5)\) :

There exists a constant \(\theta _{inc}>0\) such that \(k\notin {\mathcal {R}}_{inc}\) whenever \(\theta (x_k)\le \theta _{inc}\).

Assumptions (G3) and (G4) are needed to guarantee that \(\chi _k\) is a valid criticality measure in the sense that it converges to zero as we approach a first-order stationary point. To see this, consider a subsequence \(\{x_{k_i}\}\) with \({\text {lim}}_{i\rightarrow \infty }\chi _{k_i}=0\) and \({\text {lim}}_{i\rightarrow \infty } x_{k_i}=x^*\) for some feasible point \(x^*\). If the Jacobian is of full row rank (implied by assumption (G4)) and (12b) holds, we have that \({\text {lim}}_{i\rightarrow \infty }\bar{q}_{k_i}=0\) as \({\text {lim}}_{i\rightarrow \infty } x_{k_i}=x^*\). From \({\text {lim}}_{i\rightarrow \infty }\chi _{k_i}\), (12c), and (13); and because (G3) guarantees nonsingularity of the reduced Hessian we have that \({\text {lim}}_{i\rightarrow \infty }\Vert Z_{k_i}^Tg_{k_i}\Vert =0\). In Sect. 4 we will observe that a decomposition of the form (12) can always be obtained when the Jacobian is of full row rank and the augmented matrix is nonsingular (these are less restrictive assumptions). Consequently, the step computed from the augmented system (8) is equivalent to that obtained with (12).

Positive definiteness of the reduced Hessian (G3) also guarantees that the search direction is of descent when the constraint violation is sufficiently small and the criticality measure is nonzero (see Lemma 2 in [39]). As we will discuss in Sect. 4, this descent lemma is essential for establishing global convergence. Assumption (G3) can be enforced in a practical setting by monitoring the inertia (number of positive, negative, and zero eigenvalues) of the augmented matrix \(M_k\) and correcting it (if necessary) by regularizing the Hessian matrix as \(W_k\leftarrow W_k+\delta I\) for \(\delta \ge 0\). This convexification approach is justified from the results of Gould [25], which show that the reduced Hessian is positive definite if and only if the augmented matrix \(M_k\) has n positive, m negative, and no zero eigenvalues. We state this condition formally as

$$\begin{aligned} {\text {Inertia}}(M_k) = \{n,m,0\}. \end{aligned}$$
(14)

The augmented matrix \(M_k\) can be decomposed as \(LBL^T\) by using symmetric indefinite factorizations where L is a unit lower triangular matrix and B is a block diagonal matrix composed of \(1\times 1\) and \(2\times 2\) diagonal blocks. From Sylvester’s law of inertia we know that the eigenvalues of \(M_k\) are the eigenvalues of B. Furthermore, because each \(2\times 2\) block is constructed by having one positive and one negative eigenvalue, the inertia of \(M_k\) can be estimated from the inertia of B [9].

We can enforce assumption (G3) using the following procedure. The matrix \(M_k\) is decomposed as \(LBL^T\) for \(\delta =0\) and the search direction \(d_k\) is computed using this decomposition. If the inertia is correct (i.e., condition (14) is satisfied), the search direction \(d_k\) is used as trial step in the line-search procedure. If the inertia is not correct, the regularization parameter \(\delta \) is increased, the augmented matrix is refactorized and a new direction \(d_k\) is obtained. The procedure is repeated until the matrix \(M_k\) has the correct inertia. Heuristics are incorporated to accelerate the rate of increase/decrease of \(\delta \) in order to ensure that the number of trial factorizations is not too large (because each factorization is computationally expensive). The inertia-based regularization strategy implemented in the current version of IPOPT [38] is shown below.

Inertia-based regularization (IBR)

  1. IBR-1

    Factorize \(M_k\) with \(\delta =0\). If (14) holds, compute \(d_k\) and stop.

  2. IBR-2

    If \(\delta ^{last}=0\), set \(\delta \leftarrow \bar{\delta }^0\), otherwise set \(\delta \leftarrow {\text {max}}\{\bar{\delta }^{min},\kappa ^-\delta ^{last}\}\).

  3. IBR-3

    Factorize \(M_k\) with current \(\delta \). If (14) holds, set \(\delta ^{last}\leftarrow \delta \), compute \(d_k\) and stop.

  4. IBR-4

    If \(\delta ^{last}=0\), set \(\delta \leftarrow \hat{\kappa }^+\delta \), otherwise set \(\delta \leftarrow \kappa ^+\delta \) and go to IBR-3.

Here, \(0<\bar{\delta }^{min}<\bar{\delta }^0\), \(0<\kappa ^-<1<\kappa ^+<\hat{\kappa }^+\) are given constants.

4 Inertia-free strategies

Estimating the inertia of \(M_k\) can be complicated or impossible when a decomposition of the form \(LBL^T\) is not available. As we noted in the introduction, this situation limits our options for computing the search step and motivates us to consider inertia-free strategies. If we take one step back we realize that the primary practical intention of inertia correction is to guarantee that the direction \(d_k\) is of descent when the constraint violation is sufficiently small. This approach, however, introduces a disconnect between the regularization procedure (IBR) and the filter line-search globalization procedure. In particular, the inertia test (14) is based solely on the structural properties of \(M_k\) and not on the computed direction \(d_k\). Hence, the regularization procedure IBR can implicitly discard productive descent directions in attempting to enforce a correct inertia. We thus consider another route to enforce descent.

We first discuss a inertia-free strategy that uses the step decomposition (6), system (7)–(8), and the criticality measure

$$\begin{aligned} \varPsi _{k}^t:=\Vert t_k\Vert . \end{aligned}$$
(15)

As we will argue in Sect. 4.1, a step decomposition is not strictly necessary, but it is advantageous for the analysis and can be used to enable the use of PCG strategies [23].

For our analysis we make the following assumptions.

Assumptions (RG)

  1. (RG1)

    There exists an open set \(\varOmega \subseteq \mathfrak {R}^n\) with \([x_k,x_k+d_k]\subseteq \varOmega \) for all \(k\notin {\mathcal {R}}_{inc}\) in which \(\varphi (\cdot )\) and \(c(\cdot )\) are twice differentiable and their values and derivatives are bounded.

  2. (RG2)

    The matrices \(W_k\) are uniformly bounded for all \(k \notin {\mathcal {R}}_{inc}\).

  3. (RG3)

    There exists a constant \(\alpha _t>0\) such that the step components \(n_k,t_k\) satisfy the following curvature condition (RG3a):

    $$\begin{aligned} t^T_kW_kt_k + {\text {max}}\{t_k^TW_kn_k-g_k^Tn_k,0\}\ge \alpha _tt_k^Tt_k. \end{aligned}$$
    (16)

    Furthermore, the augmented matrix \(M_k\) is nonsingular and its inverse is bounded for all \(k\notin {\mathcal {R}}_{inc}\) (RG3b).

  4. (RG4)

    There exists a constant \(c_A > 0\) so that for all \(k\notin {\mathcal {R}}_{inc}\), the smallest singular value of \(J_k\) is bounded by \(c_{A}\).

  5. (RG5)

    There exists a constant \(\theta _{inc}>0\) such that \(k\notin {\mathcal {R}}_{inc}\) whenever \(\theta (x_k)\le \theta _{inc}\).

The key difference between assumptions (RG) and assumptions (G) used in the standard filter line-search algorithm is that we do not require positive definiteness of the reduced Hessian (G3). This requirement is replaced by assumptions (RG3) and we now show that these are less restrictive and sufficient to guarantee global convergence. We begin by showing that conditions (RG3b) and (RG4) are sufficient to guarantee that the reduced Hessian is nonsingular and that (15) is a valid criticality measure.

Lemma 1

Let (RG3b) and (RG4) hold and define \(M=M_k\), \(W=W_k\), \(Z=Z_k\), and \(J=J_k\). Then (i) the reduced Hessian \(Z^TWZ\) is nonsingular and (ii) the inverse of M admits a decomposition of the form

$$\begin{aligned} M_{inv}=\left[ \begin{array}{ll}W&{}\quad J^T\\ J &{}\quad 0\end{array}\right] ^{-1} = \left[ \begin{array}{ll}P ~&{} \quad (I-PW)J^TV^{-1}\\ V^{-1}J(I-WP) ~&{}\quad - V^{-1}J(W-WPW)J^TV^{-1}\end{array}\right] \end{aligned}$$
(17)

with

$$\begin{aligned} P=Z(Z^TWZ)^{-1}Z^T \end{aligned}$$
(18)

and \(V=JJ^T\).

Proof

Part (i) follows from Lemmas 3.2 and 3.4 in [25]. These results establish that if the Jacobian is of full row rank (RG4) there exists a nonsingular matrix R such that

$$\begin{aligned} RMR^T=\left[ \begin{array}{ccc}&{}&{}I\\ &{}Z^TWZ\\ I\end{array}\right] . \end{aligned}$$
(19)

From Sylvester’s law of inertia and the structure of \(RMR^T\) we have that

$$\begin{aligned} {\text {Inertia}}(M)={\text {Inertia}}(Z^TWZ) + \left\{ m,m,0\right\} . \end{aligned}$$
(20)

Consequently, the number of zero eigenvalues of M is equal to the number of zero eigenvalues of \(Z^TWZ\). By assumption (RG3b) we know that M is nonsingular and thus we know that it does not have zero eigenvalues. Consequently, \(Z^TWZ\) does not have zero eigenvalues either and therefore is nonsingular.

To prove (ii), we first note that (RG3b) and (RG4) guarantee that P exists. We denote the blocks of \(M_{inv}M\) as \(A_{11}\), \(A_{12}=A_{21}^T\), \(A_{22}\), and we seek to prove that \(M_{inv}M=I\). By direct calculation and by noticing that \(J^T(JJ^T)^{-1}J=I-ZZ^T\) [5, p. 20] and \(JZ=0\), we obtain

$$\begin{aligned} A_{11}&=WP+J^TV^{-1}J(I-WP)\nonumber \\&=WP+(I-ZZ^T)(I-WP)\nonumber \\&=I-ZZ^T+ZZ^TWZ(Z^TWZ)^{-1}Z^T\nonumber \\&=I \end{aligned}$$
(21a)
$$\begin{aligned} A_{12}&=W(I-PW)J^T(JJ^T)^{-1}-J^T(JJ^T)^{-1}J(W-WPW)J^T(JJ^T)^{-1}\nonumber \\&=ZZ^T(W-WPW)J^T(JJ^T)^{-1}\nonumber \\&=\left( ZZ^TW-ZZ^TWZ(Z^TWZ)^{-1}Z^TW\right) J^T(JJ^T)^{-1}\nonumber \\&=0 \end{aligned}$$
(21b)
$$\begin{aligned} A_{22}&=J(I-PW)J^T(JJ^T)^{-1}\nonumber \\&=J(I-Z(Z^TWZ)^{-1}Z^TW)J^T(JJ^T)^{-1}\nonumber \\&=JJ^T(JJ^T)^{-1}-JZ(Z^TWZ)^{-1}Z^TWJ^T(JJ^T)^{-1}\nonumber \\&=I. \end{aligned}$$
(21c)

The proof is complete. \(\square \)

From (7), (8), and the explicit form of the inverse of \(M_k\) in (17) we have that,

$$\begin{aligned} n_k&=-(I-Z_k(Z_k^TW_kZ_k)^{-1}Z^T_kW_k)J^T_k(J_kJ^T_k)^{-1}c_k, \end{aligned}$$
(22)

and

$$\begin{aligned} t_k&=-Z_k(Z_k^TW_kZ_k)^{-1}Z^T_k(g_k+W_kn_k). \end{aligned}$$
(23)

We thus see that the tangential step is equivalent to the tangential component \(Z_k\bar{p}_k\) obtained from the decomposition (12). Consequently, Lemma 1 implies that an expression for the tangential step of the form in (12) can always be obtained if the Jacobian is of full row rank and the augmented matrix is nonsingular. We also note that the expression for the normal component in (12) is not equivalent to that in (22) but this is not necessary as long as \(n_k\) obtained from (7) is bounded and decays to zero as we approach a feasible point (equivalence can be achieved by replacing \(W_k\) with I in (7)). These observations allow us to prove that the measure (15) is a valid criticality measure under assumptions (RG), as stated in the next result.

Theorem 1

Consider a subsequence \(\{x_{k_i}\}\) with \({\text {lim}}_{i\rightarrow \infty } x_{k_i}=x^*\) for a feasible \(x^*\), let (RG3b) and (RG4) hold, let \(n_{k_i}\) solve (7), and let \(t_{k_i}\) solve (8). Then

$$\begin{aligned} \mathop {{\text {lim}}}_{i\rightarrow \infty } \varPsi _{k_i}^t=0 \implies \mathop {{\text {lim}}}_{i\rightarrow \infty } \left\| Z_{k_i}^Tg_{k_i}\right\| =0 \end{aligned}$$

for \(Z_{k_i}\) spanning the null space of \(J_{k_i}\).

Proof

Define \(M:=M_{k_i}\), \(W:=W_{k_i}\), \(J:=J_{k_i}\), and \(Z:=Z_{k_i}\). Boundedness of \(n_k\) follows from (RG1), which guarantees that the right-hand side of the augmented system (7) is bounded and from (RG3b) which guarantees that the inverse of \(M_k\) is bounded. Boundedness of \(n_k\) together with (RG1) and (RG2) guarantee that the right hand side of (8) is bounded. This fact, together with (RG3b), guarantee that \(t_k\) is bounded. We thus have that condition \({\text {lim}}_{i \rightarrow \infty } x_{k_i}=x^*\) for feasible \(x^*\) ensures \({\text {lim}}_{i\rightarrow \infty }n_{k_i}=0\). From Lemma 1 we know that \(Z^TWZ\) is nonsingular and therefore the projection matrix \(P_{k_i}\) exists and is nonsingular. From (15), (23), and \({\text {lim}}_{i\rightarrow \infty } n_{k_i}=0\) as \({\text {lim}}_{i\rightarrow \infty } x_{k_i}=x^*\) we obtain the result. \(\square \)

We now show that the curvature condition of assumption (RG3a) is sufficient to ensure that the step \(d_k\) is a descent direction.

Lemma 2

Let Assumptions (RG1)–(RG5) hold. If \(x_{k_i}\) is a subsequence of iterates for which \(\varPsi _{k_i}^t\ge \epsilon \) with a constant \(\epsilon \) independent of i, then there exist positive constants \(\epsilon _1,\epsilon _2\) such that

$$\begin{aligned} \theta _{k_i}\le \epsilon _1 \implies \frac{m_{k_i}(\alpha )}{\alpha }\le -\epsilon _2. \end{aligned}$$

Proof

Define \(W:=W_{k_i}\), \(J:=J_{k_i}\), \(g:=g_{k_i}\), \(d:=d_{k_i}\), \(\varPsi :=\varPsi _{k_i}^t\), \(\theta :=\theta _{k_i}\), \(n:=n_{{k_i}}\), and \(t:=t_{k_i}\). Multiplying the first row of (8) by \(t^T\) and recalling that \(Jt=0\), we obtain

$$\begin{aligned} t^TWt&= -g^Tt-t^TWn. \end{aligned}$$

We know that \(g^Td=g^Tn+g^Tt\). Thus, combining terms, we obtain

$$\begin{aligned} -g^Td&= t^TWt +t^TWn - g^Tn. \end{aligned}$$

We consider two cases. In the first case we have that \(t^TWn - g^Tn<0\), and the curvature condition (16) guarantees that \(t^TWt\ge \alpha _tt^Tt\) and \(-g^Td\ge \alpha _tt^Tt +t^TWn - g^Tn\). From (RG1) we can obtain the bounds \(|t^TWn|\le \bar{c}_1\varPsi \theta \) and \(|g^Tn|\le \bar{c}_2\theta \) for \(\bar{c}_1,\bar{c}_2>0\). From (15) we have that \(\Vert t\Vert =\varPsi \), and we thus have

$$\begin{aligned} g^Td&\le -\alpha _t\varPsi ^2 +\bar{c}_1\varPsi \theta + \bar{c}_2\theta \\&\le \varPsi \left( -\alpha _t\epsilon +\bar{c}_1\theta +\frac{\bar{c}_2}{\epsilon }\theta \right) . \end{aligned}$$

Defining \(\epsilon _1:={\text {min}}\left\{ \theta _{inc},\frac{\epsilon ^2\alpha _t}{2(\bar{c}_1\epsilon +\bar{c}_2)}\right\} \) with \(\theta _{inc}\) from (RG5), it follows that for all \(\theta \le \epsilon _1\) we have \(m(\alpha )\le -\alpha \epsilon _2\) with \(\epsilon _2:=\frac{\epsilon ^2\alpha _t}{2}\). In the second case, we have that \(t^TWn - g^Tn\ge 0\) and the curvature condition guarantees that \(t^TWt+t^TWn - g^Tn\ge \alpha _tt^Tt\) and \(-g^Td\ge \alpha _tt^Tt\). In this case the result follows with \(\epsilon _1:=\theta _{inc}\) because \(\bar{c}_1=\bar{c}_2=0\) and for \(\epsilon _2\) defined previously. \(\square \)

The descent lemma guarantees that the objective function will be improved at a subsequence of nonstationary iterates (i.e., those with \(\varPsi _{k_i}^t\ge \epsilon \)) having a sufficiently small constraint violation \(\theta _{k_i}\). This implies that f-iterates will eventually be accepted and the filter is eventually not augmented. This in turn implies that an infinite subsequence of nonstationary iterates cannot exist. We now prove that assumptions (RG) guarantee global convergence of the filter line-search algorithm.

Theorem 2

Let Assumptions (RG) hold. The filter line-search algorithm delivers a sequence \(\{x_k\}\) satisfying

$$\begin{aligned} \mathop {{\text {lim}}}_{k\rightarrow \infty } \; \theta (x_k)&=0 \end{aligned}$$
(25a)
$$\begin{aligned} \mathop {{\text {lim inf}}}_{k\rightarrow \infty } \; \varPsi ^t(x_k)&=0. \end{aligned}$$
(25b)

Proof

We go through the results leading to the proof of Theorem 2 in [39] and argue that our assumptions (RG) are sufficient. Unless otherwise stated, all lemmas refer to those in [39]. Boundedness of \(d_k\) follows from boundedness of its components \(n_k\) and \(t_k\) which in turn follow from (RG1), (RG2), and (RG3b), as was argued in the proof of Theorem 1. We can also use similar arguments to prove that boundedness of \(\lambda _k^+\) follows from boundedness of \(n_k\), and from (RG1), (RG2) and (RG3b). Boundedness of \(|m_k(\alpha )|\) follows from (RG1). Lemma 2 is replaced by the descent Lemma 2 of this work. Lemma 3 are standard bounding results that follow from Taylor’s theorem and require only (RG1). Lemma 4 follows from the descent Lemma 2 of this work. Lemma 6 requires only (RG1). Lemma 8 requires the descent Lemma 2 of this work. Lemma 10 establishes that for a subsequence of nonstationary iterates the filter is eventually not augmented. This requires the descent Lemma 2 of this work. The result follows. \(\square \)

The curvature condition (16) of assumption (RG3a) can hold even if the reduced Hessian matrix is not positive definite, as required by assumption (G3) in the standard filter line-search algorithm. Consequently, the curvature condition is less restrictive. If the curvature condition does not hold for the computed step, we can enforce it by regularizing the Hessian matrix \(W_k\leftarrow W_k+\delta I\). Specifically, the curvature condition (16) can always be satisfied for sufficiently large \(\delta \) and sufficiently small \(\alpha _t\) satisfying \(\alpha _t\le \lambda _{min}(Z^T_kW_kZ_k)\). The reason is that \(t_k\) lies on the null space of \(J_k\) and, consequently, can always be expressed as \(t_k=Z_ku\) for a given nonzero vector u and the curvature condition then implies that \(u^TZ^T_kW_kZ_ku\ge \alpha _tu^Tu\). We also know that an appropriate \(\alpha _t\) exists for any \(\delta \) because \(\lambda _{min}(Z^T_kW_k(\delta )Z_k)\) is an increasing function of \(\delta \). Note also that the term (\({\text {max}}\{t_k^TW_kn_k-g_k^Tn_k,0\}\)) does not affect these properties. This is because, if the argument is negative, then this is set to zero; and, if the argument is positive, it provides additional flexibility to satisfy the curvature condition.

We can enforce nonsingularity of \(M_k\) and boundedness of its inverse, as required by assumption (RG3b), by regularizing the Hessian \(W_k\) as well while making sure that (RG2) holds. This is necessary to guarantee that both \(n_k\) and \(t_k\) are bounded. Note also that nonsingularity of \(M_k\) can be monitored using a linear solver and boundedness of \(n_k\) and \(t_k\) can be monitored directly. These observations lead to the following inertia-free regularization (IFR) procedure:

Inertia-free regularization (IFR)

  1. IFR-1

    Given constant \(\alpha _t>0\), factorize \(M_k\) with \(\delta =0\) and compute \(n_k\), \(t_k\) from (7) and (8). If \(t_k\) satisfies the curvature condition (16) and \(M_k\) satisfies (RG3b), set \(d_k=t_k+n_k\), and terminate.

  2. IFR-2

    If \(\delta ^{last}=0\), set \(\delta \leftarrow \bar{\delta }^0\), otherwise set \(\delta \leftarrow {\text {max}}\{\delta ^{min},\kappa ^-\delta ^{last}\}\).

  3. IFR-3

    Given constant \(\alpha _t>0\), factorize \(M_k\) with current \(\delta \) and compute \(n_k\), \(t_k\) from (7) and (8). If \(t_k\) satisfies (16) and \(M_k\) satisfies (RG3b), set \(d_k\leftarrow n_k+t_k\), and terminate.

  4. IFR-4

    If \(\delta ^{last}=0\), set \(\delta \leftarrow \hat{\kappa }^+\delta \), otherwise set \(\delta \leftarrow \kappa ^+\delta \) and go to IFR-3.

We make the following remarks:

  • The curvature condition (16) is enforced at every iteration. In principle, however, one can enforce it only at iterations in which the constraint violation is less than a certain small threshold value \(\theta _{sml}\). This approach is consistent with the observation that the switching condition SC needs to be checked only at iterations with small constraint violation [39]. In either case, however, we might need to regularize the Hessian in order to enforce nonsingularity of \(M_k\) at every iteration.

  • A potential caveat of the inertia-free strategy is that it cannot guarantee that the step computed via the augmented system is a minimum of the associated quadratic program (the inertia-based approach guarantees this). Consequently, while enabling global convergence with enhanced flexibility is a great benefit of inertia-free strategy, the potential price to pay is the possibility of having a larger proportion of steps that are accepted because of improvements on constraint violation and not on the objective function. This situation might ultimately manifest as a tendency to get attracted to first-order stationary points with larger objective values than those obtained with the inertia-based strategy. We provide numerical results in Sect. 5 to discuss this issue further.

  • As seen in Lemma 2, the term (\({\text {max}}\{t_k^TW_kn_k-g_k^Tn_k,0\}\)) in the curvature condition is harmless and is included only to provide additional flexibility. Because of this, one might also consider the simpler test \(t_k^TW_kt_k\ge \alpha _tt_k^Tt_k\) and still guarantee convergence.

  • The normal component \(n_k\) can also be computed from system (7) by replacing \(W_k\) with I. This, in fact, is a more natural choice than our approach because the normal and tangential components correspond to the decomposition (12) for \(Y_k=J_k^T\). Moreover, this approach naturally guarantees that the corresponding augmented matrix is bounded and that the normal step is bounded. This, approach, however, would require the solution of a linear system with a different coefficient matrix as the one used to compute the tangential step.

  • Assumption (RG3b) is needed to guarantee that the tangential step is bounded. In practice, however, this condition might be unnecessarily strong if the normal step is guaranteed to be bounded. This is because, in principle, one could still compute a productive tangential step that satisfies (RG3a) even if (RG3b) does not hold. Establishing convergence in this case, however, would require a more sophisticated analysis.

  • Assumption (RG1) can be guaranteed to hold only if all iterates \(x_k\) remain strictly in the interior of the nonnegative orthant. This condition guarantees that the barrier function \(\varphi ^{\mu }(x_k)\) and its derivatives are bounded. One can show that Theorem 3 in [39] holds under assumptions (RG). This result establishes that the iterates \(x_k\) remain in the strict interior of the feasible region if the maximum stepsize \(\alpha _k^{max}\) is determined by using the following fraction-to-the-boundary rule

    $$\begin{aligned} \alpha _k^{max}:={\text {max}}\{\alpha \in (0,1] \,:\,x_k+\alpha d_k\ge (1-\tau )x_k\}, \end{aligned}$$
    (26)

    for a fixed parameter \(\tau \in (0,1)\). The full row rank assumption of \(J_k\) (RG4), together with the assumption that its rows and the corresponding rows of the active bounds of \(x_k\) are linearly independent as well as the nonsingularity of the reduced Hessian (which follows from (RG3b) and Lemma 1), is sufficient for establishing the result.

4.1 Alternative inertia-free tests

Computing the normal and tangential components of the step separately can be beneficial in certain situations. For instance, the use of a PCG scheme provides a mechanism to perform the curvature test

$$\begin{aligned} t_{k,j}^TW_kt_{k,j}\ge \alpha _tt_{{k,j}}^Tt_{k,j} \end{aligned}$$
(27)

on the fly at each PCG iteration j and thus terminate early and save some work if the test does not hold for the current regularization parameter \(\delta \). This approach can also be beneficial because more test directions are used to identify negative curvature. This approach, however, requires a preconditioner that projects the iterations onto the null space of the Jacobian, which might not be available in certain applications.

In some applications it might be desirable to operate directly with the full step \(d_k\). In this case, we can impose a curvature condition of the form

$$\begin{aligned} d_k^TW_kd_k + {\text {max}}\{-(\lambda _k^+)^Tc_k,0\}\ge \alpha _dd_k^Td_k, \end{aligned}$$
(28)

with \(d_k\) computed from (4).

To argue that test (28) is consistent, we use the criticality measure

$$\begin{aligned} \varPsi ^d_k:=\Vert d_k\Vert . \end{aligned}$$
(29)

If (RG3b) and (RG4) hold, we have from Lemma 1 and equation (4) that

$$\begin{aligned} d_k&=-P_kg_k-(I-P_kW_k)J^T_k(J_kJ^T_k)^{-1}c_k\nonumber \\&=-Z_k(Z^T_kW_kZ_k)^{-1}Z^T_k(g_k-W_kJ^T_k(J_kJ^T_k)^{-1}c_k)-J^T_k(J_kJ^T_k)^{-1}c_k. \end{aligned}$$
(30)

If we set \(Y_k=J_k^T\), we have that \(d_k\) has the same structure as the step decomposition in (12). Moreover, we have that

$$\begin{aligned} \varPsi _k^d=\varPsi _k^t+O(\Vert c_k\Vert ). \end{aligned}$$
(31)

Consequently, the results of Theorem 1 still hold and \(\varPsi _k^d\) is a valid criticality measure. If \(-(\lambda _k^+)^Tc_k<0\), from (28), (RG1), (RG4), and (RG3b) we have that \(\Vert \lambda _k^+\Vert \le \kappa \) for some \(\kappa >0\) and, therefore,

$$\begin{aligned} g^T_kd_k&=-d_k^TW_kd_k+d^T_kJ^T_k\lambda ^+_k\nonumber \\&=-d^T_kW_kd_k+c^T_k\lambda ^+_k\nonumber \\&\le -\alpha _d(\varPsi _k^d)^2+ \kappa \theta _k. \end{aligned}$$
(32)

Consequently, the results of Lemma 2 hold with appropriate constants. If \(-(\lambda _k^+)^Tc_k\ge 0\) holds the result follows with \(\kappa =0\).

The curvature condition (28) holds for any \(\alpha _d\le \lambda _{min}(W_k)\), and we note that the term \({\text {max}}\{-(\lambda _k^+)^Tc_k,0\}\) is harmless and is used only to enhance flexibility.

5 Numerical results

In this section we benchmark the inertia-based and inertia-free strategies using the PIPS-NLP interior-point framework [14]. We first describe the implementation used to perform the benchmarks. We then present results for small-scale problems and large-scale problems arising from different applications.

5.1 Implementation

PIPS-NLP is an object-oriented interior-point framework implemented in C++ that facilitates the communication of model structures and the development of linear algebra strategies. These capabilities enable us to solve large-scale problems on parallel computing architectures. The algorithmic framework of PIPS-NLP follows along the lines of that of IPOPT [38] but we do not implement a feasibility restoration phase and watchdog heuristics. If the restoration phase is reached, we terminate the algorithm. We prefer to use this approach to isolate the effects of inertia correction on performance. As in IPOPT [38], we allow steps that are very small in norm \(\Vert d_k\Vert \) to be accepted even if they do not satisfy SDC or SAC, we use the same stepsize for both primal and equality constraint multipliers, use a primal-dual Hessian, and use a different stepsize for the bound multipliers. We have validated the performance of our implementation by comparing it with that of IPOPT, and we have obtained nearly identical results.

We compare three regularization strategies: (1) IBR: inertia-based regularization, (2) IFRt: inertia-free regularization with the curvature test (16), and (3) IFRd: inertia-free regularization with the curvature test (28). We use the same parameters for IBR and IFR to increase/decrease the regularization parameter \(\delta \). For IFRd and IFRt we set both \({\alpha }_t\) and \(\alpha _d\) to \(10^{-12}\) as default. We also scale these parameters using the barrier parameter \(\mu \), as suggested in [18]. Strategy IFRd has been implemented in the latest version of IPOPT https://projects.coin-or.org/Ipopt.

For the IFRt strategy, we compute the normal step by solving the linear system (7) by factorizing \(M_k\) using MA57 [20], and we reuse the factorization to compute the tangential component from (8). For IBR we estimate the inertia of the augmented matrix using MA57.

We use the optimality error described in [38, Sect. 2] with a convergence tolerance of \(10^{-6}\), we perform iterative refinement for the augmented system with a tolerance of \(10^{-12}\), and we set the maximum number of iterations to 1000. If the line search cannot find an acceptable point within 50 trials, the last trial stepsize is used in the next iteration. We use a pivoting tolerance of \(10^{-4}\) for MA57.

Of the entire set of test problems studied in Sect. 5.2, 13 % have Jacobians that are nearly rank-deficient. To deal with these instances, we regularize the (2,2) block of the augmented matrix as in IPOPT whenever we detect the augmented matrix to be singular [38, Sect. 3.1]. We emphasize that the convergence theory of this work and the supporting theory in [39] do not hold in this case. In particular, the tangential step computed in IFRt no longer lies exactly on the null-space of the Jacobian and the inverse expression of the augmented matrix of Lemma 1 does not hold. The regularization parameter of the (2,2) block is, however, very small (\(10^{-8}\)\(10^{-11}\)) and this is often sufficient to avoid singularity issues. The numerical results obtained indicate that the performance of IFRd and IFRt are not strongly affected by this regularization.

5.2 Small-scale tests

We consider 929 test problems: 738 CUTE instances http://orfe.princeton.edu/~rvdb/ampl/nlmodels/cute/index.html, 188 Schittkowski instances http://orfe.princeton.edu/~rvdb/ampl/nlmodels/cute/index.html, and three additional optimal power flow instances based on IEEE data. The power flow models are accessible at http://zavalab.engr.wisc.edu/data/testinertia. Out of the 929 test problems, PIPS-NLP can solve 828 problems with at least one of the regularization strategies implemented (89 % of all the tests). This demonstrates the robustness of our implementation. The rest of the problems cannot be solved with any of the strategies considered because the solver either: (1) reaches the limit of iterations; (2) requires feasibility restoration; (3) encounters an evaluation error in the problem functions; (4) requires regularization that reaches the maximum limit of \(1\times 10^{12}\). We only compare the performance of inertia-based and inertia-free strategies for the 828 problems for which at least one strategy was successful.

The performance of the three strategies is presented in Table A1 in the Appendix. Here, we report the optimal objective (OBJ), the number of iterations (Iter), number of regularizations (Reg), and solution time in seconds. The total number of factorizations equals the number of iterations plus the number of regularizations. We use “-” to denote the tests that cannot be solved within the limit of iterations. The last three instances in the table are the energy problems.

We present the numerical results in Table A1. From these results we have that 802, 796, and 794 test problems can be solved with IBR, IFRd and IFRt, respectively. Moreover, there are several instances that can only be solved with the inertia-free strategies (e.g., fltcher, and steenbrf). We can thus conclude that the inertia-free strategies are competitive. In Figs. 1 and 2 we present Dolan–Moré profiles [19] for the total number of iterations and the average number of factorizations per iteration. In Fig. 1 we can see that IBR requires fewer iterations to converge, but the differences are negligible. In Fig. 2 we can see that IFRd and IFRt significantly outperform IBR in terms of the average number of factorizations required per iteration. From the tests reported in Table A1 we also estimate the average number of regularizations per iteration. We have that IBR requires 0.61 regularizations per iteration while IFRd and IFRt require 0.37 and 0.43 regularizations per iteration, respectively. We thus conclude that IFR strategies significantly reduce the amount of regularization needed.

Fig. 1
figure 1

Number of iterations

Fig. 2
figure 2

Number of factorizations per iteration

Table 1 Performance of inertia-based and inertia-free strategies for IEEE_162 under different pivoting tolerances

The inertia-free strategies yield the same final objective value as IBR in 85 % of the instances. The performance in this respect is rather surprising. IBR yields better final objective values than IFRd and IFRt in 81 instances (e.g., instances hatfldd and hatflde), while IFRd and IFRt yield better objective values than IBR in 40 and 41 instances (e.g., instances s295 and s296), respectively. We could, in principle, attribute the tendency of IBR to reach better final objective values to the fact that the solutions of the augmented system are actual minimizers of the associated quadratic program at each iteration, whereas the solutions obtained with the inertia-free strategies are not. Consequently, we would expect that steps computed with IBR should yield improvements in the objective more often. Interestingly, we have found this not necessarily to be the case. To demonstrate this, we compared the percentages of steps accepted by the filter for the three strategies as a result of improvements in the objective function and in the constraint violation. We recall that a trial stepsize can be accepted under three cases: (1) SAC: both SC and AC hold, (2) SDCO: SDC holds because of sufficient decrease in the objective, and (3) SDCC: SDC holds because of sufficient reduction in the constraint violation. In Table A2 we present the percentage of successful trial steps obtained for each case. We note that the percentages do not add to 100 % in some cases because we allow the line search to accept very small steps and because we round the percentages to the nearest integer. To perform this comparison, we consider only problems in which all strategies are successful, and we consider only problems with constraints (the unconstrained instances have a percentage of acceptance for SAC of 100 %). The last row presents the average percentages for all problems and from this we can see that IBR accepts 46 % of the steps due to SAC. The corresponding percentages are 49 and 51 % for IFRd and IFRt, respectively. If we add the total percentages in which the steps are accepted because of improvements in the objective (i.e., add SAC and SDCO), we have that the percentages are 76, 75, and 79 % for IBR, IFRd, and IFRt, respectively. This result indicates that both inertia-free strategies are competitive with inertia-based strategy in achieving productive steps for the objective value. Moreover, we recall that IFRd and IFRt yield better final objective values than IBR in 40 and 41 instances. Because of this, we cannot draw general conclusions on the tendency of the strategies to provide better final objective values. This can be attributed to the presence of multiple local minima.

We elaborate on the behavior of the strategies in instance IEEE_162 which is a highly nonlinear optimal power flow problem. From Table 1 we can see that IFRt and IFRd do not require regularization and converge in 23 iterations whereas IBR requires 145 iterations and 183 regularizations. This instance is an ill-posed problem that does not seem to have an isolated local minimum (inertia is not correct at the solution). In this instance, significant regularization is observed for IBR during the entire search. This degrades the quality of the steps and results in slow convergence. On the other hand, from the behavior of IFR we can see that productive steps can be achieved without regularizing the system. We observed similar behavior in other instances (e.g., static3, s368, and s389). For the IEEE_162 instance we performed an additional experiment in which we change the pivoting tolerance of MA57. In Table 1 we compare the performance of IFRd, IFRt and IBR. We note that the number of iterations and regularizations for IBR vary quite drastically as we change the pivoting tolerance. This is an indication that the inertia estimates provided by MA57 become unreliable. This parasitic behavior is confirmed when we compare against the performance of the inertia-free approaches, which require the same number of iterations in all cases. As can be seen, inertia-free strategies can be beneficial in solving ill-posed problems. In this respect, inertia-free strategies inherit some of the desirable features of trust-region settings.

We highlight that IFRt requires two backsolves to compute the search step (one for the normal component and one for the tangential component) while IFRd requires only one. The factorization of the augmented matrix is reused. The overhead of the additional backsolve is often one to two orders of magnitude smaller than the factorization overhead. We can see this from Table A1 if one considers large instances that do not require regularization and take the same number of iterations for IFRt and IFRd. For example, cvxqp3 requires one more second for IFRt than for IFRd out of 371 seconds of total time. For cvxqp2, IFRt requires 12 more seconds than IFRd while the total time is 123 seconds.

5.3 Large-scale tests

We now demonstrate that the inertia-free strategies remain efficient in large-scale problems. We use large-scale stochastic optimization problems arising from security-constrained optimal power flow and stochastic optimal control of natural gas networks [13, 42]. Because of the large dimensionality of these instances we solve them using a distributed-memory Schur decomposition strategy implemented in PIPS-NLP.

5.3.1 Structured NLPs and Schur decomposition

We are interested in solving NLPs with the following structure:

$$\begin{aligned} {\text {min}}&\;\; f_0(x_0)+\sum _{\omega \in \varOmega }f_{\omega }(x_{\omega },x_0)\end{aligned}$$
(33a)
$$\begin{aligned} {\text {s.t.}}&\,\;\qquad c_0(x_0)= 0&(\lambda _0)\end{aligned}$$
(33b)
$$\begin{aligned}&\;\;\;c_{\omega }(x_{\omega },x_0)= 0, \; \omega \in \varOmega&(\lambda _{\omega })\end{aligned}$$
(33c)
$$\begin{aligned}&\;\qquad \quad \; x_0\ge 0&(\nu _0)\end{aligned}$$
(33d)
$$\begin{aligned}&\;\; \quad \;\;\;\quad \, x_{\omega }\ge 0, \; \omega \in \varOmega&(\nu _{\omega }) \end{aligned}$$
(33e)

The augmented matrix (4) of the structured NLP (33) can be permuted into the following block-bordered diagonal form:

$$\begin{aligned} \hat{M}= \left[ \begin{array}{ccccc} \hat{M}_1 &{} &{} &{} &{}{B}_1^T\\ &{}\hat{M}_2 &{} &{} &{}{B}_2^T\\ &{} &{}\ddots &{} &{}\vdots \\ &{} &{} &{}\hat{M}_{|\varOmega |} &{}{B}_{|\varOmega |}^T\\ {B}_1 &{}{B}_2 &{}\cdots &{}{B}_{|\varOmega |} &{}\hat{M}_0 \\ \end{array} \right] , \end{aligned}$$
(34)

where \(\varOmega :=\{0,1,2,...,|\varOmega |\}\) is the scenario set [22, 32, 44]. The zero index corresponds to coupling variables. We refer to this coupling scenario as the zero scenario. We use \(\hat{M}\) to denote the permuted form of the augmented matrix \(M_k\). The matrices

$$\begin{aligned} \hat{M}_\omega = \left[ \begin{array}{cc} W_\omega &{}J_{\omega }^T\\ J_{\omega } &{}0 \end{array} \right] \end{aligned}$$
(35)

for \(\omega \in \varOmega \) have a saddle-point structure. Here, \(W_\omega \) and \(J_\omega \) are the corresponding Hessian and Jacobian contributions of each scenario and the border matrices \({B}_\omega \) define coupling between scenarios and the zero scenario.

The permuted augmented system can be represented as

$$\begin{aligned} \hat{M} {\hat{w}} = \hat{r}, \end{aligned}$$
(36)

where \({\hat{w}} = ({\hat{w}}_1,\dots ,{\hat{w}}_{|\varOmega |},{\hat{w}}_0)\) are the permuted search directions for primal variables and multipliers, respectively, and \(\hat{r} = (\hat{r}_1,\dots ,\hat{r}_{|\varOmega |},\hat{r}_0)\) are the permuted right-hand sides. To solve the structured augmented system (36) in parallel, we use Schur decomposition. The solution of (36) can be obtained from

$$\begin{aligned} \hat{{z}}_\omega= & {} \hat{M}_\omega ^{-1}\hat{r}_\omega \quad \omega \in \varOmega \setminus \{0\}, \end{aligned}$$
(37a)
$$\begin{aligned} \hat{{w}}_0= & {} {C}( \delta )^{-1} \left( \hat{r}_0-\sum _{\omega \in \varOmega }{B}_\omega {\hat{z}}_\omega \right) , \end{aligned}$$
(37b)
$$\begin{aligned} \hat{{w}}_\omega= & {} {\hat{z}}_\omega -\hat{M}_\omega ^{-1}{B}_\omega ^T{\hat{w}}_0, \quad \omega \in \varOmega \setminus \{0\}, \end{aligned}$$
(37c)

where

$$\begin{aligned} {C} = \hat{M}_0 - \sum _{\omega \in \varOmega }{B}_\omega \hat{M}_\omega ^{-1}{B}_\omega ^T, \end{aligned}$$
(38)

is the Schur complement. Each slave processor is allocated with the information of certain blocks \(\omega \), and performs step (37a) by factorizing the local blocks \(\hat{M}_\omega \) in parallel. A master processor gathers the contributions of each worker to assemble the Schur complement in (38) and computes the stepsize for the coupling variables using (37b). Having the coupling step \(\hat{w}_0\), the slave processors compute the local steps \(\hat{{w}}_\omega \) in parallel using (37c).

Because an \(LBL^T\) factorization of the entire matrix \(\hat{M}\) is not available, its inertia can be inferred by using Haynsworth’s inertia additivity formula [27]:

$$\begin{aligned} {\text {Inertia}}(\hat{M}) = {\text {Inertia}}(C) + \sum _{\omega \in \varOmega } {\text {Inertia}}(\hat{M}_\omega ). \end{aligned}$$
(39)

We perform the factorization of the subblocks and of the Schur complement and check that the addition of their inertias satisfies the inertia condition \({\text {Inertia}}(\hat{M})=\{n,m,0\}\). If this is not the case, all the Hessian terms \(W_\omega \) are regularized by using a common parameter \(\delta \) until \(\hat{M}\) has the correct inertia. We note that obtaining the inertia of the Schur complement C by factorizing it with a sparse symmetric indefinite routine such as MA57 is not efficient because this matrix tends to be dense. More efficient parallel codes such as MAGMA or ELEMENTAL can be used but these are based on dense factorizations schemes that do not provide inertia information [1, 31]. This situation illustrates a practical complication that can be encountered when inertia information is required and motivates the development of inertia-free strategies. In our implementation we use MA57 to factorize the Schur complement (even if this is not the most efficient choice) because we seek to compare the performance of IFR with that of IBR.

We note that the serial bottleneck of the Schur complement is associated to the formation and factorization of the Schur complement. Because the Schur complement must be re-assembled and factorized whenever the regularization term \(\delta \) is adjusted, regularization can increase not only total work per iteration but also parallel performance.

5.3.2 Natural gas and power grid problems

The stochastic optimal control problem for gas networks has the form presented in (40). This problem determines compressor policies \(\varDelta \theta _{\omega ,\ell }(\tau )\) up to a preparation time \(T_d\) that build up enough gas in the network to sustain demand profiles \(d^{target}_j( \omega ,\tau )\) under multiple scenarios \(\omega \in \varOmega \). The objective function is the expected cost of compression plus an error term between the actual delivered demands and the targets. The PDEs (40b)–(40d) are transport equations that describe flow, pressure, and temperature variations inside the pipelines comprising the network. These equations also capture the dynamic response of the gas stored inside the pipelines when gas is added/withdrawn at network nodes. The boundary conditions and the network constraints (40f)-(40i) couple the pipelines. The constraint (40j) is used to compute the power consumed by each compressor station and the constraints (40l) bound the discharge and suction pressures at network nodes. The constraint (40k) enforces periodicity in the average flow of the system. Constraint (40m) are the so-called non-anticipativity constraints that require the compressor policies to be the same for each scenario. For more details in the problem formulation the reader is referred to [42]. We discretize the PDEs in space and time using finite differences and implicit Euler schemes, respectively. After discretization, we obtain an NLP of the form (33) with 128 scenarios and that contains a total of 1,024,651 variables and 1,023,104 constraints. We refer to this problem instance as STOCH_GAS.

The security-constrained optimal power flow (SCOPF) problem has the form shown in (41). The SCOPF problem seeks to minimize the power generation cost under a base condition (zero scenario) while guaranteeing that the operation of the network is feasible under predetermined contingencies caused by the loss of different transmission lines. Here, \(\varOmega \) is the set of contingencies (scenarios) and each contingency \(\omega \in \varOmega \) gives rise to a different network topology (reflected in the sets of links \({\mathcal {L}}_{\omega }\)). The voltage angle differences between nodes i and j are defined as \(\delta _{\omega ,ij}:=\delta _{\omega ,i}-\delta _{\omega ,j}\). Constraints (41b)–(41e) are derived from Kirchhoff’s laws, which establish the conservation of energy in the power network. Constraints (41b) and (41c) assert that the sum of incoming power flows and power generation at any particular node (bus) must be equal to the sum of outgoing power flow and power consumption. Constraints (41d) and (41e) are power balances at each node. Constraint (41f) forces the phase angle at a reference bus \(b_0\) to be zero. Constraints (41g)–(41j) are physical limits of the system. Constraints (41l)–(41k) are non-anticipativity constraints and state that the voltage level and real power generation at the PV buses (control buses) should remain at their base condition. The only correction action after a failure occurs in the real power generation at the reference bus, which is used to refill the power transmission loss occurring in each contingency. We use the IEEE 300 bus system with 239 contingencies as case study (we refer to the resulting problem as IEEE_300). This gives a structured NLP of the form (1) with 878,650 variables and 734,406 constraints. The gas and power flow instances are accessible at http://zavalab.engr.wisc.edu/data/testinertia.

All parallel numerical tests were performed on the Fusion computing cluster at Argonne National Laboratory. Fusion contains 320 computing nodes, and each node has two quad-core Nehalem 2.6 GHz CPUs. The results are presented in Table 2. Here, #MPI denotes the number of MPI processes used for parallelization. All strategies converge to the same objective values; consequently, we report only one value. For these problem instances we have used parameter values \(\alpha _t=10^{-10}, \alpha _d= 10^{-10}\) (scaled by \(\mu \)) because the default values of \(10^{-12}\) resulted in high variability in the number of iterations for STOCH_GAS. We have observed that increasing \(\alpha _t,\alpha _d\) enhances efficiency. As expected, however, this comes at the expense of additional regularizations

$$\begin{aligned}&\min \; {\mathbb {E}}\left[ \int _{0}^T \left( \sum _{\ell \in {\mathcal {L}}_a}\alpha _{\ell }^{P} P_{\omega ,\ell }(\tau )+\sum _{j\in {\mathcal {D}}}\alpha _{j}^{d} \left( d_{\omega ,j}(\tau )-{d}_{\omega ,j}^{target}(\tau )\right) ^2\right) d\tau \right] \end{aligned}$$
(40a)
$$\begin{aligned}&{\text {s.t.}}\nonumber \\&\frac{\partial p^{}_{\omega ,\ell } (x,\tau )}{\partial \tau } + \frac{ZRT_{\omega ,\ell }(x,\tau )}{A_{\ell }}\frac{\partial f^{}_{\omega ,\ell }(x,\tau )}{\partial x} = 0,\;\ell \in {\mathcal {L}},\omega \in \varOmega , x\in {\mathcal {X}}_\ell ,\tau \in {\mathcal {T}} \end{aligned}$$
(40b)
$$\begin{aligned}&\frac{1}{A_{\ell }}\frac{\partial f^{}_{\omega ,\ell }(x,\tau )}{\partial \tau } + \frac{\partial p^{}_{\omega ,\ell }(x,\tau )}{\partial x} + \frac{8\lambda _{\ell }}{\pi ^2 D_{\ell }^5}\frac{f^{}_{\omega ,\ell }(x,\tau )|f^{}_{\omega ,\ell }(x,\tau )|}{\rho _{\omega ,\ell }^{}(x,\tau )}=0,\nonumber \\&\quad \;\ell \in {\mathcal {L}},\omega \in \varOmega , x\in {\mathcal {X}}_\ell ,\tau \in {\mathcal {T}} \end{aligned}$$
(40c)
$$\begin{aligned}&\rho _{\omega ,\ell }(x,\tau )\, c_p\,\left( \frac{\partial T_{\omega ,\ell }(x,\tau )}{\partial \tau } +\nu _{\omega ,\ell }(x,\tau )\frac{\partial T_{\omega ,\ell }(x,\tau )}{\partial x}\right) \nonumber \\&\qquad \qquad \;-\left( \frac{\partial p_{\omega ,\ell }(x,\tau )}{\partial \tau }+\nu _{\omega ,\ell }(x,\tau )\frac{\partial p_{\omega ,\ell }(x,\tau )}{\partial x}\right) \nonumber \\&\qquad \qquad + \frac{\pi D_{\ell } U_{\ell }}{A_\ell }\left( T_{\omega ,\ell }(x,\tau )-T^{amb}_{\omega }(x,\tau )\right) =0,\;\ell \in {\mathcal {L}},\omega \in \varOmega , x\in {\mathcal {X}}_\ell ,\tau \in {\mathcal {T}} \end{aligned}$$
(40d)
$$\begin{aligned}&\frac{p_{\omega ,\ell }(x,\tau )}{\rho _{\omega ,\ell }(x,\tau )}=ZRT_{\omega ,\ell }(x,\tau ),\;\ell \in {\mathcal {L}},\omega \in \varOmega , x\in {\mathcal {X}}_\ell ,\tau \in {\mathcal {T}} \end{aligned}$$
(40e)
$$\begin{aligned}&\;p_{\omega ,\ell }(L_{\ell },\tau )=\theta _{\omega ,rec(\ell )}(\tau ),\;\ell \in {\mathcal {L}},\omega \in \varOmega , \tau \in {\mathcal {T}} \end{aligned}$$
(40f)
$$\begin{aligned}&\;p_{\omega ,\ell }(0,\tau )=\theta _{\omega ,snd(\ell )}(\tau ),\; \ell \in {{\mathcal {L}}_p},\omega \in \varOmega , \tau \in {\mathcal {T}} \end{aligned}$$
(40g)
$$\begin{aligned}&\;p_{\omega ,\ell }(0,\tau )=\theta _{\omega ,snd(\ell )}(\tau )+\varDelta \theta _{\omega ,\ell }(\tau ),\; \ell \in {{\mathcal {L}}_a},\omega \in \varOmega , \tau \in {\mathcal {T}} \end{aligned}$$
(40h)
$$\begin{aligned}&\;\sum _{\ell \in {\mathcal {L}}_n^{rec}}f_{\omega ,\ell }(L_{\ell },\tau )-\sum _{\ell \in {\mathcal {L}}_n^{snd}}f_{\omega ,\ell }(0,\tau )\nonumber \\&\qquad \qquad \qquad \qquad \qquad + \sum _{i\in {\mathcal {S}}_n}s_{\omega ,i}(\tau ) - \sum _{j\in {\mathcal {D}}_n}d_{\omega ,j}(\tau ) = 0,\; n \in {\mathcal {N}},\omega \in \varOmega , \tau \in {\mathcal {T}} \end{aligned}$$
(40i)
$$\begin{aligned}&\;P_{\omega ,\ell }(\tau )= c_p\cdot T_{\omega ,\ell }(0,\tau ) \cdot f_{\omega ,\ell }(0,\tau )\left( \left( \frac{\theta _{\omega ,snd(\ell )}(\tau )+\varDelta \theta _{\omega ,\ell }(\tau ) }{\theta _{\omega ,snd(\ell )}(\tau )}\right) ^{\frac{\gamma -1}{\gamma }}-1\right) ,\;\nonumber \\&\quad \ell \in {\mathcal {L}}_a,\omega \in \varOmega , \tau \in {\mathcal {T}} \end{aligned}$$
(40j)
$$\begin{aligned}&\;\int _0^{L_{\ell }} f_{\omega ,\ell }(x,T)dx\ge \int _0^{L_{\ell }} f_{\omega ,\ell }(x,0)dx,\quad \ell \in {\mathcal {L}}_a,\omega \in \varOmega \end{aligned}$$
(40k)
$$\begin{aligned}&\; \underline{\theta }_n\le \theta _{\omega ,n}(\tau )\le \overline{\theta }_n,\; n\in {\mathcal {N}},\omega \in \varOmega , \tau \in {\mathcal {T}} \end{aligned}$$
(40l)
$$\begin{aligned}&\; \varDelta \theta _{\omega ,\ell }(\tau )=\varDelta \theta _{0,\ell }(\tau ),\; \ell \in {\mathcal {L}}_a,\; \omega \in \varOmega \setminus \{0\}, \tau \in [0,T_d]. \end{aligned}$$
(40m)

We can see that, despite the high nonlinearity of the large-scale instances, the inertia-free approaches IFRd and IFRt converge in all cases. This provides evidence that the tests can scale. In general, IFRd and IFRt require more iterations than IBR does but the number of factorizations is reduced, resulting in faster solutions. These problem instances are highly ill-conditioned, particularly STOCH_GAS as is evident from the variability of the number of iterations as we increase the number of MPI processes. This is the result of linear system errors introduced by Schur decomposition. The performance, however, is satisfactory in all cases. For IEEE_300 we note that the number of iterations for IBR does not vary as we add MPI processors whereas those of IFRd and IFRt do. While it is difficult to isolate a specific source of such behavior, we attribute this behavior to the stabilizing effect that additional regularizations of IBR provide on the linear system.

$$\begin{aligned}&\min \; \sum _{g \in G} \big (c_{0g} + c_{1g}p_{base,g}+c_{2g}p_{base,g}^2\big ) \end{aligned}$$
(41a)
$$\begin{aligned}&{\text {s.t.}}\nonumber \\&\; \sum _{g|o_g=b}p_{\omega ,g} - d_b^P = \sum _{(i,j) \in {\mathcal {L}}_{\omega }}f_{\omega ,(i,j)}^{P}, \; b \in {\mathcal {B}}, \omega \in \varOmega \end{aligned}$$
(41b)
$$\begin{aligned}&\; \sum _{g|o_g=b}q_{\omega ,g} - d_b^Q = \sum _{(i,j) \in {\mathcal {L}}_{\omega }} f_{\omega ,(i,j)}^{Q} - \frac{(V_{b})^2}{2}\sum _{(i,j) \in {\mathcal {L}}_{\omega }}\gamma _{(i,j)}, \; b \in {\mathcal {B}}, \omega \in \varOmega \end{aligned}$$
(41c)
$$\begin{aligned}&\; f^P_{\omega ,(i,j)} = {\left\{ \begin{array}{ll} \displaystyle \alpha _l V_{\omega ,i}^2-V_{\omega ,i}V_{\omega ,j}[\alpha _l \cos (\delta _{\omega ,ij}) \\ \quad +\, \beta _l \sin (\delta _{\omega ,ij})] , &{}\quad {\text { otherwise}}\\ \displaystyle \alpha _l \left( \frac{V_{\omega ,i}}{\tau _t}\right) ^2-\frac{V_{\omega ,i}V_{\omega ,j}}{\tau _t} \\ \quad [\alpha _t \cos (\delta _{\omega ,ij})+ \beta _t \sin (\delta _{\omega ,ij})] , &{}\quad i\text { tapped}\\ \displaystyle \alpha _l V_{\omega ,i}^2-\frac{V_{\omega ,i}V_{\omega ,j}}{\tau _t}[\alpha _t \cos (\delta _{\omega ,ij}) \\ \quad +\, \beta _t \sin (\delta _{\omega ,ij})] , &{}\quad j\text { tapped} \end{array}\right. }, \; (i,j) \in {\mathcal {L}}_{\omega }, \omega \in \varOmega \end{aligned}$$
(41d)
$$\begin{aligned}&\; f^Q_{\omega ,(i,j)} = {\left\{ \begin{array}{ll} \displaystyle -\beta _l V_{\omega ,i}^2-V_{\omega ,i}V_{\omega ,j}[\alpha _l \sin (\delta _{\omega ,ij})\\ \quad - \,\beta _l \cos (\delta _{\omega ,ij})] , &{}\quad \text { otherwise}\\ \displaystyle -\beta _l \left( \frac{V_{\omega ,i}}{\tau _t}\right) ^2-\frac{V_{\omega ,i}V_{\omega ,j}}{\tau _t} \\ \quad [\alpha _l \sin (\delta _{\omega ,ij})- \beta _l \cos (\delta _{\omega ,ij})] , &{}\quad i\text { tapped}\\ \displaystyle -\beta _l V_{\omega ,i}^2-\frac{V_{\omega ,i}V_{\omega ,j}}{\tau _t}[\alpha _l \sin (\delta _{\omega ,ij}) \\ \quad - \beta _l \cos (\delta _{\omega ,ij})] , &{}\quad j\text { tapped} \end{array}\right. }, \; (i,j) \in {\mathcal {L}}_{\omega }, \omega \in \varOmega \end{aligned}$$
(41e)
$$\begin{aligned}&\; \delta _{\omega ,b_0}=0, \omega \in \varOmega \end{aligned}$$
(41f)
$$\begin{aligned}&\; {(f^P_{\omega ,(i,j)})}^2+{(f^{Q}_{\omega ,(i,j)})}^2 \le {(f_{\omega ,(i,j)}^+)}^2,\; \; (i,j) \in {\mathcal {L}}, \omega \in \varOmega \end{aligned}$$
(41g)
$$\begin{aligned}&\; p_{\omega ,g}^{-} \le p_{\omega ,g}\le p_{\omega ,g}^{+}, \; g \in {\mathcal {G}}, \omega \in \varOmega \end{aligned}$$
(41h)
$$\begin{aligned}&\; q_{\omega ,g}^{-} \le q_{\omega ,g}\le q_{\omega ,g}^{+}, \; g \in {\mathcal {G}}, \omega \in \varOmega \end{aligned}$$
(41i)
$$\begin{aligned}&\; V_{\omega ,b}^- \le V_{\omega ,b} \le V_{\omega ,b}^+, \; b \in {\mathcal {B}}, \omega \in \varOmega \end{aligned}$$
(41j)
$$\begin{aligned}&\; V_{\omega ,b} = V_{base,b}, \; b\in {\mathcal {B_{PV}}}, \omega \in \varOmega \setminus \{base\} \end{aligned}$$
(41k)
$$\begin{aligned}&\; p_{\omega ,g} = p_{base,g}, \; g \in \{g|o_g \in {\mathcal {B_{PV}}}\setminus {\{b_0\}} \}. \end{aligned}$$
(41l)
Table 2 Performance of inertia-based and inertia-free strategies on large-scale instances

6 Conclusions

We have presented a new filter line-search algorithm that does not require inertia information. This inertia-free approach performs curvature tests along computed directions to guarantee descent when the constraint violation is sufficiently small. We proved that the approach yields global convergence and is competitive with the standard inertia-based strategy based on symmetric indefinite factorizations. Moreover, we demonstrate that the inertia-free approach can significantly reduce the amount of regularization needed. The availability of inertia-free strategies opens the possibility of using different types of linear algebra strategies and libraries and thus can enhance modularity of implementations. This was demonstrated through a distributed-memory Schur decomposition setting.