Abstract
Clique tree conversion solves large-scale semidefinite programs by splitting an \(n\times n\) matrix variable into up to n smaller matrix variables, each representing a principal submatrix of up to \(\omega \times \omega \). Its fundamental weakness is the need to introduce overlap constraints that enforce agreement between different matrix variables, because these can result in dense coupling. In this paper, we show that by dualizing the clique tree conversion, the coupling due to the overlap constraints is guaranteed to be sparse over dense blocks, with a block sparsity pattern that coincides with the adjacency matrix of a tree. We consider two classes of semidefinite programs with favorable sparsity patterns that encompass the MAXCUT and MAX k-CUT relaxations, the Lovasz Theta problem, and the AC optimal power flow relaxation. Assuming that \(\omega \ll n\), we prove that the per-iteration cost of an interior-point method is linear O(n) time and memory, so an \(\epsilon \)-accurate and \(\epsilon \)-feasible iterate is obtained after \(O(\sqrt{n}\log (1/\epsilon ))\) iterations in near-linear \(O(n^{1.5}\log (1/\epsilon ))\) time. We confirm our theoretical insights with numerical results on semidefinite programs as large as \(n=13{,}659\).
Similar content being viewed by others
1 Introduction
Given \(n\times n\) real symmetric matrices \(C,A_{1},\ldots ,A_{m}\) and real scalars \(b_{1},\ldots ,b_{m}\), we consider the standard form semidefinite program
over the \(n\times n\) real symmetric matrix variable X. Here, \(A_{i}\bullet X=\mathrm {tr}\,A_{i}X\) refers to the usual matrix inner product, and \(X\succeq 0\) restricts to be symmetric positive semidefinite. Instances of (SDP) arise as some of the best convex relaxations to nonconvex problems like graph optimization [1, 2], integer programming [3,4,5,6], and polynomial optimization [7, 8].
Interior-point methods are the most reliable approach for solving small- and medium-scale instances of (SDP), but become prohibitively time- and memory-intensive for large-scale instances. A fundamental difficulty is the constraint \(X\succeq 0\), which densely couples all \(O(n^{2})\) elements within the matrix variable X to each other. The linear system solved at each iteration, known as the normal equation or the Schur complement equation, is usually fully-dense, irrespective of sparsity in the data matrices \(C,A_{1},\ldots ,A_{m}\). With a naïve implementation, the per-iteration cost of an interior-point method is roughly the same for highly sparse semidefinite programs as it is for fully-dense ones of the same dimensions: at least cubic \((n+m)^{3}\) time and quadratic \((n+m)^{2}\) memory. (See e.g. Nesterov [9, Section 4.3.3] for a derivation.)
Much larger instances of (SDP) can be solved using the clique tree conversion technique of Fukuda et al. [10]. The main idea is to use an interior-point method to solve a reformulation whose matrix variables \(X_{1},\ldots ,X_{n}\succeq 0\) represent principal submatrices of the original matrix variable \(X\succeq 0\), as inFootnote 1
where \(J_{1},J_{2},\dots ,J_{n}\subseteq \{1,2,\dots ,n\}\) denote row/column indices, and to use its solution to recover a solution to the original problem in closed-form. Here, different \(X_{i}\) and \(X_{j}\) interact only through the linear constraints
and the need for their overlapping elements to agree,
As a consequence, the normal equation associated with the reformulation is often block sparse—sparse over fully-dense blocks. When the maximum order of the submatrices
is significantly smaller than n, the number of linearly independent constraints is boundedFootnote 2\(m\le \omega n\), and the per-iteration cost of an interior-point method scales as low as linearly with respect to \(n+m\). This is a remarkable speed-up over a direct interior-point solution of (SDP), particularly in view of the fact that the original matrix variable \(X\succeq 0\) already contains more than \(n^{2}/2\) degrees of freedom on its own.
In practice, clique tree conversion has successfully solved large-scale instances of (SDP) with n as large as tens of thousands [11,12,13,14]. Where applicable, the empirical time complexity is often as low as linear \(O(n+m)\). However, this speed-up is not guaranteed, not even on highly sparse instances of (SDP). We give an example in Sect. 4 whose data matrices \(A_{1},\ldots ,A_{m}\) each contains just a single nonzero element, and show that it nevertheless requires at least \((n+m)^{3}\) time and \((n+m)^{2}\) memory to solve using clique tree conversion.
The core issue, and indeed the main weakness of clique tree conversion, is the overlap constraints (3), which are imposed in addition to the constraints (2) already present in the original problem [15, Section 14.2]. These overlap constraints can significantly increase the size of the normal equation solved at each interior-point iteration, thereby offsetting the benefits of increased sparsity [16]. In fact, they may contribute more nonzeros to the normal matrix of the converted problem than contained in the fully-dense normal matrix of the original problem. In [17], omitting some of the overlap constraints made the converted problem easier to solve, but at the cost of also making the reformulation from (SDP) inexact.
1.1 Contributions
In this paper, we show that it is possible to fully address the density of the overlap constraints using the dualization technique of Löfberg [18]. By dualizing the reformulation generated via clique tree conversion, the overlap constraints are guaranteed to contribute \(O(\omega ^{4}n)\) nonzero elements to the normal matrix. Moreover, these nonzero elements appear with a block sparsity pattern that coincides with the adjacency matrix of a tree. Under suitable assumptions on the original constraints (2), this favorable block sparsity pattern allows us to guarantee an interior-point method per-iteration cost of \(O(\omega ^{6}n)\) time and memory, by using a specific fill-reducing permutation in computing the Cholesky factor of the normal matrix. After \(O(\sqrt{\omega n}\log (1/\epsilon ))\) iterations, we arrive at an \(\epsilon \)-accurate solution of (SDP) in near-linear \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time.
Our first main result guarantees these complexity figures for a class of semidefinite programs that we call partially separable semidefinite programs. Our notion is an extension of the partially separable cones introduced by Sun, Andersen, and Vandenberghe [16], based in turn on the notion of partial separability due to Griewank and Toint [19]. We show that if an instance of (SDP) is partially separable, then an optimally sparse clique tree conversion reformulation can be constructed in \(O(\omega ^{3}n)\) time, and then solved using an interior-point method to \(\epsilon \)-accuracy in \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time. Afterwards, a corresponding \(\epsilon \)-accurate solution to (SDP) is recovered in \(O(\omega ^{3}n)\) time, for a complete end-to-end cost of \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time.
Semidefinite programs that are not partially separable can be systematically “separated” by introducing auxiliary variables, at the cost of increasing the number of variables that must be optimized. For a class of semidefinite programs that we call network flow semidefinite programs, the number of auxiliary variables can be bounded in closed-form. This insight allows us to prove our second main result, which guarantees the near-linear time figure for network flow semidefinite programs on graphs with small degrees and treewidth.
1.2 Comparisons to prior work
At the time of writing, clique tree conversion is primarily used as a preprocessor for an off-the-shelf interior-point method, like SeDuMi and MOSEK. It is often implemented using a parser like CVX [20] and YALMIP [21] that converts mathematical expressions into a compatible data format for the solver, but this process is very slow, and usually destroys the inherent structure in the problem. Solver-specific implementations of clique tree conversion like SparseColo [22, 23] and OPFSDR [24] are much faster while also preserving the structure of the problem for the solver. Nevertheless, the off-the-shelf solver is itself structure-agnostic, so an improved complexity figure cannot be guaranteed.
In the existing literature, solvers designed specifically for clique tree conversion are generally first-order methods [16, 25, 26]. While their per-iteration cost is often linear time and memory, they require up to \(O(1/\epsilon )\) iterations to achieve \(\epsilon \)-accuracy, which is exponentially worse than the \(O(\log (1/\epsilon ))\) figure of interior-point methods. It is possible to incorporate a first-order method within an outer interior-point iteration [27,28,29], but this does not improve upon the \(O(1/\epsilon )\) iteration bound, because the first-order method solves an increasingly ill-conditioned subproblem, with condition number that scales \(O(1/\epsilon ^{2})\) for \(\epsilon \)-accuracy.
Andersen, Dahl, and Vandenberghe [30] describe an interior-point method that exploits the same chordal sparsity structure that underlies clique tree conversion, with a per-iteration cost of \(O(\omega ^{3}nm+\omega m^{2}n+m^{3})\) time. The algorithm solves instances of (SDP) with a small number of constraints \(m=O(1)\) in near-linear \(O(\omega ^{3}n^{1.5}\log (1/\epsilon ))\) time. However, substituting \(m\le \omega n\) yields a general time complexity figure of \(O(\omega ^{3}n^{3.5}\log (1/\epsilon ))\), which is comparable to the cubic time complexity of a direct interior-point solution of (SDP).
In this paper, we show that off-the-shelf interior-point methods can be modified to exploit the structure of clique tree conversion, by forcing a specific choice of fill-reducing permutation in factorizing the normal equation. For partially separable semidefinite programs, the resulting modified solver achieves a guaranteed per-iteration cost of \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) memory on the dualized version of the clique tree conversion.
Our complexity guarantees are independent of the actual algorithm used to factorize the normal equation. Most off-the-shelf interior-point methods use a standard implementation of the multifrontal method [31, 32], but further efficiency can be gained by adopting a parallel and/or distributed implementation. For example, the interior-point method of Khoshfetrat Pakazad et al. [33, 34] factorizes the normal equation using a message passing algorithm, which can be understood as a distributed implementation of the multifrontal method. Of course, distributed algorithms are most efficient when the workload is evenly distributed, and when communication is minimized. It remains an important future work to understand these issues in the context of the sparsity patterns analyzed within this paper.
Finally, a reviewer noted that if the original problem (SDP) has a low-rank solution, then the interior-point method iterates approach a low-dimensional face of the semidefinite cone, which could present conditioning issues. In contrast, the clique tree conversion might expect solutions strictly in the interior of the semidefinite cone, which may be better conditioned. It remains an interesting future direction to understand the relationship in complementarity, uniqueness, and conditioning [35] between (SDP) and its clique tree conversion.
2 Main results
2.1 Assumptions
To guarantee an exact reformulation, clique tree conversion chooses the index sets \(J_{1},\ldots ,J_{\ell }\) in (1) as the bags of a tree decomposition for the sparsity graph of the data matrices \(C,A_{1},\ldots ,A_{m}\). Accordingly, the parameter \(\omega \) in (4) can only be small if the sparsity graph has a small treewidth. Below, we define a graph G by its vertex set \(V(G)\subseteq \{1,2,\ldots ,n\}\) and its edge set \(E(G)\subseteq V(G)\times V(G)\).
Definition 1
(Sparsity graph) The \(n\times n\) matrix M (resp. the set of \(n\times n\) matrices \(\{M_{1},\ldots ,M_{m}\}\)) is said to have sparsity graph G if G is an undirected simple graph on n vertices \(V(G)=\{1,\ldots ,n\}\) and that \((i,j)\in E(G)\) if \(M[i,j]\ne 0\) (resp. if there exists \(M\in \{M_{1},\ldots ,M_{m}\}\) such that \(M[i,j]\ne 0\)).
Definition 2
(Tree decomposition) A tree decomposition \({\mathcal {T}}\) of a graph G is a pair \(({\mathcal {J}},T)\), where each bag of vertices \(J_{j}\in {\mathcal {J}}\) is a subset of V(G), and T is a tree on \(|{\mathcal {J}}|\le n\) vertices, such that:
-
1.
(Vertex cover) For every \(v\in V(G)\), there exists \(J_{k}\in {\mathcal {J}}\) such that \(v\in J_{k}\);
-
2.
(Edge cover) For every \((u,v)\in E(G)\), there exists \(J_{k}\in {\mathcal {J}}\) such that \(u\in J_{k}\) and \(v\in J_{k}\); and
-
3.
(Running intersection) If \(v\in J_{i}\) and \(v\in J_{j}\), then we also have \(v\in J_{k}\) for every k that lies on the path from i to j in the tree T.
The width \(\mathrm {wid}({\mathcal {T}})\) of the tree decomposition \({\mathcal {T}}=({\mathcal {J}},T)\) is the size of its largest bag minus one, as in \(\max \{|J_{k}|:J_{k}\in {\mathcal {J}}\}-1.\) The treewidth \(\mathrm {tw}(G)\) of the graph G is the minimum width amongst all tree decompositions \({\mathcal {T}}\).
Throughout this paper, we make the implicit assumption that a tree decomposition with small width is known a priori for the sparsity graph. In practice, such a tree decomposition can usually be found using fill-reducing heuristics for sparse linear algebra; see Sect. 3.
We also make two explicit assumptions, which are standard in the literature on interior-point methods.
Assumption 1
(Linear independence) We have \(\sum _{i=1}^{m}y_{i}A_{i}=0\) if and only if \(y=0\).
The assumption is without loss of generality, because it can either be enforced by eliminating \(A_{i}\bullet X=b_{i}\) for select i, or else these constraints are not consistent for all i. Under Assumption 1, the total number of constraints is bounded \(m\le \omega n\) (due to the fact that \(|E(G)|\le n\cdot \mathrm {tw}(G)\) [36]).
Assumption 2
(Primal-dual Slater’s condition) There exist \(X\succ 0,\) y, and \(S\succ 0\) satisfying \(A_{i}\bullet X=b_{i}\) for all \(i\in \{1,\ldots ,m\}\) and \(\sum _{i=1}^{m}y_{i}A_{i}+S=C\).
In fact, our proofs solve the homogeneous self-dual embedding [37], so our conclusions can be extended with few modifications to a much larger array of problems that mostly do not satisfy Assumption 2; see de Klerk et al. [38] and Permenter et al. [39]. Nevertheless, we adopt Assumption 2 to simplify our discussions, by focusing our attention towards the computational aspects of the interior-point method, and away from the theoretical intricacies of the self-dual embedding.
2.2 Partially separable
We define the class of partially separable semidefinite program based on the partially separable cones introduced by Sun, Andersen, and Vandenberghe [16]. The general concept of partial separability is due to Griewank and Toint [19].
Definition 3
(Partially separable) Let \({\mathcal {T}}=({\mathcal {J}},T)\) be a tree decomposition for the sparsity graph of \(C,A_{1},\ldots ,A_{m}\). The matrix \(A_{i}\) is said to be partially separable on \({\mathcal {T}}\) if there exist \(J_{j}\in {\mathcal {J}}\) and some choice of \(A_{i,j}\) such that
for all \(n\times n\) matrices X. We say that (SDP) is partially separable on \({\mathcal {T}}\) if every constraint matrix \(A_{1},\ldots ,A_{m}\) is partially separable on \({\mathcal {T}}\).
Due to the edge cover property of the tree decomposition, any \(A_{i}\) that indexes a single element of X (can be written as \(A_{i}\bullet X=X[j,k]\) for suitable j, k) is automatically partially separable on any valid tree decomposition \({\mathcal {T}}\). In this way, many of the classic semidefinite relaxations for NP-hard combinatorial optimization problems can be shown as partially separable.
Example 1
(MAXCUT and MAX k-CUT) Let C be the (weighted) Laplacian matrix for a graph G with n vertices. Frieze and Jerrum [40] proposed a randomized algorithm to solve MAX k-CUT with an approximation ratio of \(1-1/k\) based on solving
The classic Goemans–Williamson 0.878 algorithm [2] for MAXCUT is recovered by setting \(k=2\) and removing the redundant constraint \(X[i,j]\ge -1\). In both the MAXCUT relaxation and the MAX k-CUT relaxation, observe that each constraint affects a single matrix element in X, so the problem is partially separable on any tree decomposition. \(\square \)
Example 2
(Lovasz Theta) The Lovasz number \(\vartheta (G)\) of a graph G [1] is the optimal value to the following dual semidefinite program
over \(\lambda \in {\mathbb {R}}\) and \(y_{i,j}\in {\mathbb {R}}\) for \((i,j)\in E(G)\). Here, \(e_{j}\) is the j-th column of the \(n\times n\) identity matrix and \({\mathbf {1}}\) is the length-n vector-of-ones. Problem (LT) is not partially separable. However, given that \(\vartheta (G)\ge 1\) holds for all graphs G, we may divide the linear matrix inequality through by \(\lambda \), redefine \(y\leftarrow y/\lambda \), apply the Schur complement lemma, and take the Lagrangian dual to yield a sparse formulation
Each constraint affects a single matrix element in X, so (\(\hbox {LT}'\)) is again partially separable on any tree decomposition. \(\square \)
We remark that instances of the MAXCUT, MAX k-CUT, and Lovasz Theta problems constitute a significant part of the DIMACS [41] and the SDPLIB [42] test libraries. In Sect. 5, we prove that partially separable semidefinite programs like these admit a clique tree conversion reformulation that can be dualized and then solved using an interior-point method in \(O(n^{1.5}\log (1/\epsilon ))\) time, under the assumption that the parameter \(\omega \) in (4) is significantly smaller than n. Moreover, we prove in Sect. 6 that this reformulation can be efficiently found using an algorithm based on the running intersection property of the tree decomposition. Combining these results with an efficient low-rank matrix completion algorithm [43, Algorithm 2] yields the following.
Theorem 1
Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for the sparsity graph of \(C,A_{1},\ldots ,A_{m}\). If (SDP) is partially separable on \({\mathcal {T}}\), then under Assumptions 1& 2, there exists an algorithm that computes \(U\in {\mathbb {R}}^{n\times \omega }\), \(y\in {\mathbb {R}}^{m}\), and \(S\succeq 0\) satisfying
in \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time and \(O(\omega ^{4}n)\) space, where \(\omega =\max _{j}|J_{j}|=1+\mathrm {wid}({\mathcal {T}})\) and \(\Vert M\Vert _{F}=\sqrt{M\bullet M}\) denotes the Frobenius norm.
The proof of Theorem 1 is given at the end of Sect. 6.
2.3 Network flow
Problems that are not partially separable can be systematically separated by introducing auxiliary variables. The complexity of solving the resulting problem then becomes parameterized by the number of additional auxiliary variables. In a class of graph-based semidefinite programs that we call network flow semidefinite programs, the number of auxiliary variables can be bounded using properties of the tree decomposition.
Definition 4
(Network flow) Given a graph \(G=(V,E)\) on n vertices \(V=\{1,\ldots ,n\}\), we say that the linear constraint \(A\bullet X=b\) is a network flow constraint (at vertex k) if the \(n\times n\) constraint matrix A can be rewritten
in which \(e_{k}\) is the k-th column of the identity matrix and \(\{\alpha _{j}\}\) are scalars. We say that an instance of (SDP) is a network flow semidefinite program if every constraint matrix \(A_{1},\ldots ,A_{m}\) is a network flow constraint, and G is the sparsity graph for the objective matrix C.
Such problems frequently arise on physical networks subject to Kirchhoff’s conservation laws, such as electrical circuits and hydraulic networks.
Example 3
(Optimal power flow) The AC optimal power flow (ACOPF) problem is a nonlinear, nonconvex optimization that plays a vital role in the operations of an electric power system. Let G be a graph representation of the power system. Then, ACOPF has a well-known semidefinite relaxation
over a Hermitian matrix variable X, subject to
Here, each \(a_{i,j}\) and \(c_{i,j}\) is a complex vector, each \(b_{i}\) and \(d_{i,j}\) is a real vector, and \(W\subseteq V(G)\) is a subset of vertices. If a rank-1 solution \(X^{\star }\) is found, then the relaxation (OPF) is exact, and a globally-optimal solution to the original NP-hard problem can be extracted. Clearly, each constraint in (OPF) is a network flow constraint, so the overall problem is also a network flow semidefinite program. \(\square \)
In Sect. 7, we prove that network flow semidefinite programs can be reformulated in closed-form, dualized, and efficiently solved using an interior-point method.
Theorem 2
Let (SDP) be a network flow semidefinite program on a graph G on n vertices, and let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for G. Then, under Assumptions 1& 2, there exists an algorithm that computes \(U\in {\mathbb {R}}^{n\times \omega }\), \(y\in {\mathbb {R}}^{m}\), and \(S\succeq 0\) satisfying
in
where:
-
\(\omega =\max _{j}|J_{j}|=1+\mathrm {wid}({\mathcal {T}})\),
-
\(d_{\max }\) is the maximum degree of the tree T,
-
\(m_{k}\) is the maximum number of network flow constraints at any vertex \(k\in V(G)\).
3 Preliminaries
3.1 Notation
The sets \({\mathbb {R}}^{n}\) and \({\mathbb {R}}^{m\times n}\) are the length-n real vectors and \(m\times n\) real matrices. We use “MATLAB notation” in concatenating vectors and matrices:
and the following short-hand to construct them:
The notation X[i, j] refers to the element of X in the i-th row and j-th column, and X[I, J] refers to the submatrix of X formed from the rows in \(I\subseteq \{1,\ldots ,m\}\) and columns in \(J\subseteq \{1,\ldots ,n\}\). The Frobenius inner product is \(X\bullet Y=\mathrm {tr}\,(X^{T}Y)\), and the Frobenius norm is \(\Vert X\Vert _{F}=\sqrt{X\bullet X}\). We use \(\mathrm {nnz}\,(X)\) to denote the number of nonzero elements in X.
The sets \({\mathbb {S}}^{n}\subseteq {\mathbb {R}}^{n\times n},\) \({\mathbb {S}}_{+}^{n}\subset {\mathbb {S}}^{n},\) and \({\mathbb {S}}_{++}^{n}\subset {\mathbb {S}}_{+}^{n}\) are the \(n\times n\) real symmetric matrices, positive semidefinite matrices, and positive definite matrices, respective. We write \(X\succeq Y\) to mean \(X-Y\in {\mathbb {S}}_{+}^{n}\) and \(X\succ Y\) to mean \(X-Y\in {\mathbb {S}}_{++}^{n}\). The (symmetric) vectorization
outputs the lower-triangular part of a symmetric matrix as a vector, with factors of \(\sqrt{2}\) added so that \(\mathrm {svec}\,(X)^{T}\mathrm {svec}\,(Y)=X\bullet Y\).
A graph G is defined by its vertex set \(V(G)\subseteq \{1,2,3,\ldots \}\) and its edge set \(E(G)\subseteq V(G)\times V(G)\). The graph T is a tree if it is connected and does not contain any cycles; we refer to its vertices V(T) as its nodes. Designating a special node \(r\in V(T)\) as the root of the tree allows us to define the parent p(v) of each node \(v\ne r\) as the first node encountered on the path from v to r, and \(p(r)=r\) for consistency. The set of children is defined \(\mathrm {ch}(v)=\{u\in V(T)\backslash v:p(u)=v\}\). Note that the edges E(T) are fully determined by the parent pointer p as \(\{v,p(v)\}\) for all \(v\ne r\).
The set \({\mathbb {S}}_{G}^{n}\subseteq {\mathbb {S}}^{n}\) is the set of \(n\times n\) real symmetric matrices with sparsity graph G. We denote \(P_{G}(X)=\min _{Y\in {\mathbb {S}}_{G}^{n}}\Vert X-Y\Vert _{F}\) as the Euclidean projection of \(X\in {\mathbb {S}}^{n}\) onto \({\mathbb {S}}_{G}^{n}\).
3.2 Tree decomposition via the elimination tree
The standard procedure for solving \(Sx=b\) with \(S\succ 0\) consists of a factorization step, where S is decomposed into the unique Cholesky factor L satisfying
and a substitution step, where the two triangular systems \(Lu=r\) and \(L^{T}x=u\) are back-substituted to yield x.
In the case that S is sparse, the location of nonzero elements in L encodes a tree decomposition for the sparsity graph of S known as the elimination tree [44]. Specifically, define the index sets \(J_{1},\ldots ,J_{n}\subseteq \{1,\ldots ,n\}\) as in
and the tree T via the parent pointers
Then, ignoring perfect numerical cancellation, \({\mathcal {T}}=(\{J_{1},\ldots ,J_{n}\},T)\) is a tree decomposition for the sparsity graph of S.
Elimination trees with reduced widths can be obtained by reordering the rows and columns of S using a fill-reducing permutation \(\varPi \), because the sparsity graph of \(\varPi S\varPi ^{T}\) is just the sparsity graph of S with its vertices reordered. The minimum width of an elimination tree over all permutations \(\varPi \) is precisely the treewidth of the sparsity graph of S; see Bodlaender et al. [45] and the references therein. The general problem is well-known to be NP-complete in general [36], but polynomial-time approximation algorithms exist to solve the problem to a logarithmic factor [45,46,47]. In practice, heuristics like the minimum degree [48] and nested dissection [49] are considerably faster while still producing high-quality choices of \(\varPi \).
Note that the sparsity pattern of L is completely determined by the sparsity pattern of S, and not by its numerical value. The former can be computed from the latter using a symbolic Cholesky factorization algorithm, a standard routine in most sparse linear algebra libraries, in time linear to the number of nonzeros in L; see [50, Section 5] and [51, Theorem 5.4.4], and also the discussion in [49].
3.3 Clique tree conversion
Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition with small width for the sparsity graph G of the data matrices \(C,A_{1},\ldots ,A_{m}\). We define the graph \(F\supseteq G\) by taking each index set \(J_{j}\) of \({\mathcal {T}}\) and interconnecting all pairs of vertices \(u,v\in J_{j}\), as in
The following fundamental result was first established by Grone et al. [52]. Constructive proofs allow us to recover all elements in \(X\succeq 0\) from only the elements in \(P_{F}(X)\) using a closed-form formula.
Theorem 3
(Grone et al. [52]) Given \(Z\in {\mathbb {S}}_{F}^{n}\), there exists an \(X\succeq 0\) satisfying \(P_{F}(X)=Z\) if and only if \(Z[J_{j},J_{j}]\succeq 0\) for all \(j\in \{1,2,\ldots ,\ell \}\).
We can use Theorem 3 to reformulate (SDP) into a reduced-complexity form. The key is to view (SDP) as an optimization over \(P_{F}(X)\), since
and similarly \(A_{i}\bullet X=A_{i}\bullet P_{F}(X)\). Theorem 3 allows us to account for \(X\succeq 0\) implicitly, by optimizing over \(Z=P_{F}(X)\) in the following
Next, we split the principal submatrices into distinct matrix variables, coupled by the need for their overlapping elements to agree. Define the overlap operator \({\mathcal {N}}_{i,j}(\cdot )\) to output the overlapping elements of two principal submatrices given the latter as input:
The running intersection property of the tree decomposition allows us to enforce this agreement using \(\ell -1\) pairwise block comparisons.
Theorem 4
(Fukuda et al. [10]) Given \(X_{1},X_{2},\ldots ,X_{\ell }\) for \(X_{j}\in {\mathbb {S}}^{|J_{j}|}\), there exists Z satisfying \(Z[J_{j},J_{j}]=X_{j}\) for all \(j\in \{1,2,\ldots ,\ell \}\) if and only if \({\mathcal {N}}_{i,j}(X_{j})={\mathcal {N}}_{j,i}(X_{i})\) for all \((i,j)\in E(T)\).
Splitting the objective C and constraint matrices \(A_{1},\ldots ,A_{m}\) into \(C_{1},\ldots ,C_{\ell }\) and \(A_{1,1},\ldots ,A_{m,\ell }\) to satisfy
and applying Theorem 4 yields the following
which vectorizes into a linear conic program in standard form
over the Cartesian product of \(\ell \le n\) smaller semidefinite cones
Here, \({\mathbf {A}}=[\mathrm {svec}\,(A_{i,j})^{T}]_{i,j=1}^{m,\ell }\) and \(c=[\mathrm {svec}\,(C_{j})]_{j=1}^{\ell }\) correspond to (10), and the overlap constraints matrix \({\mathbf {N}}=[{\mathbf {N}}_{i,j}]_{i,j=1}^{\ell ,\ell }\) is implicitly defined by the relation
for every non-root node i on T. (To avoid all-zero rows in \({\mathbf {N}}\), we define \({\mathbf {N}}_{i,j}\,\mathrm {svec}\,(X_{j})\) as the empty length-zero vector \({\mathbb {R}}^{0}\) if i is the root node.)
The converted problem (CTC) inherits the standard regularity assumptions from (SDP). Accordingly, an interior-point method is well-behaved in solving (11). (Proofs for the following statements are deferred to “Appendix A”.)
Lemma 1
(Linear independence) There exists \([u;v]\ne 0\) such that \({\mathbf {A}}^{T}u+{\mathbf {N}}^{T}v=0\) if and only if there exists \(y\ne 0\) such that \(\sum _{i}y_{i}A_{i}=0\).
Lemma 2
(Primal Slater) There exists \(x\in \mathrm {Int}({\mathcal {K}})\) satisfying \({\mathbf {A}}x=b\) and \({\mathbf {N}}x=0\) if and only if there exists an \(X\succ 0\) satisfying \(A_{i}\bullet X=b_{i}\) for all \(i\in \{1,\ldots ,m\}\).
Lemma 3
(Dual Slater) There exists u, v satisfying \(c-{\mathbf {A}}^{T}u-{\mathbf {N}}^{T}v\in \mathrm {Int}({\mathcal {K}}_{*})\) if and only if there exists y satisfying \(C-\sum _{i}y_{i}A_{i}\succ 0\).
After an \(\epsilon \)-accurate solution \(X_{1}^{\star },\ldots ,X_{\ell }^{\star }\) to (CTC) is found, we recover, in closed-form, a positive semidefinite completion \(X^{\star }\succeq 0\) satisfying \(X^{\star }[J_{j},J_{j}]=X_{j}^{\star }\), which in turn serves as an \(\epsilon \)-accurate solution to (SDP). Of all possible choices of \(X^{\star }\), a particularly convenient one is the low-rank completion \(X^{\star }=UU^{T}\), in which U is a dense matrix with n rows and at most \(\omega =\max _{j}|J_{j}|\) columns. While the existence of the low-rank completion was known since Dancis [53] (see also Laurent and Varvitsiotis [54] and Madani et al. [55]), Sun [43, Algorithm 2] gave an explicit algorithm to compute \(U^{\star }\) from \(X_{1}^{\star },\ldots ,X_{\ell }^{\star }\) in \(O(\omega ^{3}n)\) time and \(O(\omega ^{2}n)\) memory. The practical effectiveness of Sun’s algorithm was later validated on a large array of power systems problems by Jiang [56].
4 Cost of an interior-point iteration on (CTC)
When the vectorized version (11) of the converted problem (CTC) is solved using an interior-point method, the cost of each iteration is dominated by the cost of forming and solving the normal equation (also known as the Schur complement equation)
where the scaling matrix \({\mathbf {D}}_{s}\) is block-diagonal with fully-dense blocks
Typically, each dense block in \({\mathbf {D}}_{s}\) is the Hessian of a log-det penalty, as in \({\mathbf {D}}_{s,j}=\nabla ^{2}[\log \det (X_{j})]\). The submatrix \({\mathbf {A}}{\mathbf {D}}_{s}{\mathbf {A}}^{T}\) is often sparse [16], with a sparsity pattern that coincides with the correlative sparsity [57] of the problem.
Unfortunately, \({\mathbf {N}}{\mathbf {D}}_{s}{\mathbf {N}}^{T}\) can be fully-dense, even when \({\mathbf {A}}{\mathbf {D}}_{s}{\mathbf {A}}^{T}\) is sparse or even diagonal. To explain, observe from (13) that the block sparsity pattern of \({\mathbf {N}}=[{\mathbf {N}}_{i,j}]_{i,j=1}^{\ell ,\ell }\) coincides with the incidence matrix of the tree decomposition tree T. Specifically, for every i with parent p(i), the block \({\mathbf {N}}_{i,j}\) is nonzero if and only if \(j\in \{i,p(i)\}\). As an immediate corollary, the block sparsity pattern of \({\mathbf {N}}{\mathbf {D}}_{s}{\mathbf {N}}^{T}\) coincides with the adjacency matrix of the line graph of T:
The line graph of a tree is not necessarily sparse. If T were the star graph on n vertices, then its associated line graph \({\mathcal {L}}(T)\) would be the complete graph on \(n-1\) vertices. Indeed, consider the following example.
Example 4
(Star graph) Given \(b\in {\mathbb {R}}^{n}\), embed \(\max \{b^{T}y:\Vert y\Vert \le 1\}\) into the order-\((n+1)\) semidefinite program:
The associated sparsity graph G is the star graph on \(n+1\) nodes, and its elimination tree \({\mathcal {T}}=(\{J_{1},\ldots ,J_{n}\},T)\) has index sets \(J_{j}=\{j,n+1\}\) and parent pointer \(p(j)=n\). Applying clique tree conversion and vectorizing yields an instance of (11) with
where \(e_{j}\) is the j-th column of the \(3\times 3\) identity matrix. It is straightforward to verify that \({\mathbf {A}}{\mathbf {D}}_{s}{\mathbf {A}}^{T}\) is \(n\times n\) diagonal but \({\mathbf {N}}{\mathbf {D}}_{s}{\mathbf {N}}^{T}\) is \((n-1)\times (n-1)\) fully dense for the \({\mathbf {D}}_{s}\) in (15). The cost of solving the corresponding normal equation (14) must include the cost of factoring this fully dense submatrix, which is at least \((n-1)^{3}/3\) operations and \((n-1)^{2}/2\) units of memory. \(\square \)
On the other hand, observe that the block sparsity graph of \({\mathbf {N}}^{T}{\mathbf {N}}\) coincides with the tree graph T
Such a matrix is guaranteed to be block sparse: sparse over dense blocks. More importantly, after a topological block permutation \(\varPi \), the matrix \(\varPi ({\mathbf {N}}^{T}{\mathbf {N}})\varPi ^{T}\) factors into \({\mathbf {L}}{\mathbf {L}}^{T}\) with no block fill.
Definition 5
(Topological ordering) An ordering \(\pi :\{1,2,\ldots ,n\}\rightarrow V(T)\) on the tree graph T with n nodes is said to be topological [15, p. 10] if, by designating \(\pi (n)\) as the root of T, each node is indexed before its parent:
where \(\pi ^{-1}(v)\) denotes the index associated with the node v.
Lemma 4
(No block fill) Let \(J_{1},\ldots ,J_{n}\) satisfy \(\bigcup _{j=1}^{n}J_{j}=\{1,\ldots ,d\}\) and \(J_{i}\cap J_{j}=\emptyset \) for all \(i\ne j\), and let \(H\succ 0\) be a \(d\times d\) matrix satisfying
for a tree graph T on n nodes. If \(\pi \) is a topological ordering on T and \(\varPi \) is a permutation matrix satisfying
then \(\varPi H\varPi ^{T}\) factors into \(LL^{T}\) where the Cholesky factor L satisfies
Therefore, sparse Cholesky factorization solves \(Hx=b\) for x by: (i) factoring \(\varPi H\varPi ^{T}\) into \(LL^{T}\) in \(O(\beta ^{3}n)\) operations and \(O(\beta ^{2}n)\) memory where \(\beta =\max _{j}|J_{j}|\), and (ii) solving \(Ly=\varPi b\) and \(L^{T}z=y\) and \(x=\varPi ^{T}z\) in \(O(\beta ^{2}n)\) operations and memory.
This is a simple block-wise extension of the tree elimination result originally due to Parter [58]; see also George and Liu [51, Lemma 6.3.1]. In practice, a topological ordering can be found by assigning indices \(n,n-1,n-2,\ldots \) in decreasing ordering during a depth-first search traversal of the tree. In fact, the minimum degree heuristic is guaranteed to generate a topological ordering [48].
One way of exploiting the favorable block sparsity of \({\mathbf {N}}^{T}{\mathbf {N}}\) is to view the normal equation (14) as the Schur complement equation to an augmented system with \(\epsilon =0\):
Instead, we can solve the dual Schur complement equation for \(\epsilon >0\)
and recover an approximate solution. Under suitable sparsity assumptions on \({\mathbf {A}}^{T}{\mathbf {A}}\), the block sparsity graph of the matrix in (19) coincides with that of \({\mathbf {N}}^{T}{\mathbf {N}}\), which is itself the tree graph T. Using sparse Cholesky factorization with a topological block permutation, we solve (19) in linear time and back substitute to obtain a solution to (18) in linear time. In principle, a sufficiently small \(\epsilon >0\) will approximate the exact case at \(\epsilon =0\) to arbitrary accuracy, and this is all we need for the outer interior-point method to converge in polynomial time.
A more subtle way to exploit the block sparsity of \({\mathbf {N}}^{T}{\mathbf {N}}\) is to reformulate (CTC) into a form whose normal equation is exactly (19). As we show in the next section, this is achieved by a simple technique known as dualization.
5 Dualized clique tree conversion
The dualization technique of Löfberg [18] swaps the roles played by the primal and the dual problems in a linear conic program, by rewriting a primal standard form problem into dual standard form, and vice versa. Applying dualization to (11) yields the following
where we use f to denote the number of equality constraints in (CTC). Observe that the dual problem in (20) is identical to the primal problem in (11), so that a dual solution \(y^{\star }\) to (20) immediately serves as a primal solution to (11), and hence also (CTC).
Modern interior-point methods solve (20) by embeding the free variable \(x_{1}\in {\mathbb {R}}^{f}\) and fixed variable \(s_{1}\in \{0\}^{f}\) into a second-order cone (see Sturm [59] and Andersen [60]):
When (21) is solved using an interior-point method, the normal equation solved at each iteration takes the form
where \({\mathbf {D}}_{s}\) is comparable as before in (15), and
is the rank-1 perturbation of a scaled identity matrix. The standard procedure, as implemented in SeDuMi [59, 61] and MOSEK [62], is to form the sparse matrix \({\mathbf {H}}\) and dense vector \({\mathbf {q}}\), defined
and then solve (22) using a rank-1 updateFootnote 3
at a cost comparable to the solution of \({\mathbf {H}}u=r\) for two right-hand sides. (In “Appendix B”, we repeat these derivations for the version of (11) in which \({\mathbf {A}}x=b\) is replaced by the inequality \({\mathbf {A}}x\le b\).)
The matrix \({\mathbf {H}}\) is exactly the dual Schur complement derived in (19) with \(\sigma =1/\epsilon \). If the \({\mathbf {A}}^{T}{\mathbf {A}}\) shares its block sparsity pattern with \({\mathbf {N}}^{T}{\mathbf {N}}\), then the block sparsity graph of \({\mathbf {H}}\) coincides with the tree graph T, and \({\mathbf {H}}u=r\) can be solved in linear time. The cost of making the rank-1 update is also linear time, so the cost of solving the normal equation is linear time.
Lemma 5
(Linear-time normal equation) Let there exist \(v_{i}\in V(T)\) for each \(i\in \{1,\ldots ,m\}\) such that
Define \({\mathbf {H}}\) and \({\mathbf {q}}\) according to (24). Then, under Assumption 1:
-
1.
(Forming) It costs \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) space to form \({\mathbf {H}}\) and \({\mathbf {q}}\), where \(\omega =\max _{j}|J_{j}|=1+\mathrm {wid}({\mathcal {T}})\).
-
2.
(Factoring) Let \(\pi \) be a topological ordering on T, and define the associated block topological permutation \(\varPi \) as in Lemma 4. Then, it costs \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) space to factor \(\varPi {\mathbf {H}}\varPi ^{T}\) into \({\mathbf {L}}{\mathbf {L}}^{T}\).
-
3.
(Solving) Given r, \({\mathbf {q}}\), \(\varPi \), and the Cholesky factor \({\mathbf {L}}\) satisfying \({\mathbf {L}}{\mathbf {L}}^{T}=\varPi {\mathbf {H}}\varPi ^{T}\), it costs \(O(\omega ^{4}n)\) time and space to solve \(({\mathbf {H}}+{\mathbf {q}}{\mathbf {q}}^{T})u=r\) for u.
Proof
For an instance of (CTC), denote \(\ell =|{\mathcal {J}}|\le n\) as its number of conic constraints, and \(d=\frac{1}{2}\sum _{j=1}^{\ell }|J_{j}|(|J_{j}|+1)\le \omega ^{2}\ell \) as its total number of variables. Under linear independence (Assumption 1), the constraint matrix \([{\mathbf {A}};{\mathbf {N}}]\) associated with (CTC) has d columns and at most d rows (Lemma 1). Write \(\xi _{i}^{T}\) as the i-th row of \([{\mathbf {A}};{\mathbf {N}}]\), and assume without loss of generality that \([{\mathbf {A}};{\mathbf {N}}]\) has exactly d rows. Observe that \(\mathrm {nnz}\,(\xi _{i})\le \omega (\omega +1)\) by the definition of \({\mathbf {N}}\) (13) and the hypothesis on \({\mathbf {A}}\) via (26), so \(\mathrm {nnz}\,([{\mathbf {A}};{\mathbf {N}}])\le 2\omega ^{4}\ell \).
(i) We form \({\mathbf {H}}\) by setting \({\mathbf {H}}\leftarrow {\mathbf {D}}_{s}\) and then adding \({\mathbf {H}}\leftarrow {\mathbf {H}}+\sigma \xi _{i}\xi _{i}^{T}\) one at a time, for \(i\in \{1,2,\ldots ,d\}\). The first step forms \({\mathbf {D}}_{s}=\mathrm {diag}({\mathbf {D}}_{s}^{(1)},{\mathbf {D}}_{s}^{(2)},\ldots ,{\mathbf {D}}_{s}^{(\ell )})\) where \({\mathbf {D}}_{s}^{(j)}=W_{j}\otimes W_{j}\). Each \(\mathrm {nnz}\,({\mathbf {D}}_{s}^{(j)})=\mathrm {nnz}\,(W_{j})^{2}=|J_{j}|^{2}(|J_{j}|+1)^{2}/4\le \omega ^{4}\) for \(j\in \{1,2,\ldots ,\ell \}\), so the total cost is \(O(\omega ^{4}n)\) time and space. The second step adds \(\mathrm {nnz}\,(\xi _{i}\xi _{i}^{T})^{2}\le \omega ^{2}(\omega +1)^{2}\) nonzeros per constraint over d total constraints, for a total cost of \(O(\omega ^{6}n)\) time and apparently \(O(\omega ^{6}n)\) space. However, by the definition of \({\mathbf {N}}\) (13) and the hypothesis on \({\mathbf {A}}\) via (26), the (j, k)-th off-diagonal block of \(\xi _{i}\xi _{i}^{T}\) is nonzero only if (j, k) is an edge of the tree T, as in
Hence, adding \({\mathbf {H}}\leftarrow {\mathbf {H}}+\sigma \xi _{i}\xi _{i}^{T}\) one at a time results in at most \(|V(T)|+|E(T)|\) dense blocks of at most \(\frac{1}{2}\omega (\omega +1)\times \frac{1}{2}\omega (\omega +1)\), for a total memory cost of \(O(\omega ^{4}n)\).
(ii) We form \({\mathbf {q}}=[{\mathbf {A}}^{T},{\mathbf {N}}^{T}]w_{1}\) using a sparse matrix-vector product. Given that \(\mathrm {nnz}\,(w_{1})\le d\) and \(\mathrm {nnz}\,([{\mathbf {A}};{\mathbf {N}}])\le 2\omega ^{4}\ell \), this step costs \(O(\omega ^{4}n)\) time and space.
(iii) We partition \({\mathbf {H}}\) into \([{\mathbf {H}}_{i,j}]_{i,j=1}^{\ell }\) to reveal a block sparsity pattern that coincides with the adjacency matrix of T:
where \(a_{q,i}=\mathrm {svec}\,(A_{q,i})\). According to Lemma 4, the permuted matrix \(\varPi {\mathbf {H}}\varPi ^{T}\) factors into \({\mathbf {L}}{\mathbf {L}}^{T}\) with no block fill in \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) space, because each block \({\mathbf {H}}_{i,j}\) for \(i,j\in \{1,2,\ldots ,\ell \}\) is at most order \(\frac{1}{2}\omega (\omega +1)\).
(iv) Using the rank-1 update formula (25), the cost of solving \(({\mathbf {H}}+{\mathbf {q}}{\mathbf {q}}^{T})u=r\) is the same as the cost of solving \({\mathbf {H}}u=r\) for two right-hand sides, plus algebraic manipulations in \(O(d)=O(\omega ^{2}n)\) time. Applying Lemma 4 shows that the cost of solving \({\mathbf {H}}u=r\) for each right-hand side is \(O(\omega ^{4}n)\) time and space. \(\square \)
Incorporating the block topological permutation of Lemma 5 within any off-the-self interior-point method yields a fast interior-point method with near-linear time complexity.
Theorem 5
(Near-linear time) Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for the sparsity graph of \(C,A_{1},\ldots ,A_{m}\in {\mathbb {S}}^{n}\). In the corresponding instance of (CTC), let each constraint be written
Under Assumptions 1& 2, there exists an algorithm that computes an iterate \((x,y,s)\in {\mathcal {K}}\times {\mathbb {R}}^{p}\times {\mathcal {K}}_{*}\) satisfying
in \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time and \(O(\omega ^{4}n)\) space, where \(\omega =\max _{j}|J_{j}|=1+\mathrm {wid}({\mathcal {T}})\).
For completeness, we give a proof of Theorem 5 in “Appendix C”, based on the primal-dual interior-point method found in SeDuMi [59, 61]. Our proof amounts to replacing the fill-reducing permutation—usually a minimum degree ordering—by the block topological permutation of of Lemma 5. In practice, the minimum degree ordering is often approximately block topological, and as such, Theorem 5 is often attained by off-the-shelf implementations without modification.
The complete end-to-end procedure for solving (SDP) using dualized clique tree conversion is summarized as Algorithm 1. Before we can use Algorithm 1 to prove our main results, however, we must first address the cost of the pre-processing involved in Step 1. Indeed, naively converting (SDP) into (CTC) by comparing each nonzero element of \(A_{i}\) against each index set \(J_{j}\) would result in \(\ell m=O(n^{2})\) comparisons, and this would cause Step 1 to become the overall bottleneck of the algorithm.
In the next section, we show that if (SDP) is partially separated, then the cost of Step 1 is no more than \(O(\omega ^{3}n)\) time and memory. This is the final piece in the proof of Theorem 1.
6 Optimal constraint splitting
A key step in clique tree conversion is the splitting of a given \(M\in {\mathbb {S}}_{F}^{n}\) into \(M_{1},\ldots ,M_{\ell }\) that satisfy
The choice is not unique, but has a significant impact on the complexity of an interior-point solution. The problem of choosing the sparsest choice with the fewest nonzero \(M_{j}\) matrices can be written
where \({\mathcal {M}}=\{(i,j):M[i,j]\ne 0\}\) are the nonzero matrix elements to be covered. Problem (30) is an instance of SET COVER, one of Karp’s 21 NP-complete problems, but becomes solvable in polynomial time given a tree decomposition (with small width) for the covering sets [64].
In this section, we describe an algorithm that computes the sparsest splitting for each M in \(O(\mathrm {nnz}\,(M))\) time and space, after a precomputation set taking \(O(\omega n)\) time and memory. Using this algorithm, we convert a partially separable instance of (SDP) into (CTC) in \(O(\omega ^{3}n)\) time and memory. Then, give a complete proof to Theorem 1 by using this algorithm to convert (SDP) into (CTC) in Step 1 of Algorithm 1.
Our algorithm is adapted from the leaf-pruning algorithm of Guo and Niedermeier [64], but appears to be new within the context of clique tree conversion. Observe that the covering sets inherit the edge cover and running intersection properties of \({\mathcal {T}}\):
For every leaf node j with parent node p(j) on T, property (32) implies that the subset \((J_{j}\times J_{j})\backslash (J_{p(j)}\times J_{p(j)})\) contains elements unique to \(J_{j}\times J_{j}\), because p(j) lies on the path from j to all other nodes in T. If \({\mathcal {M}}\) contains an element from this subset, then j must be included in the cover set \({\mathcal {S}}\), so we set \({\mathcal {S}}\leftarrow {\mathcal {S}}\cup \{j\}\) and \({\mathcal {M}}\leftarrow {\mathcal {M}}\backslash (J_{j}\times J_{j})\); otherwise, we do nothing. Pruning the leaf node reveals new leaf nodes, and we repeat this process until the tree T is exhausted of nodes. Then, property (31) guarantees that \({\mathcal {M}}\) will eventually be covered.
Algorithm 2 is an adaptation of the leaf-pruning algorithm described above, with three important simplifications. First, it uses a topological traversal (Definition 5) to simulate the process of leaf pruning without explicitly deleting nodes from the tree. Second, it notes that the unique subset \((J_{j}\times J_{j})\backslash (J_{p(j)}\times J_{p(j)})\) can be written in terms of another unique set \(U_{j}\):
Third, it notes that the unique set \(U_{j}\) defined above is a partitioning of \(\{1,\ldots ,n\}\), and has a well-defined inverse map. The following is taken from [65, 66], where \(U_{j}\) is denoted \({\mathrm {new}}(J_{j})\) and referred to as the “new set” of \(J_{j}\); see also [67].
Lemma 6
(Unique partition) Define \(U_{j}=J_{j}\backslash J_{p(j)}\) for all nodes j with parent p(j), and \(U_{r}=J_{r}\) for the root node r. Then: (i) \(\bigcup _{j=1}^{\ell }U_{j}=\{1,\ldots ,n\}\); and (ii) \(U_{i}\cap U_{j}=\emptyset \) for all \(i\ne j\).
In the case that \({\mathcal {M}}\) contains just O(1) items to be covered, we may use the inverse map associated with \(U_{j}\) to directly identify covering sets whose unique sets contain elements from \({\mathcal {M}}\), without exhaustively iterating through all O(n) covering sets. This final simplification reduces the cost of processing each \(M_{i}\) from linear O(n) time to \(O(\mathrm {nnz}\,(M_{i}))\) time, after setting up the inverse map in \(O(\omega n)\) time and space.
Theorem 6
Algorithm 2 has complexity
where \(\omega \equiv 1+\mathrm {wid}({\mathcal {T}})\).
For partially separable instances of (SDP), the sparsest instance of (CTC) contains exactly one nonzero split matrix \(A_{i,j}\ne 0\) for each i, and Algorithm 2 is guaranteed to find it. Using Algorithm 2 to convert (SDP) into (CTC) in Step 1 of Algorithm 1 yields the complexity figures quoted in Theorem 1.
Proof of Theorem 1
By hypothesis, \({\mathcal {T}}=\{J_{1},\ldots ,J_{\ell }\}\) is a tree decomposition for the sparsity graph of the data matrices \(C,A_{1},\ldots ,A_{m}\), and (SDP) is partially separable on \({\mathcal {T}}\). We proceed to solve (SDP) using Algorithm 1, while performing the splitting into \(C_{j}\) and \(A_{i,j}\) using Algorithm 2. Below, we show that each step of the algorithm costs no more than \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time and \(O(\omega ^{4}n)\) memory:
Step 1 (Matrix \({\mathbf {A}}\) and vector \({\mathbf {c}}\)). We have \(\dim ({\mathbb {S}}_{G}^{n})=|V(G)|+|E(G)|\le n+n\cdot \mathrm {wid}({\mathcal {T}})\le \omega n\), and hence \(\mathrm {nnz}\,(C)\le \omega n\). Under partial separability (Definition 3), we also have \(\mathrm {nnz}\,(A_{i})\le \omega ^{2}\). Assuming linear independence (Assumption 1) yields \(m\le \dim ({\mathbb {S}}_{G}^{n})\le \omega n\), and this implies that \(\mathrm {nnz}\,(C)+\sum _{i}\mathrm {nnz}\,(A_{i})=O(\omega ^{3}n)\), so the cost of forming \({\mathbf {A}}\) and \({\mathbf {c}}\) using Algorithm 1 is \(O(\omega ^{3}n)\) time and memory via Theorem 6.
Step 1 (Matrix \({\mathbf {N}}\)). For \({\mathbf {N}}=[{\mathbf {N}}_{i,j}]_{i,j=1}^{\ell }\), we note that each block \({\mathbf {N}}_{i,j}\) is diagonal, and hence \(\mathrm {nnz}\,({\mathbf {N}}_{i,j})\le \omega ^{2}\). The overall \({\mathbf {N}}\) contains \(\ell \) block-rows, with 2 nonzero blocks per block-row, for a total of \(2\ell \) nonzero blocks. Therefore, the cost of forming \({\mathbf {N}}\) is \(\mathrm {nnz}\,({\mathbf {N}})=O(\omega ^{2}n)\) time and memory.
Step 2. We dualize by forming the matrix \({\mathbf {M}}=[0,-{\mathbf {A}}^{T},{\mathbf {N}}^{T},+I]\) and vectors \({\mathbf {c}}^{T}=[0,b^{T},0,0]\) and vectors \({\mathbf {b}}=-c\) in \(O(\mathrm {nnz}\,({\mathbf {A}})+\mathrm {nnz}\,({\mathbf {N}}))=O(\omega ^{3}n)\) time and memory.
Step 3. The resulting instance of (CTC) satisfies the assumptions of Theorem 5 and therefore costs \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time and \(O(\omega ^{4}n)\) memory to solve.
Step 4. The low-rank matrix completion algorithm [43, Algorithm 2] makes \(\ell \le n\) iterations, where each iteration performs O(1) matrix-matrix operations over \(\omega \times \omega \) dense matrices. Its cost is therefore \(O(\omega ^{3}n)\) time and \(O(\omega ^{2}n)\) memory. \(\square \)
7 Dualized clique tree conversion with auxiliary variables
Theorem 5 bounds the cost of solving instances of (CTC) that satisfy the sparsity assumption (27) as near-linear time and linear memory. Instances of (CTC) that do not satisfy the sparsity assumption can be systematically transformed into ones that do by introducing auxiliary variables. Let us illustrate this idea with an example.
Example 5
(Path graph) Given \((n+1)\times (n+1)\) symmetric tridiagonal matrices \(A\succ 0\) and C with \(A[i,j]=C[i,j]=0\) for all \(|i-j|>1\), consider the Rayleigh quotient problem
The associated sparsity graph is the path graph on \(n+1\) nodes, and its elimination tree decomposition \({\mathcal {T}}=(\{J_{1},\ldots ,J_{n}\},T)\) has index sets \(J_{j}=\{j,j+1\}\) and parent pointer \(p(j)=j+1\). Applying clique tree conversion and vectorizing yields an instance of (11) with
where \(e_{j}\) is the j-th column of the \(3\times 3\) identity matrix, and \(a_{1},\ldots ,a_{n}\in {\mathbb {R}}^{3}\) are appropriately chosen vectors. The dualized Schur complement \({\mathbf {H}}={\mathbf {D}}_{s}+\sigma {\mathbf {A}}^{T}{\mathbf {A}}+\sigma {\mathbf {N}}^{T}{\mathbf {N}}\) is fully dense, so dualized clique tree conversion (Algorithm 1) would have a complexity of at least cubic \(n^{3}\) time and quadratic \(n^{2}\) memory. Instead, introducing \(n-1\) auxiliary variables \(u_{1},\ldots ,u_{n-1}\) yields the following problem
which does indeed satisfy the sparsity assumption (27) of Theorem 5. In turn, solving (34) using Steps 2-3 of Algorithm 1 recovers an \(\epsilon \)-accurate solution in \(O(n^{1.5}\log \epsilon ^{-1})\) time and O(n) memory. \(\square \)
For a constraint \(A_{i}\bullet X=b_{i}\) in (SDP), we assume without loss of generalityFootnote 4 that the corresponding constraint in (CTC) is split over a connected subtree of T induced by a subset of vertices \(W\subseteq V(T)\), as in
Then, the coupled constraint (35) can be decoupled into |W| constraints, by introducing \(|W|-1\) auxiliary variables, one for each edge of the connected subtree \(T_{W}\):
It is easy to see that (35) and (36) are equivalent by applying Gaussian elimination on the auxiliary variables.
Lemma 7
The matrix X satisfies (35) if and only if there exists \(\{u_{j}\}\) such that X satisfies (36).
Repeating the splitting procedure for every constraint in (CTC) yields a problem of the form
where \(W_{i}\) is induces the connected subtree associated with i-th constraint, and \(\gamma _{j}\) is the total number of auxiliary variables added to each j-th variable block. When (21) is dualized and solved using an interior-point method, the matrix \({\mathbf {H}}=[{\mathbf {H}}_{i,j}]_{i,j=1}^{\ell }\) satisfies \({\mathbf {H}}_{i,j}=0\) for every \((i,j)\notin E(T)\), so by repeating the proof of Lemma 5, the cost of solving the normal equation is again linear time. Incorporating this within any off-the-self interior-point method again yields a fast interior-point method.
Lemma 8
Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for the sparsity graph of \(C,A_{1},\ldots ,A_{m}\in {\mathbb {S}}^{n}\), and convert the corresponding instance of (CTC) into (37). Under Assumptions 1& 2, there exists an algorithm that computes an iterate \((x,y,s)\in {\mathcal {K}}\times {\mathbb {R}}^{p}\times {\mathcal {K}}_{*}\) satisfying (28) in
where \(\omega =1+\mathrm {wid}({\mathcal {T}})\) and \(\gamma _{\max }=\max _{j}\gamma _{j}\) is the maximum number of auxiliary variables added to a single variable block.
Proof
We repeat the proof of Theorem 5, but slightly modify the linear time normal equation result in Lemma 5. Specifically, we repeat the proof of Lemma 5, but note that each block \({\mathbf {H}}_{i,j}\) of \({\mathbf {H}}\) is now order \(\frac{1}{2}\omega (\omega +1)+\gamma _{\max }\), so that factoring in (ii) now costs \(O((\omega ^{2}+\gamma _{\max })^{3}n)\) time and \(O((\omega ^{2}+\gamma _{\max })^{2}n)\) memory, and substituting in (iii) costs \(O((\omega ^{2}+\gamma _{\max })^{2}n)\) time and memory. After \(O(\sqrt{\omega n}\log \epsilon ^{-1})\) interior-point iterations, we again arrive at an \(\epsilon \)-accurate and \(\epsilon \)-feasible solution to (CTC). \(\square \)
The complete end-to-end procedure for solving (SDP) using the auxiliary variables method is summarized as Algorithm 3. In the case of network flow semidefinite programs, the separating in Step 2 can be performed in closed-form using the induced subtree property of the tree decomposition [68].
Definition 6
(Induced subtrees) Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition. We define \(T_{k}\) as the connected subtree of T induced by the nodes that contain the element k, as in
Lemma 9
Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for the graph G. For every \(i\in V(G)\) and
there exists \(A_{j}\) for \(j\in V(T_{i})\) such that
Proof
We give an explicit construction. Iterate j over the neighbors \({\mathrm {nei}}(i)=\{j:(i,j)\in E(G)\}\) of i. By the edge cover property of the tree decomposition, there exists \(k\in \{1,\ldots ,\ell \}\) satisfying \(i,j\in J_{k}\). Moreover, \(k\in V(T_{i})\) because \(i\in J_{k}\). Define \(A_{k}\) to satisfy
where \(\deg _{i}=|{\mathrm {nei}}(i)|\). \(\square \)
If each network flow constraint is split using according to Lemma 9, then the number of auxiliary variables needed to decouple the problem can be bounded. This results in a proof of Theorem 2.
Proof of Theorem 2
By hypothesis, \({\mathcal {T}}=\{J_{1},\ldots ,J_{\ell }\}\) is a tree decomposition for the sparsity graph of the data matrices \(C,A_{1},\ldots ,A_{m}\), and each \(A_{i}\) can be split according to Lemma 9 onto a connected subtree of T. We proceed to solve (SDP) using Algorithm 3. We perform Step 1 in closed-form, by splitting each \(A_{i}\) in according to Lemma 9. The cost of Steps 2 and 3 are then bound as \(\mathrm {nnz}\,({\mathbf {A}})+\mathrm {nnz}\,({\mathbf {N}})=O(\omega ^{3}n)\) time and memory. The cost of step 5 is also \(O(\omega ^{3}n)\) time and \(O(\omega ^{2}n)\) memory, using the same reasoning as the proof of Theorem 1.
To quantify the cost of Step 4, we must show that under the conditions stated in the theorem, the maximum number of auxiliary variables added to each variable block is bound \(\gamma _{j}\le m_{k}\cdot \omega \cdot d_{\max }\). We do this via the following line of reasoning:
-
A single network flow constraint at vertex k contributes \(|\mathrm {ch}(j)|\le d_{\max }\) auxiliary variables to every j-th index set \(J_{j}\) satisfying \(j\in V(T_{k})\).
-
Having one network flow constraint at every \(k\in \{1,\ldots ,\ell \}\) contributes at most \(\omega \cdot d_{\max }\) auxiliary variables to every j-th clique \(J_{j}\). This is because the set of \(V(T_{k})\) for which \(j\in V(T_{k})\) is exactly \(J_{j}=\{\{1,\ldots ,\ell \}:j\in V(T_{k})\}\), and \(|J_{j}|\le \omega \) by definition.
-
Having \(m_{k}\) network flow constraints at each \(k\in \{1,\ldots ,\ell \}\) contributes at most \(m_{k}\cdot \omega \cdot d_{\max }\) auxiliary variables to every j-th clique \(J_{j}\).
Finally, applying \(\gamma _{j}\le m_{k}\cdot \omega \cdot d_{\max }\) to Lemma 8 yields the desired complexity figure, which dominates the cost of the entire algorithm. \(\square \)
8 Numerical experiments
Using the techniques described in this paper, we solve sparse semidefinite programs posed on the 40 power system test cases in the MATPOWER suite [69], each with number of constraints m comparable to n. The largest two cases have \(n=9241\) and \(n=13659\), and are designed to accurately represent the size and complexity of the European high voltage electricity transmission network [70]. In all of our trials below, the accuracy of a primal-dual iterate (X, y, S) is measured using the DIMACS feasibility and duality gap metrics [71] and stated as the number of accurate decimal digits:
where \({\mathcal {A}}(X)=[A_{i}\bullet X]_{i=1}^{m}\) and \({\mathcal {A}}^{T}(y)=\sum _{i=1}^{m}y_{i}A_{i}\). We will frequently measure the overall number of accurate digits as \(L=\min \{{\mathrm {gap}},{\mathrm {pinf}},{\mathrm {dinf}}\}\).
In our trials, we implement Algorithm 1 and Algorithm 3 in MATLAB using a version of SeDuMi v1.32 [59] that is modified to force a specific fill-reducing permutation during symbolic factorization. The actual block topological ordering that we force SeDuMi to use is a simple postordering of the elimination tree. For comparison, we also implement both algorithms using the standard off-the-shelf version of MOSEK v8.0.0.53 [72], without forcing a specific fill-reducing permutation. The experiments are performed on a Xeon 3.3 GHz quad-core CPU with 16 GB of RAM.
8.1 Elimination trees with small widths
We begin by computing tree decompositions using MATLAB’s internal approximate minimum degree heuristic (due to Amestoy, Davis and Duff [73]). A simplified version of our code is shown as the snippet in Fig. 1. (Our actual code uses Algorithm 4.1 in [15] to reduce the computed elimination tree to the supernodal elimination tree, for a slight reduction in the number of index sets \(\ell \).) Table 1 gives the details and timings for the 40 power system graphs from the MATPOWER suite [69]. As shown, we compute tree decompositions with \(\mathrm {wid}({\mathcal {T}})\le 34\) in less than 2 s. In practice, the bottleneck of the preprocessing step is not the tree decomposition, but the constraint splitting step in Algorithm 2.
8.2 MAX 3-CUT and Lovasz Theta
We begin by considering the MAX 3-CUT and Lovasz Theta problems, which are partially separable by default, and hence have solution complexities of \(O(n^{1.5})\) time and O(n) memory. For each of the 40 test cases, we use the MATPOWER function makeYbus to generate the bus admittance matrix \(Y_{bus}=[Y_{i,j}]_{i,j=1}^{n},\) and symmetrize to yield \(Y_{abs}=\frac{1}{2}[|Y_{i,j}|+|Y_{j,i}|]_{i,j=1}^{n}\). We view this matrix as the weighted adjacency matrix for the system graph. For MAX 3-CUT, we define the weighted Laplacian matrix \(C=\mathrm {diag}(Y_{abs}{\mathbf {1}})-Y_{abs}\), and set up problem (MkC). For Lovasz Theta, we extract the location of the graph edges from \(Y_{abs}\) and set up (\(\hbox {LT}'\)).
First, we use Algorithm 1 with the modified version of SeDuMi to solve the 80 instances of (SDP). Of the 80 instances considered, 79 solved to \(L\ge 5\) digits in \(k\le 23\) iterations and \(T\le 306\) s; the largest instance solved to \(L=4.48\). Table 2 shows the accuracy and timing details for the 20 largest problems solved. Figure 2a plots T/k, the mean time taken per-iteration. As we guaranteed in Lemma 1, the per-iteration time is linear with respect to n. A log-log regression yields \(T/k=10^{-3}n\), with \(R^{2}=0.9636\). Figure 2b plots k/L, the number of iterations to a factor-of-ten error reduction. We see that SeDuMi’s guaranteed iteration complexity \(k=O(\sqrt{n}\log \epsilon ^{-1})=O(\sqrt{n}L)\) is a significant over-estimate; a log-log regression yields \(k/L=0.929n^{0.123}\approx n^{1/8}\), with \(R^{2}=0.5432\). Combined, the data suggests an actual time complexity of \(T\approx 10^{-3}n^{1.1}L\).
Next, we use Algorithm 1 alongside the off-the-shelf version of MOSEK to solve the 80 same instances. It turns out that MOSEK is both more accurate than SeDuMi, as well as a factor of 5-10 faster. It manages to solve all 80 instances to \(L\ge 6\) digits in \(k\le 21\) iterations and \(T\le 24\) s. Table 3 shows the accuracy and timing details for the 20 largest problems solved. Figure 3a plots T/k, the mean time taken per-iteration. Despite not forcing the use of a block topological ordering, MOSEK nevertheless attains an approximately linear per-iteration cost. Figure 3b plots k/L, the number of iterations to a factor-of-ten error reduction. Again, we see that MOSEK’s guaranteed iteration complexity \(k=O(\sqrt{n}\log \epsilon ^{-1})=O(\sqrt{n}L)\) is a significant over-estimate. A log-log regression yields an empirical time complexity of \(T\approx 10^{-4}n^{1.12}L\), which is very close to being linear-time.
8.3 Optimal power flow
We now solve instances of the OPF posed on the same 40 power systems as mentioned above. Here, we use the MATPOWER function makeYbus to generate the bus admittance matrix \(Y_{bus}\), and then manually generate each constraint matrix \(A_{i}\) from \(Y_{bus}\) using the recipes described in [74]. Specifically, we formulate each OPF problem given the power flow case as follows:
-
Minimize the cost of generation. This is the sum of real-power injection at each generator times $1 per MW.
-
Constrain all bus voltages to be from 95 to 105% of their nominal values.
-
Constrain all load bus real-power and reactive-power values to be from 95 to 105% of their nominal values.
-
Constrain all generator bus real-power and reactive-power values within their power curve. The actual minimum and maximum real and reactive power limits are obtained from the case description.
We use three different algorithms based to solve the resulting semidefinite program: (1) The original clique tree conversion of Fukuda and Nakata et al. [10, 75] in Sect. 3.3; (2) Dualized clique tree conversion in Algorithm 1; (3) Dualized clique tree conversion with auxiliary variables in Algorithm 3. We solved all 40 problems using the three algorithms and MOSEK as the internal interior-point solver. Table 4 shows the accuracy and timing details for the 20 largest problems solved. All three algorithms achieved near-linear time performance, solving each problem instances to 7 digits of accuracy within 6 minutes. Upon closer examination, we see that the two dualized algorithms are both about a factor-of-two faster than the basic CTC method. Figure 4 plots T/k, the mean time taken per-iteration, and k/L, the number of iterations for a factor-of-ten error reduction, and their respective log-log regressions. The data suggests an empirical time complexity of \(T\approx 2.3\times 10^{-4}n^{1.3}L\) over the three algorithms.
9 Conclusion
Clique tree conversion splits a large \(n\times n\) semidefinite variable \(X\succeq 0\) into up to n smaller semidefinite variables \(X_{j}\succeq 0\), coupled by a large number of overlap constraints. These overlap constraints are a fundamental weakness of clique tree conversion, and can cause highly sparse semidefinite program to be solved in as much as cubic time and quadratic memory.
In this paper, we apply dualization to clique tree decomposition. Under a partially separable sparsity assumption, we show that the resulting normal equations have a block-sparsity pattern that coincides with the adjacency matrix of a tree graph, so the per-iteration time and memory complexity of an interior-point method is guaranteed to be linear with respect to n, the order of the matrix variable X. Problems that do not satisfy the separable assumption can be systematically separated by introducing auxiliary variables. In the case of network flow semidefinite programs, the number of auxiliary variables can be bounded, so an interior-point method again has a per-iteration time and memory complexity that is linear with respect to n.
Using these insights, we prove that the MAXCUT and MAX k-CUT relaxations, the Lovasz Theta problem, and the AC optimal power flow relaxation can all be solved with a guaranteed time and memory complexity that is near-linear with respect to n, assuming that a tree decomposition with small width for the sparsity graph is known. Our numerical results confirm an empirical time complexity that is linear with respect to n on the MAX 3-CUT and Lovasz Theta relaxations.
Notes
Throughout this paper, we denote the (i, j)-th element of the matrix X as X[i, j], and the submatrix of X formed by the rows in I and columns in J as X[I, J].
The symmetric matrices \(A_{1},A_{2},\dots ,A_{m}\) share an aggregate sparsity pattern E that contains at most \(\omega n\) nonzero elements (in the lower-triangular part). The set of symmetric matrices with sparsity pattern E is a linear subspace of \({\mathbb {R}}^{n\times n}\), with dimension at most \(\omega n\). Therefore, the number of linearly independent \(A_{1},A_{2},\dots ,A_{m}\) is at most \(\omega n\).
To keep our derivations simple, we perform the rank-1 update using the Sherman–Morrison—Woodbury (SMW) formula. In practice, the product-form Cholesky Factor (PFCF) formula of Goldfarb and Scheinberg [63] is more numerically stable and more widely used [59, 61]. Our complexity results remain valid in either cases because the PFCF is a constant factor of approximately two times slower than the SWM [63].
Since T is connected, we can always find a connected subset \(W'\) satisfying \(W\subseteq W'\subseteq V(T)\) and replace W by \(W'\).
References
Lovász, L.: On the Shannon capacity of a graph. IEEE Trans. Inf. Theory 25(1), 1–7 (1979)
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42(6), 1115–1145 (1995)
Sherali, H.D., Adams, W.P.: A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems. SIAM J. Discrete Math. 3(3), 411–430 (1990)
Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0–1 optimization. SIAM J. Optim. 1(2), 166–190 (1991)
Lasserre, J.B.: An explicit exact SDP relaxation for nonlinear 0-1 programs. In: International Conference on Integer Programming and Combinatorial Optimization, pp. 293–303, Springer, Berlin (2001)
Laurent, M.: A comparison of the Sherali-Adams, Lovász-Schrijver, and Lasserre relaxations for 0–1 programming. Math. Oper. Res. 28(3), 470–496 (2003)
Lasserre, J.B.: Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11(3), 796–817 (2001)
Parrilo, P.A.: Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D. thesis, California Institute of Technology (2000)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, Berlin (2013)
Fukuda, M., Kojima, M., Murota, K., Nakata, K.: Exploiting sparsity in semidefinite programming via matrix completion I: general framework. SIAM J. Optim. 11(3), 647–674 (2001)
Molzahn, D.K., Holzer, J.T., Lesieutre, B.C., DeMarco, C.L.: Implementation of a large-scale optimal power flow solver based on semidefinite programming. IEEE Trans. Power Syst. 28(4), 3987–3998 (2013)
Madani, R., Sojoudi, S., Lavaei, J.: Convex relaxation for optimal power flow problem: Mesh networks. IEEE Trans. Power Syst. 30(1), 199–211 (2015)
Madani, R., Ashraphijuo, M., Lavaei, J.: Promises of conic relaxation for contingency-constrained optimal power flow problem. IEEE Trans. Power Syst. 31(2), 1297–1307 (2016)
Eltved, A., Dahl, J., Andersen, M.S.: On the robustness and scalability of semidefinite relaxation for optimal power flow problems. Optim. Eng. 21, 375–392 (2020). https://doi.org/10.1007/s11081-019-09427-4
Vandenberghe, L., Andersen, M.S., et al.: Chordal graphs and semidefinite optimization. Found. Trends Optim. 1(4), 241–433 (2015)
Sun, Y., Andersen, M.S., Vandenberghe, L.: Decomposition in conic optimization with partially separable structure. SIAM J. Optim. 24(2), 873–897 (2014)
Andersen, M.S., Hansson, A., Vandenberghe, L.: Reduced-complexity semidefinite relaxations of optimal power flow problems. IEEE Trans. Power Syst. 29(4), 1855–1863 (2014)
Löfberg, J.: Dualize it: software for automatic primal and dual conversions of conic programs. Optim. Methods Softw. 24(3), 313–325 (2009)
Griewank, A., Toint, P.L.: Partitioned variable metric updates for large structured optimization problems. Numerische Mathematik 39(1), 119–137 (1982)
Andersen, M., Dahl, J., Vandenberghe, L.: CVXOPT: A Python package for convex optimization. abel.ee.ucla.edu/cvxopt. (2013)
Löfberg, J.: Yalmip: A toolbox for modeling and optimization in matlab. In: Proceedings of the CACSD Conference, 3. Taipei, Taiwan (2004)
Fujisawa, K., Kim, S., Kojima, M., Okamoto, Y., Yamashita, M.: User’s manual for SparseCoLO: Conversion methods for sparse conic-form linear optimization problems. Technical report, Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, (2009). Research Report B-453
Kim, S., Kojima, M., Mevissen, M., Yamashita, M.: Exploiting sparsity in linear and nonlinear matrix inequalities via positive semidefinite matrix completion. Math. Program. 129(1), 33–68 (2011)
Andersen, M.S.: Opfsdr v0.2.3 (2018)
Madani, R., Kalbat, A., Lavaei, J.: ADMM for sparse semidefinite programming with applications to optimal power flow problem. In: IEEE 54th Annual Conference on Decision and Control (CDC), pp. 5932–5939. IEEE (2015)
Zheng, Y., Fantuzzi, G., Papachristodoulou, A., Goulart, P., Wynn, A.: Chordal decomposition in operator-splitting methods for sparse semidefinite programs. Math. Program. 180, 489–532 (2020). https://doi.org/10.1007/s10107-019-01366-3
Annergren, M., Pakazad, S.K., Hansson, A., Wahlberg, B.: A distributed primal-dual interior-point method for loosely coupled problems using ADMM. arXiv preprint arXiv:1406.2192 (2014)
Khoshfetrat Pakazad, S., Hansson, A., Andersen, M.S.: Distributed interior-point method for loosely coupled problems. IFAC Proc. Volumes 47(3), 9587–9592 (2014)
Zhang, R.Y., White, J.K.: Gmres-accelerated ADMM for quadratic objectives. SIAM J. Optim. 28(4), 3025–3056 (2018)
Andersen, M.S., Dahl, J., Vandenberghe, L.: Implementation of nonsymmetric interior-point methods for linear optimization over sparse matrix cones. Math. Program. Comput. 2(3), 167–201 (2010)
Duff, I.S., Reid, J.K.: The multifrontal solution of indefinite sparse symmetric linear. ACM Trans. Math. Softw. (TOMS) 9(3), 302–325 (1983)
Liu, J.W.: The multifrontal method for sparse matrix solution: theory and practice. SIAM Rev. 34(1), 82–109 (1992)
Pakazad, S.Khoshfetrat, Hansson, A., Andersen, M.S., Nielsen, I.: Distributed primal-dual interior-point methods for solving tree-structured coupled convex problems using message-passing. Optim. Methods Softw. 32(3), 401–435 (2017)
Khoshfetrat Pakazad, S., Hansson, A., Andersen, M.S., Rantzer, A.: Distributed semidefinite programming with application to large-scale system analysis. IEEE Trans. Autom. Control 63(4), 1045–1058 (2017)
Alizadeh, F., Haeberly, J.-P.A., Overton, M.L.: Complementarity and nondegeneracy in semidefinite programming. Math. Program. 77(1), 111–128 (1997)
Arnborg, S., Corneil, D.G., Proskurowski, A.: Complexity of finding embeddings in a k-tree. SIAM J. Algebraic Discrete Methods 8(2), 277–284 (1987)
Ye, Y., Todd, M.J., Mizuno, S.: An \(O(\sqrt{nL})\)-iteration homogeneous and self-dual linear programming algorithm. Math. Oper. Res. 19(1), 53–67 (1994)
Wolkowicz, H., Saigal, R., Vandenberghe, L. (eds.): Handbook of Semidefinite Programming: Theory, Algorithms, and Applications. Springer Science & Business Media (2012)
Permenter, F., Friberg, H.A., Andersen, E.D.: Solving conic optimization problems via self-dual embedding and facial reduction: a unified approach. SIAM J. Optim. 27(3), 1257–1282 (2017)
Frieze, A., Jerrum, M.: Improved approximation algorithms for MAX k-CUT and MAX BISECTION. Algorithmica 18(1), 67–81 (1997)
Pataki, G., Stefan S.: The DIMACS library of semidefinite-quadratic-linear programs. Technical Report. Preliminary draft, Computational Optimization Research Center, Columbia University, New York (2002)
Borchers, B.: Sdplib 1.2, a library of semidefinite programming test problems. Optim. Methods Softw. 11(1–4), 683–690 (1999)
Sun, Y.: Decomposition methods for semidefinite optimization. Ph.D. thesis, UCLA (2015)
Liu, J.W.: The role of elimination trees in sparse factorization. SIAM J. Matrix Anal. Appl. 11(1), 134–172 (1990)
Bodlaender, H.L., Gilbert, J.R., Hafsteinsson, H., Kloks, T.: Approximating treewidth, pathwidth, frontsize, and shortest elimination tree. J. Algorithms 18(2), 238–255 (1995)
Leighton, T., Rao, S.: An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms. In: Proceedings of the 29th Annual Symposium on Foundations of Computer Science, pp. 422–431. IEEE (1988)
Klein, P., Stein, C., Tardos, E.: Leighton-Rao might be practical: faster approximation algorithms for concurrent flow with uniform capacities. In: Proceedings of the twenty-second annual ACM symposium on Theory of computing, pp. 310–321. ACM (1990)
George, A., Liu, J.W.: The evolution of the minimum degree ordering algorithm. SIAM Rev. 31(1), 1–19 (1989)
Lipton, R.J., Rose, D.J., Tarjan, R.E.: Generalized nested dissection. SIAM J. Numer. Anal. 16(2), 346–358 (1979)
Rose, D.J., Tarjan, R.E., Lueker, G.S.: Algorithmic aspects of vertex elimination on graphs. SIAM J. Comput. 5(2), 266–283 (1976)
George, A., Liu, J.W.: Computer Solution of Large Sparse Positive Definite Systems. Prentice Hall, Englewood Cliffs, NJ (1981)
Grone, R., Johnson, C.R., Sá, E.M., Wolkowicz, H.: Positive definite completions of partial Hermitian matrices. Linear Algebra Appl. 58, 109–124 (1984)
Dancis, J.: Positive semidefinite completions of partial hermitian matrices. Linear Algebra Appl. 175, 97–114 (1992)
Laurent, M., Varvitsiotis, A.: A new graph parameter related to bounded rank positive semidefinite matrix completions. Math. Program. 145(1–2), 291–325 (2014)
Madani, R., Sojoudi, S., Fazelnia, G., Lavaei, J.: Finding low-rank solutions of sparse linear matrix inequalities using convex optimization. SIAM J. Optim. 27(2), 725–758 (2017)
Jiang, X.: Minimum rank positive semidefinite matrix completion with chordal sparsity pattern. Master’s thesis, UCLA (2017)
Kobayashi, K., Kim, S., Kojima, M.: Correlative sparsity in primal-dual interior-point methods for LP, SDP, and SOCP. Appl. Math. Optim. 58(1), 69–88 (2008)
Parter, S.: The use of linear graphs in Gauss elimination. SIAM Rev. 3(2), 119–130 (1961)
Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11(1–4), 625–653 (1999)
Andersen, E.D.: Handling free variables in primal-dual interior-point methods using a quadratic cone. In: Proceedings of the SIAM Conference on Optimization, Toronto (2002)
Sturm, J.F.: Implementation of interior point methods for mixed semidefinite and second order cone optimization problems. Optim. Methods Softw. 17(6), 1105–1154 (2002)
Andersen, E.D., Roos, C., Terlaky, T.: On implementing a primal-dual interior-point method for conic quadratic optimization. Math. Program. 95(2), 249–277 (2003)
Goldfarb, D., Scheinberg, K.: Product-form Cholesky factorization in interior point methods for second-order cone programming. Math. Program. 103(1), 153–179 (2005)
Guo, J., Niedermeier, R.: Exact algorithms and applications for Tree-like Weighted Set Cover. J. Discrete Algorithms 4(4), 608–622 (2006)
Lewis, J.G., Peyton, B.W., Pothen, A.: A fast algorithm for reordering sparse matrices for parallel factorization. SIAM J. Sci. Stat. Comput. 10(6), 1146–1173 (1989)
Pothen, A., Sun, C.: Compact clique tree data structures in sparse matrix factorizations. In: Coleman, T.F., Li, Y. (eds.) Large-Scale Numerical Optimization, pp. 180–204. SIAM (1990)
Andersen, M.S., Dahl, J., Vandenberghe, L.: Logarithmic barriers for sparse matrix cones. Optim. Methods Softw. 28(3), 396–423 (2013)
George, A., Gilbert, J.R., Liu, J.W.H. (eds.): Graph Theory and Sparse Matrix Computation. Springer Science & Business Media (2012)
Zimmerman, R.D., Murillo-Sánchez, C.E., Thomas, R.J.: MATPOWER: Steady-state operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 26(1), 12–19 (2011)
Josz, C., Fliscounakis, S., Maeght, J., Panciatici, P.: AC power flow data in MATPOWER and QCQP format: iTesla, RTE snapshots, and PEGASE. arXiv preprint arXiv:1603.01533 (2016)
Mittelmann, H.D.: An independent benchmarking of SDP and SOCP solvers. Math. Program. 95(2), 407–430 (2003)
Frenk, H., Roos, K., Terlaky, T., Zhang, S. (eds.): High Performance Optimiztion. Springer Science & Business Media (2013)
Amestoy, P.R., Davis, T.A., Duff, I.S.: Algorithm 837: AMD, an approximate minimum degree ordering algorithm. ACM Trans. Math. Softw. (TOMS) 30(3), 381–388 (2004)
Lavaei, J., Low, S.H.: Zero duality gap in optimal power flow problem. IEEE Trans. Power Syst. 27(1), 92 (2012)
Nakata, K., Fujisawa, K., Fukuda, M., Kojima, M., Murota, K.: Exploiting sparsity in semidefinite programming via matrix completion II: implementation and numerical results. Math. Program. 95(2), 303–327 (2003)
Agler, J., Helton, W., McCullough, S., Rodman, L.: Positive semidefinite matrices with a given sparsity pattern. Linear Algebra Appl. 107, 101–149 (1988)
Vanderbei, R.J.: Linear Programming: Foundations and Extensions. Springer, Berlin (2015)
Nesterov, Y.E., Todd, M.J.: Primal-dual interior-point methods for self-scaled cones. SIAM J. Optim. 8(2), 324–364 (1998)
Sturm, J.F., Zhang, S.: Symmetric primal-dual path-following algorithms for semidefinite programming. Appl. Numer. Math. 29(3), 301–315 (1999)
Sturm, J.F., Zhang, S.: On a wide region of centers and primal-dual interior point algorithms for linear programming. Math. Oper. Res. 22(2), 408–431 (1997)
Todd, M.J., Toh, K.-C., Tütüncü, R.H.: On the nesterov-todd direction in semidefinite programming. SIAM J. Optim. 8(3), 769–796 (1998)
Alizadeh, F., Haeberly, J.-P.A., Overton, M.L.: Primal-dual interior-point methods for semidefinite programming: convergence rates, stability and numerical results. SIAM J. Optim. 8(3), 746–768 (1998)
Acknowledgements
The authors are grateful to Daniel Bienstock, Salar Fattahi, Cédric Josz, and Yi Ouyang for insightful discussions and helpful comments on earlier versions of this manuscript. We thank Frank Permenter for clarifications on various aspects of the homogeneous self-dual embedding for SDPs. Finally, we thank the Associate Editor and Reviewer 2 for meticulous and detailed comments that led to a significantly improved paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the ONR YIP Award, DARPA YFA Award, AFOSR YIP Award, NSF CAREER Award, and ONR N000141712933.
Appendices
A Linear independence and Slater’s conditions
In this section, we prove that (CTC) inherits the assumptions of linear independence and Slater’s conditions from (SDP). We begin with two important technical lemmas.
Lemma 10
The matrix \({\mathbf {N}}\) in (13) has full row rank, that is \(\det ({\mathbf {N}}{\mathbf {N}}^{T})\ne 0\).
Proof
We make \({\mathbf {N}}=[{\mathbf {N}}_{i,j}]_{i,j=1}^{\ell }\) upper-triangular by ordering its blocks topologically on T: each nonempty block row \({\mathbf {N}}_{j}\) contains a nonzero block at \({\mathbf {N}}_{j,j}\) and a nonzero block at \({\mathbf {N}}_{j,p(j)}\) where the parent node \(p(j)>j\) is ordered after j. Then, the claim follows because each diagonal block \({\mathbf {N}}_{j,j}\) implements a surjection and must therefore have full row-rank. \(\square \)
Lemma 11
(Orthogonal complement) Let \(d=\frac{1}{2}\sum _{j=1}^{\ell }|J_{j}|(|J_{j}|+1)\). Implicitly define the \(d\times \frac{1}{2}n(n+1)\) matrix \({\mathbf {P}}\) to satisfy
Then, (i) \({\mathbf {N}}{\mathbf {P}}=0\); (ii) every \(x\in {\mathbb {R}}^{d}\) can be decomposed as \(x={\mathbf {N}}^{T}u+{\mathbf {P}}v\).
Proof
For each \(x=[\mathrm {svec}\,(X_{j})]_{j=1}^{\ell }\in {\mathbb {R}}^{d}\), Theorem 4 says that there exists a Z satisfying \({\mathbf {P}}\,\mathrm {svec}\,(Z)=x\) if and only if \({\mathbf {N}}x=0\). Equivalently, \(x\in {\mathrm {span}}({\mathbf {P}})\) if and only if \(x\perp {\mathrm {span}}({\mathbf {N}}^{T})\). The “only if” part implies (i), while the “if” part implies (ii).
\(\square \)
Define the \(m\times \frac{1}{2}n(n+1)\) matrix \({\mathbf {M}}\) as the vectorization of the linear constraints in (SDP), as in
In reformulating (SDP) into (CTC), the splitting conditions (10) can be rewritten as the following
where \(c=[\mathrm {svec}\,(C_{j})]_{j=1}^{\ell }\) and \({\mathbf {A}}=[\mathrm {svec}\,(A_{i,j})^{T}]_{i,j=1}^{m,\ell }\) are the data for the vectorized verison of (CTC).
Proof of Lemma 1
We will prove that
(\(\implies \)) We must have \(u\ne 0\), because \({\mathbf {N}}\) has full row rank by Lemma 10, and so \({\mathbf {A}}^{T}0+{\mathbf {N}}^{T}v=0\) if and only if \(v=0\). Multiplying by \({\mathbf {P}}\) yields \({\mathbf {P}}^{T}({\mathbf {A}}^{T}u+{\mathbf {N}}^{T}v)={\mathbf {M}}^{T}u+0=0\) and so setting \(y=u\ne 0\) yields \({\mathbf {M}}^{T}y=0\). () We use Lemma 11 to decompose \({\mathbf {A}}^{T}y={\mathbf {P}}z+{\mathbf {N}}^{T}v\). If \(\mathrm {svec}\,(\sum _{i}y_{i}A_{i})={\mathbf {M}}^{T}y={\mathbf {P}}^{T}{\mathbf {A}}^{T}y=0,\) then \({\mathbf {P}}^{T}{\mathbf {P}}z={\mathbf {P}}^{T}{\mathbf {A}}^{T}y-{\mathbf {P}}^{T}{\mathbf {N}}^{T}v=0\) and so \({\mathbf {P}}z=0\). Setting \(u=-y\ne 0\) yields \({\mathbf {A}}^{T}u+{\mathbf {N}}^{T}v=0\). \(\square \)
Proof of Lemma 2
We will prove that
Define the chordal completion F as in (8). Observe that \({\mathbf {M}}\,\mathrm {svec}\,(X)={\mathbf {M}}\,\mathrm {svec}\,(Z)\) holds for all pairs of \(P_{F}(X)=Z\), because each \(A_{i}\in {\mathbb {S}}_{F}^{n}\) satisfies \(A_{i}\bullet X=A_{i}\bullet P_{F}(X)\). Additionally, the positive definite version of Theorem 3 is written
This result was first established by Grone et al. [52]; a succinct proof can be found in [15, Theorem 10.1]. (\(\implies \)) For every x satisfying \({\mathbf {N}}x=0\), there exists Z such that \({\mathbf {P}}\,\mathrm {svec}\,(Z)=x\) due to Lemma 11. If additionally \(x\in \mathrm {Int}({\mathcal {K}})\), then there exists \(X\succ 0\) satisfying \(Z=P_{F}(X)\) due to (39). We can verify that \({\mathbf {M}}\,\mathrm {svec}\,(X)={\mathbf {M}}\,\mathrm {svec}\,(Z)={\mathbf {A}}{\mathbf {P}}\,\mathrm {svec}\,(Z)={\mathbf {A}}x=b\). () For every \(X\succ 0,\) there exists Z satisfying \(Z=P_{F}(X)\) and \({\mathbf {P}}\,\mathrm {svec}\,(Z)\in \mathrm {Int}({\mathcal {K}})\) due to (39). Set \(u={\mathbf {P}}\,\mathrm {svec}\,(Z)\) and observe that \(u\in \mathrm {Int}({\mathcal {K}})\) and \({\mathbf {N}}u={\mathbf {N}}{\mathbf {P}}\,\mathrm {svec}\,(Z)=0\). If additionally \({\mathbf {M}}\,\mathrm {svec}\,(X)=b\), then \({\mathbf {A}}u={\mathbf {A}}{\mathbf {P}}\,\mathrm {svec}\,(Z)={\mathbf {M}}\,\mathrm {svec}\,(Z)=b\). \(\square \)
Proof of Lemma 2
We will prove that
Define the chordal completion F as in (8). Theorem 3 in (39) has a dual theorem
This result readily follows from the positive semidefinite version proved by Alger et al. [76]; see also [15, Theorem 9.2]. (\(\implies \)) For each \(h=c-{\mathbf {A}}^{T}u-{\mathbf {N}}^{T}v\), define \(S=C-\sum _{i}u_{i}A_{i}\) and observe that
If additionally \(h\in {\mathcal {K}}_{*}\), then \(S\succ 0\) due to (40). () For each \(S=C-\sum _{i}y_{i}A_{i}\succ 0\), there exists an \(h\in \mathrm {Int}({\mathcal {K}}_{*})\) satisfying \(\mathrm {svec}\,(S)={\mathbf {P}}^{T}h\) due to (40). We use Lemma 11 to decompose \(h={\mathbf {P}}u+{\mathbf {N}}^{T}v\). Given that \(\mathrm {svec}\,(S)={\mathbf {P}}^{T}h={\mathbf {P}}^{T}{\mathbf {P}}u+0\), we must actually have \({\mathbf {P}}u=c-{\mathbf {A}}^{T}y\) since \({\mathbf {P}}^{T}(c-{\mathbf {A}}^{T}y)=\mathrm {svec}\,(C)-{\mathbf {M}}^{T}y=\mathrm {svec}\,(S)\). Hence \(h=c-{\mathbf {A}}^{T}y+{\mathbf {N}}^{T}v\) and \(h\in \mathrm {Int}({\mathcal {K}}_{*})\). \(\square \)
B Extension to inequality constraints
Consider the modifying the equality constraint in (11) into an inequality constraint, as in
The corresponding dualization reads
where m denotes the number of rows in \({\mathbf {A}}\) and f now denotes the number of rows in \({\mathbf {N}}\). Embedding the equality constraint into a second-order cone, the associated normal equation takes the form
where \({\mathbf {D}}_{s}\) and \({\mathbf {D}}_{f}\) are comparable as before in (15) and (23), and \({\mathbf {D}}_{l}\) is a diagonal matrix with positive diagonal elements. This matrix has the same sparse-plus-rank-1 structure as (22), and can therefore be solved using the same rank-1 update
where \({\mathbf {H}}\) and \({\mathbf {q}}\) now read
The matrix \({\mathbf {H}}\) has the same block sparsity graph as the tree graph T, so we can evoke Lemma 5 to show that the cost of computing \(\varDelta y\) is again \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) memory.
C Interior-point method complexity analysis
We solve the dualized problem (21) by solving its extended homogeneous self-dual embedding
where the data is given in standard form
and the residual vectors are defined
Here, \(\nu \) is the order of the cone \({\mathcal {C}}\), and \({\mathbf {1}}_{{\mathcal {C}}}\) is its identity element
Problem (41) has optimal value \(\theta ^{\star }=0\). Under the primal-dual Slater’s conditions (Assumption 2), an interior-point method is guaranteed to converge to an \(\epsilon \)-accurate solution with \(\tau >0\), and this yields an \(\epsilon \)-feasible and \(\epsilon \)-accurate solution to the dualized problem (21) by rescaling \(x/\tau \) and \(y=y/\tau \) and \(s=s/\tau \). The following result is adapted from [38, Lemma 5.7.2] and [77, Theorem 22.7].
Lemma 12
(\(\epsilon \)-accurate and \(\epsilon \)-feasible) If \((x,y,s,\tau ,\theta ,\kappa )\) satisfies (41b) and (41c) and
for constants \(\epsilon ,\gamma >0\), then the rescaled point \((x/\tau ,y/\tau ,s/\tau )\) satisfies
where K is a constant.
Proof
Note that (41b) implies \(\mu =\theta \) and
Hence, we obtain our desired result by upper-bounding \(1/\tau \). Let \((x^{\star },y^{\star },s^{\star },\tau ^{\star },\theta ^{\star },\kappa ^{\star })\) be a solution of (41), and note that for every \((x,y,s,\tau ,\theta ,\kappa )\) satisfying (41b) and (41c), we have the following via the skew-symmetry of (41b)
Rearranging yields
and hence
If (SDP) satisfies the primal-dual Slater’s condition, then (CTC) also satisfies the primal-dual Slater’s condition (Lemmas 2& 3). Therefore, the vectorized version (11) of (CTC) attains a solution \(({\hat{x}},{\hat{y}},{\hat{s}})\) with \({\hat{x}}^{T}{\hat{s}}=0\), and the following
with \(\theta ^{\star }=\kappa ^{\star }=0\) is a solution to (41). This proves the following upper-bound
Setting \(K=\max \{\Vert r_{p}\Vert K_{\tau },\Vert r_{d}\Vert K_{\tau },K_{\tau }^{2}\}\) yields our desired result. \(\square \)
We solve the homogeneous self-dual embedding (41) using the short-step method of Nesterov and Todd [78, Algorithm 6.1] (and also Sturm and S. Zhang [79, Section 5.1]), noting that SeDuMi reduces to it in the worst case; see [61] and [80]. Beginning at the following strictly feasible, perfectly centered point
with barrier parameter \(\mu =1\), we take the following steps
along the search direction defined by the linear system [81, Eqn. 9]
Here, F is the usual self-concordant barrier function on \({\mathcal {C}}\)
and \(w\in \mathrm {Int}({\mathcal {C}})\) is the unique scaling point satisfying \(\nabla ^{2}F(w)x=s\), which can be computed from x and s in closed-form. The following iteration bound is an immediate consequence of [78, Theorem 6.4]; see also [79, Theorem 5.1].
Lemma 13
(Short-Step Method) The sequence in (44) arrives at an iterate \((x,y,s,\tau ,\theta ,\kappa )\) satisfying the conditions of Lemma 12 with \(\gamma =9/10\) in at most \(O(\sqrt{\nu }\log (1/\epsilon ))\) iterations.
The cost of each interior-point iteration is dominated by the cost of computing the search direction in (45). Using elementary but tedious linear algebra, we can show that if
where \({\mathbf {D}}=\nabla ^{2}F(w)\) and \(d=-s-\mu ^{+}\nabla F(x)\), and
then
where \({\mathbf {D}}_{0}=\kappa /\tau \) and \(d_{0}=-\kappa +\mu ^{+}\tau ^{-1}\). Hence, the cost of computing the search direction is dominated by the cost of solving the normal equation for three different right-hand sides. Here, the normal matrix is written
where \(\sigma =\frac{1}{2}(w_{0}^{2}-w_{1}^{T}w_{1})>0\) and \(\otimes _{s}\) denotes the symmetric Kronecker product [82] implicitly defined to satisfy
Under the hypothesis on \({\mathbf {A}}\) stated in Theorem 5, the normal matrix satisfies the assumptions of Lemma 5, and can therefore be solved in linear O(n) time and memory.
Proof of Theorem 5
Combining Lemmas 12 and 13 shows that the desired \(\epsilon \)-accurate, \(\epsilon \)-feasible iterate is obtained after \(O(\sqrt{\nu }\log (1/\epsilon ))\) interior-point iterations. At each iteration we perform the following steps: (1) compute the scaling point w; (2) solve the normal equation (47a) for three right-hand sides; (3) back-substitute (47b)–(47f) for the search direction and take the step in (44). Note from the proof of Lemma 5 that the matrix \([{\mathbf {A}};{\mathbf {N}}]\) has at most \(O(\omega ^{2}n)\) rows under Assumption 1, and therefore \(\mathrm {nnz}\,({\mathbf {M}})=O(\omega ^{4}n)\) under the hypothesis of Theorem 5. Below, we show that the cost of each step is bounded by \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) memory.
Scaling point We partition \(x=[x_{0};x_{1};\mathrm {svec}\,(X_{1});\ldots ;\mathrm {svec}\,(X_{\ell })]\) and similarly for s. Then, the scaling point w is given in closed-form [61, Section 5]
Noting that \(\mathrm {nnz}\,(w_{1})\le O(\omega ^{2}n)\), \(\ell \le n\) and each \(W_{j}\) is at most \(\omega \times \omega \), the cost of forming \(w=[w_{0};w_{1};\mathrm {svec}\,(W_{1});\ldots ;\mathrm {svec}\,(W_{\ell })]\) is at most \(O(\omega ^{3}n)\) time and \(O(\omega ^{2}n)\) memory. Also, since
the cost of each matrix-vector product with \({\mathbf {D}}\) and \({\mathbf {D}}^{-1}\) is also \(O(\omega ^{3}n)\) time and \(O(\omega ^{2}n)\) memory.
Normal equation The cost of matrix-vector products with \({\mathbf {M}}\) and \({\mathbf {M}}^{T}\) is \(\mathrm {nnz}\,({\mathbf {M}})=O(\omega ^{4}n)\) time and memory. Using Lemma 5, we form the right-hand sides and solve the three normal equations in (47a) in \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) memory.
Back-substitution The cost of back substituting (47b)–(47f) and making the step (44) is dominated by matrix-vector products with \({\mathbf {D}}\), \({\mathbf {D}}^{-1}\), \({\mathbf {M}}\), and \({\mathbf {M}}^{T}\) at \(O(\omega ^{4}n)\) time and memory.\(\square \)
Rights and permissions
About this article
Cite this article
Zhang, R.Y., Lavaei, J. Sparse semidefinite programs with guaranteed near-linear time complexity via dualized clique tree conversion. Math. Program. 188, 351–393 (2021). https://doi.org/10.1007/s10107-020-01516-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-020-01516-y