1 Introduction

Given \(n\times n\) real symmetric matrices \(C,A_{1},\ldots ,A_{m}\) and real scalars \(b_{1},\ldots ,b_{m}\), we consider the standard form semidefinite program

$$\begin{aligned} \begin{aligned}&\text {minimize }&C\bullet X&\\&\text {subject to }&A_{i}\bullet X&=b_{i}\quad \text {for all }i\in \{1,\ldots ,m\} \\&X&\succeq 0, \end{aligned} \end{aligned}$$
(SDP)

over the \(n\times n\) real symmetric matrix variable X. Here, \(A_{i}\bullet X=\mathrm {tr}\,A_{i}X\) refers to the usual matrix inner product, and \(X\succeq 0\) restricts to be symmetric positive semidefinite. Instances of (SDP) arise as some of the best convex relaxations to nonconvex problems like graph optimization [1, 2], integer programming [3,4,5,6], and polynomial optimization [7, 8].

Interior-point methods are the most reliable approach for solving small- and medium-scale instances of (SDP), but become prohibitively time- and memory-intensive for large-scale instances. A fundamental difficulty is the constraint \(X\succeq 0\), which densely couples all \(O(n^{2})\) elements within the matrix variable X to each other. The linear system solved at each iteration, known as the normal equation or the Schur complement equation, is usually fully-dense, irrespective of sparsity in the data matrices \(C,A_{1},\ldots ,A_{m}\). With a naïve implementation, the per-iteration cost of an interior-point method is roughly the same for highly sparse semidefinite programs as it is for fully-dense ones of the same dimensions: at least cubic \((n+m)^{3}\) time and quadratic \((n+m)^{2}\) memory. (See e.g. Nesterov [9, Section 4.3.3] for a derivation.)

Much larger instances of (SDP) can be solved using the clique tree conversion technique of Fukuda et al. [10]. The main idea is to use an interior-point method to solve a reformulation whose matrix variables \(X_{1},\ldots ,X_{n}\succeq 0\) represent principal submatrices of the original matrix variable \(X\succeq 0\), as inFootnote 1

$$\begin{aligned} X_{j}= X[J_{j},J_{j}]\succeq 0\qquad \text { for all }j\in \{1,\ldots ,n\} \end{aligned}$$
(1)

where \(J_{1},J_{2},\dots ,J_{n}\subseteq \{1,2,\dots ,n\}\) denote row/column indices, and to use its solution to recover a solution to the original problem in closed-form. Here, different \(X_{i}\) and \(X_{j}\) interact only through the linear constraints

$$\begin{aligned} A_{i}\bullet X=A_{i,1}\bullet X_{1}+\cdots +A_{i,n}\bullet X_{n}=b_{i}\qquad \text { for all }i\in \{1,\ldots ,m\}, \end{aligned}$$
(2)

and the need for their overlapping elements to agree,

$$\begin{aligned} X_{i}[\alpha ,\beta ]=X_{j}[\alpha ',\beta ']\qquad \text { for all }J_{i}(\alpha )=J_{j}(\alpha '),\quad J_{i}(\beta )=J_{j}(\beta '). \end{aligned}$$
(3)

As a consequence, the normal equation associated with the reformulation is often block sparse—sparse over fully-dense blocks. When the maximum order of the submatrices

$$\begin{aligned} \omega =\max \{|J_{1}|,|J_{2}|,\ldots ,|J_{n}|\} \end{aligned}$$
(4)

is significantly smaller than n, the number of linearly independent constraints is boundedFootnote 2\(m\le \omega n\), and the per-iteration cost of an interior-point method scales as low as linearly with respect to \(n+m\). This is a remarkable speed-up over a direct interior-point solution of (SDP), particularly in view of the fact that the original matrix variable \(X\succeq 0\) already contains more than \(n^{2}/2\) degrees of freedom on its own.

In practice, clique tree conversion has successfully solved large-scale instances of (SDP) with n as large as tens of thousands [11,12,13,14]. Where applicable, the empirical time complexity is often as low as linear \(O(n+m)\). However, this speed-up is not guaranteed, not even on highly sparse instances of (SDP). We give an example in Sect. 4 whose data matrices \(A_{1},\ldots ,A_{m}\) each contains just a single nonzero element, and show that it nevertheless requires at least \((n+m)^{3}\) time and \((n+m)^{2}\) memory to solve using clique tree conversion.

The core issue, and indeed the main weakness of clique tree conversion, is the overlap constraints (3), which are imposed in addition to the constraints (2) already present in the original problem [15, Section 14.2]. These overlap constraints can significantly increase the size of the normal equation solved at each interior-point iteration, thereby offsetting the benefits of increased sparsity [16]. In fact, they may contribute more nonzeros to the normal matrix of the converted problem than contained in the fully-dense normal matrix of the original problem. In [17], omitting some of the overlap constraints made the converted problem easier to solve, but at the cost of also making the reformulation from (SDP) inexact.

1.1 Contributions

In this paper, we show that it is possible to fully address the density of the overlap constraints using the dualization technique of Löfberg [18]. By dualizing the reformulation generated via clique tree conversion, the overlap constraints are guaranteed to contribute \(O(\omega ^{4}n)\) nonzero elements to the normal matrix. Moreover, these nonzero elements appear with a block sparsity pattern that coincides with the adjacency matrix of a tree. Under suitable assumptions on the original constraints (2), this favorable block sparsity pattern allows us to guarantee an interior-point method per-iteration cost of \(O(\omega ^{6}n)\) time and memory, by using a specific fill-reducing permutation in computing the Cholesky factor of the normal matrix. After \(O(\sqrt{\omega n}\log (1/\epsilon ))\) iterations, we arrive at an \(\epsilon \)-accurate solution of (SDP) in near-linear \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time.

Our first main result guarantees these complexity figures for a class of semidefinite programs that we call partially separable semidefinite programs. Our notion is an extension of the partially separable cones introduced by Sun, Andersen, and Vandenberghe [16], based in turn on the notion of partial separability due to Griewank and Toint [19]. We show that if an instance of (SDP) is partially separable, then an optimally sparse clique tree conversion reformulation can be constructed in \(O(\omega ^{3}n)\) time, and then solved using an interior-point method to \(\epsilon \)-accuracy in \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time. Afterwards, a corresponding \(\epsilon \)-accurate solution to (SDP) is recovered in \(O(\omega ^{3}n)\) time, for a complete end-to-end cost of \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time.

Semidefinite programs that are not partially separable can be systematically “separated” by introducing auxiliary variables, at the cost of increasing the number of variables that must be optimized. For a class of semidefinite programs that we call network flow semidefinite programs, the number of auxiliary variables can be bounded in closed-form. This insight allows us to prove our second main result, which guarantees the near-linear time figure for network flow semidefinite programs on graphs with small degrees and treewidth.

1.2 Comparisons to prior work

At the time of writing, clique tree conversion is primarily used as a preprocessor for an off-the-shelf interior-point method, like SeDuMi and MOSEK. It is often implemented using a parser like CVX [20] and YALMIP [21] that converts mathematical expressions into a compatible data format for the solver, but this process is very slow, and usually destroys the inherent structure in the problem. Solver-specific implementations of clique tree conversion like SparseColo [22, 23] and OPFSDR [24] are much faster while also preserving the structure of the problem for the solver. Nevertheless, the off-the-shelf solver is itself structure-agnostic, so an improved complexity figure cannot be guaranteed.

In the existing literature, solvers designed specifically for clique tree conversion are generally first-order methods [16, 25, 26]. While their per-iteration cost is often linear time and memory, they require up to \(O(1/\epsilon )\) iterations to achieve \(\epsilon \)-accuracy, which is exponentially worse than the \(O(\log (1/\epsilon ))\) figure of interior-point methods. It is possible to incorporate a first-order method within an outer interior-point iteration [27,28,29], but this does not improve upon the \(O(1/\epsilon )\) iteration bound, because the first-order method solves an increasingly ill-conditioned subproblem, with condition number that scales \(O(1/\epsilon ^{2})\) for \(\epsilon \)-accuracy.

Andersen, Dahl, and Vandenberghe [30] describe an interior-point method that exploits the same chordal sparsity structure that underlies clique tree conversion, with a per-iteration cost of \(O(\omega ^{3}nm+\omega m^{2}n+m^{3})\) time. The algorithm solves instances of (SDP) with a small number of constraints \(m=O(1)\) in near-linear \(O(\omega ^{3}n^{1.5}\log (1/\epsilon ))\) time. However, substituting \(m\le \omega n\) yields a general time complexity figure of \(O(\omega ^{3}n^{3.5}\log (1/\epsilon ))\), which is comparable to the cubic time complexity of a direct interior-point solution of (SDP).

In this paper, we show that off-the-shelf interior-point methods can be modified to exploit the structure of clique tree conversion, by forcing a specific choice of fill-reducing permutation in factorizing the normal equation. For partially separable semidefinite programs, the resulting modified solver achieves a guaranteed per-iteration cost of \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) memory on the dualized version of the clique tree conversion.

Our complexity guarantees are independent of the actual algorithm used to factorize the normal equation. Most off-the-shelf interior-point methods use a standard implementation of the multifrontal method [31, 32], but further efficiency can be gained by adopting a parallel and/or distributed implementation. For example, the interior-point method of Khoshfetrat Pakazad et al. [33, 34] factorizes the normal equation using a message passing algorithm, which can be understood as a distributed implementation of the multifrontal method. Of course, distributed algorithms are most efficient when the workload is evenly distributed, and when communication is minimized. It remains an important future work to understand these issues in the context of the sparsity patterns analyzed within this paper.

Finally, a reviewer noted that if the original problem (SDP) has a low-rank solution, then the interior-point method iterates approach a low-dimensional face of the semidefinite cone, which could present conditioning issues. In contrast, the clique tree conversion might expect solutions strictly in the interior of the semidefinite cone, which may be better conditioned. It remains an interesting future direction to understand the relationship in complementarity, uniqueness, and conditioning [35] between (SDP) and its clique tree conversion.

2 Main results

2.1 Assumptions

To guarantee an exact reformulation, clique tree conversion chooses the index sets \(J_{1},\ldots ,J_{\ell }\) in (1) as the bags of a tree decomposition for the sparsity graph of the data matrices \(C,A_{1},\ldots ,A_{m}\). Accordingly, the parameter \(\omega \) in (4) can only be small if the sparsity graph has a small treewidth. Below, we define a graph G by its vertex set \(V(G)\subseteq \{1,2,\ldots ,n\}\) and its edge set \(E(G)\subseteq V(G)\times V(G)\).

Definition 1

(Sparsity graph) The \(n\times n\) matrix M (resp. the set of \(n\times n\) matrices \(\{M_{1},\ldots ,M_{m}\}\)) is said to have sparsity graph G if G is an undirected simple graph on n vertices \(V(G)=\{1,\ldots ,n\}\) and that \((i,j)\in E(G)\) if \(M[i,j]\ne 0\) (resp. if there exists \(M\in \{M_{1},\ldots ,M_{m}\}\) such that \(M[i,j]\ne 0\)).

Definition 2

(Tree decomposition) A tree decomposition \({\mathcal {T}}\) of a graph G is a pair \(({\mathcal {J}},T)\), where each bag of vertices \(J_{j}\in {\mathcal {J}}\) is a subset of V(G), and T is a tree on \(|{\mathcal {J}}|\le n\) vertices, such that:

  1. 1.

    (Vertex cover) For every \(v\in V(G)\), there exists \(J_{k}\in {\mathcal {J}}\) such that \(v\in J_{k}\);

  2. 2.

    (Edge cover) For every \((u,v)\in E(G)\), there exists \(J_{k}\in {\mathcal {J}}\) such that \(u\in J_{k}\) and \(v\in J_{k}\); and

  3. 3.

    (Running intersection) If \(v\in J_{i}\) and \(v\in J_{j}\), then we also have \(v\in J_{k}\) for every k that lies on the path from i to j in the tree T.

The width \(\mathrm {wid}({\mathcal {T}})\) of the tree decomposition \({\mathcal {T}}=({\mathcal {J}},T)\) is the size of its largest bag minus one, as in \(\max \{|J_{k}|:J_{k}\in {\mathcal {J}}\}-1.\) The treewidth \(\mathrm {tw}(G)\) of the graph G is the minimum width amongst all tree decompositions \({\mathcal {T}}\).

Throughout this paper, we make the implicit assumption that a tree decomposition with small width is known a priori for the sparsity graph. In practice, such a tree decomposition can usually be found using fill-reducing heuristics for sparse linear algebra; see Sect. 3.

We also make two explicit assumptions, which are standard in the literature on interior-point methods.

Assumption 1

(Linear independence) We have \(\sum _{i=1}^{m}y_{i}A_{i}=0\) if and only if \(y=0\).

The assumption is without loss of generality, because it can either be enforced by eliminating \(A_{i}\bullet X=b_{i}\) for select i, or else these constraints are not consistent for all i. Under Assumption 1, the total number of constraints is bounded \(m\le \omega n\) (due to the fact that \(|E(G)|\le n\cdot \mathrm {tw}(G)\) [36]).

Assumption 2

(Primal-dual Slater’s condition) There exist \(X\succ 0,\) y,  and \(S\succ 0\) satisfying \(A_{i}\bullet X=b_{i}\) for all \(i\in \{1,\ldots ,m\}\) and \(\sum _{i=1}^{m}y_{i}A_{i}+S=C\).

In fact, our proofs solve the homogeneous self-dual embedding [37], so our conclusions can be extended with few modifications to a much larger array of problems that mostly do not satisfy Assumption 2; see de Klerk et al. [38] and Permenter et al. [39]. Nevertheless, we adopt Assumption 2 to simplify our discussions, by focusing our attention towards the computational aspects of the interior-point method, and away from the theoretical intricacies of the self-dual embedding.

2.2 Partially separable

We define the class of partially separable semidefinite program based on the partially separable cones introduced by Sun, Andersen, and Vandenberghe [16]. The general concept of partial separability is due to Griewank and Toint [19].

Definition 3

(Partially separable) Let \({\mathcal {T}}=({\mathcal {J}},T)\) be a tree decomposition for the sparsity graph of \(C,A_{1},\ldots ,A_{m}\). The matrix \(A_{i}\) is said to be partially separable on \({\mathcal {T}}\) if there exist \(J_{j}\in {\mathcal {J}}\) and some choice of \(A_{i,j}\) such that

$$\begin{aligned} A_{i}\bullet X=A_{i,j}\bullet X[J_{j},J_{j}] \end{aligned}$$

for all \(n\times n\) matrices X. We say that (SDP) is partially separable on \({\mathcal {T}}\) if every constraint matrix \(A_{1},\ldots ,A_{m}\) is partially separable on \({\mathcal {T}}\).

Due to the edge cover property of the tree decomposition, any \(A_{i}\) that indexes a single element of X (can be written as \(A_{i}\bullet X=X[j,k]\) for suitable jk) is automatically partially separable on any valid tree decomposition \({\mathcal {T}}\). In this way, many of the classic semidefinite relaxations for NP-hard combinatorial optimization problems can be shown as partially separable.

Example 1

(MAXCUT and MAX k-CUT) Let C be the (weighted) Laplacian matrix for a graph G with n vertices. Frieze and Jerrum [40] proposed a randomized algorithm to solve MAX k-CUT with an approximation ratio of \(1-1/k\) based on solving

$$\begin{aligned} \begin{aligned}&\text {maximize }\quad&\frac{k-1}{2k}C\bullet X&\\&\text {subject to}&X[i,i]&=1\quad&\text {for all }&i\in \{1,\ldots ,n\}\\&X[i,j]&\ge \frac{-1}{k-1}\quad&\text {for all }&(i,j)\in E(G)\\&X&\succeq 0. \end{aligned} \end{aligned}$$
(MkC)

The classic Goemans–Williamson 0.878 algorithm [2] for MAXCUT is recovered by setting \(k=2\) and removing the redundant constraint \(X[i,j]\ge -1\). In both the MAXCUT relaxation and the MAX k-CUT relaxation, observe that each constraint affects a single matrix element in X, so the problem is partially separable on any tree decomposition. \(\square \)

Example 2

(Lovasz Theta) The Lovasz number \(\vartheta (G)\) of a graph G [1] is the optimal value to the following dual semidefinite program

$$\begin{aligned} \begin{aligned}&\text {minimize }\quad&\lambda \\&\text {subject to}&{\mathbf {1}}{\mathbf {1}}^{T}-\sum _{(i,j)\in E}y_{i,j}(e_{i}e_{j}^{T}+e_{j}e_{i}^{T})\preceq \lambda I \end{aligned} \end{aligned}$$
(LT)

over \(\lambda \in {\mathbb {R}}\) and \(y_{i,j}\in {\mathbb {R}}\) for \((i,j)\in E(G)\). Here, \(e_{j}\) is the j-th column of the \(n\times n\) identity matrix and \({\mathbf {1}}\) is the length-n vector-of-ones. Problem (LT) is not partially separable. However, given that \(\vartheta (G)\ge 1\) holds for all graphs G, we may divide the linear matrix inequality through by \(\lambda \), redefine \(y\leftarrow y/\lambda \), apply the Schur complement lemma, and take the Lagrangian dual to yield a sparse formulation

$$\begin{aligned} \begin{aligned}&\text {minimize }\quad&\begin{bmatrix}I &{} {\mathbf {1}}\\ {\mathbf {1}}^{T} &{} 0 \end{bmatrix}\bullet X&\\&\text {subject to}&X[i,j]&=0\quad \text {for all }(i,j)\in E \\&X[n+1,n+1]&=1\\&X&\succeq 0. \end{aligned} \end{aligned}$$
($\hbox {LT}'$)

Each constraint affects a single matrix element in X, so (\(\hbox {LT}'\)) is again partially separable on any tree decomposition. \(\square \)

We remark that instances of the MAXCUT, MAX k-CUT, and Lovasz Theta problems constitute a significant part of the DIMACS [41] and the SDPLIB [42] test libraries. In Sect. 5, we prove that partially separable semidefinite programs like these admit a clique tree conversion reformulation that can be dualized and then solved using an interior-point method in \(O(n^{1.5}\log (1/\epsilon ))\) time, under the assumption that the parameter \(\omega \) in (4) is significantly smaller than n. Moreover, we prove in Sect. 6 that this reformulation can be efficiently found using an algorithm based on the running intersection property of the tree decomposition. Combining these results with an efficient low-rank matrix completion algorithm [43, Algorithm 2] yields the following.

Theorem 1

Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for the sparsity graph of \(C,A_{1},\ldots ,A_{m}\). If (SDP) is partially separable on \({\mathcal {T}}\), then under Assumptions 12, there exists an algorithm that computes \(U\in {\mathbb {R}}^{n\times \omega }\), \(y\in {\mathbb {R}}^{m}\), and \(S\succeq 0\) satisfying

$$\begin{aligned} \sqrt{\sum _{i=1}^{m}|A_{i}\bullet UU^{T}-b_{i}|^{2}}\le \epsilon ,\quad \left\| \sum _{i=1}^{m}y_{i}A_{i}+S-C\right\| _{F}\le \epsilon ,\quad \frac{UU^{T}\bullet S}{n}\le \epsilon \end{aligned}$$

in \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time and \(O(\omega ^{4}n)\) space, where \(\omega =\max _{j}|J_{j}|=1+\mathrm {wid}({\mathcal {T}})\) and \(\Vert M\Vert _{F}=\sqrt{M\bullet M}\) denotes the Frobenius norm.

The proof of Theorem 1 is given at the end of Sect. 6.

2.3 Network flow

Problems that are not partially separable can be systematically separated by introducing auxiliary variables. The complexity of solving the resulting problem then becomes parameterized by the number of additional auxiliary variables. In a class of graph-based semidefinite programs that we call network flow semidefinite programs, the number of auxiliary variables can be bounded using properties of the tree decomposition.

Definition 4

(Network flow) Given a graph \(G=(V,E)\) on n vertices \(V=\{1,\ldots ,n\}\), we say that the linear constraint \(A\bullet X=b\) is a network flow constraint (at vertex k) if the \(n\times n\) constraint matrix A can be rewritten

$$\begin{aligned} A=\alpha _{k}e_{k}e_{k}^{T}+\frac{1}{2}\sum _{(j,k)\in E}\alpha _{j}(e_{j}e_{k}^{T}+e_{k}e_{j}^{T}), \end{aligned}$$

in which \(e_{k}\) is the k-th column of the identity matrix and \(\{\alpha _{j}\}\) are scalars. We say that an instance of (SDP) is a network flow semidefinite program if every constraint matrix \(A_{1},\ldots ,A_{m}\) is a network flow constraint, and G is the sparsity graph for the objective matrix C.

Such problems frequently arise on physical networks subject to Kirchhoff’s conservation laws, such as electrical circuits and hydraulic networks.

Example 3

(Optimal power flow) The AC optimal power flow (ACOPF) problem is a nonlinear, nonconvex optimization that plays a vital role in the operations of an electric power system. Let G be a graph representation of the power system. Then, ACOPF has a well-known semidefinite relaxation

$$\begin{aligned} \begin{aligned} \text {minimize }\quad&\sum _{i\in W}(f_{i,i}X[i,i]+\sum _{(i,j)\in E(G)}{\mathrm {Re}}\{f_{i,j}X[i,j]\})&\end{aligned} \end{aligned}$$
(OPF)

over a Hermitian matrix variable X, subject to

$$\begin{aligned} a_{i,i}X[i,i]+\sum _{(i,j)\in E(G)}{\mathrm {Re}}\{a_{i,j}X[i,j]\}&\le b_{i}\quad \text { for all }i\in V(G)\\ {\mathrm {Re}}\{c_{i,j}X[i,j]\}&\le d_{i,j}\quad \text { for all }(i,j)\in E(G)\\ X&\succeq 0. \end{aligned}$$

Here, each \(a_{i,j}\) and \(c_{i,j}\) is a complex vector, each \(b_{i}\) and \(d_{i,j}\) is a real vector, and \(W\subseteq V(G)\) is a subset of vertices. If a rank-1 solution \(X^{\star }\) is found, then the relaxation (OPF) is exact, and a globally-optimal solution to the original NP-hard problem can be extracted. Clearly, each constraint in (OPF) is a network flow constraint, so the overall problem is also a network flow semidefinite program. \(\square \)

In Sect. 7, we prove that network flow semidefinite programs can be reformulated in closed-form, dualized, and efficiently solved using an interior-point method.

Theorem 2

Let (SDP) be a network flow semidefinite program on a graph G on n vertices, and let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for G. Then, under Assumptions 12, there exists an algorithm that computes \(U\in {\mathbb {R}}^{n\times \omega }\), \(y\in {\mathbb {R}}^{m}\), and \(S\succeq 0\) satisfying

$$\begin{aligned} \sqrt{\sum _{i=1}^{m}|A_{i}\bullet UU^{T}-b_{i}|^{2}}\le \epsilon ,\quad \left\| \sum _{i=1}^{m}y_{i}A_{i}+S-C\right\| _{F}\le \epsilon ,\quad \frac{UU^{T}\bullet S}{n}\le \epsilon \end{aligned}$$

in

$$\begin{aligned} O((\omega +d_{\max }m_{k})^{3.5}\cdot \omega ^{3.5}\cdot n^{1.5}\cdot \log (1/\epsilon ))\text { time }\\ \text {and }O((\omega +d_{\max }m_{k})^{2}\cdot \omega ^{2}\cdot n)\text { memory} \end{aligned}$$

where:

  • \(\omega =\max _{j}|J_{j}|=1+\mathrm {wid}({\mathcal {T}})\),

  • \(d_{\max }\) is the maximum degree of the tree T,

  • \(m_{k}\) is the maximum number of network flow constraints at any vertex \(k\in V(G)\).

The proof of Theorem 1 is given at the end of Sect. 7.

3 Preliminaries

3.1 Notation

The sets \({\mathbb {R}}^{n}\) and \({\mathbb {R}}^{m\times n}\) are the length-n real vectors and \(m\times n\) real matrices. We use “MATLAB notation” in concatenating vectors and matrices:

$$\begin{aligned}{}[a,b]=\begin{bmatrix}a&b\end{bmatrix},\qquad [a;b]=\begin{bmatrix}a\\ b \end{bmatrix},\qquad \mathrm {diag}(a,b)=\begin{bmatrix}a &{} 0\\ 0 &{} b \end{bmatrix}, \end{aligned}$$

and the following short-hand to construct them:

$$\begin{aligned}{}[x_{i}]_{i=1}^{n}=\begin{bmatrix}x_{1}\\ \vdots \\ x_{n} \end{bmatrix},\qquad [x_{i,j}]_{i,j=1}^{m,n}=\begin{bmatrix}x_{1,1} &{} \cdots &{} x_{1,n}\\ \vdots &{} \ddots &{} \vdots \\ x_{m,1} &{} \cdots &{} x_{m,n} \end{bmatrix}. \end{aligned}$$

The notation X[ij] refers to the element of X in the i-th row and j-th column, and X[IJ] refers to the submatrix of X formed from the rows in \(I\subseteq \{1,\ldots ,m\}\) and columns in \(J\subseteq \{1,\ldots ,n\}\). The Frobenius inner product is \(X\bullet Y=\mathrm {tr}\,(X^{T}Y)\), and the Frobenius norm is \(\Vert X\Vert _{F}=\sqrt{X\bullet X}\). We use \(\mathrm {nnz}\,(X)\) to denote the number of nonzero elements in X.

The sets \({\mathbb {S}}^{n}\subseteq {\mathbb {R}}^{n\times n},\) \({\mathbb {S}}_{+}^{n}\subset {\mathbb {S}}^{n},\) and \({\mathbb {S}}_{++}^{n}\subset {\mathbb {S}}_{+}^{n}\) are the \(n\times n\) real symmetric matrices, positive semidefinite matrices, and positive definite matrices, respective. We write \(X\succeq Y\) to mean \(X-Y\in {\mathbb {S}}_{+}^{n}\) and \(X\succ Y\) to mean \(X-Y\in {\mathbb {S}}_{++}^{n}\). The (symmetric) vectorization

$$\begin{aligned} \mathrm {svec}\,(X)=[X[1,1];\sqrt{2}X[2,1];\ldots ;\sqrt{2}X[m,1];X[2,2],\ldots ] \end{aligned}$$

outputs the lower-triangular part of a symmetric matrix as a vector, with factors of \(\sqrt{2}\) added so that \(\mathrm {svec}\,(X)^{T}\mathrm {svec}\,(Y)=X\bullet Y\).

A graph G is defined by its vertex set \(V(G)\subseteq \{1,2,3,\ldots \}\) and its edge set \(E(G)\subseteq V(G)\times V(G)\). The graph T is a tree if it is connected and does not contain any cycles; we refer to its vertices V(T) as its nodes. Designating a special node \(r\in V(T)\) as the root of the tree allows us to define the parent p(v) of each node \(v\ne r\) as the first node encountered on the path from v to r, and \(p(r)=r\) for consistency. The set of children is defined \(\mathrm {ch}(v)=\{u\in V(T)\backslash v:p(u)=v\}\). Note that the edges E(T) are fully determined by the parent pointer p as \(\{v,p(v)\}\) for all \(v\ne r\).

The set \({\mathbb {S}}_{G}^{n}\subseteq {\mathbb {S}}^{n}\) is the set of \(n\times n\) real symmetric matrices with sparsity graph G. We denote \(P_{G}(X)=\min _{Y\in {\mathbb {S}}_{G}^{n}}\Vert X-Y\Vert _{F}\) as the Euclidean projection of \(X\in {\mathbb {S}}^{n}\) onto \({\mathbb {S}}_{G}^{n}\).

3.2 Tree decomposition via the elimination tree

The standard procedure for solving \(Sx=b\) with \(S\succ 0\) consists of a factorization step, where S is decomposed into the unique Cholesky factor L satisfying

$$\begin{aligned} LL^{T}=S,\quad L\text { is lower-triangular},\quad L_{i,i}>0\quad \text {for all }i, \end{aligned}$$
(5)

and a substitution step, where the two triangular systems \(Lu=r\) and \(L^{T}x=u\) are back-substituted to yield x.

In the case that S is sparse, the location of nonzero elements in L encodes a tree decomposition for the sparsity graph of S known as the elimination tree [44]. Specifically, define the index sets \(J_{1},\ldots ,J_{n}\subseteq \{1,\ldots ,n\}\) as in

$$\begin{aligned} J_{j}=\{i\in \{1,\ldots ,n\}:L[i,j]\ne 0\}, \end{aligned}$$
(6)

and the tree T via the parent pointers

$$\begin{aligned} p(j)={\left\{ \begin{array}{ll} \hbox {min}_{i}\{i>j:L[i,j]\ne 0\} &{} |J_{j}|>1,\\ j &{} |J_{j}|=1. \end{array}\right. } \end{aligned}$$
(7)

Then, ignoring perfect numerical cancellation, \({\mathcal {T}}=(\{J_{1},\ldots ,J_{n}\},T)\) is a tree decomposition for the sparsity graph of S.

Elimination trees with reduced widths can be obtained by reordering the rows and columns of S using a fill-reducing permutation \(\varPi \), because the sparsity graph of \(\varPi S\varPi ^{T}\) is just the sparsity graph of S with its vertices reordered. The minimum width of an elimination tree over all permutations \(\varPi \) is precisely the treewidth of the sparsity graph of S; see Bodlaender et al. [45] and the references therein. The general problem is well-known to be NP-complete in general [36], but polynomial-time approximation algorithms exist to solve the problem to a logarithmic factor [45,46,47]. In practice, heuristics like the minimum degree [48] and nested dissection [49] are considerably faster while still producing high-quality choices of \(\varPi \).

Note that the sparsity pattern of L is completely determined by the sparsity pattern of S, and not by its numerical value. The former can be computed from the latter using a symbolic Cholesky factorization algorithm, a standard routine in most sparse linear algebra libraries, in time linear to the number of nonzeros in L; see [50, Section 5] and [51, Theorem 5.4.4], and also the discussion in [49].

3.3 Clique tree conversion

Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition with small width for the sparsity graph G of the data matrices \(C,A_{1},\ldots ,A_{m}\). We define the graph \(F\supseteq G\) by taking each index set \(J_{j}\) of \({\mathcal {T}}\) and interconnecting all pairs of vertices \(u,v\in J_{j}\), as in

$$\begin{aligned} V(F)&=V(G),&E(F)&=\bigcup _{j=1}^{\ell }\{(u,v):u,v\in J_{j}\}. \end{aligned}$$
(8)

The following fundamental result was first established by Grone et al. [52]. Constructive proofs allow us to recover all elements in \(X\succeq 0\) from only the elements in \(P_{F}(X)\) using a closed-form formula.

Theorem 3

(Grone et al. [52]) Given \(Z\in {\mathbb {S}}_{F}^{n}\), there exists an \(X\succeq 0\) satisfying \(P_{F}(X)=Z\) if and only if \(Z[J_{j},J_{j}]\succeq 0\) for all \(j\in \{1,2,\ldots ,\ell \}\).

We can use Theorem 3 to reformulate (SDP) into a reduced-complexity form. The key is to view (SDP) as an optimization over \(P_{F}(X)\), since

$$\begin{aligned} C\bullet X=\sum _{i,j=1}^{n}C_{i,j}X_{i,j}=\sum _{(i,j)\in F}C_{i,j}X_{i,j}=C\bullet P_{F}(X), \end{aligned}$$

and similarly \(A_{i}\bullet X=A_{i}\bullet P_{F}(X)\). Theorem 3 allows us to account for \(X\succeq 0\) implicitly, by optimizing over \(Z=P_{F}(X)\) in the following

$$\begin{aligned} \begin{aligned}&\text {minimize }\quad&C\bullet Z\\&\text {subject to }&A_{i}\bullet Z&=b_{i}&\qquad \text {for all }&i\in \{1,\ldots ,m\}, \\&Z[J_{j},J_{j}]&\succeq 0&\text {for all }&j\in \{1,\ldots ,\ell \}. \end{aligned} \end{aligned}$$
(9)

Next, we split the principal submatrices into distinct matrix variables, coupled by the need for their overlapping elements to agree. Define the overlap operator \({\mathcal {N}}_{i,j}(\cdot )\) to output the overlapping elements of two principal submatrices given the latter as input:

$$\begin{aligned} {\mathcal {N}}_{i,j}(X[J_{j},J_{j}])=X[J_{i}\cap J_{j},\;J_{i}\cap J_{j}]={\mathcal {N}}_{j,i}(X[J_{i},J_{i}]). \end{aligned}$$

The running intersection property of the tree decomposition allows us to enforce this agreement using \(\ell -1\) pairwise block comparisons.

Theorem 4

(Fukuda et al. [10]) Given \(X_{1},X_{2},\ldots ,X_{\ell }\) for \(X_{j}\in {\mathbb {S}}^{|J_{j}|}\), there exists Z satisfying \(Z[J_{j},J_{j}]=X_{j}\) for all \(j\in \{1,2,\ldots ,\ell \}\) if and only if \({\mathcal {N}}_{i,j}(X_{j})={\mathcal {N}}_{j,i}(X_{i})\) for all \((i,j)\in E(T)\).

Splitting the objective C and constraint matrices \(A_{1},\ldots ,A_{m}\) into \(C_{1},\ldots ,C_{\ell }\) and \(A_{1,1},\ldots ,A_{m,\ell }\) to satisfy

$$\begin{aligned} C_{1}\bullet X[J_{1},J_{1}]+C_{2}\bullet X[J_{2},J_{2}]+\cdots +C_{\ell }\bullet X[J_{\ell },J_{\ell }]&=C\bullet X,\nonumber \\ A_{i,1}\bullet X[J_{1},J_{1}]+A_{i,2}\bullet X[J_{2},J_{2}]+\cdots +A_{i,\ell }\bullet X[J_{\ell },J_{\ell }]&=A_{i}\bullet X, \end{aligned}$$
(10)

and applying Theorem 4 yields the following

$$\begin{aligned} \begin{aligned}&\text {minimize }\quad&\sum _{j=1}^{\ell }C_{j}\bullet X_{j}&&\\&\text {subject to }&\sum _{j=1}^{\ell }A_{i,j}\bullet X_{j}&=b_{i}&\text {for all }&i\in \{1,\ldots ,m\}, \\&{\mathcal {N}}_{i,j}(X_{j})&={\mathcal {N}}_{j,i}(X_{i})&\qquad \text {for all }&(i,j)\in E(T), \\&X_{j}&\succeq 0&\text {for all }&j\in \{1,\ldots ,\ell \}, \end{aligned} \end{aligned}$$
(CTC)

which vectorizes into a linear conic program in standard form

$$\begin{aligned} \begin{aligned}&\text {minimize }&c^{T}x,&\qquad \qquad&\text {maximize }&\begin{bmatrix}b\\ 0 \end{bmatrix}^{T}y,\\&\text {subject to }&\begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}x&=\begin{bmatrix}b\\ 0 \end{bmatrix},&\text {subject to }&\begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}^{T}y+s&=c, \\&x&\in {\mathcal {K}},&&s&\in {\mathcal {K}}_{*} \end{aligned} \end{aligned}$$
(11)

over the Cartesian product of \(\ell \le n\) smaller semidefinite cones

$$\begin{aligned} {\mathcal {K}}={\mathcal {K}}_{*}={\mathbb {S}}_{+}^{|J_{1}|}\times {\mathbb {S}}_{+}^{|J_{2}|} \times \cdots \times {\mathbb {S}}_{+}^{|J_{\ell }|}. \end{aligned}$$
(12)

Here, \({\mathbf {A}}=[\mathrm {svec}\,(A_{i,j})^{T}]_{i,j=1}^{m,\ell }\) and \(c=[\mathrm {svec}\,(C_{j})]_{j=1}^{\ell }\) correspond to (10), and the overlap constraints matrix \({\mathbf {N}}=[{\mathbf {N}}_{i,j}]_{i,j=1}^{\ell ,\ell }\) is implicitly defined by the relation

$$\begin{aligned} {\mathbf {N}}_{i,j}\mathrm {svec}\,(X_{j})={\left\{ \begin{array}{ll} +\mathrm {svec}\,({\mathcal {N}}_{p(i),i}(X_{i})) &{} j=i,\\ -\mathrm {svec}\,({\mathcal {N}}_{i,p(i)}(X_{p(i)})) &{} j=p(i),\\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(13)

for every non-root node i on T. (To avoid all-zero rows in \({\mathbf {N}}\), we define \({\mathbf {N}}_{i,j}\,\mathrm {svec}\,(X_{j})\) as the empty length-zero vector \({\mathbb {R}}^{0}\) if i is the root node.)

The converted problem (CTC) inherits the standard regularity assumptions from (SDP). Accordingly, an interior-point method is well-behaved in solving (11). (Proofs for the following statements are deferred to “Appendix A”.)

Lemma 1

(Linear independence) There exists \([u;v]\ne 0\) such that \({\mathbf {A}}^{T}u+{\mathbf {N}}^{T}v=0\) if and only if there exists \(y\ne 0\) such that \(\sum _{i}y_{i}A_{i}=0\).

Lemma 2

(Primal Slater) There exists \(x\in \mathrm {Int}({\mathcal {K}})\) satisfying \({\mathbf {A}}x=b\) and \({\mathbf {N}}x=0\) if and only if there exists an \(X\succ 0\) satisfying \(A_{i}\bullet X=b_{i}\) for all \(i\in \{1,\ldots ,m\}\).

Lemma 3

(Dual Slater) There exists uv satisfying \(c-{\mathbf {A}}^{T}u-{\mathbf {N}}^{T}v\in \mathrm {Int}({\mathcal {K}}_{*})\) if and only if there exists y satisfying \(C-\sum _{i}y_{i}A_{i}\succ 0\).

After an \(\epsilon \)-accurate solution \(X_{1}^{\star },\ldots ,X_{\ell }^{\star }\) to (CTC) is found, we recover, in closed-form, a positive semidefinite completion \(X^{\star }\succeq 0\) satisfying \(X^{\star }[J_{j},J_{j}]=X_{j}^{\star }\), which in turn serves as an \(\epsilon \)-accurate solution to (SDP). Of all possible choices of \(X^{\star }\), a particularly convenient one is the low-rank completion \(X^{\star }=UU^{T}\), in which U is a dense matrix with n rows and at most \(\omega =\max _{j}|J_{j}|\) columns. While the existence of the low-rank completion was known since Dancis [53] (see also Laurent and Varvitsiotis [54] and Madani et al. [55]), Sun [43, Algorithm 2] gave an explicit algorithm to compute \(U^{\star }\) from \(X_{1}^{\star },\ldots ,X_{\ell }^{\star }\) in \(O(\omega ^{3}n)\) time and \(O(\omega ^{2}n)\) memory. The practical effectiveness of Sun’s algorithm was later validated on a large array of power systems problems by Jiang [56].

4 Cost of an interior-point iteration on (CTC)

When the vectorized version (11) of the converted problem (CTC) is solved using an interior-point method, the cost of each iteration is dominated by the cost of forming and solving the normal equation (also known as the Schur complement equation)

$$\begin{aligned} \begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}{\mathbf {D}}_{s}\begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}^{T}\varDelta y=\begin{bmatrix}{\mathbf {A}}{\mathbf {D}}_{s}{\mathbf {A}}^{T} &{}\quad {\mathbf {A}}{\mathbf {D}}_{s}{\mathbf {N}}^{T}\\ {\mathbf {N}}{\mathbf {D}}_{s}{\mathbf {A}}^{T} &{}\quad {\mathbf {N}}{\mathbf {D}}_{s}{\mathbf {N}}^{T} \end{bmatrix}\begin{bmatrix}\varDelta y_{1}\\ \varDelta y_{2} \end{bmatrix}=\begin{bmatrix}r_{1}\\ r_{2} \end{bmatrix}, \end{aligned}$$
(14)

where the scaling matrix \({\mathbf {D}}_{s}\) is block-diagonal with fully-dense blocks

$$\begin{aligned} {\mathbf {D}}_{s}=\mathrm {diag}({\mathbf {D}}_{s,1},\ldots ,{\mathbf {D}}_{s,\ell }),\qquad {\mathbf {D}}_{s,j}\succ 0\quad \text {for all }j\in \{1,\ldots ,\ell \}. \end{aligned}$$
(15)

Typically, each dense block in \({\mathbf {D}}_{s}\) is the Hessian of a log-det penalty, as in \({\mathbf {D}}_{s,j}=\nabla ^{2}[\log \det (X_{j})]\). The submatrix \({\mathbf {A}}{\mathbf {D}}_{s}{\mathbf {A}}^{T}\) is often sparse [16], with a sparsity pattern that coincides with the correlative sparsity [57] of the problem.

Unfortunately, \({\mathbf {N}}{\mathbf {D}}_{s}{\mathbf {N}}^{T}\) can be fully-dense, even when \({\mathbf {A}}{\mathbf {D}}_{s}{\mathbf {A}}^{T}\) is sparse or even diagonal. To explain, observe from (13) that the block sparsity pattern of \({\mathbf {N}}=[{\mathbf {N}}_{i,j}]_{i,j=1}^{\ell ,\ell }\) coincides with the incidence matrix of the tree decomposition tree T. Specifically, for every i with parent p(i), the block \({\mathbf {N}}_{i,j}\) is nonzero if and only if \(j\in \{i,p(i)\}\). As an immediate corollary, the block sparsity pattern of \({\mathbf {N}}{\mathbf {D}}_{s}{\mathbf {N}}^{T}\) coincides with the adjacency matrix of the line graph of T:

$$\begin{aligned} \sum _{k=1}^{\ell }{\mathbf {N}}_{i,k}{\mathbf {D}}_{s,k}{\mathbf {N}}_{j,k}^{T}\ne 0\quad \iff \quad j\in \{i,p(i)\}\text { or }p(j)\in \{i,p(i)\}. \end{aligned}$$
(16)

The line graph of a tree is not necessarily sparse. If T were the star graph on n vertices, then its associated line graph \({\mathcal {L}}(T)\) would be the complete graph on \(n-1\) vertices. Indeed, consider the following example.

Example 4

(Star graph) Given \(b\in {\mathbb {R}}^{n}\), embed \(\max \{b^{T}y:\Vert y\Vert \le 1\}\) into the order-\((n+1)\) semidefinite program:

$$\begin{aligned} \begin{aligned}&\text {minimize }&\mathrm {tr}\,(X)\\&\text {subject to }&X[i,(n+1)]&=b_{i}&\quad \text {for all }i\in \{1,\ldots ,n\}\\&X&\succeq 0 \end{aligned} \end{aligned}$$

The associated sparsity graph G is the star graph on \(n+1\) nodes, and its elimination tree \({\mathcal {T}}=(\{J_{1},\ldots ,J_{n}\},T)\) has index sets \(J_{j}=\{j,n+1\}\) and parent pointer \(p(j)=n\). Applying clique tree conversion and vectorizing yields an instance of (11) with

$$\begin{aligned} {\mathbf {A}}&=\begin{bmatrix}e_{2}^{T} &{}\quad &{}\quad 0\\ &{}\quad \ddots \\ 0 &{}\quad &{}\quad e_{2}^{T} \end{bmatrix},&{\mathbf {N}}&=\begin{bmatrix}e_{3}^{T} &{}\quad &{}\quad 0 &{}\quad -e_{3}^{T}\\ &{}\quad \ddots &{}\quad &{}\quad \vdots \\ 0 &{}\quad &{}\quad e_{3}^{T} &{}\quad -e_{3}^{T} \end{bmatrix}, \end{aligned}$$

where \(e_{j}\) is the j-th column of the \(3\times 3\) identity matrix. It is straightforward to verify that \({\mathbf {A}}{\mathbf {D}}_{s}{\mathbf {A}}^{T}\) is \(n\times n\) diagonal but \({\mathbf {N}}{\mathbf {D}}_{s}{\mathbf {N}}^{T}\) is \((n-1)\times (n-1)\) fully dense for the \({\mathbf {D}}_{s}\) in (15). The cost of solving the corresponding normal equation (14) must include the cost of factoring this fully dense submatrix, which is at least \((n-1)^{3}/3\) operations and \((n-1)^{2}/2\) units of memory. \(\square \)

On the other hand, observe that the block sparsity graph of \({\mathbf {N}}^{T}{\mathbf {N}}\) coincides with the tree graph T

$$\begin{aligned} \sum _{k=1}^{\ell }{\mathbf {N}}_{k,i}^{T}{\mathbf {N}}_{k,j}\ne 0\quad \iff \quad i=j\text { or }(i,j)\in E(T). \end{aligned}$$
(17)

Such a matrix is guaranteed to be block sparse: sparse over dense blocks. More importantly, after a topological block permutation \(\varPi \), the matrix \(\varPi ({\mathbf {N}}^{T}{\mathbf {N}})\varPi ^{T}\) factors into \({\mathbf {L}}{\mathbf {L}}^{T}\) with no block fill.

Definition 5

(Topological ordering) An ordering \(\pi :\{1,2,\ldots ,n\}\rightarrow V(T)\) on the tree graph T with n nodes is said to be topological [15, p. 10] if, by designating \(\pi (n)\) as the root of T, each node is indexed before its parent:

$$\begin{aligned} \pi ^{-1}(v)<\pi ^{-1}(p(v))\qquad \text { for all }v\ne r, \end{aligned}$$

where \(\pi ^{-1}(v)\) denotes the index associated with the node v.

Lemma 4

(No block fill) Let \(J_{1},\ldots ,J_{n}\) satisfy \(\bigcup _{j=1}^{n}J_{j}=\{1,\ldots ,d\}\) and \(J_{i}\cap J_{j}=\emptyset \) for all \(i\ne j\), and let \(H\succ 0\) be a \(d\times d\) matrix satisfying

$$\begin{aligned} H[J_{i},J_{j}]\ne 0\qquad \implies \qquad (i,j)\in E(T) \end{aligned}$$

for a tree graph T on n nodes. If \(\pi \) is a topological ordering on T and \(\varPi \) is a permutation matrix satisfying

$$\begin{aligned} (\varPi H\varPi )[J_{i},J_{j}]=H[J_{\pi (i)},J_{\pi (j)}]\qquad \text {for all }i,j\in \{1,\ldots ,n\}, \end{aligned}$$

then \(\varPi H\varPi ^{T}\) factors into \(LL^{T}\) where the Cholesky factor L satisfies

$$\begin{aligned} L[J_{i},J_{j}]\ne 0\qquad \implies \qquad (\varPi H\varPi )[J_{i},J_{j}]\ne 0\qquad \text { for all }i>j. \end{aligned}$$

Therefore, sparse Cholesky factorization solves \(Hx=b\) for x by: (i) factoring \(\varPi H\varPi ^{T}\) into \(LL^{T}\) in \(O(\beta ^{3}n)\) operations and \(O(\beta ^{2}n)\) memory where \(\beta =\max _{j}|J_{j}|\), and (ii) solving \(Ly=\varPi b\) and \(L^{T}z=y\) and \(x=\varPi ^{T}z\) in \(O(\beta ^{2}n)\) operations and memory.

This is a simple block-wise extension of the tree elimination result originally due to Parter [58]; see also George and Liu [51, Lemma 6.3.1]. In practice, a topological ordering can be found by assigning indices \(n,n-1,n-2,\ldots \) in decreasing ordering during a depth-first search traversal of the tree. In fact, the minimum degree heuristic is guaranteed to generate a topological ordering [48].

One way of exploiting the favorable block sparsity of \({\mathbf {N}}^{T}{\mathbf {N}}\) is to view the normal equation (14) as the Schur complement equation to an augmented system with \(\epsilon =0\):

$$\begin{aligned} \begin{bmatrix}{\mathbf {D}}_{s}^{-1} &{}\quad {\mathbf {A}}^{T} &{}\quad {\mathbf {N}}^{T}\\ {\mathbf {A}}&{}\quad -\epsilon I &{}\quad 0\\ {\mathbf {N}}&{}\quad 0 &{}\quad -\epsilon I \end{bmatrix}\begin{bmatrix}\varDelta x\\ \varDelta y_{1}\\ \varDelta y_{2} \end{bmatrix}=\begin{bmatrix}0\\ -r_{1}\\ -r_{2} \end{bmatrix}. \end{aligned}$$
(18)

Instead, we can solve the dual Schur complement equation for \(\epsilon >0\)

$$\begin{aligned} \left( {\mathbf {D}}_{s}^{-1}+\frac{1}{\epsilon }{\mathbf {A}}^{T}{\mathbf {A}}+\frac{1}{\epsilon }{\mathbf {N}}^{T}{\mathbf {N}}\right) \varDelta x=-\frac{1}{\epsilon }{\mathbf {A}}^{T}r_{1}-\frac{1}{\epsilon }{\mathbf {A}}^{T}r_{2} \end{aligned}$$
(19)

and recover an approximate solution. Under suitable sparsity assumptions on \({\mathbf {A}}^{T}{\mathbf {A}}\), the block sparsity graph of the matrix in (19) coincides with that of \({\mathbf {N}}^{T}{\mathbf {N}}\), which is itself the tree graph T. Using sparse Cholesky factorization with a topological block permutation, we solve (19) in linear time and back substitute to obtain a solution to (18) in linear time. In principle, a sufficiently small \(\epsilon >0\) will approximate the exact case at \(\epsilon =0\) to arbitrary accuracy, and this is all we need for the outer interior-point method to converge in polynomial time.

A more subtle way to exploit the block sparsity of \({\mathbf {N}}^{T}{\mathbf {N}}\) is to reformulate (CTC) into a form whose normal equation is exactly (19). As we show in the next section, this is achieved by a simple technique known as dualization.

5 Dualized clique tree conversion

The dualization technique of Löfberg [18] swaps the roles played by the primal and the dual problems in a linear conic program, by rewriting a primal standard form problem into dual standard form, and vice versa. Applying dualization to (11) yields the following

$$\begin{aligned} \begin{aligned}&\text {minimize }&\begin{bmatrix}b\\ 0 \end{bmatrix}^{T}x_{1}&\qquad&\text {maximize }&-c^{T}y\\&\text {subject to }&\begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}^{T}x_{1}-x_{2}&=-c,&\text {subject to }&\begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}y+s_{1}&=\begin{bmatrix}b\\ 0 \end{bmatrix}, \\&x_{1}\in {\mathbb {R}}^{f},\,x_{2}&\in {\mathcal {K}}.&&-y+s_{2}&=0, \\&&&s_{1}\in \{0\}^{f},\,s_{2}&\in {\mathcal {K}}. \end{aligned} \end{aligned}$$
(20)

where we use f to denote the number of equality constraints in (CTC). Observe that the dual problem in (20) is identical to the primal problem in (11), so that a dual solution \(y^{\star }\) to (20) immediately serves as a primal solution to (11), and hence also (CTC).

Modern interior-point methods solve (20) by embeding the free variable \(x_{1}\in {\mathbb {R}}^{f}\) and fixed variable \(s_{1}\in \{0\}^{f}\) into a second-order cone (see Sturm [59] and Andersen [60]):

$$\begin{aligned} \begin{aligned}&\text {minimize }&\begin{bmatrix}b\\ 0 \end{bmatrix}^{T}x_{1}&\qquad&\text {maximize }&-c^{T}y\\&\text {subject to }&\begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}^{T}x_{1}-x_{2}&=-c,&\text {subject to }&\begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}y+s_{1}&=\begin{bmatrix}b\\ 0 \end{bmatrix}, \\&\Vert x_{1}\Vert \le x_{0},\,x_{2}&\in {\mathcal {K}}.&&-y+s_{2}&=0, \\&&&s_{0}&=0, \\&&&\Vert s_{1}\Vert \le s_{0},\,s_{2}&\in {\mathcal {K}}. \end{aligned} \end{aligned}$$
(21)

When (21) is solved using an interior-point method, the normal equation solved at each iteration takes the form

$$\begin{aligned} \left( {\mathbf {D}}_{s}+\begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}^{T}{\mathbf {D}}_{f}\begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}\right) \varDelta y=r \end{aligned}$$
(22)

where \({\mathbf {D}}_{s}\) is comparable as before in (15), and

$$\begin{aligned} {\mathbf {D}}_{f}=\sigma I+ww^{T},\qquad \sigma >0 \end{aligned}$$
(23)

is the rank-1 perturbation of a scaled identity matrix. The standard procedure, as implemented in SeDuMi [59, 61] and MOSEK [62], is to form the sparse matrix \({\mathbf {H}}\) and dense vector \({\mathbf {q}}\), defined

$$\begin{aligned} {\mathbf {H}}&={\mathbf {D}}_{s}+\sigma {\mathbf {A}}^{T}{\mathbf {A}}+\sigma {\mathbf {N}}^{T}{\mathbf {N}},&{\mathbf {q}}&=\begin{bmatrix}{\mathbf {A}}^{T}&{\mathbf {N}}^{T}\end{bmatrix}w. \end{aligned}$$
(24)

and then solve (22) using a rank-1 updateFootnote 3

$$\begin{aligned} \varDelta y=({\mathbf {H}}+{\mathbf {q}}{\mathbf {q}}^{T})^{-1}r=\left( I-\frac{({\mathbf {H}}^{-1}{\mathbf {q}}){\mathbf {q}}^{T}}{1+{\mathbf {q}}^{T}({\mathbf {H}}^{-1}{\mathbf {q}})}\right) {\mathbf {H}}^{-1}r, \end{aligned}$$
(25)

at a cost comparable to the solution of \({\mathbf {H}}u=r\) for two right-hand sides. (In “Appendix B”, we repeat these derivations for the version of (11) in which \({\mathbf {A}}x=b\) is replaced by the inequality \({\mathbf {A}}x\le b\).)

The matrix \({\mathbf {H}}\) is exactly the dual Schur complement derived in (19) with \(\sigma =1/\epsilon \). If the \({\mathbf {A}}^{T}{\mathbf {A}}\) shares its block sparsity pattern with \({\mathbf {N}}^{T}{\mathbf {N}}\), then the block sparsity graph of \({\mathbf {H}}\) coincides with the tree graph T, and \({\mathbf {H}}u=r\) can be solved in linear time. The cost of making the rank-1 update is also linear time, so the cost of solving the normal equation is linear time.

Lemma 5

(Linear-time normal equation) Let there exist \(v_{i}\in V(T)\) for each \(i\in \{1,\ldots ,m\}\) such that

$$\begin{aligned} A_{i,j}\ne 0\qquad \implies \qquad j=v_{i}\text { or }j=p(v_{i}). \end{aligned}$$
(26)

Define \({\mathbf {H}}\) and \({\mathbf {q}}\) according to (24). Then, under Assumption 1:

  1. 1.

    (Forming) It costs \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) space to form \({\mathbf {H}}\) and \({\mathbf {q}}\), where \(\omega =\max _{j}|J_{j}|=1+\mathrm {wid}({\mathcal {T}})\).

  2. 2.

    (Factoring) Let \(\pi \) be a topological ordering on T, and define the associated block topological permutation \(\varPi \) as in Lemma 4. Then, it costs \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) space to factor \(\varPi {\mathbf {H}}\varPi ^{T}\) into \({\mathbf {L}}{\mathbf {L}}^{T}\).

  3. 3.

    (Solving) Given r, \({\mathbf {q}}\), \(\varPi \), and the Cholesky factor \({\mathbf {L}}\) satisfying \({\mathbf {L}}{\mathbf {L}}^{T}=\varPi {\mathbf {H}}\varPi ^{T}\), it costs \(O(\omega ^{4}n)\) time and space to solve \(({\mathbf {H}}+{\mathbf {q}}{\mathbf {q}}^{T})u=r\) for u.

Proof

For an instance of (CTC), denote \(\ell =|{\mathcal {J}}|\le n\) as its number of conic constraints, and \(d=\frac{1}{2}\sum _{j=1}^{\ell }|J_{j}|(|J_{j}|+1)\le \omega ^{2}\ell \) as its total number of variables. Under linear independence (Assumption 1), the constraint matrix \([{\mathbf {A}};{\mathbf {N}}]\) associated with (CTC) has d columns and at most d rows (Lemma 1). Write \(\xi _{i}^{T}\) as the i-th row of \([{\mathbf {A}};{\mathbf {N}}]\), and assume without loss of generality that \([{\mathbf {A}};{\mathbf {N}}]\) has exactly d rows. Observe that \(\mathrm {nnz}\,(\xi _{i})\le \omega (\omega +1)\) by the definition of \({\mathbf {N}}\) (13) and the hypothesis on \({\mathbf {A}}\) via (26), so \(\mathrm {nnz}\,([{\mathbf {A}};{\mathbf {N}}])\le 2\omega ^{4}\ell \).

(i) We form \({\mathbf {H}}\) by setting \({\mathbf {H}}\leftarrow {\mathbf {D}}_{s}\) and then adding \({\mathbf {H}}\leftarrow {\mathbf {H}}+\sigma \xi _{i}\xi _{i}^{T}\) one at a time, for \(i\in \{1,2,\ldots ,d\}\). The first step forms \({\mathbf {D}}_{s}=\mathrm {diag}({\mathbf {D}}_{s}^{(1)},{\mathbf {D}}_{s}^{(2)},\ldots ,{\mathbf {D}}_{s}^{(\ell )})\) where \({\mathbf {D}}_{s}^{(j)}=W_{j}\otimes W_{j}\). Each \(\mathrm {nnz}\,({\mathbf {D}}_{s}^{(j)})=\mathrm {nnz}\,(W_{j})^{2}=|J_{j}|^{2}(|J_{j}|+1)^{2}/4\le \omega ^{4}\) for \(j\in \{1,2,\ldots ,\ell \}\), so the total cost is \(O(\omega ^{4}n)\) time and space. The second step adds \(\mathrm {nnz}\,(\xi _{i}\xi _{i}^{T})^{2}\le \omega ^{2}(\omega +1)^{2}\) nonzeros per constraint over d total constraints, for a total cost of \(O(\omega ^{6}n)\) time and apparently \(O(\omega ^{6}n)\) space. However, by the definition of \({\mathbf {N}}\) (13) and the hypothesis on \({\mathbf {A}}\) via (26), the (jk)-th off-diagonal block of \(\xi _{i}\xi _{i}^{T}\) is nonzero only if (jk) is an edge of the tree T, as in

$$\begin{aligned} \xi _{i}\xi _{i}^{T}[J_{j},J_{k}]\ne 0\qquad \implies \qquad (j,k)\in E(T)\qquad \text {for all }j\ne k. \end{aligned}$$

Hence, adding \({\mathbf {H}}\leftarrow {\mathbf {H}}+\sigma \xi _{i}\xi _{i}^{T}\) one at a time results in at most \(|V(T)|+|E(T)|\) dense blocks of at most \(\frac{1}{2}\omega (\omega +1)\times \frac{1}{2}\omega (\omega +1)\), for a total memory cost of \(O(\omega ^{4}n)\).

(ii) We form \({\mathbf {q}}=[{\mathbf {A}}^{T},{\mathbf {N}}^{T}]w_{1}\) using a sparse matrix-vector product. Given that \(\mathrm {nnz}\,(w_{1})\le d\) and \(\mathrm {nnz}\,([{\mathbf {A}};{\mathbf {N}}])\le 2\omega ^{4}\ell \), this step costs \(O(\omega ^{4}n)\) time and space.

(iii) We partition \({\mathbf {H}}\) into \([{\mathbf {H}}_{i,j}]_{i,j=1}^{\ell }\) to reveal a block sparsity pattern that coincides with the adjacency matrix of T:

$$\begin{aligned} {\mathbf {H}}_{i,j}&={\left\{ \begin{array}{ll} {\mathbf {D}}_{s,i}+\sigma {\sum }_{k=1}^{\ell }{\mathbf {N}}_{k,i}^{T}{\mathbf {N}}_{k,i}+\sigma {\sum }_{q=1}^{m}a_{q,i}a_{q,i}^{T} &{} i=j\\ \sigma {\sum }_{k=1}^{\ell }{\mathbf {N}}_{k,i}^{T}{\mathbf {N}}_{k,j}+\sigma {\sum }_{q=1}^{m}a_{q,i}a_{q,j}^{T} &{} (i,j)\in E(T)\\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

where \(a_{q,i}=\mathrm {svec}\,(A_{q,i})\). According to Lemma 4, the permuted matrix \(\varPi {\mathbf {H}}\varPi ^{T}\) factors into \({\mathbf {L}}{\mathbf {L}}^{T}\) with no block fill in \(O(\omega ^{6}n)\) time and \(O(\omega ^{4}n)\) space, because each block \({\mathbf {H}}_{i,j}\) for \(i,j\in \{1,2,\ldots ,\ell \}\) is at most order \(\frac{1}{2}\omega (\omega +1)\).

(iv) Using the rank-1 update formula (25), the cost of solving \(({\mathbf {H}}+{\mathbf {q}}{\mathbf {q}}^{T})u=r\) is the same as the cost of solving \({\mathbf {H}}u=r\) for two right-hand sides, plus algebraic manipulations in \(O(d)=O(\omega ^{2}n)\) time. Applying Lemma 4 shows that the cost of solving \({\mathbf {H}}u=r\) for each right-hand side is \(O(\omega ^{4}n)\) time and space. \(\square \)

Incorporating the block topological permutation of Lemma 5 within any off-the-self interior-point method yields a fast interior-point method with near-linear time complexity.

Theorem 5

(Near-linear time) Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for the sparsity graph of \(C,A_{1},\ldots ,A_{m}\in {\mathbb {S}}^{n}\). In the corresponding instance of (CTC), let each constraint be written

$$\begin{aligned} \sum _{j=1}^{\ell }A_{i,j}\bullet X_{j}=A_{i,j}\bullet X_{j}+A_{i,k}\bullet X_{k}=b_{i}\qquad (j,k)\in E(T). \end{aligned}$$
(27)

Under Assumptions 12, there exists an algorithm that computes an iterate \((x,y,s)\in {\mathcal {K}}\times {\mathbb {R}}^{p}\times {\mathcal {K}}_{*}\) satisfying

$$\begin{aligned} \left\| \begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}x-\begin{bmatrix}b\\ 0 \end{bmatrix}\right\|&\le \epsilon ,&\left\| \begin{bmatrix}{\mathbf {A}}\\ {\mathbf {N}}\end{bmatrix}^{T}y-s+c\right\|&\le \epsilon ,&\frac{x^{T}s}{\sum _{j=1}^{\ell }|J_{j}|}&\le \epsilon \end{aligned}$$
(28)

in \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time and \(O(\omega ^{4}n)\) space, where \(\omega =\max _{j}|J_{j}|=1+\mathrm {wid}({\mathcal {T}})\).

For completeness, we give a proof of Theorem 5 in “Appendix C”, based on the primal-dual interior-point method found in SeDuMi [59, 61]. Our proof amounts to replacing the fill-reducing permutation—usually a minimum degree ordering—by the block topological permutation of of Lemma 5. In practice, the minimum degree ordering is often approximately block topological, and as such, Theorem 5 is often attained by off-the-shelf implementations without modification.

figure a

The complete end-to-end procedure for solving (SDP) using dualized clique tree conversion is summarized as Algorithm 1. Before we can use Algorithm 1 to prove our main results, however, we must first address the cost of the pre-processing involved in Step 1. Indeed, naively converting (SDP) into (CTC) by comparing each nonzero element of \(A_{i}\) against each index set \(J_{j}\) would result in \(\ell m=O(n^{2})\) comparisons, and this would cause Step 1 to become the overall bottleneck of the algorithm.

In the next section, we show that if (SDP) is partially separated, then the cost of Step 1 is no more than \(O(\omega ^{3}n)\) time and memory. This is the final piece in the proof of Theorem 1.

6 Optimal constraint splitting

A key step in clique tree conversion is the splitting of a given \(M\in {\mathbb {S}}_{F}^{n}\) into \(M_{1},\ldots ,M_{\ell }\) that satisfy

$$\begin{aligned} M_{1}\bullet X[J_{1},J_{1}]+M_{2}\bullet X[J_{2},J_{2}]+\cdots +M_{\ell }\bullet X[J_{\ell },J_{\ell }]=M\bullet X\qquad \text {for all }X\in {\mathbb {S}}^{n}. \end{aligned}$$
(29)

The choice is not unique, but has a significant impact on the complexity of an interior-point solution. The problem of choosing the sparsest choice with the fewest nonzero \(M_{j}\) matrices can be written

$$\begin{aligned} {\mathcal {S}}^{\star }=\underset{{\mathcal {S}}\subseteq \{1,\ldots ,\ell \}}{\text { minimize }}\quad |{\mathcal {S}}|\quad \text { subject to }\quad \bigcup _{j\in {\mathcal {S}}}(J_{j}\times J_{j})\supseteq {\mathcal {M}}, \end{aligned}$$
(30)

where \({\mathcal {M}}=\{(i,j):M[i,j]\ne 0\}\) are the nonzero matrix elements to be covered. Problem (30) is an instance of SET COVER, one of Karp’s 21 NP-complete problems, but becomes solvable in polynomial time given a tree decomposition (with small width) for the covering sets [64].

In this section, we describe an algorithm that computes the sparsest splitting for each M in \(O(\mathrm {nnz}\,(M))\) time and space, after a precomputation set taking \(O(\omega n)\) time and memory. Using this algorithm, we convert a partially separable instance of (SDP) into (CTC) in \(O(\omega ^{3}n)\) time and memory. Then, give a complete proof to Theorem 1 by using this algorithm to convert (SDP) into (CTC) in Step 1 of Algorithm 1.

Our algorithm is adapted from the leaf-pruning algorithm of Guo and Niedermeier [64], but appears to be new within the context of clique tree conversion. Observe that the covering sets inherit the edge cover and running intersection properties of \({\mathcal {T}}\):

$$\begin{aligned} \bigcup _{j=1}^{\ell }(J_{j}\times J_{j})&\supseteq {\mathcal {M}}\quad \text {for all possible choices of }{\mathcal {M}}, \end{aligned}$$
(31)
$$\begin{aligned} (J_{i}\times J_{i})\cap (J_{j}\times J_{j})&\subseteq (J_{k}\times J_{k})\quad \text {for all }k\text { on the path from }i\text { to }j. \end{aligned}$$
(32)

For every leaf node j with parent node p(j) on T, property (32) implies that the subset \((J_{j}\times J_{j})\backslash (J_{p(j)}\times J_{p(j)})\) contains elements unique to \(J_{j}\times J_{j}\), because p(j) lies on the path from j to all other nodes in T. If \({\mathcal {M}}\) contains an element from this subset, then j must be included in the cover set \({\mathcal {S}}\), so we set \({\mathcal {S}}\leftarrow {\mathcal {S}}\cup \{j\}\) and \({\mathcal {M}}\leftarrow {\mathcal {M}}\backslash (J_{j}\times J_{j})\); otherwise, we do nothing. Pruning the leaf node reveals new leaf nodes, and we repeat this process until the tree T is exhausted of nodes. Then, property (31) guarantees that \({\mathcal {M}}\) will eventually be covered.

figure b

Algorithm 2 is an adaptation of the leaf-pruning algorithm described above, with three important simplifications. First, it uses a topological traversal (Definition 5) to simulate the process of leaf pruning without explicitly deleting nodes from the tree. Second, it notes that the unique subset \((J_{j}\times J_{j})\backslash (J_{p(j)}\times J_{p(j)})\) can be written in terms of another unique set \(U_{j}\):

$$\begin{aligned} (J_{j}\times J_{j})\backslash (J_{p(j)}\times J_{p(j)})=(U_{j}\times J_{j})\cup (J_{j}\times U_{j})\qquad \text {where }U_{j}\equiv J_{j}\backslash J_{p(j)}. \end{aligned}$$

Third, it notes that the unique set \(U_{j}\) defined above is a partitioning of \(\{1,\ldots ,n\}\), and has a well-defined inverse map. The following is taken from [65, 66], where \(U_{j}\) is denoted \({\mathrm {new}}(J_{j})\) and referred to as the “new set” of \(J_{j}\); see also [67].

Lemma 6

(Unique partition) Define \(U_{j}=J_{j}\backslash J_{p(j)}\) for all nodes j with parent p(j), and \(U_{r}=J_{r}\) for the root node r. Then: (i) \(\bigcup _{j=1}^{\ell }U_{j}=\{1,\ldots ,n\}\); and (ii) \(U_{i}\cap U_{j}=\emptyset \) for all \(i\ne j\).

In the case that \({\mathcal {M}}\) contains just O(1) items to be covered, we may use the inverse map associated with \(U_{j}\) to directly identify covering sets whose unique sets contain elements from \({\mathcal {M}}\), without exhaustively iterating through all O(n) covering sets. This final simplification reduces the cost of processing each \(M_{i}\) from linear O(n) time to \(O(\mathrm {nnz}\,(M_{i}))\) time, after setting up the inverse map in \(O(\omega n)\) time and space.

Theorem 6

Algorithm 2 has complexity

$$\begin{aligned} O(\omega n+\mathrm {nnz}\,(M_{1})+\mathrm {nnz}\,(M_{2})+\cdots +\mathrm {nnz}\,(M_{m}))\text { time and memory,} \end{aligned}$$

where \(\omega \equiv 1+\mathrm {wid}({\mathcal {T}})\).

For partially separable instances of (SDP), the sparsest instance of (CTC) contains exactly one nonzero split matrix \(A_{i,j}\ne 0\) for each i, and Algorithm 2 is guaranteed to find it. Using Algorithm 2 to convert (SDP) into (CTC) in Step 1 of Algorithm 1 yields the complexity figures quoted in Theorem 1.

Proof of Theorem 1

By hypothesis, \({\mathcal {T}}=\{J_{1},\ldots ,J_{\ell }\}\) is a tree decomposition for the sparsity graph of the data matrices \(C,A_{1},\ldots ,A_{m}\), and (SDP) is partially separable on \({\mathcal {T}}\). We proceed to solve (SDP) using Algorithm 1, while performing the splitting into \(C_{j}\) and \(A_{i,j}\) using Algorithm 2. Below, we show that each step of the algorithm costs no more than \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time and \(O(\omega ^{4}n)\) memory:

Step 1 (Matrix \({\mathbf {A}}\) and vector \({\mathbf {c}}\)). We have \(\dim ({\mathbb {S}}_{G}^{n})=|V(G)|+|E(G)|\le n+n\cdot \mathrm {wid}({\mathcal {T}})\le \omega n\), and hence \(\mathrm {nnz}\,(C)\le \omega n\). Under partial separability (Definition 3), we also have \(\mathrm {nnz}\,(A_{i})\le \omega ^{2}\). Assuming linear independence (Assumption 1) yields \(m\le \dim ({\mathbb {S}}_{G}^{n})\le \omega n\), and this implies that \(\mathrm {nnz}\,(C)+\sum _{i}\mathrm {nnz}\,(A_{i})=O(\omega ^{3}n)\), so the cost of forming \({\mathbf {A}}\) and \({\mathbf {c}}\) using Algorithm 1 is \(O(\omega ^{3}n)\) time and memory via Theorem 6.

Step 1 (Matrix \({\mathbf {N}}\)). For \({\mathbf {N}}=[{\mathbf {N}}_{i,j}]_{i,j=1}^{\ell }\), we note that each block \({\mathbf {N}}_{i,j}\) is diagonal, and hence \(\mathrm {nnz}\,({\mathbf {N}}_{i,j})\le \omega ^{2}\). The overall \({\mathbf {N}}\) contains \(\ell \) block-rows, with 2 nonzero blocks per block-row, for a total of \(2\ell \) nonzero blocks. Therefore, the cost of forming \({\mathbf {N}}\) is \(\mathrm {nnz}\,({\mathbf {N}})=O(\omega ^{2}n)\) time and memory.

Step 2. We dualize by forming the matrix \({\mathbf {M}}=[0,-{\mathbf {A}}^{T},{\mathbf {N}}^{T},+I]\) and vectors \({\mathbf {c}}^{T}=[0,b^{T},0,0]\) and vectors \({\mathbf {b}}=-c\) in \(O(\mathrm {nnz}\,({\mathbf {A}})+\mathrm {nnz}\,({\mathbf {N}}))=O(\omega ^{3}n)\) time and memory.

Step 3. The resulting instance of (CTC) satisfies the assumptions of Theorem 5 and therefore costs \(O(\omega ^{6.5}n^{1.5}\log (1/\epsilon ))\) time and \(O(\omega ^{4}n)\) memory to solve.

Step 4. The low-rank matrix completion algorithm [43, Algorithm 2] makes \(\ell \le n\) iterations, where each iteration performs O(1) matrix-matrix operations over \(\omega \times \omega \) dense matrices. Its cost is therefore \(O(\omega ^{3}n)\) time and \(O(\omega ^{2}n)\) memory. \(\square \)

7 Dualized clique tree conversion with auxiliary variables

Theorem 5 bounds the cost of solving instances of (CTC) that satisfy the sparsity assumption (27) as near-linear time and linear memory. Instances of (CTC) that do not satisfy the sparsity assumption can be systematically transformed into ones that do by introducing auxiliary variables. Let us illustrate this idea with an example.

Example 5

(Path graph) Given \((n+1)\times (n+1)\) symmetric tridiagonal matrices \(A\succ 0\) and C with \(A[i,j]=C[i,j]=0\) for all \(|i-j|>1\), consider the Rayleigh quotient problem

$$\begin{aligned} \text {minimize }C\bullet X\qquad \text { subject to }\qquad A\bullet X=1,\quad X\succeq 0. \end{aligned}$$
(33)

The associated sparsity graph is the path graph on \(n+1\) nodes, and its elimination tree decomposition \({\mathcal {T}}=(\{J_{1},\ldots ,J_{n}\},T)\) has index sets \(J_{j}=\{j,j+1\}\) and parent pointer \(p(j)=j+1\). Applying clique tree conversion and vectorizing yields an instance of (11) with

$$\begin{aligned} {\mathbf {A}}&=\begin{bmatrix}a_{1}^{T}&\quad \cdots&\quad a_{n}^{T}\end{bmatrix},&{\mathbf {N}}&=\begin{bmatrix}e_{3}^{T} &{}\quad -e_{1}^{T}\\ &{}\quad \ddots &{} \quad \ddots \\ &{} \quad &{}\quad e_{3}^{T} &{}\quad -e_{1}^{T} \end{bmatrix} \end{aligned}$$

where \(e_{j}\) is the j-th column of the \(3\times 3\) identity matrix, and \(a_{1},\ldots ,a_{n}\in {\mathbb {R}}^{3}\) are appropriately chosen vectors. The dualized Schur complement \({\mathbf {H}}={\mathbf {D}}_{s}+\sigma {\mathbf {A}}^{T}{\mathbf {A}}+\sigma {\mathbf {N}}^{T}{\mathbf {N}}\) is fully dense, so dualized clique tree conversion (Algorithm 1) would have a complexity of at least cubic \(n^{3}\) time and quadratic \(n^{2}\) memory. Instead, introducing \(n-1\) auxiliary variables \(u_{1},\ldots ,u_{n-1}\) yields the following problem

$$\begin{aligned}&\begin{aligned}&\text {minimize }\quad&\sum _{j=1}^{n}c_{j}^{T}x_{j}\\&\text {subject to }&a_{1}^{T}x_{1}-\begin{bmatrix}0&1\end{bmatrix}\begin{bmatrix}x_{2}\\ u_{2} \end{bmatrix}&=b \\&\begin{bmatrix}a_{i}^{T}&1\end{bmatrix}\begin{bmatrix}x_{i}\\ u_{i} \end{bmatrix}-\begin{bmatrix}0&1\end{bmatrix}\begin{bmatrix}x_{i+1}\\ u_{i+1} \end{bmatrix}&=0&\qquad \text {for all }i&\in \{2,\ldots n-1\} \\&x_{1}\in \mathrm {svec}\,({\mathbb {S}}_{+}^{2}),\;\begin{bmatrix}x_{j}\\ u_{j} \end{bmatrix}\in \mathrm {svec}\,({\mathbb {S}}_{+}^{2})&\times {\mathbb {R}}&\text {for all }j&\in \{2,\ldots n\} \end{aligned}\nonumber \\ \end{aligned}$$
(34)

which does indeed satisfy the sparsity assumption (27) of Theorem 5. In turn, solving (34) using Steps 2-3 of Algorithm 1 recovers an \(\epsilon \)-accurate solution in \(O(n^{1.5}\log \epsilon ^{-1})\) time and O(n) memory. \(\square \)

For a constraint \(A_{i}\bullet X=b_{i}\) in (SDP), we assume without loss of generalityFootnote 4 that the corresponding constraint in (CTC) is split over a connected subtree of T induced by a subset of vertices \(W\subseteq V(T)\), as in

$$\begin{aligned} \sum _{j\in W}A_{i,j}\bullet X[J_{j},J_{j}]=b_{i},\qquad T_{W}\equiv (W,\;E(T))\text { is connected}. \end{aligned}$$
(35)

Then, the coupled constraint (35) can be decoupled into |W| constraints, by introducing \(|W|-1\) auxiliary variables, one for each edge of the connected subtree \(T_{W}\):

$$\begin{aligned} A_{i,j}\bullet X[J_{j},J_{j}]+\sum _{k\in \mathrm {ch}(j)}u_{k}&={\left\{ \begin{array}{ll} b_{i} &{} k\text { is root of }T_{W},\\ u_{j} &{} \text {otherwise,} \end{array}\right. }\qquad \text {for all }j\in W. \end{aligned}$$
(36)

It is easy to see that (35) and (36) are equivalent by applying Gaussian elimination on the auxiliary variables.

Lemma 7

The matrix X satisfies (35) if and only if there exists \(\{u_{j}\}\) such that X satisfies (36).

Repeating the splitting procedure for every constraint in (CTC) yields a problem of the form

$$\begin{aligned} \begin{aligned}&\text {minimize }&c^{T}x,\\&\text {subject to }&\sum _{j\in W_{i}}({\mathbf {A}}_{i,j}x_{j}+{\mathbf {B}}_{i,j}u_{i,j})&={\mathbf {f}}_{i}\qquad \text {for all }i\in \{1,\ldots ,m\} \\&\sum _{j=1}^{\ell }{\mathbf {N}}_{i,j}x_{j}&=0\qquad \text {for all }i\in \{1,\ldots ,\ell \} \\&\begin{bmatrix}x_{j}\\ {} [u_{i,j}]_{i=1}^{m} \end{bmatrix}\in \mathrm {svec}\,({\mathbb {S}}_{+}^{|J_{j}|})\times&{\mathbb {R}}^{\gamma _{j}}\qquad \text {for all }j\in \{1,\ldots ,\ell \} \end{aligned} \end{aligned}$$
(37)

where \(W_{i}\) is induces the connected subtree associated with i-th constraint, and \(\gamma _{j}\) is the total number of auxiliary variables added to each j-th variable block. When (21) is dualized and solved using an interior-point method, the matrix \({\mathbf {H}}=[{\mathbf {H}}_{i,j}]_{i,j=1}^{\ell }\) satisfies \({\mathbf {H}}_{i,j}=0\) for every \((i,j)\notin E(T)\), so by repeating the proof of Lemma 5, the cost of solving the normal equation is again linear time. Incorporating this within any off-the-self interior-point method again yields a fast interior-point method.

Lemma 8

Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for the sparsity graph of \(C,A_{1},\ldots ,A_{m}\in {\mathbb {S}}^{n}\), and convert the corresponding instance of (CTC) into (37). Under Assumptions 12, there exists an algorithm that computes an iterate \((x,y,s)\in {\mathcal {K}}\times {\mathbb {R}}^{p}\times {\mathcal {K}}_{*}\) satisfying (28) in

$$\begin{aligned} O((\omega ^{2}+\gamma _{\max })^{3}\omega ^{0.5}n^{1.5}\log \epsilon ^{-1})\text { time and }O((\omega ^{2}+\gamma _{\max })^{2}n)\text { memory,} \end{aligned}$$

where \(\omega =1+\mathrm {wid}({\mathcal {T}})\) and \(\gamma _{\max }=\max _{j}\gamma _{j}\) is the maximum number of auxiliary variables added to a single variable block.

Proof

We repeat the proof of Theorem 5, but slightly modify the linear time normal equation result in Lemma 5. Specifically, we repeat the proof of Lemma 5, but note that each block \({\mathbf {H}}_{i,j}\) of \({\mathbf {H}}\) is now order \(\frac{1}{2}\omega (\omega +1)+\gamma _{\max }\), so that factoring in (ii) now costs \(O((\omega ^{2}+\gamma _{\max })^{3}n)\) time and \(O((\omega ^{2}+\gamma _{\max })^{2}n)\) memory, and substituting in (iii) costs \(O((\omega ^{2}+\gamma _{\max })^{2}n)\) time and memory. After \(O(\sqrt{\omega n}\log \epsilon ^{-1})\) interior-point iterations, we again arrive at an \(\epsilon \)-accurate and \(\epsilon \)-feasible solution to (CTC). \(\square \)

figure c

The complete end-to-end procedure for solving (SDP) using the auxiliary variables method is summarized as Algorithm 3. In the case of network flow semidefinite programs, the separating in Step 2 can be performed in closed-form using the induced subtree property of the tree decomposition [68].

Definition 6

(Induced subtrees) Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition. We define \(T_{k}\) as the connected subtree of T induced by the nodes that contain the element k, as in

$$\begin{aligned} V(T_{k})&=\{j\in \{1,\ldots ,\ell \}:k\in J_{j}\},&E(T_{k})&=E(T). \end{aligned}$$

Lemma 9

Let \({\mathcal {T}}=(\{J_{1},\ldots ,J_{\ell }\},T)\) be a tree decomposition for the graph G. For every \(i\in V(G)\) and

$$\begin{aligned} A=\alpha _{i}e_{i}e_{i}^{T}+\sum _{(i,j)\in E(G)}\alpha _{j}(e_{i}e_{j}^{T}+e_{j}e_{i}^{T}), \end{aligned}$$

there exists \(A_{j}\) for \(j\in V(T_{i})\) such that

$$\begin{aligned} A\bullet X=\sum _{j\in V(T_{i})}A_{j}\bullet X[J_{j},J_{j}]\qquad \text {for all }X\in {\mathbb {S}}^{n}. \end{aligned}$$

Proof

We give an explicit construction. Iterate j over the neighbors \({\mathrm {nei}}(i)=\{j:(i,j)\in E(G)\}\) of i. By the edge cover property of the tree decomposition, there exists \(k\in \{1,\ldots ,\ell \}\) satisfying \(i,j\in J_{k}\). Moreover, \(k\in V(T_{i})\) because \(i\in J_{k}\). Define \(A_{k}\) to satisfy

$$\begin{aligned} A_{k}\bullet X[J_{k},J_{k}]=(\alpha _{i}/\deg _{i})X[i,i]+\alpha _{j}(X[i,j]+X[j,i]), \end{aligned}$$

where \(\deg _{i}=|{\mathrm {nei}}(i)|\). \(\square \)

If each network flow constraint is split using according to Lemma 9, then the number of auxiliary variables needed to decouple the problem can be bounded. This results in a proof of Theorem 2.

Proof of Theorem 2

By hypothesis, \({\mathcal {T}}=\{J_{1},\ldots ,J_{\ell }\}\) is a tree decomposition for the sparsity graph of the data matrices \(C,A_{1},\ldots ,A_{m}\), and each \(A_{i}\) can be split according to Lemma 9 onto a connected subtree of T. We proceed to solve (SDP) using Algorithm 3. We perform Step 1 in closed-form, by splitting each \(A_{i}\) in according to Lemma 9. The cost of Steps 2 and 3 are then bound as \(\mathrm {nnz}\,({\mathbf {A}})+\mathrm {nnz}\,({\mathbf {N}})=O(\omega ^{3}n)\) time and memory. The cost of step 5 is also \(O(\omega ^{3}n)\) time and \(O(\omega ^{2}n)\) memory, using the same reasoning as the proof of Theorem 1.

To quantify the cost of Step 4, we must show that under the conditions stated in the theorem, the maximum number of auxiliary variables added to each variable block is bound \(\gamma _{j}\le m_{k}\cdot \omega \cdot d_{\max }\). We do this via the following line of reasoning:

  • A single network flow constraint at vertex k contributes \(|\mathrm {ch}(j)|\le d_{\max }\) auxiliary variables to every j-th index set \(J_{j}\) satisfying \(j\in V(T_{k})\).

  • Having one network flow constraint at every \(k\in \{1,\ldots ,\ell \}\) contributes at most \(\omega \cdot d_{\max }\) auxiliary variables to every j-th clique \(J_{j}\). This is because the set of \(V(T_{k})\) for which \(j\in V(T_{k})\) is exactly \(J_{j}=\{\{1,\ldots ,\ell \}:j\in V(T_{k})\}\), and \(|J_{j}|\le \omega \) by definition.

  • Having \(m_{k}\) network flow constraints at each \(k\in \{1,\ldots ,\ell \}\) contributes at most \(m_{k}\cdot \omega \cdot d_{\max }\) auxiliary variables to every j-th clique \(J_{j}\).

Finally, applying \(\gamma _{j}\le m_{k}\cdot \omega \cdot d_{\max }\) to Lemma 8 yields the desired complexity figure, which dominates the cost of the entire algorithm. \(\square \)

8 Numerical experiments

Using the techniques described in this paper, we solve sparse semidefinite programs posed on the 40 power system test cases in the MATPOWER suite [69], each with number of constraints m comparable to n. The largest two cases have \(n=9241\) and \(n=13659\), and are designed to accurately represent the size and complexity of the European high voltage electricity transmission network [70]. In all of our trials below, the accuracy of a primal-dual iterate (XyS) is measured using the DIMACS feasibility and duality gap metrics [71] and stated as the number of accurate decimal digits:

$$\begin{aligned} {\mathrm {pinf}}&=-\log _{10}\left[ \;\Vert {\mathcal {A}}(X)-b\Vert _{2}/(1+\Vert b\Vert _{2})\;\right] ,\\ {\mathrm {dinf}}&=-\log _{10}\left[ \;\lambda _{\max }({\mathcal {A}}^{T}(y)-C)/(1+\Vert C\Vert _{2})\;\right] ,\\ {\mathrm {gap}}&=-\log _{10}\left[ \;(C\bullet X-b^{T}y)/(1+|C\bullet X|+|b^{T}y|)\;\right] , \end{aligned}$$

where \({\mathcal {A}}(X)=[A_{i}\bullet X]_{i=1}^{m}\) and \({\mathcal {A}}^{T}(y)=\sum _{i=1}^{m}y_{i}A_{i}\). We will frequently measure the overall number of accurate digits as \(L=\min \{{\mathrm {gap}},{\mathrm {pinf}},{\mathrm {dinf}}\}\).

In our trials, we implement Algorithm 1 and Algorithm 3 in MATLAB using a version of SeDuMi v1.32 [59] that is modified to force a specific fill-reducing permutation during symbolic factorization. The actual block topological ordering that we force SeDuMi to use is a simple postordering of the elimination tree. For comparison, we also implement both algorithms using the standard off-the-shelf version of MOSEK v8.0.0.53 [72], without forcing a specific fill-reducing permutation. The experiments are performed on a Xeon 3.3 GHz quad-core CPU with 16 GB of RAM.

8.1 Elimination trees with small widths

We begin by computing tree decompositions using MATLAB’s internal approximate minimum degree heuristic (due to Amestoy, Davis and Duff [73]). A simplified version of our code is shown as the snippet in Fig. 1. (Our actual code uses Algorithm 4.1 in [15] to reduce the computed elimination tree to the supernodal elimination tree, for a slight reduction in the number of index sets \(\ell \).) Table 1 gives the details and timings for the 40 power system graphs from the MATPOWER suite [69]. As shown, we compute tree decompositions with \(\mathrm {wid}({\mathcal {T}})\le 34\) in less than 2 s. In practice, the bottleneck of the preprocessing step is not the tree decomposition, but the constraint splitting step in Algorithm 2.

Fig. 1
figure 1

MATLAB code for computing the tree decomposition of a given sparsity graph. The code terminates with tree decomposition \({\mathcal {T}}=({\mathcal {J}},T)\) in which the index sets \({\mathcal {J}}=\{J_{1},\ldots ,J_{n}\}\) are stored as the cell array J, and the tree T is stored in terms of its parent pointer parT

Table 1 Tree decompositions for the 40 test power systems under study: |V(G)|—number of vertices; |E(G)|—number of edges; \(|{\mathcal {J}}|\)—number of bags in \({\mathcal {T}}\); \(\omega =1+\mathrm {wid}({\mathcal {T}})\)—computed clique number; “Time”—total computation time in seconds
Table 2 Accuracy (in decimal digits) and timing (in seconds) for 20 largest MAX 3-CUT problems: n—order of matrix variable; m—number of constraints; “Pre-proc”—post-processing time; “gap”—duality gap; “pinf”—primal infeasibility; “dinf”—dual infeasibility; k—number of interior-point iterations; T—total interior-point time; “Post-proc”—post-processing time

8.2 MAX 3-CUT and Lovasz Theta

We begin by considering the MAX 3-CUT and Lovasz Theta problems, which are partially separable by default, and hence have solution complexities of \(O(n^{1.5})\) time and O(n) memory. For each of the 40 test cases, we use the MATPOWER function makeYbus to generate the bus admittance matrix \(Y_{bus}=[Y_{i,j}]_{i,j=1}^{n},\) and symmetrize to yield \(Y_{abs}=\frac{1}{2}[|Y_{i,j}|+|Y_{j,i}|]_{i,j=1}^{n}\). We view this matrix as the weighted adjacency matrix for the system graph. For MAX 3-CUT, we define the weighted Laplacian matrix \(C=\mathrm {diag}(Y_{abs}{\mathbf {1}})-Y_{abs}\), and set up problem (MkC). For Lovasz Theta, we extract the location of the graph edges from \(Y_{abs}\) and set up (\(\hbox {LT}'\)).

First, we use Algorithm 1 with the modified version of SeDuMi to solve the 80 instances of (SDP). Of the 80 instances considered, 79 solved to \(L\ge 5\) digits in \(k\le 23\) iterations and \(T\le 306\) s; the largest instance solved to \(L=4.48\). Table 2 shows the accuracy and timing details for the 20 largest problems solved. Figure 2a plots T/k, the mean time taken per-iteration. As we guaranteed in Lemma 1, the per-iteration time is linear with respect to n. A log-log regression yields \(T/k=10^{-3}n\), with \(R^{2}=0.9636\). Figure 2b plots k/L, the number of iterations to a factor-of-ten error reduction. We see that SeDuMi’s guaranteed iteration complexity \(k=O(\sqrt{n}\log \epsilon ^{-1})=O(\sqrt{n}L)\) is a significant over-estimate; a log-log regression yields \(k/L=0.929n^{0.123}\approx n^{1/8}\), with \(R^{2}=0.5432\). Combined, the data suggests an actual time complexity of \(T\approx 10^{-3}n^{1.1}L\).

Fig. 2
figure 2

SeDuMi Timings for MAX 3-CUT (\(\circ \)) and Lovasz Theta (\(\times \)) problems: a Time per iteration, with regression \(T/k=10^{-3}n\); b Iterations per decimal digit of accuracy, with (solid) regression \(k/L=0.929n^{0.123}\) and (dashed) bound \(k/L=\sqrt{n}\)

Next, we use Algorithm 1 alongside the off-the-shelf version of MOSEK to solve the 80 same instances. It turns out that MOSEK is both more accurate than SeDuMi, as well as a factor of 5-10 faster. It manages to solve all 80 instances to \(L\ge 6\) digits in \(k\le 21\) iterations and \(T\le 24\) s. Table 3 shows the accuracy and timing details for the 20 largest problems solved. Figure 3a plots T/k, the mean time taken per-iteration. Despite not forcing the use of a block topological ordering, MOSEK nevertheless attains an approximately linear per-iteration cost. Figure 3b plots k/L, the number of iterations to a factor-of-ten error reduction. Again, we see that MOSEK’s guaranteed iteration complexity \(k=O(\sqrt{n}\log \epsilon ^{-1})=O(\sqrt{n}L)\) is a significant over-estimate. A log-log regression yields an empirical time complexity of \(T\approx 10^{-4}n^{1.12}L\), which is very close to being linear-time.

Table 3 Accuracy and timing for 20 largest Lovasz Theta problems
Fig. 3
figure 3

MOSEK Timings for MAX 3-CUT (\(\circ \)) and Lovasz Theta (\(\times \)) problems: a Time per iteration, with regression \(T/k=1.488\times 10^{-4}n\); b Iterations per decimal digit of accuracy, with (solid) regression \(k/L=0.697n^{0.123}\) and (dashed) bound \(k/L=\sqrt{n}\)

Table 4 Accuracy (in decimal digits) and timing (in seconds) for 20 largest OPF problems: n—order of matrix variable; m - number of constraints; “Pre-proc”—post-processing time; \(L=\min \{{\mathrm {gap}},{\mathrm {pinf}},{\mathrm {dinf}}\}\)—accurate decimal digits; k—number of interior-point iterations; T—total interior-point time; “Post-proc”—post-processing time

8.3 Optimal power flow

We now solve instances of the OPF posed on the same 40 power systems as mentioned above. Here, we use the MATPOWER function makeYbus to generate the bus admittance matrix \(Y_{bus}\), and then manually generate each constraint matrix \(A_{i}\) from \(Y_{bus}\) using the recipes described in [74]. Specifically, we formulate each OPF problem given the power flow case as follows:

  • Minimize the cost of generation. This is the sum of real-power injection at each generator times $1 per MW.

  • Constrain all bus voltages to be from 95 to 105% of their nominal values.

  • Constrain all load bus real-power and reactive-power values to be from 95 to 105% of their nominal values.

  • Constrain all generator bus real-power and reactive-power values within their power curve. The actual minimum and maximum real and reactive power limits are obtained from the case description.

We use three different algorithms based to solve the resulting semidefinite program: (1) The original clique tree conversion of Fukuda and Nakata et al. [10, 75] in Sect. 3.3; (2) Dualized clique tree conversion in Algorithm 1; (3) Dualized clique tree conversion with auxiliary variables in Algorithm 3. We solved all 40 problems using the three algorithms and MOSEK as the internal interior-point solver. Table 4 shows the accuracy and timing details for the 20 largest problems solved. All three algorithms achieved near-linear time performance, solving each problem instances to 7 digits of accuracy within 6 minutes. Upon closer examination, we see that the two dualized algorithms are both about a factor-of-two faster than the basic CTC method. Figure 4 plots T/k, the mean time taken per-iteration, and k/L, the number of iterations for a factor-of-ten error reduction, and their respective log-log regressions. The data suggests an empirical time complexity of \(T\approx 2.3\times 10^{-4}n^{1.3}L\) over the three algorithms.

Fig. 4
figure 4

OPF problems solved using clique tree conversion (\(\times \)), dualized clique tree conversion (\(\circ \)) and dualized clique tree conversion with auxiliary variables (\(\triangle \)): a Time per iteration, with regression \(T/k=2.931\times 10^{-4}n\); b Iterations per decimal digit of accuracy, with (solid) regression \(k/L=0.807n^{0.271}\) and (dashed) bound \(k/L=\sqrt{n}\)

9 Conclusion

Clique tree conversion splits a large \(n\times n\) semidefinite variable \(X\succeq 0\) into up to n smaller semidefinite variables \(X_{j}\succeq 0\), coupled by a large number of overlap constraints. These overlap constraints are a fundamental weakness of clique tree conversion, and can cause highly sparse semidefinite program to be solved in as much as cubic time and quadratic memory.

In this paper, we apply dualization to clique tree decomposition. Under a partially separable sparsity assumption, we show that the resulting normal equations have a block-sparsity pattern that coincides with the adjacency matrix of a tree graph, so the per-iteration time and memory complexity of an interior-point method is guaranteed to be linear with respect to n, the order of the matrix variable X. Problems that do not satisfy the separable assumption can be systematically separated by introducing auxiliary variables. In the case of network flow semidefinite programs, the number of auxiliary variables can be bounded, so an interior-point method again has a per-iteration time and memory complexity that is linear with respect to n.

Using these insights, we prove that the MAXCUT and MAX k-CUT relaxations, the Lovasz Theta problem, and the AC optimal power flow relaxation can all be solved with a guaranteed time and memory complexity that is near-linear with respect to n, assuming that a tree decomposition with small width for the sparsity graph is known. Our numerical results confirm an empirical time complexity that is linear with respect to n on the MAX 3-CUT and Lovasz Theta relaxations.