A manifoldbased approach to sparse global constraint satisfaction problems
Abstract
We consider square, sparse nonlinear systems of equations whose Jacobian is structurally nonsingular, with reasonable bound constraints on all variables. We propose an algorithm for finding good approximations to all wellseparated solutions of such systems. We assume that the input system is ordered such that its Jacobian is in bordered block lower triangular form with small diagonal blocks and with small border width; this can be performed fully automatically with offtheshelf decomposition methods. Five decades of numerical experience show that models of technical systems tend to decompose favorably in practice. Once the block decomposition is available, we reduce the task of solving the large nonlinear system of equations to that of solving a sequence of lowdimensional ones. The most serious weakness of this approach is wellknown: It may suffer from severe numerical instability. The proposed method resolves this issue with the novel backsolve step. We study the effect of the decomposition on a sequence of challenging problems. Beyond a certain problem size, the computational effort of multistart (no decomposition) grows exponentially. In contrast, thanks to the decomposition, for the proposed method the computational effort grows only linearly with the problem size. It depends on the problem size and on the hyperparameter settings whether the decomposition and the more sophisticated algorithm pay off. Although there is no theoretical guarantee that all solutions will be found in the general case, increasing the socalled sample size hyperparameter improves the robustness of the proposed method.
Keywords
Decomposition methods Diakoptics Largescale systems of equations Numerical instability Sparse matrices Tearing1 Introduction
1.1 Aims
The task that we just posed is computationally intractable in general; we have to make further assumptions. We assume that (1) has already been ordered such that its Jacobian is in bordered block lower triangular form with small blocks and with small border width; the formal definition of bordered block lower triangular forms is given in Sect. 1.3. In Sect. 1.4 we give references how (1) can be ordered to the desired bordered block lower triangular form fully automatically and efficiently. We argue in Sect. 1.5 why the models of technical systems tend to decompose favorably in practice, and why the proposed method is expected to be useful across many engineering fields, e.g., mechanical, electrical, chemical, and aerospace engineering. Further (less limiting) assumptions are given in Sect. 1.6. The last one of our assumptions is given in Sect. 3, after the overview of the proposed method; this is necessary for better understanding of this particular assumption.
1.2 Terminology
We refer to the number \(\dim x\) of components of a vector x as its dimension. The structural rank of a matrix A is the maximum number of nonzero entries that can be permuted onto the diagonal with suitable row and column permutations. (It is also known as the maximal size of a transversal, of a maximum assignment, or of a maximum matching in the bipartite sparsity graph of A.) The structural rank is an upper bound on the numerical rank of A. A is nonsingular for some numerical values of its nonzero entries if and only if it is possible to permute the rows and columns of A in such a way that the diagonal is zero free. Such a matrix is called structurally nonsingular.
In an engineering application it is usually not meaningful to distinguish two solutions that are too close due to the intrinsic uncertainty of every reallife model. We therefore call a set P of points wellseparated if, for any distinct points \(p,q \in P\), the distance \(\left\ pq\right\ _2\) is above a small threshold \(\delta \) specified by the user, for example \(\delta =10^{4}\).
Array slicing notation. The shorthand p:q is used for the ordered index set \(p,p+1,\dots ,q\), where \(p\le q\). When forming the subvector \(v_{p:q}\) of a vector v, p:q is cropped appropriately if necessary; that is, invalid indices are ignored. The index set p:q is considered empty if \(p>q\), and the expression \(v_{p:q}\) is a valid subvector of v that has no components. In case of block vectors, the shorthand \(v_{i:k}\) is used for a block vector with consecutive blocks \(v_j\) (\(j=i:k\)).
A point cloud is a set of scattered points, intended to approximate a manifold.
1.3 Bordered block lower triangular forms
By construction, the diagonal blocks are structurally nonsingular. We refer to the set S of arguments where some block is singular as the singular set of the system. The structural nonsingularity implies that S has measure zero. For arguments x outside this set, all blocks are nonsingular, and \(F_{1:i}(x_{0:i})=0\)\((i=1,\dots ,N)\) implicitly defines a (possibly disconnected) \(d_0\)dimensional manifold in \(\mathbb {R}^{d_i}\), where \(d_i=\dim x_{0:i}\). We refer to the full solution set of this subsystem for arguments within the original bounds as the solution manifold associated with the bordered block lower triangular form. (If the singular set S is nonempty, this is a manifold only in a generalized sense since it has singularities at the points of S, e.g. selfcrossings and cusps.) In our algorithm we resolve this manifold through a coarse discretization by a point cloud.
1.4 Creating the desired block decomposition automatically
Sparse matrix ordering algorithms are a wellresearched subject with a vast literature; we only mention some key points and references here. Both the Jacobian of (1) and the square blocks along the diagonal are required to have full structural row rank. The structural rank is revealed by the Dulmage–Mendelsohn decomposition (Dulmage and Mendelsohn [18, 19, 20], Johnson et al. [30], Duff et al. [17, Ch. 6], Pothen and Fan [43], and Davis [11, Ch. 7]). This decomposition is a standard procedure, and efficient computer implementations are available, for example HSL_MC79 from the HSL [29]. Practical ordering algorithms are applied next; these include the Hellerman–Rarick family of ordering algorithms [17, 21, 27, 28], and the algorithms of Stadtherr and Wood [45, 46]. An efficient computer implementation of the Hellerman–Rarick algorithms is MC33 from the HSL [29]. Although there are subtle differences among the various ordering algorithms, they all fit the same pattern when viewed from a high level of abstraction, see Fletcher and Hall [22].
1.5 Tearing heuristics to create bordered block lower triangular forms
Beside the references given in Sect. 1.4, the engineering literature is also rich in sparse matrix ordering algorithms. Decomposing to bordered block lower triangular form has a long tradition in engineering applications: It is usually referred to as tearing, diakoptics, or sequential modular approach, depending on the engineering discipline. When dealing with distillation columns, tearing is called stagetostage or stagebystage calculations. Tearing dates back to the 1930’s [34, 48], and has been widely adapted across many engineering fields since: Stateoftheart steadystate and dynamic simulation environments all implement some variant of tearing, see for example Aspen Technology, Inc. [1], MOSAICmodeling [9], Dymola [10], JModelica [37], or OpenModelica [40]. The applicability of tearing is not limited to a particular engineering discipline: It is generic, and it is used in all stateoftheart Modelica simulators to model “complex physical systems containing, e.g., mechanical, electrical, electronic, hydraulic, thermal, control, electric power or processoriented subcomponents” [36].
The various tearing heuristics are concerned with selecting a minimal subset of variables called the torn variables; when these torn variables are moved to the border of the matrix, and the Dulmage–Mendelsohn decomposition is applied to the rest of the matrix, the blocks of the resulting bordered block lower triangular form correspond to the devices (or machines) of the technical system. The block sizes therefore tend to be O(1), that is, they are typically bounded by a small constant. More than five decades of practical experience and the widespread usage of tearing show that the tearing heuristics also tend to produce a narrow border when applied to technical systems.
1.6 Further assumptions
Our algorithm assumes that the variables are adequately scaled. This allows us to use one of the standard norms to measure distances; unless otherwise indicated, we use the Euclidean norm (\(\ell _2\)norm).
We also assume that the bound constraints \(\underline{x} \le x \le \overline{x}\) are finite and reasonable; this is needed to allow an adequate sampling of the search space. Therefore, our method may not work well when a variable is unbounded or its upper bound is not known, and the user circumvents this by specifying a huge number such as \(10^{20}\) as upper bound. Finite bound constraints are also important from an engineering perspective: These bounds often exclude those solutions of \(F(x)=0\) that either have no physical meaning or lie outside the validity of the model.
2 Overview of the proposed algorithm
 (a)
Many or all points become bound infeasible.
 (b)
The \(x_i\) component of many points in the point cloud accumulate around one point or around a particular subspace. In this case, the remaining part of the feasible region is no longer adequately represented by the other points.
The last subproblem at \(i=N+1\) is different from (8) in that it is an overdetermined system, while all the other subproblems (8) are square. At \(i=N+1\) the algorithm skips the forward solve step (since there is no square block to eliminate), and performs a backsolvelike step: It solves (9) with \(J=\varnothing \), and with \(x_{ih:N}\) as starting points for \(y_{ih:N}\). (For the variables, the variable slices \(ih:i\) are truncated to \(ih:N\), see Sect. 1.2.)
The output of the proposed (main) algorithm, after finishing the last subproblem \(i=N+1\), is a point cloud approximating the solution set of (1). The implementation details of the algorithm will be discussed in Sect. 4. The algorithm of the present paper is a significant improvement over older algorithms discussed in [3, 4], both algorithmically and on the implementation level. The entire algorithm has been redesigned and rewritten from scratch, and in particular, the backsolve step is radically different. Our numerical results show several orders of magnitude improvements in speed, while achieving better robustness at the same time.
3 Exponential worstcase time complexity in the border width
Let N(s) denote the number of boxes intersecting the solution manifold when uniformly covering the bound constraint box by a grid of boxes of side \(s>0\). We call \(e:=\lim _{s\rightarrow 0}e(s)\), where \(e(s):=\log N(s)/\log s\), the effective dimension of the solution manifold. As a consequence, the size of a cloud with the property that every point on the manifold has distance at most s to a point of the cloud grows for small s proportionally to \(s^{e}\). In other words, constructing the point cloud will have a time complexity that is exponential in the effective dimension e. Creating the point cloud is therefore computationally tractable only for small effective dimensions e; how small depends on the resolution s needed, which fortunately is not high when (as usual) the total number of solutions of the original system is small, and the solutions are wellseparated. Thus a small effective dimension e is the main assumption under which our new method can operate efficiently.
If the Jacobian of (10) has full rank then (since d equations are “missing” from the square system) the solution manifold has dimension d, the ddimensional volume of the solution manifold is finite, and thus the effective dimension is \(e=d\). But in general, pathological cases might be possible, such as Peanolike curves (\(d=1\)) that come close to every point in the box and then have large e. Excluding such pathological cases, which do not arise in most applications of interest, the border width d agrees with the effective dimension e.
In engineering applications, the presence of important bounds further decreases the effective dimension of the manifold. For example, we have the natural nonnegativity bound on many variables. Each such bound will be active at many solutions, effectively amounting to an additional equation, typically decreasing the effective dimension by one. In addition, if the lower and upper bound on some variable differs by significantly less than the threshold s, this variable is effectively constant and also decreases the effective dimension. Such strong specifications are fairly common since the designer wants the system to perform something useful and therefore pushes the system to its limits (for example to create almost pure chemicals). We give numerical examples in Sects. 5 and 6 showing that the method is practical for certain difficult engineering applications.
To locate the solution manifold, i.e., to construct the approximating point cloud, we need to sample function values without knowing beforehand where the useful points lie that should go into the point cloud. To achieve this efficiently is the main reason why the bordered block triangular decomposition is needed. Indeed, we could sample the solution set of \(F_{1:N}(x_{0:N})=0\) within the bound constraints directly, without decomposition. However, the volume to be sampled then grows exponentially with the dimension \(p:=\dim {x_{0:i}}\), which gets larger and larger (ultimately \(p=n\)). This makes good sampling in this naive way prohibitively expensive. The proposed method avoids this scalability trap by sampling only at the square blocks along the diagonal (see Fig. 1): The volume to be sampled grows exponentially only with the largest block size, which is assumed to be reasonably small. In engineering applications, this assumption is usually satisfied since typically the largest block corresponds to the largest device/machine in the technical system being modeled.
4 Implementation details of the proposed algorithm
A highlevel overview of the algorithm was already given in Sect. 2. In this section we discuss the building blocks in more detail. These building blocks are mostly implementationlevel details, and there could be other ways to fillin these lowlevel details that the highlevel overview left open.
4.1 The source code of the algorithm
The most complete description of the algorithm is its source code, therefore the Python source code of the algorithm is available on GitHub [2] under the very permissive 3Clause BSD License. For convenience the source code is distilled down to its essence, and it is given in “Appendix A” as pseudocode too. Algorithm 1 of “Appendix A” is the core of the algorithm. We use the VA27 solver from HSL [29] to solve the equations and NLPs at each block. Since this solver cannot handle variable bounds, we enforce them with Algorithm 2. The backsolve step is given by Algorithm 3. The pseudocode is less than 50 lines in total.
4.2 The farthestfirst subsampling algorithm
The goal of the subsampling algorithm is to select a spatially welldistributed subset of a given scattered set of points S. A greedy heuristic is implemented, based on the socalled farthestfirst traversal. The algorithm starts by choosing a point in S. We currently pick the point closest to the mean of S; other choices are also possible, including the random choice. Then, points are selected onebyone, always picking that not yet chosen point next that is the farthest away from the already chosen ones, breaking ties arbitrarily. The subsampling algorithm stops when the desired sample size is reached.
4.3 Generating the new random points in the backsolve step
We refer back to Sect. 2, and to Fig. 2: After each forward solve we must insert new points into the sample where the manifold is not approximated properly. One way of populating such deserted areas would be inter and extrapolation; this would assume that the spatial distribution of the points is already appropriate for inter and extrapolation tasks, and assumes connectedness of the manifold. While this could be a viable approach, we chose a much simpler and more robust approach. Essentially we propose bruteforce oversampling at the block level: We try to insert significantly more \((\tilde{x}_i)_J\) points than what we need. We do not know where to insert them, so we generate them uniformly at random within the variable bounds (bruteforce). Then, the NLPs (9) of the backsolve step are solved, and those points whose objective (norm of the constraint violation) is above a userdefined threshold are discarded. Finally, we keep only the most distant ones of the remaining points by applying the subsampling algorithm.
This approach for populating deserted areas of the manifold is very robust, and fairly simple to implement. It does not assume connectedness, and it does not assume anything about the spatial distribution of the already existing points in the sample. In fact, if we loose all points in the forward solve, the backsolve may still succeed to insert new points, and the algorithm can continue. In contrast, it is impossible to inter and extrapolate if we have lost all our points. Since we cannot assume connectedness of the manifold, some sort of (blocklevel) global sampling is inevitable.
4.4 Efficient implementation of the backsolve step
A significant fraction of the execution time is spent in the backsolve step, solving (9). Three improvements proved to be crucial to perform the backsolve step efficiently: (i) trying only a small, carefully selected subset of all the possible combinations of the \(((\tilde{x}_i)_J,~x_{0:ih1})\) matches in (9) instead of trying all of them, (ii) estimating a good starting point for (9), and (iii) skipping those matches that are very likely to have abovethreshold objective value (constraint violation) at the optimum, and would most likely be discarded anyway.
As Sect. 2 is written, we try all the possible \(((\tilde{x}_i)_J,~x_{0:ih1})\) matches in a bruteforce manner. The previous implementation of the algorithm also worked [3] this way. Numerical evidence shows that it can be very wasteful: If two distinct points in the point cloud are close in their \(x_{ih:i}\) components, it is very likely that the \(((\tilde{x}_i)_J,~x_{0:ih1})\) matches will have very similar objective value in (9) too; there is little to no benefit in trying both of them. An optional heuristic that we propose is to apply the subsampling algorithm of Sect. 4.2 to the points of the point cloud, considering their \(x_{ih:i}\) components only. We then try to match the points \((\tilde{x}_i)_J\) with this selected subset only. This heuristic can be disabled at the user’s discretion.
The best match \(((\tilde{x}_i)_J,~x_{0:ih1})\) for each \((\tilde{x}_i)_J\) is always tried. For those matches for which the norm of \(\left\ F_{ih:i}(x_{0:ih1},y_{ih:i})\right\ \) at the starting point is below the predefined threshold (hyperparameter), we select at most \(m1\) additional candidate \(((\tilde{x}_i)_J\), \(x_{0:ih1})\) matches with subsampling. For each candidate match, we launch the local solver from the estimated \(y_{ih:i}\) to solve (9). The value of m is an arbitrary, useddefined value; in our numerical experiments \(m=20\) was used, and we did not attempt to tune this hyperparameter.
5 Numerical results: the effect of decomposition
We give numerical results where the computational gains, if any, are thanks to the block decomposition. The benchmark problems are coded in the AMPL modeling language [23], and are available on GitHub [2] together with the source code of the algorithm.
5.1 Series of test problems
The steadystate simulation of distillation columns can be a major numerical challenge [13]. Our example is a series of challenging distillation columns; these columns have 3 solutions, one of which is missed even with problemspecific methods, see Sect. 5.2. Distillation columns consist of socalled stages. The natural order of the stages directly yields the desired block structure (2) and (4) by virtue of the internal physical layout of distillation columns; no preprocessing is necessary. (Even if it was not the case, we could use any of the ordering algorithms referenced in Sect. 1.4 and 1.5 to create the block structure fully automatically.) There is a onetoone correspondence between the stages and the blocks.
In the engineering applications it is common to optimize the total cost by varying the number of stages, which makes distillation columns perfect test problems from the perspective of the present paper: Distillation columns have a natural parameter, namely the number of stages, for examining how different numerical methods scale as the number of blocks changes. As the number of blocks is varied (within reasonable limits) each column is interesting from an engineering point of view. Let N denote the number blocks. In our examples the size of each block is \(4\times 4\) except the first block which is \(2\times 2\); the problem size is 4N; the number of nonzeros is \(25 N  10\). The manifold dimension \(d=2\), and it is independent of N.
The model equations are the MESH equations: The component material balance (M), vaporliquid equilibrium (E), summation (S), and heat balance (H) equations are solved. The liquid phase activity coefficient is computed from the Wilson equations. The model and its parameters correspond to the Auto model [25], except for the number of stages N and the feed stage location \(N_F\). The specifications are the feed composition (methanol–methyl butyrate–toluene), the reflux ratio, and the vapor flow rate.
There are three steadystate branches: two stable steadystate branches and an unstable branch; this was experimentally verified in an industrial pilot column operated at finite reflux [16, 25]. Multiple steadystates can be predicted by analyzing columns with infinite reflux and infinite length [5, 26, 42]. These predictions for infinite columns have relevant implications for columns of finite length operated at finite reflux.
5.2 Numerical results published in the literature
The published numerical results for our test problem indicate numerical difficulties. Both the conventional insideout procedure [8] and the simultaneous correction procedure [38] were reported to miss the unstable steadystate solution, see Vadapalli and Seader [49] and Kannan et al. [31] (all input variables specified; output multiplicity). However, all steadystate branches were computed either with the AUTO software package [12] or with an appropriate continuation method [25, 31, 49]. In both cases, the initial estimates were carefully chosen with the \(\infty / \infty \) analysis [5, 26], and special attention was paid to the turning points and branch switching. Unfortunately, those papers do not include execution times, most likely because the computations involved human interactions too (initial estimates, turning points and branch switching).
5.3 The baseline for comparisons
As discussed in Sect. 5.2, the literature clearly indicates that our benchmark problems are challenging, unfortunately the execution times are not available for comparisons; we have to establish a baseline for comparisons.
5.3.1 Requirements for the baseline algorithm
 (1)
stateoftheart;
 (2)
able to enumerate all solutions of large, sparse systems;
 (3)
able to handle transcendental equations and bound constraints,
 (4)
usable from an advanced modeling language without userinput beyond equations and variable bounds;
 (5)
a generic algorithm not tailored to a specific class of problems;
 (6)
easy to use without any expert knowledge;
 (7)
publicly available as an offtheshelf solver.
5.3.2 Results with the baseline algorithm
Relative frequencies (percentages) of IPOPT finding a particular solution when starting points are generated uniformly at random between the variable bounds
N  Sol. 1  Sol. 2  Sol. 3  None 

50  82.3  17.0  0.7  0.0 
51  83.1  16.2  0.7  0.0 
52  84.0  15.3  0.7  0.0 
53  84.8  14.5  0.7  0.0 
54  85.6  13.7  0.6  0.0 
55  86.1  13.2  0.6  0.0 
56  86.7  12.6  0.6  0.0 
57  87.2  12.2  0.6  0.0 
58  87.6  11.7  0.5  0.1 
59  88.0  11.2  0.5  0.3 
60  88.3  10.6  0.5  0.5 
61  88.7  9.8  0.5  1.0 
62  89.2  9.0  0.4  1.4 
63  89.4  8.2  0.4  2.0 
64  89.6  7.3  0.4  2.7 
65  89.7  6.4  0.3  3.6 
66  89.8  5.6  0.3  4.3 
67  90.0  4.8  0.2  5.0 
68  90.1  4.1  0.2  5.6 
69  90.3  3.5  0.2  6.1 
70  90.2  3.0  0.1  6.6 
71  90.4  2.6  0.1  6.9 
72  90.5  2.2  0.1  7.2 
73  90.6  1.9  0.1  7.4 
74  90.6  1.7  0.1  7.7 
75  90.6  1.5  0.1  7.8 
5.4 Results with the proposed method
5.4.1 Illustrating the point cloud computed with the proposed method
5.4.2 Illustrating the point cloud with manifold learning
We also investigated the manifold structure using manifold learning. We tried each manifold learning algorithm available in Scikitlearn: Isomap [47], locally linear embedding (LLE) [44], modified locally linear embedding (MLLE) [53], Hessian Eigenmapping (also known as Hessianbased LLE or HLLE) [15], Spectral Embedding (Laplacian Eigenmaps) [6], local tangent space alignment (LTSA) [52], and multidimensional scaling (MDS) [7, 32, 33]. (Although the tdistributed Stochastic Neighbor Embedding [35, 50] algorithm, or tSNE, also proved to be robust, we did not use tSNE for the present paper: It was designed to artificially exaggerate structure in the data to reveal clusters, but that is undesirable in our case.)
It is not uncommon that the embeddings show false structures that in reality are not present in the data. Problemspecific knowledge was used to recognize any false structure in the embeddings as follows. We colored each point: The mole fractions of the 3 chemical components in the liquid phase on stage k of the distillation column are chosen as the coordinates in the RGB color space, where the stage index k is a parameter. In short, the color of the point corresponds to the chemical composition of the liquid phase on stage k. The theory of distillation – in particular the socalled residue curve map [14] of the mixture – tells us that for a fixed k we should see a smooth color transition in the embeddings, similar to the smooth shade transition in Fig. 6. Furthermore, the coloring of the points in the embeddings should change only smoothly when k is increased or decreased by 1.
We inspected these color transitions for each algorithm offered by Scikitlearn. If the manifold learning algorithm creates a wrong embedding or false structure, it is obvious at first glance. In our numerical experience, among the manifold learning algorithms implemented in Scikitlearn, only multidimensional scaling was robust enough to consistently produce correct embeddings without any hyperparameter tuning. A possible explanation for its robustness could be that it randomly chooses the initial configuration; the other embedding techniques that we listed are based on a nearestneighbor search which can be fooled if the points happen to have unfortunate distribution in the original highdimensional space. The downside of multidimensional scaling is that it was by far the most computationally expensive manifold learning algorithm of all tried. Note, however, that multidimensional scaling is not part of the proposed method; it is used only for visualization.
5.4.3 Running a local solver from the output of the proposed algorithm
The subsampling algorithm of Sect. 4.2 selects the points in a specific order; the subsampling procedure can be used to order the points in any set S. This order is the socalled greedy permutation or the farthestfirst traversal. When the main algorithm finishes, we propose that a local solver for largescale, sparse problems (like IPOPT) is launched from the points of the final point cloud in this order. The numerical experiments suggest that this increases the likelihood of finding all solutions early, because we always try that point next that is the least similar to the already tried ones. As it is shown in Fig. 8, the first 3 points picked by the farthestfirst heuristic suffice to find all solutions in this case. Note that in Sect. 5.3 the probability of finding the third solution was \(0.5\%\) for starting points generated uniformly at random between the variable bounds; see in Table 1, row \(N=60\).
Numerical experiments also show that the final constraint violations are nondistinctive with respect to the goodness of the starting points: Below a certain threshold, the constraint violations are due to the random perturbations applied in the backsolve step, and they do not convey any information regarding the goodness of the starting points. In other words, the constraint violation is not a good metric for ordering the final starting points; we propose the farthestfirst traversal instead.
5.5 Comparisons: The effect of decomposition
The effect of the decomposition (2)–(5) can be studied by requesting all solutions for a given column length, and comparing the execution times of the proposed method with the baseline multistart algorithm (no decomposition). As we discussed in Sect. 5.3, if the starting points are generated uniformly at random within the variable bounds, the computational efforts grow exponentially for \(N\ge 64\). For the proposed method, the computational efforts grow linearly, thanks to the decomposition. It depends on the problem size (column length) and on the hyperparameter settings whether the decomposition, and the more sophisticated algorithm pays off; see the left column of Fig. 9, comparing the execution times.
6 Numerical results: reusing shared substructure
A frequent task in engineering is to solve a series of related square systems \(F^{\ell }(x)=0\), where the number \(N_\ell \) of blocks of the \(\ell \)th problem and hence the Jacobian varies, but the equations in the first \(B_\ell \) blocks of \(F^{\ell }\) and \(F^{\ell +1}\) are identical; the remainder may deviate arbitrarily. If \(B_\ell \) is close to \(N_\ell \), the major part of the point cloud can be reused without any change.
7 Future work

the reflux ratio, that is, provide one more equation, making the system square;

or the reflux molar flow rate, making the system square;

or append an objective function, and look for the cost optimal reflux ratio.
Notes
Acknowledgements
Open access funding provided by Austrian Science Fund (FWF).
Supplementary material
References
 1.Aspen Technology, Inc.: Aspen Simulation Workbook, Version Number: V7.1. Burlington, MA, USA. EO and SM Variables and Synchronization, p. 110 (2009)Google Scholar
 2.Baharev, A.: ManiSolve: A manifoldbased approach to solve systems of equations (2019). URL https://github.com/baharev/ManiSolve
 3.Baharev, A., Domes, F., Neumaier, A.: A robust approach for finding all wellseparated solutions of sparse systems of nonlinear equations. Numer. Algorithms 76, 163–189 (2017). https://doi.org/10.1007/s110750160249x. ISSN 15729265MathSciNetCrossRefzbMATHGoogle Scholar
 4.Baharev, A., Neumaier, A.: A globally convergent method for finding all steadystate solutions of distillation columns. AIChE J. 60, 410–414 (2014)CrossRefGoogle Scholar
 5.Bekiaris, N., Meski, G.A., Radu, C.M., Morari, M.: Multiple steady states in homogeneous azeotropic distillation. Ind. Eng. Chem. Res. 32, 2023–2038 (1993)CrossRefGoogle Scholar
 6.Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003). https://doi.org/10.1162/089976603321780317 CrossRefzbMATHGoogle Scholar
 7.Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling. Springer, New York (2005)zbMATHGoogle Scholar
 8.Boston, J.F., Sullivan, S.L.: A new class of solution methods for multicomponent, multistage separation processes. Can. J. Chem. Eng. 52, 52–63 (1974)CrossRefGoogle Scholar
 9.Bublitz, S., Esche, E., Tolksdorf, G., Mehrmann, V., Repke, J.U.: Analysis and decomposition for improved convergence of nonlinear process models in chemical engineering. Chem. Ing. Tech. 89(11), 1503–1514 (2017)CrossRefGoogle Scholar
 10.Dassault Systèmes AB. Dymola—Dynamic Modeling Laboratory. User Manual, vol. 2., Ch. 8. Advanced Modelica Support (2014)Google Scholar
 11.Davis, T.A.: Direct methods for sparse linear systems. In: Higham, N.J. (ed.) Fundamentals of Algorithms. SIAM, Philadelphia (2006)Google Scholar
 12.Doedel, E.J., Wang, X.J., Fairgrieve, T.F.: AUTO94: Software for continuation and bifurcation problems in ordinary differential equations. Technical Report CRPC951, Center for Research on Parallel Computing, California Institute of Technology, Pasadena, CA (1995)Google Scholar
 13.Doherty, M.F., Fidkowski, Z.T., Malone, M.F., Taylor, R.: Perry’s Chemical Engineers’ Handbook, Chapter 13, 8th edn, p. 33. McGrawHill Professional, New York (2008)Google Scholar
 14.Doherty, M.F., Fidkowski, Z.T., Malone, M.F., Taylor, R.: Perry’s Chemical Engineers’ Handbook, Chapter 13, 8th edn, p. 69. McGrawHill Professional, New York (2008)Google Scholar
 15.Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for highdimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003) ISSN 00278424. URL https://www.pnas.org/content/100/10/5591
 16.Dorn, C., Güttinger, T.E., Wells, G.J., Morari, M.: Stabilization of an unstable distillation column. Ind. Eng. Chem. Res. 37, 506–515 (1998)CrossRefGoogle Scholar
 17.Duff, I.S., Erisman, A.M., Reid, J.K.: Direct Methods for Sparse Matrices. Clarendon Press, Oxford (1986)zbMATHGoogle Scholar
 18.Dulmage, A.L., Mendelsohn, N.S.: Coverings of bipartite graphs. Can. J. Math. 10, 517–534 (1958)MathSciNetCrossRefzbMATHGoogle Scholar
 19.Dulmage, A.L., Mendelsohn, N.S.: A structure theory of bipartite graphs of finite exterior dimension. Trans. R. Soc. Can. Sec. 3(53), 1–13 (1959)zbMATHGoogle Scholar
 20.Dulmage, A.L., Mendelsohn, N.S.: Two algorithms for bipartite graphs. J. Soc. Ind. Appl. Math. 11, 183–194 (1963)MathSciNetCrossRefzbMATHGoogle Scholar
 21.Erisman, A.M., Grimes, R.G., Lewis, J.G., Poole, W.G.J.: A structurally stable modification of Hellerman–Rarick’s \(P^4\) algorithm for reordering unsymmetric sparse matrices. SIAM J. Numer. Anal. 22, 369–385 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
 22.Fletcher, R., Hall, J.A.J.: Ordering algorithms for irreducible sparse linear systems. Ann. Oper. Res. 43, 15–32 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
 23.Fourer, R., Gay, D.M., Kernighan, B.W.: AMPL: A Modeling Language for Mathematical Programming. Brooks/Cole, Belmont (2003)zbMATHGoogle Scholar
 24.Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)zbMATHGoogle Scholar
 25.Güttinger, T.E., Dorn, C., Morari, M.: Experimental study of multiple steady states in homogeneous azeotropic distillation. Ind. Eng. Chem. Res. 36, 794–802 (1997)CrossRefGoogle Scholar
 26.Güttinger, T.E., Morari, M.: Comments on “multiple steady states in homogeneous azeotropic distillation”. Ind. Eng. Chem. Res. 35, 2816–2816 (1996)CrossRefGoogle Scholar
 27.Hellerman, E., Rarick, D.C.: Reinversion with preassigned pivot procedure. Math. Program. 1, 195–216 (1971)MathSciNetCrossRefzbMATHGoogle Scholar
 28.Hellerman, E., Rarick, D.C.: The partitioned preassigned pivot procedure (\(P^4\)). In: Rose, D.J., Willoughby, R.A. (eds.) Sparse Matrices and their Applications. The IBM Research Symposia Series, pp. 67–76. Springer, New York (1972)CrossRefGoogle Scholar
 29.HSL: A collection of Fortran codes for large scale scientific computation (2016). URL http://www.hsl.rl.ac.uk
 30.Johnson, D.M., Dulmage, A.L., Mendelsohn, N.S.: Connectivity and reducibility of graphs. Can. J. Math. 14, 529–539 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
 31.Kannan, A., Joshi, M.R., Reddy, G.R., Shah, D.M.: Multiplesteadystates identification in homogeneous azeotropic distillation using a process simulator. Ind. Eng. Chem. Res. 44, 4386–4399 (2005)CrossRefGoogle Scholar
 32.Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964a). https://doi.org/10.1007/BF02289565. (ISSN 18600980)MathSciNetCrossRefzbMATHGoogle Scholar
 33.Kruskal, J.B.: Nonmetric multidimensional scaling: a numerical method. Psychometrika 29(2), 115–129 (1964b). https://doi.org/10.1007/BF02289694. (ISSN 18600980)MathSciNetCrossRefzbMATHGoogle Scholar
 34.Lewis, W.K., Matheson, G.L.: Studies in distillation. Ind. Eng. Chem. 24, 494–498 (1932)CrossRefGoogle Scholar
 35.Maaten, L.V.D., Hinton, G.: Visualizing data using tSNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)zbMATHGoogle Scholar
 36.Modelica: Modelica and the modelica association. https://www.modelica.org/, 2018. [Online; Accessed 14 Oct 2018]
 37.Modelon, A.B.: JModelica.org User Guide, version 2.2. https://jmodelica.org/downloads/UsersGuide.pdf, 2018. [Online; Accessed 14 Oct 2018]
 38.Naphthali, L.M., Sandholm, D.P.: Multicomponent separation calculations by linearization. AIChE J. 17, 148–153 (1971)CrossRefGoogle Scholar
 39.Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)zbMATHGoogle Scholar
 40.OpenModelica: Openmodelica user’s guide. https://openmodelica.org/doc/OpenModelicaUsersGuide/latest/omchelptext.html, 2018. [Online; Accessed 14 Oct 2018]
 41.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikitlearn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
 42.Petlyuk, F.B.: Distillation Theory and Its Application to Optimal Design of Separation Units. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
 43.Pothen, A., Fan, C.J.: Computing the block triangular form of a sparse matrix. ACM Trans. Math. Softw. 16, 303–324 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
 44.Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000). (ISSN 00368075)CrossRefGoogle Scholar
 45.Stadtherr, M.A., Wood, E.S.: Sparse matrix methods for equationbased chemical process flowsheetingI: reordering phase. Comput. Chem. Eng. 8(1), 9–18 (1984)CrossRefGoogle Scholar
 46.Stadtherr, M.A., Wood, E.S.: Sparse matrix methods for equationbased chemical process flowsheetingII: numerical Phase. Comput. Chem. Eng. 8(1), 19–33 (1984)CrossRefGoogle Scholar
 47.Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000). (ISSN 00368075)CrossRefGoogle Scholar
 48.Thiele, E., Geddes, R.: Computation of distillation apparatus for hydrocarbon mixtures. Ind. Eng. Chem. 25, 289–295 (1933)CrossRefGoogle Scholar
 49.Vadapalli, A., Seader, J.D.: A generalized framework for computing bifurcation diagrams using process simulation programs. Comput. Chem. Eng. 25, 445–464 (2001)CrossRefGoogle Scholar
 50.Van Der Maaten, L.: Accelerating tSNE using treebased algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)MathSciNetzbMATHGoogle Scholar
 51.Wächter, A., Biegler, L.T.: On the implementation of an interiorpoint filter linesearch algorithm for largescale nonlinear programming. Math. Program. 106, 25–57 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 52.Zhang, Z.Y., Zha, H.Y.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. J. Shanghai Univ. (English Edition) 8(4), 406–424 (2004). https://doi.org/10.1007/s1174100400511. (ISSN 1863236X)MathSciNetCrossRefzbMATHGoogle Scholar
 53.Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 1593–1600. MIT Press, Cambridge (2007)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.