Balancing and reconciling large multiregional input–output databases using parallel optimisation and highperformance computing
 203 Downloads
Abstract
Over the past decade, largescale multiregional input–output (MRIO) tables have advanced the knowledge about pressing global issues. At the same time, the data reconciliation strategies required to construct such MRIOs have vastly increased in complexity: largescale MRIOs are more detailed and hence require large amounts of different source data which are in return often varying in quality and reliability, overlapping, and, as a result, conflicting. Current MRIO reconciliation approaches—mainly RAStype algorithms—cannot fully address this complexity adequately, since they are either tailored to handle certain classes of constraints only, or their mathematical foundations are currently unknown. Leastsquarestype approaches have been identified to offer a robust mathematical framework, but the added complexity in terms of numerical handling and computing requirements has so far prevented the use of these methods for MRIO reconciliation tasks. We present a new algorithm (ACIM) based on a weighted leastsquares approach. ACIM is able to reconcile MRIO databases of equal or greater size than then currently largest global MRIO databases. ACIM can address arbitrary linear constraints and consider lower and upper bounds as well as reliability information for each given data point. ACIM is based on judicious data preprocessing, stateofthe art quadratic optimisation, and highperformance computing in combination with parallel programming. ACIM addresses all shortcomings of RAStype MRIO reconciliation approaches. ACIM’s was tested on the Eora model, and it was able to demonstrate improved runtimes and source data adherences.
Keywords
Input–output analysis Constrained optimisation Matrix balancing Leastsquares approach RAS method1 Introduction
Restricting the constraints set to linear constraints does not pose a problem in MRIO reconciliation, as most conditions that are posed on the MRIO table elements by superior data (such as balancing, adhered to sector totals, matching an aggregated table) can be interpreted as linear constraints.
The formulation given in Eq. (1) also includes reliability data for \({\varvec{\tau }}^0\) and \({\mathbf {c}}\), given in the vectors \({\varvec{\sigma }}_{{\varvec{\tau }}^0}\), and \({\varvec{\sigma }}_{\mathbf {c}}\). We are following the interpretation of the reliability data as it has been proposed in previous publications [see, for example, Lenzen et al. (2010)]. That is, each element of an MRIO table is interpreted as the expected value of a normally distributed random variable, with the standard deviation being a measure for the reliability of each value. Hence, each element in \({{\varvec{\tau }}^0}\) is accompanied by a value for its standard deviation, and these values are pooled in the vector \({\varvec{\sigma }}_{{\varvec{\tau }}^0}\).
The objective function f typically measures the deviation between the original unbalanced and the balanced estimate.
From a mathematical viewpoint, it is usually the problem that defines which objective function should be used to measure this deviation. For example, in order to minimise the travelled distance between two arbitrary points on a plane, a standard Pythagorean distance measure is applied. If the ways on which one can travel across the plane are restricted, for example by a road, then the Pythagorean measure does not suit the problem anymore, and the distance between two points must be measured so that the possible pathways between the two points of interest are considered correctly. After all, one cannot walk straight through a building! Only once the correct objective function is found, an appropriate algorithm to find the shortest way can be chosen.
This pathway towards finding the appropriate objective function for compiling MRIOs has been somewhat reversed: suitable algorithms were developed first, and their mathematical understanding grew in subsequent years.
Over the decades, two mathematical approaches prevailed for the task of data reconciliation during MRIO compilation. These can be broadly classified as crossentropy methods and quadratic programming (QP) methods. In this paper, we are proposing a new algorithm for a leastsquares approach, which falls into the category of QP methods.
The idea of using a leastsquares objective function was first proposed in Stone et al. (1942). It was further developed into a fully quadratic model by Byron (1978, 1996) and van der Ploeg (1982, 1984, 1988), who also introduced error terms to control uncertainties [see also Harrigan and Buchanan (1984), Morrison and Thumann (1980)]. In order to motivate this function, van der Ploeg distinguishes between socalled soft and hard constraints in the matrix \({\mathbf {M}}\) and the vector \({\mathbf {c}}\).
In the following, \({\mathbf {m}}^T\) describes the transposed of the vector \({\mathbf {m}}\), where the different \({\mathbf {m}}^T_i\) are the rows of \({\mathbf {M}}\). Each row in \({\mathbf {m}}^T_i\) in \({\mathbf {M}}\) and its corresponding value \(c_i\) in \({\mathbf {c}}\) represent one constraint. As a preparatory measure, the aim is now to identify for each row of \({\mathbf {M}}\) if it presents a socalled hard or soft constraint. The index s is used as the counter for the soft constraints, and the index h is the counter for the hard constraints.
A row \({\mathbf {m}}^T_{s}\) of \({\mathbf {M}}\) and the corresponding value \({\mathbf {c}}_{s}\) of \({\mathbf {c}}\) are considered a soft constraints, if for the corresponding standard deviation \({\varvec{\sigma }}_{\mathbf {c},s}\ne 0\) holds (standard deviation for the superior data point is not zero). Consequently, \({\mathbf {m}}^T_{h}\) and \({\mathbf {c}}_{h}\) form a hard constraint if \({\varvec{\sigma }}_{\mathbf {c}},h= 0\). Plainly speaking, a soft constraint is a constraint that can remain unfulfilled (within the limits defined by the standard deviation), whereas hard constraints must be adhered to perfectly.
Hard constraints are typically the socalled balancing constraints, which ensure that the final MRIO adheres to the condition that the total output of each sector must equals the sector’s inputs. Other data, such as national statistical data or international trade data, do not have to be adhered to perfectly, and this is represented by the positive values in \({\varvec{\sigma }}_{\mathbf {c},s}\ne 0\), which is the condition for a soft constraint. For example, the GDP for a country is never perfectly reliable, and a deviation by a few per cent does not render the final table as faulty. However, official statistics data should of course be adhered to as much as possible.
The soft and hard constraints can then be pooled in different matrices, splitting the original matrix \({\mathbf {M}}\) into separate matrices. Let \({\mathbf {M}}_{\text {soft}}\) only contain soft constraints and \({\mathbf {M}}_{\text {hard}}\) only contain hard constraints. The vector \({\mathbf {c}}\) is split into \({\mathbf {c}}_{\text {soft}}\) and \({\mathbf {c}}_{\text {hard}}\) accordingly.
Here, the hat operator \({\varvec{{\hat{\sigma }}}}_{{\mathbf {p}}^0}\) depicts the diagonalisation of the vector \({\varvec{\sigma }}_{{\mathbf {p}}^0}\). It should be pointed out that van der Ploeg’s formulation elegantly reinterprets the reliability values for the constraint values in \({\mathbf {c}}_\text {soft}\) as inverted reliability values for the error terms \(\varepsilon\). Note also that the error terms are unbounded, that is \({\varvec{l}}\in ({\mathbb {R}}\cup \{\infty \})^N\) and \(u\in ({\mathbb {R}}\cup \{+\infty \})^N\). Note further that due to its construction the values in \({\varvec{\sigma }}_{{\mathbf {p}}^0}\) cannot be 0, and an inversion and squaring of the values as it is carried out in \({\varvec{{\hat{\sigma }}}}_{{\mathbf {p}}^0}^{2}\) does not pose a numerical problem.
This reconciliation problem fulfils the form given in Eq. (1), since the reliability data for both the MRIO elements and the superior data are considered.
2 RAStype crossentropy methods and the quadratic programming approach
Thus far, the discussion around the compiling MRIO matrices identified two different types of algorithms that were considered for this task: quadratic programming algorithms and RAStype methods, which are specialised crossentropy methods. RAStype methods have so far enjoyed more attention, but there are fundamental differences between the development of the algorithm presented in this paper and the algorithms summarised in the family of RAStype methods.
Stone (1961) presented RAS as a calculation method for one task only: balancing a matrix according to given row and column totals. It took another 9 years before Bacharach (1970) discovered that RAS is in fact an optimisation method that solves a version of the problem given in Eq. (1). The original RAS method is limited to balancing constraints only, it can only handle positive elements in the matrix (in our case: the MRIO), and it can neither consider reliability information nor boundary conditions. To cater for various needs such as these, the algorithm has been modified and extended numerous times. For example, Gilchrist and Louis (1999) presented a threestage RAS (TRAS) aimed at allowing the consideration of partially aggregated data, Junius and Oosterhaven (2003) presented a generalised RAS (GRAS) that allowed positive and negative elements within the MRIO. Mínguez et al. (2009) proposed the CRAS algorithm which takes into account additional information when a time series of MRIO tables is available. The most recent extension of RAS was presented by Lenzen et al. (2010). Their Konfliktfreies RAS (KRAS) allows the use of arbitrary constraints and the use of reliability data for both the MRIO elements and the superior data. KRAS is the first RAStype method capable of solving the problem given in Eq. (1). Lahr and de Mesnard (2004) and Lenzen et al. (2010) gave detailed overviews of the different RAS variants.
The development of RAS and its extensions was usually driven by the need of solving the reconciliation problem for a certain class of constraints. The mathematical interpretation of the method usually followed after the development of the method itself. Although this incremental development approach has enabled steady progress in MRIO compilation capabilities, the process of analysing the RAS modifications can be tedious. Such rigorous mathematical analyses allow for rational comparison of algorithms and informed reconciliation methodologies. In the case of the most recent RAS development—KRAS—the mathematical interpretation has not been carried out yet and the objective function of the algorithm remains unknown.
In contrast, QPbased algorithms provide a robust mathematical base. However, a generalpurpose, singlestep QP algorithm for MRIO compilation is currently not available.
A number of largescale global MRIO databases such as Eora, EXIOBASE, WIOD, and GTAP [see Tukker and Dietzenbacher (2013)] were developed over recent years. These databases require large amounts of input data from varying sources. This inevitably leads to conflicting information during the data processing stage. Due to the amount of data to be considered during the reconciliation and the—compared to RAStype approaches—relatively high complexity of van der Ploeg’s QP approach, the developers of these databases have so far employed QPtype approaches—if at all—only for subproblems during the data reconciliation stage. For example, EXIOBASE, EXIOBASE2, and EXIOBASE3 were compiled using multistep reconciliation processes. Some of the intermediate steps in this processes were completed using QPtype algorithms, with the final global balancing step being carried using a RAStype algorithm (Wood et al. 2014, 2015; Stadler et al. 2018).
The mathematical effects of mixing both RAStype and QPtype reconciliation techniques in a multistep process have so not been examined. To the authors’ knowledge, only the Eora database is reconciled using a reconciliation method that considers all available input data in a singlestep reconciliation process (namely the KRAS method). While this removes any potential effects of using a multistep process on the data quality, a mathematical analysis has not been carried out for KRAS.
Thanks to its rigorous mathematical properties, the quadratic approach proposed by van der Ploeg (1982, 1984, 1988), Harrigan and Buchanan (1984) and Morrison and Thumann (1980) has attracted some attention. Canning and Wang (2005) noted its flexibility and use it to construct an IRIO model. However, the lack of efficient and easytouse methods and algorithms for the use of this approach with larger MRIO tables and their corresponding dimensionality has prevented a more widespread adoption. But, as mentioned above, partial uptake of QPtype methods has occurred. In general, the dimension of practical reconciliation problems such as the construction of the global MRIO databases mentioned above is too large for general, singlestep quadratic optimisation algorithms.

based on van der Ploeg’s approach and is hence based on a robust mathematical theory,

is able to consider arbitrary linear constraints during the reconciliation of MRIO databases, and

uses mathematical and computational parallelisation in order to allow for it to be applied to largescale MRIO databases of equal or greater size than the currently biggest MRIO databases in existence.
The algorithm is motivated by starting from a mathematical formulation of the problem of reconciling an MRIO subject to a set of arbitrary constraints. Exploiting the structure of this formulation (in particular by interpreting the constraints geometrically), the authors present an algorithm that solves the given problem. In this paper, we will exploit the structure of the system to develop such an algorithm and apply it to reconcile one the global MRIO databases: the Eora model (Lenzen et al. 2012, 2013).
The remainder of this paper is organised as follows: Sect. 3 states the requirements for a leastsquares algorithm designed to reconcile large MRIO databases, the ACIM algorithm is presented in Sect. 4, the implementation is described in Sects. 5 and 6 contains conclusions.
3 Requirements for a leastsquares algorithm to reconcile large MRIO databases
System (4) is split into two subsystems: the soft constraints matrix is very large, but very sparse (a typical row may contain only a few nonzero values), while the hard constraints matrix represents balancing constraints, which are present more structure (though also denser). In this paper, we propose to use a splitting algorithm. Such algorithms allow one to solve subsystems independently from each other and use these solutions to construct a new iterate. Using this approach, we will then develop strategies for each of the soft and hard constraints. The idea of using a splitting algorithm for matrix reconciliation problem was suggested, for example, by Nagurney and Eydeland (1992).
A leastsquares algorithm designed to solve the problem given in Eq. (4) must be able to handle large amounts of data, and to provide an accurate solution within an acceptable time frame. In the case of Eora, the vector \({\varvec{\tau }}\) has around \(10^9\) entries and the matrix \({\mathbf {G}}\) has around \(10^6\) rows. Together, the complete set of input data for the problem in Eq. (4) occupy more than 80 GB of space. In order to accommodate for this amount of data, we design an algorithm to be run in parallel on a HPC cluster. This will enable the simultaneous reconciliation of subproblems and hence reduce runtime. Splitting methods are particularly well suited to a HPC environment, where each subsystem can be solved in parallel. At the same time, the algorithm features a number of mathematical acceleration measures that ensure faster convergence. In order to achieve this, certain geometric properties of the individual rows of \({\mathbf {G}}\) are considered during the calculation process.
Our findings show that given the size and complexity of the reconciliation task at hand, both the computational parallelisation and the mathematical accelerations must be applied when designing the ACIM algorithm.
Definition 1
4 The ACIM algorithm
 1.
Prepare the problem for the computational parallelisation by splitting \({\mathbf {G}}\) and \({\mathbf {c}}\) into blocks (or subproblems) that will later be processed in parallel, and
 2.
Design each subproblem so that it can be solved efficiently
In order to decide on how we split \({\mathbf {G}}\) and \({\mathbf {c}}\), we must understand which algorithms we will use to construct ACIM. By understanding the strengths of the different algorithms, we can then design a blocking concept for \({\mathbf {G}}\) and \({\mathbf {c}}\) such that the resulting subproblems are tailored to exploit the strengths of ACIM and it components.
ACIM is a nested algorithm, consisting of three individual algorithms, paired with mathematical acceleration measures. We will first present the different algorithms that make up ACIM, and then present the acceleration measures, and finally, using the lessons learned from the different algorithms, we will present the blocking approach.
Before we start looking at the individual algorithms, it is worthwhile looking at the geometric interpretation of the constraints sets
4.1 Geometric interpretation of the constraints set as hyperplanes
Lemma 1
Proof
Available in any standard literature on linear algebra, for example Fischer (2000). \(\square\)
Remark 1
Lemma 1 also holds if \({\mathbf {p}}\) is already part of the hyperplane \(H_{{\mathbf {g}},c}\). In this case, \({\mathbf {p}}\) and its projection \({\mathbf {p}}_p\) are identical, i.e. \({\mathbf {p}}={\mathbf {p}}_p\).
Remark 2
4.2 The components of ACIM
ACIM follows the concept of Cimmino’s algorithm (see Fig. 1), which will be presented first.
Algorithm 1
 0.
Initialise \(k=0\)
 1.
Project \({\mathbf {p}}^k\) onto each hyperplane: \({\mathbf {p}}^{k}_{p,m}={\mathbf {p}}^k+(c_m{\mathbf {g}}_m^T {\mathbf {p}}^k){\mathbf {g}}_m\).
\({\mathbf {p}}^{k}_{p,m}\) is the corresponding projection of \({\mathbf {p}}^k\) onto the hyperplane defined by the mth row \({\mathbf {g}}_m^T\) of matrix \({{\mathbf {G}}}\), \(m=1,\ldots ,M\).
 2.
Compute the barycentre of all projections: \({\mathbf {p}}_b^{k}=\sum _{m=1}^M w_m {\mathbf {p}}^k_{p,m}\).
 3.
Compute the new iterate: \({\mathbf {p}}^{k+1}={\mathbf {p}}^k+\alpha ({\mathbf {p}}_b^{k}{\mathbf {p}}^k)\).
 4.
If \({\mathbf {G}}{\mathbf {p}}^{k+1}{\mathbf {c}}<\varepsilon\), a solution has been found. Set \({\mathbf {p}}^{*}={\mathbf {p}}^{k+1}\) and terminate the algorithm.
If \({\mathbf {G}}{\mathbf {p}}^{k+1}\ne {\mathbf {c}}\) set \(k=k+1\) and return to Step 1.
It is shown by Scolnik et al. (2000) that the algorithm converges (although it is not actually shown in Scolnik et al. (2000), it is easy to verify that the algorithm terminates after at most \(m\) iterations in a typical conjugate direction manner. However, here we apply it as an iterative algorithm) (Fig. 1).
The main step in this algorithm is the projection step 1, and our aim is to make this projection step as efficient as possible. In order to achieve this, we will attempt to split \({\mathbf {G}}\) into blocks such that the projection onto each of these blocks is robust and fast. This block projection will be carried out by Kaczmarz’s algorithm (Fig. 2).
Algorithm 2
 Step 1If \({\mathbf {G}} {\mathbf {p}}^{k+1}={\mathbf {c}}\), terminate the algorithm with the solution \({\mathbf {p}}^{k+1}\).$$\begin{aligned} {\mathbf {p}}^{k+1}={\mathbf {p}}^{k}+\frac{c_{m}\langle {\mathbf {g}}_{m},{\mathbf {p}}^k \rangle }{\Vert {\mathbf {g}}_{m}\Vert ^2}{\mathbf {g}}_{m}. \end{aligned}$$(8)If \({\mathbf {G}} {\mathbf {p}}^{k+1}\ne {\mathbf {c}}\), set \(k=k+1\) andand return to Step 1. Note that M is the number of constraints.

Step 1.1 Set \(m=m+1\) if \(m<M\) or

Step 1.2 Set \(m=1\) if \(m=M\),

Kaczmarz’s algorithm uses elements that were already present in Cimmino’s algorithm: orthogonal projections onto hyperplanes. But Kaczmarz’s algorithm only projects onto one hyperplane per iteration. Hence, if the hyperplanes were all orthogonal to one another, individual projections carried out in one iteration would not be voided by subsequent projections. The projection onto the next hyperplane would occur on the hyperplanes that previous iterations projected onto. Hence, Kaczmarz’s algorithm can find a solution for a linear system very efficiently if the hyperplanes of the problem are orthogonal to one another. The blocking algorithm will be based on the approach to build individual blocks that only contain hyperplanes that fulfil this condition.
Algorithm 3
 Step 1
 For each element \(p_n^k,l_n,u_n\), \(n=1,\dots ,N\) of the vectors \({\mathbf {p}}^k,\mathbf {l},{\mathbf {u}}\) define$$\begin{aligned} {\hat{p}}_n^{k+1}= {\left\{ \begin{array}{ll} l_n &\quad \hbox { if}\ p_n^k\le l_n\\ p_n^k &\quad \hbox { if}\ l_n\le p_n^k \le u_n\\ u_n &\quad \hbox { if}\ p_n^k\ge u_n \end{array}\right. }. \end{aligned}$$(9)
 Step 2

Set \(p^{k+1} = p^k + ({\hat{p}}^{k+1}  {\hat{p}}^k)\)
4.3 Blocking the problem in subproblems
As described before, the blocking algorithm is aimed at splitting the system of linear constraints given by \({\mathbf {G}}\) and \({\mathbf {c}}\) into blocks of hyperplanes that are orthogonal to one another. The aim is to use Kaczmarz’s algorithm on each of these blocks to obtain an exact solution for each of the subproblems within a very short time. The constraints matrix \({\mathbf {G}}\) features the socalled balancing constraints, which are represented by densely populated, nearorthogonal rows in \({\mathbf {G}}\), and other constraints which are sparsely populated. Additionally, the number of other constraints usually greatly outnumbers the balancing constraints. Since the balancing constraints present a nearorthogonal constraints set, they will be bundled in a single block \({\mathbf {G}}_1\).
 1.
Test the current row \({\mathbf {g}}_m\) against all rows that have previously been assigned to the B blocks \({\mathbf {G}}_b\), \(b1,\ldots ,B\). If a block is found in which \({\mathbf {g}}_m\) is orthogonal to all previously assigned rows, add \({\mathbf {g}}_m\) to this block.
 2.
If in each existing block \({\mathbf {G}}_b\) there is at least one row that is not orthogonal to \({\mathbf {g}}_m\), then open a new block \({\mathbf {G}}_{B+1}\) and assign \({\mathbf {g}}_m\) to it as its first row.
The condition to detect orthogonality is: two rows of \({\mathbf {G}}\) are considered orthogonal if they do not share any nonzero coefficients in the same columns. This condition is stricter than the general concept of orthogonality, but it is numerically easy to achieve and proved sufficient for this application. This procedure will generate a set of orthogonal blocks \({\mathbf {G}}_b\) on which Kaczmarz’s algorithm can be applied to find an exact solution in a finite number of steps.
The aim of blocking the matrix \({\mathbf {G}}\) is to reduce the computational efforts and storage requirements when implementing the final algorithm.
Definition 2
(Block Projection) The orthogonal projection of a vector \(\mathbf {p}\) onto the subspace of hyperplanes defined by the rows of a block \({\mathbf {G}}_b\) is called a block projection. A block projection will be referred to as \(\mathbf {p}_{p,b}\) (unlike the projection over a onto a single hyperplane, which was defined as \(\mathbf {p}_p\) in Cimmino’s algorithm).
For a block projection \(\mathbf {p}_b\), the equation \({\mathbf {G}}_b\mathbf {p}_{p,b}={\mathbf {c}}_b\) holds. Hence, \(\mathbf {p}_{p,b}\) lies within the intersection of all hyperplanes defined by \({\mathbf {G}}_b\) and \({\mathbf {c}}_b\).
Remark 3
Kaczmarz’s Algorithm 2 delivers this orthogonal projection for each orthogonal blocks \({\mathbf {G}}_b\), \(b=2,\ldots ,B\) in one iteration. Note that \({\mathbf {G}}_1\) is not generally orthogonal as it contains the balancing constraints.
4.4 Accelerating Cimmino’s algorithm
Theorem 1
Proof
See Scolnik et al. (2000). \(\square\)
4.5 The accelerated Cimmino algorithm (ACIM)
Algorithm 4
(ACIM: Main Algorithm for Parallel MRIO Reconciliation) Assume that \(\Vert {\mathbf {g}}_m\Vert =1\) for all \(m=1,\ldots ,M\).
 1.
Block the matrix \({\mathbf {G}}\) into the balancing block \({\mathbf {G}}_1\) and orthogonal blocks \({\mathbf {G}}_b\), \(b=2,\ldots ,B\). Define the first iterate (this is the vectorization of the raw data table).
 2.
Apply Hildreth’s Algorithm to the iterate to obtain a vector that is located within the given boundaries.
 3.
Perform one step of Cimmino’s Algorithm by projection the current iterate onto the B blocks. Use Kaczmarz’s Algorithm to calculate the individual blocks projections.
 4.
Calculate the barycentre from the B different block projections and calculate the direction vector.
 5.
Use Scolnik’s acceleration to find the ideal steps length for the given direction and calculate the next iterate. Step back to Step 2 to start the next iteration.
A full mathematical description of the ACIM algorithm can be found in “Appendix”.
 Step 1

Ensuring that the boundary conditions \(\mathbf {l}\le {\mathbf {p}}\le {\mathbf {u}}\) are adhered to. This is achieved by performing Hildreth’s algorithm.
 Step 2

Creating a sequence of iterates that converges towards a solution of \({\mathbf {G}}{\mathbf {p}}={\mathbf {c}}\). This is achieved by performing Cimmino’s algorithm on the matrix blocks \({\mathbf {G}}_b\) and by using Scolnik’s acceleration.
This concept results in the performance of several iterations of Cimmino’s algorithm between two iterations of Hildreth’s algorithm. This technique will be used in the following section. We will refer to this concept as a certain number of Cimmino iterations per Hildreth iteration.
5 Implementation and results
The ACIM algorithm 4 was programmed in parallel in C, using OpenMP for the parallelisation. The implemented version of ACIM will be referred to as the ACIM code.
5.1 Parallelisation of the ACIM code

The block projections. The calculation of the block projection over each block \({\mathbf {G}}_b\) yielding a vector \({\mathbf {p}}^k_{p,b}\) for each block.

The computation of the barycentre \({\mathbf {p}}^k_b=\sum _{b=1}^Bw_b{\mathbf {p}}^k_{p,b}\).
Since a potentially large number of block projections \({\mathbf {p}}^k_{p,b}\) must be added up during the calculation of the barycentre, a sharedmemory architecture was chosen for the parallelisation of the ACIM code. A highperformance computing system with 12 hyperthreaded 3.4 GHz cores and 296 GB of fully shared memory is available for the execution of the ACIM code.
5.2 ACIM code: parallelisation of the block projections
For the Eora model, the blocking algorithm generates more blocks \({\mathbf {G}}_b\) than processors available. The calculations of the block projections are arranged within a forloop which is parallelised using OpenMP. The fact that there are more blocks than available computing cores has significant implications on the memory management. Limiting the number of threads results in each thread getting assigned a certain amount of block projections, which are calculated by the thread sequentially. Hence, it is possible to sum up the block projections calculated by one thread within a single variable without risking data racing.
The sum of all block projections that were calculated by a single thread will be referred to as a thread projection \({\mathbf {p}}^{k,t}\). The index t indicates which thread the thread projections were calculated by. The total number of threads is given by the variable T. Hence, \(t=1,\ldots ,T\).
The ACIM code only requires to save as many vectors in memory as there are threads, as opposed to saving as many vectors in memory as there are blocks. For the Eora model, each block projection requires about 4 GB of memory. If 24 threads calculate the block projections in parallel, around 100 GB of memory is occupied by the 24 thread projections.
5.3 ACIM code: parallelisation of the barycentre calculation
5.4 Numerical testing: using the ACIM code for the Eora Model
Figure 6 shows the development of the residual norm for the Eora model for Cimmino iterations only (no boundary adjustment by Hildreth iterations). Despite the size of the problem, the residual norm decreases at a very high rate, especially during the first iterations, and reaches the termination criterion after only 13 iterations.
Runtime results of the ACIM code for the Eora model
Code section  Duration (s) 

Block projections (all orthogonal blocks)  80.74 
Balancing constraints  73.98 
Parallel summation  5.28 
It is noteworthy that the calculation of the balancing constraints requires a large portion of the entire runtime for the calculation of the block projections. In order to speed up the ACIM code in the future, most of the attention should be directed to speeding up the balancing process.
Also, Table 1 shows that the ACIM code finishes the balancing task for the entire Eora model in less than 80 s (if no bounds are defined at Hildreth’s algorithm does not have to be used). This is a significant improvement in performance compared to earlier methods such as RAS.
6 Conclusions and outlook
We have presented a parallelised leastsquarestype algorithm (ACIM) for the reconciliation of large MRIO databases. ACIM uses the combination of mathematical acceleration measures and parallel programming executed on highperformance computing hardware to achieve a acceptable runtimes. To our knowledge, the ACIM algorithm is currently the only leastsquares algorithm that is capable of reconciling global MRIOs of the size of the Eora database with satisfactory accuracy and runtime.
Future research will focus on enhancing the speed and accuracy as well as the numerical robustness of the ACIM algorithm further. Similar to current version of the ACIM algorithm, this will be achieved by a combination of introducing further computational measures and mathematical features. As a first step, the parallelisation of the code will be optimised in order to improve the load balancing and ultimately the overall performance of the ACIM code. The introduction of libraries such as Intel’s Math Kernel Library^{2} (MKL), the Basic Linear Algebra Subprograms^{3} (BLAS) or CHOLMOD^{4} to the code, will reduce the runtime further. Additional mathematical features will include the introduction of additional, more efficient accelerations as well as more efficient use of the geometric structure of the different constraints given in matrix \({\mathbf {G}}\).
Footnotes
 1.
In the literature, the reconciliation of MRIO tables is often referred to as balancing. Strictly speaking, balancing limits the reconciliation task to a very narrow class of constraints. The authors are using the broader term reconciliation to indicate that there is no limitation in terms of classes of constraints that can be handled.
 2.
For information on MKL, see http://software.intel.com/enus/articles/intelmkl/.
 3.
For information on BLAS, see http://www.netlib.org/blas/.
 4.
For information on CHOLMOD, see http://www.cise.ufl.edu/research/sparse/cholmod/.
Notes
Authors' contributions
AG and JU wrote the algorithm. JU developed the mathematical framework for the algorithm. AG tested and analysed the algorithm and wrote the paper. ML provided the ground work for the development of the algorithm. KK and DDM ensured the flawless deployment on the highperformance computing systems at the University of Sydney, and ensured the operation of the Eora/algorithm interface. All authors read and approved the final manuscript.
Acknowledgements
This work was supported by the Australian Research Council through its Discovery Projects DP0985522 and DP130101293, and Linkage Infrastructure, Equipment and Facilities Grant LE160100066, and the National eResearch Collaboration Tools and Resources project (NeCTAR) through its Industrial Ecology Virtual Laboratory VL201. The Eora database is available under www.worldmrio.com. Sebastian Jursazek expertly managed the advanced computation requirements, and Charlotte Jarabak from the University of Sydney’s SciTec Library collected data.
Competing interest
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 Bacharach M (1970) Biproportional matrices and input–output change. Cambridge University Press, CambridgeGoogle Scholar
 Byron R (1996) Diagnostic testing and sensitivity analysis in the construction of social accounting matrices. J R Stat Soc Ser A (Stat Soc) 159(1):133–148CrossRefGoogle Scholar
 Byron RP (1978) The estimation of large social account matrices. J R Stat Soc Ser A (Gen) 141(3):359–367CrossRefGoogle Scholar
 Canning P, Wang Z (2005) A flexible mathematical programming model to estimate interregional input–output accounts. J Reg Sci 45(3):539–563CrossRefGoogle Scholar
 Fischer G (2000) Lineare algebra. Vieweg Studium, BraunschweigCrossRefGoogle Scholar
 Gilchrist D, Louis LS (1999) Completing input–output tables using partial information, with an application to Canadian data. Econ Syst Res 11(2):185–193CrossRefGoogle Scholar
 Harrigan F, Buchanan I (1984) A quadratic programming approach to input–output estimation and simulation. J Reg Sci 24:339–358CrossRefGoogle Scholar
 Hildreth C (1957) A quadratic programming procedure. Naval Res Logist Q 4(1):79–85CrossRefGoogle Scholar
 Junius T, Oosterhaven J (2003) The solution of updating or regionalizing a matrix with both positive and negative entries. Econ Syst Res 15:87–96CrossRefGoogle Scholar
 Lahr ML, de Mesnard L (2004) Biproportional techniques in input–output analysis: table updating and structural analysis. Econ Syst Res 16(2):115–134CrossRefGoogle Scholar
 Lenzen M, Gallego B, Wood R (2010) Matrix balancing under conflicting information. Econ Syst Res 21:23–44CrossRefGoogle Scholar
 Lenzen M, Kanemoto K, Moran D, Geschke A (2012) Mapping the structure of the world economy. Environ Sci Technol 46(15):8374–8381CrossRefGoogle Scholar
 Lenzen M, Moran D, Kanemoto K, Geschke A (2013) Building EORA: a global multiregion inputoutput database at high country and sector resolution. Econ Syst Res 25(1):20–49CrossRefGoogle Scholar
 Mínguez R, Oosterhaven J, Escobedo F (2009) Cellcorrected RAS method (CRAS) for updating or regionalizing an input–output matrix. J Reg Sci 49:329–348CrossRefGoogle Scholar
 Morrison W, Thumann RG (1980) A Lagrangian multiplier approach to the solution of a special constrained matrix problem. J Reg Sci 20:279–292CrossRefGoogle Scholar
 Nagurney A, Eydeland A (1992) A splitting equilibration algorithm for the computation of largescale constrained matrix problems: theoretical analysis and applications. In: Amman HM, Belsley DA, Pau LF (eds) Computational economics and econometrics, advanced studies in theoretical and applied econometrics, vol. 22. Springer, Netherlands, pp 65–105Google Scholar
 Scolnik H, Echebest N, Guardarucci MT, Vacchino MC (2000) A class of optimized row projection methods for solving large nonsymmetric linear systems. Appl Numer Math 41(4):499–513CrossRefGoogle Scholar
 Stadler K, Wood R, Bulavskaya T, Södersten CJ, Simas M, Schmidt S, Usubiaga A, AcostaFernández J, Kuenen J, Bruckner M, Giljum S, Lutter S, Merciai S, Schmidt JH, Theurl MC, Plutzar C, Kastner T, Eisenmenger N, Erb KH, Koning A, Tukker A (2018) EXIOBASE 3: developing a time series of detailed environmentally extended multiregional input–output tables. J Ind Ecol 22(3):502–515. https://doi.org/10.1111/jiec.12715 CrossRefGoogle Scholar
 Stone R (1961) Input–output and national accounts. Organisation for European Economic Cooperation, ParisGoogle Scholar
 Stone R, Champernowne DG, Meade JE (1942) The precision of national income estimates. Rev Econ Stud 9(2):111–125CrossRefGoogle Scholar
 Tukker A, Dietzenbacher E (2013) Global multiregional input–output frameworks: an introduction and outlook. Econ Syst Res 25(1):1–19CrossRefGoogle Scholar
 van der Ploeg F (1982) Reliability and the adjustment of sequences of large economic accounting matrices. J R Stat Soc Ser A 145(2):169–194CrossRefGoogle Scholar
 van der Ploeg F (1984) Generalized least squares methods for balancing large systems and tables of national accounts. Rev Public Data Use 12:17–33Google Scholar
 van der Ploeg F (1988) Balancing large systems of national accounts. Comput Sci Econ Manag 1:31–39CrossRefGoogle Scholar
 Wood R, Hawkins TR, Hertwich EG, Tukker A (2014) Harmonising national input–output tables for consumptionbased accounting—experiences from EXIOPOL. Econ Syst Res 26(4):387–409. https://doi.org/10.1080/09535314.2014.960913 CrossRefGoogle Scholar
 Wood R, Stadler K, Bulavskaya T, Lutter S, Giljum S, de Koning A, Kuenen J, Schütz H, AcostaFernández J, Usubiaga A, Simas M, Ivanova O, Weinzettel J, Schmidt J, Merciai S, Tukker A (2015) Global sustainability accounting–developing EXIOBASE for multiregional footprint analysis. Sustainability 7(1):138CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.