Keywords

1 Introduction

Model simulations and forecasts of volcanic aerosol transport are of great importance in many fields, e.g., aviation safety [1], studies of global climate change [2, 3] and atmospheric dynamics [4]. However, existing observation techniques, e.g., satellite measurements, cannot provide detailed and complete spatial-temporal information due to their own limitations. With appropriate initial conditions, numerical simulations can provide relatively complete and high-resolution information in time and space. Model predictions can help to provide early warning information for air traffic control or input to studies of complex global or regional atmospheric transport processes.

In order to achieve accurate atmospheric transport simulations, it is necessary to first combine a series of numerical techniques with limited observational data to achieve high-resolution estimates of the emission sources. These techniques include backward-trajectory methods [5], empirical estimates [6] and inverse approaches. Among them, the inverse approaches are universal and systematic in the identification of atmospheric emission sources due to their mathematical rigor.

For instance, Stohl et al. [7] used an inversion scheme to estimate the volcanic ash emissions related to the volcanic eruptions of Eyjafjallajökull in 2010 and Kelut in 2014. They utilized Tikhonov regularization to deal with the ill-posedness of the inverse problem. Flemming and Inness [8] applied the Monitoring Atmospheric Composition and Climate (MACC) system to estimate sulfur dioxide (SO2) emissions by Eyjafjallajökull in 2010 and Grimsvötn in 2011, in which the resolution of the emission rates is about 2–3 km in altitude and more than 6 h in time. Due to limitations in computational power and algorithms, the spatial-temporal resolution of the reconstructed source obtained in previous studies is relatively low.

The main limitations of real-time atmospheric transport forecasts are the great computational effort and data I/O issues. Some researchers tried to employ graphics processing units to reduce the computational time and got impressive results [9,10,11]. Lagrangian particle dispersion models are particularly well suited to distributed-memory parallelization, as each trajectory is calculated independently of each other. To reduce the computational cost, Larson et al. [12] applied a shared- and distributed-memory parallelization to a Lagrangian particle dispersion model and achieved nearly linear scaling in execution time with the distributed-memory version and a speed-up factor of about 1.4 with the shared-memory version. In the study of Müller et al. [11], the parallelization of the Lagrangian particle model was implemented in the OpenMP shared memory framework and good strong scalability up to 12 cores was achieved.

In this work, we implement the Lagrangian particle dispersion model Massive-Parallel Trajectory Calculations (MPTRAC) [5] on the Tianhe-2 supercomputer, along with an inverse modeling algorithm based on the concept of sequential importance resampling [13] to estimate time- and altitude-dependent volcanic emission rates. In order to realize large-scale SO2 transport simulations on a global scale, high-resolution emission reconstructions and real-time forecasts, the implementation is based on state-of-the-art techniques of supercomputing and big-data processing. The computing performance is assessed in the form of strong and weak scalability tests. Good scalability and computational efficiency of our codes make it possible to reconstruct emission rates with unprecedented resolution both in time and altitude and enable real-time forecasts.

The remainder of this manuscript is organized as follows: Sect. 2 introduces the forward model, the inverse modeling algorithm and the parallelization strategies. Section 3 presents the parallel performance of the forward and inverse code on the Tianhe-2 supercomputer. In Sect. 4, the results of the emission reconstruction and forward simulation are presented for a case study. Discussions and conclusions are provided in Sect. 5.

2 Data and Methods

2.1 Lagrangian Particle Dispersion Model

In this work, the forward simulations are conducted with the Lagrangian particle dispersion model MPTRAC, which has been successfully applied for volcanic eruption cases of Grímsvötn, Puyehue-Cordón Caulle and Nabro [5]. Meteorological fields of the ERA-Interim reanalysis [14] provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) are used as input data for the transport simulations. The trajectory of an individual air parcel is calculated by

$$ \frac{{d{\mathbf{x}}\left( t \right)}}{dt} = {\mathbf{v}}\left( {{\mathbf{x}}\left( t \right),t} \right), $$
(1)

where \( {\mathbf{x}} = \left( {x,y,z} \right) \) denotes the spatial position and \( {\mathbf{v}} = \left( {u,v,w} \right) \) denotes the velocity of the air parcel at time t. Here, x and y coordinates refer to latitude and longitude whereas the z coordinate refers to pressure. The horizontal wind components u and v and the vertical velocity \( w = dp/dt \) are obtained by 4-D linear interpolation from the meteorology data, which is common in Lagrangian particle dispersion models [15]. Small-scale diffusion and subgrid-scale wind fluctuations are simulated based on a Markov model following Stohl et al. [16].

In our previous work [17], truncation errors of different numerical integration schemes of MPTRAC have been analyzed in order to obtain an optimal numerical solution strategy with accurate results and minimum computational cost. The accuracy of the MPTRAC trajectory calculations has been analyzed in different studies, including [18], which compared trajectory calculations to superpressure balloon tracks.

2.2 Evaluation of Goodness-of-Fit of Forward Simulation Results

Atmospheric InfraRed Sounder (AIRS) satellite observations are used to detect volcanic SO2 based on a brightness temperature differences (BTD) algorithm [19]. To evaluate the goodness-of-fit of the forward simulation results obtained by MPTRAC, the critical success index (CSI) [20] is calculated by

$$ {\text{CSI}} = {{C_{x} } \mathord{\left/ {\vphantom {{C_{x} } {\left( {C_{x} + C_{y} + C_{z} } \right)}}} \right. \kern-0pt} {\left( {C_{x} + C_{y} + C_{z} } \right)}}. $$
(2)

Here, the number of positive forecasts with positive observations is Cx, the number of negative forecasts with positive observations is Cy, and the number of positive forecasts with negative observations is Cz. The CSI, representing the ratio of successful predicts to the total number of predicts that were either made (Cx+ Cz) or needed (Cy), is commonly used for the assessment of the simulation results of volcanic eruptions and other large-scale SO2 transport problems. Basically, it provides a measure of the overlap of a simulated volcanic SO2 plume from the model with the real plume as found in the satellite observations. CSI time series are calculated using the AIRS satellite observations and MPTRAC simulation results mapped on a discrete grid, which are essential to the inverse modeling algorithm presented in the next section.

2.3 Inverse Source Estimation Algorithm

The strategy for the inverse estimation of time- and altitude-dependent emission rates is shown in Fig. 1 and Algorithm 1. The time- and altitude-dependent emissions are considered for the domain \( E: = \left[ {t_{0} ,t_{f} } \right] \times \varOmega \), which is discretized with \( n_{t} \) and \( n_{h} \) uniform intervals into \( N = n_{t} \cdot n_{h} \) subdomains. For each subdomain, a forward calculation of a set of air parcel trajectories is conducted with MPTRAC, which is referred to here as ‘unit simulation’ for a given time and altitude. Each unit simulation is assigned a certain amount of SO2, where we assume that the total SO2 mass over all unit simulations is known a-priori. During the inversion, a set of importance weights \( w_{i} \left( {i = 1, \cdots ,N} \right) \), which satisfy \( \sum\nolimits_{i = 1}^{N} {w_{i} = 1} \), are estimated to represent the relative posterior probabilities of the occurrence of SO2 emission mass.

At first, the subdomains are populated with SO2 emissions (air parcels) according to an equal-probability strategy. N parallel unit simulations with a certain amount of air parcels are performed in an iterative process and the corresponding time series \( \left( {{\text{CSI}}_{k}^{i} } \right) \) with \( k = 1, \cdots ,n_{k} \) at different times tk and \( i = 1, \cdots ,N \) are calculated to evaluate the agreement of the simulations with the satellite observations. Then, the importance weights are updated according to the following formulas:

$$ w_{i} = {{m_{i} } \mathord{\left/ {\vphantom {{m_{i} } {\sum\limits_{a = 1}^{N} {m_{a} } }}} \right. \kern-0pt} {\sum\limits_{a = 1}^{N} {m_{a} } }}, $$
(3)
$$ m_{i} = {{\left( {\sum\limits_{1}^{{n_{k} }} {{\text{CSI}}_{k}^{i} } } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{1}^{{n_{k} }} {{\text{CSI}}_{k}^{i} } } \right)} {n_{k} }}} \right. \kern-0pt} {n_{k} }}. $$
(4)

During the iteration, the mi represents the probability of emitted source air parcels that fall in the ith temporal and spatial subdomain. Finally, after the termination criterion is satisfied, the emission source is obtained based on the final importance weight distribution. To define the stopping criterion, we calculate the relative difference d by

$$ d\left( {{\mathbf{W}}^{r + 1} ,{\mathbf{W}}^{r} } \right) = \frac{{\left\| {{\mathbf{W}}^{r + 1} - {\mathbf{W}}^{r} } \right\|}}{{\hbox{max} \left( {\left\| {{\mathbf{W}}^{r + 1} } \right\|,\left\| {{\mathbf{W}}^{r} } \right\|} \right)}},r \ge 1, $$
(5)

where r denotes the iterative step and the norm is defined by

$$ \left\| {{\mathbf{W}}^{r} } \right\| = \sqrt {\sum\limits_{i = 1}^{N} {\left| {w_{i}^{r} } \right|^{2} } } ,{\mathbf{W}}^{r} { = }\left( {w_{i}^{r} } \right)_{i = 1 \cdots N}. $$
(6)

As a stopping criterion, threshold of the relative difference d is chosen to be 1%.

In practice, in order to deal with the complexity of the SO2 air parcel transport, a so-called “product rule” is utilized in the resampling process, in which the average CSI time series is replaced by the product of two average CSI time series in subsequent and separate time periods:

$$ m_{i} = \left( {{{\sum\limits_{k = 1}^{{n^{\prime}_{k} }} {{\text{CSI}}_{k}^{i} } } \mathord{\left/ {\vphantom {{\sum\limits_{k = 1}^{{n^{\prime}_{k} }} {{\text{CSI}}_{k}^{i} } } {n^{\prime}_{k} }}} \right. \kern-0pt} {n^{\prime}_{k} }}} \right) \cdot \left( {{{\sum\limits_{{k = n^{\prime}_{k} + 1}}^{{n_{k} }} {{\text{CSI}}_{k}^{i} } } \mathord{\left/ {\vphantom {{\sum\limits_{{k = n^{\prime}_{k} + 1}}^{{n_{k} }} {{\text{CSI}}_{k}^{i} } } {\left( {n_{k} - n^{\prime}_{k} } \right)}}} \right. \kern-0pt} {\left( {n_{k} - n^{\prime}_{k} } \right)}}} \right),1 \le n^{\prime}_{k} < n_{k} , $$
(7)

where \( n^{\prime}_{k} \) is a “split point” of the time series. This strategy can better eliminate some low-probability local emissions when reconstructing source terms, thus leading to accurate final forward simulation results locally and globally. A detailed description of the inverse algorithm and the improvements due to applying the product rule can be found in [21].

Fig. 1.
figure 1

Flow chart of the inverse modeling strategy

figure a

2.4 Parallel Implementation

The Tianhe-2 supercomputer at the National Supercomputing Center of Guangzhou (NSCC-GZ) consists of 16000 compute nodes, with each node containing two 12-core Intel Xeon E5-2692 CPUs with 64 GB memory [22]. The advanced computing performance and massive computing resources of Tianhe-2 provide the possibility to conduct more complex mathematical research and simulations on much larger scales than before. Based on off-line simulations in previous work, we expect that Tianhe-2 will facilitate applications of real-time forecasts for larger-scale problems. The computational efficiency of the high-precision inverse reconstruction of emission source will directly determine whether the atmospheric SO2 transport process can be predicted in real time.

To our best knowledge, few studies focus on both, direct inverse source estimation and forecasts, at near-real-time. Fu et al. [23] conducted a near-real-time prediction study on volcanic eruptions based on the LOTOS-EUROS model and an ensemble Kalman filter. Santos et al. [10] developed a GPU-based code to process the calculations in near real time. In this work, we attempt to further develop the parallel inverse algorithm for reconstruction of volcanic SO2 emission rates based on sequential importance resampling methods, utilizing the computational power of the Tianhe-2 supercomputer to achieve large-scale SO2 transport simulations and real-time or near-real-time predictions.

The parallelization of MPTRAC and the inverse algorithm is realized by means of a hybrid scheme based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP). Since each trajectory can be computed independently, the ensembles of unit simulations are distributed to different compute nodes using the MPI distributed memory parallelization. On a particular compute node, the trajectory calculations of the individual unit simulations are distributed using the OpenMP shared memory parallelization. Theoretically, the calculation time will decrease near linearly with an increasing number of compute processes. Therefore, sufficient computational performance can greatly reduce the computational costs and enable simulations of hundreds of millions of air parcels on the supercomputer system.

The implementation of the inverse algorithm is designed based on a high-throughput computing strategy. At each iterative step, the time- and altitude-dependent domain \( E: = \left[ {t_{0} ,t_{f} } \right] \times \varOmega \) is discretized with \( n_{t} \) and \( n_{h} \) uniform intervals \( N = n_{t} \cdot n_{h} \), which leads to \( N = n_{t} \cdot n_{h} \) unit simulations that are calculated in parallel as shown in Fig. 1. Theoretically, the high-throughput parallel computing strategy can greatly improve the resolution of the inversion in time and altitude through increasing the value of N. Only little communication overhead is needed to distribute the tasks and gather the results. With more computing resources, it is possible to operate on more detailed spatial-temporal grids and to obtain more accurate results. In this work, we have achieved a resolution of 30 min in time and 100 m in altitude for the first time with our modeling system.

In summary, the goal of this work is to develop an inverse modeling system using parallel computing on a scale of millions of cores, including high-throughput submission, monitoring, error tolerance management and analysis of results. A multi-level task scheduling strategy has been employed, i.e., the computational performance of each sub-task was analyzed to maximize load balancing. During the calculation, every task is monitored by a daemon with an error tolerance mechanism being established to avoid accidental interruptions and invalid calculations.

3 Parallel Performance Analysis

In this section, we evaluate the model parallel performance on the Tianhe-2 supercomputer based on the single-node performance of MPTRAC for the unit simulations and the multi-node performance of the sequential importance resampling algorithm.

Since our parallel strategy is based on high-throughput computing to avoid communication across the compute nodes, the single-node computing performance is essential in determining the global computing efficiency. To test the single-node computing performance, we employ the Paratune Application Runtime Characterization Analyzer to measure the floating-point speed. An ensemble of 100 million air parcels was simulated on a single node and the gigaflops per second (Gflops) turned out to be 13.16. The strong scalability test on a single node is conducted by simulating an ensemble of 1 million air parcels. The results on strong scaling are listed in Table 1 and the results on weak scaling are listed in Table 2. Referred to a single process calculation, the strong and weak scaling efficiency using 16 computing processes reach 84.25% and 85.63%, respectively.

Table 1. Strong scaling of a single-node MPTRAC simulation
Table 2. Weak scaling of a single-node MPTRAC simulation

Since the ensemble simulations with MPTRAC covering multiple unit simulations are conducted independently on each node, the scaling efficiency of the MPI parallelization is mostly limited by I/O issues rather than communication or computation. Nevertheless, we tested the strong and weak scalability of the high-throughput based inverse calculation process, with a maximum computing scale of up to 38400 computing processes. The results are shown in Tables 3 and 4. The scaling is nearly linear with respect to the number of compute nodes. Especially for weak scaling, the efficiency is close to ideal conditions, except for little extra costs related to the calculation of the CSI and I/O issues.

Table 3. Strong scaling of the multi-node inverse algorithm (Each unit simulation cover 1 million air parcels. The same goes for Table 4)
Table 4. Weak scaling of multi-node inverse algorithm

In summary, the high-throughput based hybrid MPI/OpenMP parallel strategy of MPTRAC and the inverse algorithm show good strong and excellent weak scalability on the Tianhe-2 supercomputer. That means the inverse modeling system has high potential in massive parallel applications, meeting the requirements of real-time forecasts. However, the forward calculation still has some potential for optimization. In future work, we will investigate the possibility of cross-node computing with MPTRAC and try to further improve the single-node computing performance by using hyper threading. Besides, some further improvements may also be possible for the multi-node parallelization, in particular for the I/O issues and the efficiency of temporary file storage.

4 Case Study of the Nabro Volcanic Eruption

Following Heng et al. [21], we choose an eruption of the Nabro volcano, Eritrea, as a case study to test the inverse modeling system on the Tianhe-2 supercomputer. The Nabro volcano erupted at about 20:30 UTC on 12 June 2011, causing a release of about 1.5 × 109 kg of volcanic SO2 into the troposphere and lower stratosphere. The volcanic activity lasted over 5 days with varying plume altitudes. The simulation results obtained for the Nabro volcanic eruption are of particular interest for studies of the Asian monsoon circulation [4, 24].

4.1 Reconstructed Emission Results with Different Resolutions

In general, with increasing resolution of the initial emissions, the forward simulation results are expected to become more accurate, but the calculation cost will also be much larger. In this work, the resolution of the volcanic SO2 emission rates has been raised to 30 min of time and 100 m of altitude for the first time. The largest computing scale employs 60250 compute nodes on Tianhe-2 simultaneously. Each node calculated the kinematic trajectories of 1 million air parcels using a total of 24 cores. On such a computing scale, the inverse reconstruction and final forward simulation take about 22 min and require about 530,000 core hours in total.

Based on the inverse algorithm and parallel strategy described in Sects. 2 and 3, the SO2 emission rates are reconstructed at different temporal and spatial resolutions, as shown in Fig. 2. The resolutions are (a) 6 h of time, 2.5 km of altitude, (b) 3 h of time, 1 km of altitude, (c) 1 h of time, 250 m of altitude, and (d) 30 min of time, 100 m of altitude. More fine structures in the emission rates become visible at higher resolution in Figs. 2a to 2c. However, the overall result in Fig. 2d at the highest resolution appears to be unstable with oscillations occurring between 12 to 16 km of altitude. The reason of this is not clear and will require further study, e.g., in terms of regularization of the inverse problem. For the time being, we employ the results in the Fig. 2c for the final forward simulation.

Compared with our previous work performed on the JuRoPA supercomputer at the Jülich Supercomputing Centre [21], the simulation results of this work performed on Tianhe-2 are rather similar. The reconstructed emissions show that the Nabro volcano had three strong eruptions on June 13, 14 and 16. For validation, Table 5 compares altitude and time of the major eruptions obtained with observations from different satellite sensors, which shows that the emission data constructed by the inverse modeling approach qualitatively agree with the measurements. Here, we also refer to the time series of the 2011 Nabro eruption based on Meteosat Visible and InfraRed Imager (MVIRI) infrared imagery (IR) and water-vapor (WV) measurements, which were used as validation data sets in [21] as shown in Fig. 3.

Table 5. Major eruption altitudes of the Nabro volcano on different days
Fig. 2.
figure 2

Reconstructed volcanic SO2 emission rates of the Nabro eruption in June 2011. The x-axis refers to time, the y-axis refers to altitude (km), and the color bar refers to the emission rate (kg m−1 s−1). (Color figure online)

Fig. 3.
figure 3

Time line of the 2011 Nabro eruption based on MVIRI IR and WV measurements. Here white is none, light blue is low level, blue is medium level, dark blue is high level [21] (Color figure online)

4.2 Final Forward Simulation Results

Based on the reconstructed emission data with 1 h in time and 250 m in altitude resolution, the final simulation applying the product rule was conducted on the Tianhe-2 supercomputer for further evaluation. Figure 5 illustrates the simulated SO2 transport, providing information on both, altitude and concentration, which are comparable to the AIRS observation maps shown in Fig. 4, suggesting the results are stable and accurate.

Fig. 4.
figure 4

The AIRS satellite observations on 14, 16, 18, 20 June 2011, 06:00 UTC (SO2 index is a function of column density obtained from radiative transfer calculations. Here we refer to [19] for more detailed description of detection of volcanic emissions based on brightness temperature differences (BTDs) technique.)

Fig. 5.
figure 5

Final forward simulation results of volcanic SO2 released by the Nabro eruption. The black square indicates the location of the Nabro volcano.

5 Conclusions and Outlook

The high-resolution reconstruction of source information is critical to obtain precise atmospheric aerosol and trace gas transport simulations. The work we present in this paper has potential applications for studying the effects of large-scale industrial emissions, nuclear leaks and other pollutions of the atmosphere and environment. The computational costs and efficiency of the inverse model will directly determine whether the atmospheric pollutant transport process can be predicted in real time or near real time. For this purpose, we implemented and assessed a high-throughput based inverse algorithm using the MPTRAC model on the Tianhe-2 supercomputer. The good scalability demonstrates that the algorithm is well suited for large-scale parallel computing. In our case study, the computational costs for the inverse reconstruction and final forward simulation at unprecedented resolution satisfy the requirements of real-time forecasts.

In the future work, we will study further improvements of the computational efficiency, e.g., multi-node parallel usage of MPTRAC, mitigation of remaining I/O issues, post-processing overhead, efficient storage of temporary files, etc. Also, some stability problems at the highest resolution problems need to be addressed, e.g., by means of regularization techniques. Nevertheless, we think that the inverse modeling system in its present form is ready to be tested in further applications.