Keywords

1 Introduction

Networks are everywhere. Complex interactions between different entities play a sensible role in modeling the behavior of both society and natural world. Such networks – which comprises World Wide Web, metabolic networks, neural networks, communication and collaboration networks, and social networks – are the subject of a growing number of research efforts. Indeed, many interesting phenomena are structured as networks (i.e., sets of entities joined in pairs by lines representing relations or interactions).

The study of networked phenomena has experienced a particular surge of interest due to the increasing availability of massive data about the static topology of real networks as well as the dynamic behavior generated by the interactions among network entities. The analysis of real networks topologies has revealed several interesting structural features, like the small-world phenomena as well as the power-law degree distribution [10], which appear in several real network and can be extremely helpful for the design of artificial networks. On the other hand, understanding the dynamic behavior generated by complex network systems is extremely hard. Networks are often characterized by a dynamic feedback effect which is hard to predict analytically.

More generally, complex systems require innovative study methodologies. In this regard, in recent years, the two branches of the classical sciences, theoretical and experimental, have merged in the computational sciences where scientists design mathematical models and perform computational simulations of complex phenomena on real systems which are too complex to be studied analytically on theoretical grounds as well as too risky/expensive to be tested experimentally [23]. In particular, Agent-Based simulation Models (ABMs) have spread in many fields, from social sciences to the life sciences, from economics to artificial intelligence. Successes of the computational sciences have led to an increased demand for computation-intensive software implementations, in order to improve the performance of ABMs in terms of both size (number of agents) and quality (complexity of interactions). Such an amount of computing power can only be achieved by parallel computing (indeed, serial-processing speed is reaching a physical limit [22]). However, exploiting parallel systems is not an easy task; many parallel applications fail to leverage on the hardware parallelism and experience scalability limits.

The computer science community has responded to the need for tools and platforms that can help the development and testing of new models in each specific field by providing tools, libraries and frameworks that speed up and make easier the task of developing and running parallel ABMs for complex phenomena. For instance, D-Mason [5, 6, 26, 28] is a parallel version of the Mason [4, 16, 17] library for running ABMs on distributed systems. D-Mason adopts a framework-level parallelization mechanism approach, which allows the harnessing of computational power of a parallel environment and, at the same time, hides the details of the architecture so that users, even with limited knowledge of parallel computer programming, can easily develop and run simulation models.

Mason like several other ABMs systems provides one or more fields to represent the space where the agents lie and interact with the other agents. A field is a specific data structure relating various objects (agents) or values together. With more details, Mason provides a number of built-in fields, such as 2D/3D geometric discrete and continuous spaces plus a network field typically used to model social interactions. Currently, D-Mason allows modellers to parallelize simulation based on geometric fields. It adopts a space partitioning approach [9] which allowed the balancing of workload among the resources involved for the computation with a limited amount of communication overhead.

The space partitioning approach described above is devoted to decomposing ABMs based on geometric fields. On the other hand, when agents lie and/or interact on a network [1] – where the network can represent social, geographical or even a semantic space – a different approach is needed. The problem is to (dynamically) partition the network into a fixed set of sub-networks in such a way that: (i) the components have roughly the same size and (ii) both the number of connections and the communication volume between vertices belonging to different components are minimized.

1.1 Our Results

In this paper we provide an extensive evaluation of the most efficient and effective solutions for the problem defined above, which is well known in literature as the graph partitioning problem. We will evaluate several algorithms both analytically and on a real distributed simulation settings. Results shows that a good partitioning strategy strongly influences the performances of the distributed simulation environment. Analyzing the results in detail we were also able to discover the parameters that need to be optimized for the best performances on networks in ABMs.

2 The Graph Partitioning Problem

Finding good network partitions is a well-studied problem in graph theory [2]. Several are the problems that motivate the study of this problem. They range from computer science problems like integrated circuit design, VLSI circuits, domain decomposition for parallel computing, image segmentation, data mining [13, 15], etc. to other problems raised by physicists, biologists, and applied mathematicians, with applications to social and biological networks (community structure detection, structuring cellular networks and matrix decomposition [12, 18]).

The most common formulation of the balanced graph partitioning problem is the following:

Balanced k -way partitioning ( \(G,k,\epsilon \) ).

Instance: A graph \(G=(V,E)\), an integer \(k>1\) (number of components) and a rational \(\epsilon \) (imbalance factor).

Problem: Compute a partition \(\Pi \) of V into k pairwise disjoint subsets (components) \(V_1,V_2, \ldots , V_k\) of size at most \((1 +\epsilon ) \lceil |V|/k \rceil \), while minimizing the size of the edge-cut \(\sum _{i<j} |E_{ij}|\), where \(E_{ij}=\{ \{u,v\} \in E : u \in V_i, v\in V_j\}\subseteq E\).

This problem has been extensively studied (see [3] for a comprehensive presentation) and is known to be NP-hard [11].

Being a hard problem, exact solutions are found in reasonable time only for small graphs. However the applications of this problem require to partition much larger graphs and so several heuristic solutions have been proposed.

The graph partitioning problem was faced using several approaches. Two version of this problem have been considered: the former takes into account the coordinate information in the space of the vertices (this is common in graphs describing a physical domain) while, in the latter problem, vertices are coordinate free. In this paper we discuss the coordinate free problem which better fits ABMs’ domain.

The graph partitioning coordinate free problem requires combinatorial heuristics to partition them. For instance, considering the simplest version of the partitioning problem (2-way partitioning), that is find a bisection of the graph \(G=(V,E)\) that minimize the size of the cut. A really simple solution of the problem uses the breadth first search (BFS) visit of the graph to generate a subgraph \(T=(V,E'\subseteq E)\) of G also called a BFS tree. Given the subgraph T, is possible to find a cut to generate two disjoint subnetwork \(N_1\) and \(N_2\) such that (i) \(N_1 \cup N_2 =V\) and (ii) \(|N_1|\approx |N_2|\). The fact that T has been built using the BFS ensures that the size of the edge-cut is bounded.

This solution, which works well for planar graphs, is not efficient for complex graph. An additional approach is the Kernighan-Lin (KL) algorithm [15] that, starting with two sets \(N_1\) and \(N_2\) (describing a partition of V), greedily improves the quality of the partitioning by iteratively swapping vertices among the two sets. This solution converges to the global optimum if the initial partition is fairly good. Other approaches are the Spectral partitioning [21] and the Multilevel Approach [14]. We will focus on the most promising techniques that either use a multilevel approach or a distributed algorithm that exploits a local search approach.

METIS is a graph multilevel k-way partitioning suite developed in the Karypis lab of University of Minnesota. Shortly, METIS comprises three phases: during the coarsening phase the vertices are collapsed in order to decrease the size of the initial graph G. Consequently, starting from \(G=G_0\) a sequence of graphs \(G_0, G_1, \ldots , G_\ell \) is generated. Then a k-way partitioning is performed on the smallest graph \(G_\ell \). Then, during the uncoarsening phase the partitioning is refined, using a variant of the KL algorithm, and is projected to larger graphs on the sequence.

KaHIP (Karlsruhe High Quality Partitioning) is a suite of graph partitioning algorithms. The suite comprises two main algorithms KaFFPa (Karlsruhe Fast Flow Partitioner) [20], which is a multilevel graph partitioning algorithm, and KaFFPaE (KaFFPa Evolutionary) that uses an evolutionary algorithm approach. In this paper we analyze KaFFPa. KaFFPa, like METIS, uses the multilevel graph partitioning approach but it uses a different strategy for the uncoarsening phase of the algorithm which exploits a local search method instead of the KL approach.

Ja-be-Ja [19] exploits a distributed computing approach. It uses a local search technique (simulated annealing), to find a good partitions of the graph minimizing the edge-cut size. The energy of the system is measured by counting the number of edges that have endpoints in different components. Ja-be-Ja starts with a random balanced partitioning and then it iteratively applies the local search heuristic to obtain a configuration having a lower energy state (edge-cut size). The size of the initial components is preserved since Ja-be-Ja allows only the swapping of vertices among two components.

3 Experiment Setting

We report on simulation experiments that compare five k-way partitioning algorithms on several networks, taken from [27]. The data sets we considered include networks having different structural features (see Table 1). For each network, partitions into \(k=2,4,8,16,32\) and 64 components have been considered.

We compare the analytical results obtained (i.e., size of the edge-cut, number of communication channels required and imbalance) by each algorithm with the real performances (overall simulation time) in an ABM scenario.

Table 1. Networks.

Simulation Environment. To evaluate real performances we developed a toy distributed SIR (Susceptible, Infected, and Removed) simulation, where, for each simulation step, each agent (a vertex of the network) has to communicate with its neighbors. The SIR simulation has been developed on top of D-Mason, exploiting the novel communication strategy which realizes a Publish/Subscribe paradigm through a layer based on the MPI standard [7, 8]. Simulations have been performed on cluster of eight computer nodes, each equipped as follows:

  • Hardware:

    • CPUs: 2 x Intel(R) Xeon(R) CPU E5-2680 @ 2.70GHz (#core 16, #threads 32)

    • RAM: 256 GB

    • Network: adapters Intel Corporation I350 Gigabit

  • Software:

    • Ubuntu 12.04.4 LTS (GNU/Linux 3.11.0-15-generic x86_64)

    • Java JDK 1.6.25

    • OpenMPI 1.7.4 (feature version of Feb 5, 2014).

Simulation results, on k-way partitioning, have been obtained using k logical processors (one logical processor per component). We notice that, when the simulation is distributed, the communication between agents in the same component is much faster than the communication between agents belonging to different components. On the other hand, balancing is important because the simulation is synchronized and evolves with the speed of the slowest component.

The Competing Algorithms. We have analyzed five algorithms, briefly discussed in Sect. 2:

  • Multilevel approach:

    • METIS: (cf. Sect. 2);

    • METIS Relaxed: this version of the METIS algorithm uses a relaxed version of the balancing constraint (i.e., a larger value of the parameter \(\epsilon \)), in order to improve on other parameters (like the edge-cut size);

    • KaFFPa: (cf. Sect. 2);

  • Distributed Computing Approach:

    • Ja-be-Ja: (cf. Sect. 2). Unfortunately, we were not able to find a real implementation of the algorithm. We used an implementation available on the public Ja-be-Ja GitHub repository [24]. This implementation is not truly distributed but is simulated through the use of the Java library GraphChi [25], that enables modellers to simulate a distributed computation on multi-cores machines. Clearly the computational efficiency of this implementation is limited and, for this reason, we could only run 100 iterations of the algorithm for each test setting. We assume that the poor results of the algorithm (cf. Sect. 4) are, at least, partially due to the small number of iteration used in our tests. In order to better evaluate the real performances of the algorithm, a real distributed implementation of the Ja-be-Ja algorithm is needed.

  • Random: This algorithm assigns each vertex to a random component. It always provides an optimal balancing. We use this algorithm as baseline in our comparisons.

Performance Metrics. Let \(G=(V,E)\) the analyzed network and let \(\Pi =(V_1,V_2,\ldots ,V_k)\) the partition provided by a given algorithm, we evaluate algorithms’ performances using the following metrics:

  • Edge-cut size (W), the total number of edges having their incident vertices in different components;

  • Number of communication channels (E), two components \(U_1\) and \(U_2\) requires a communication channel when \(\exists v_1 \in U_1, \ v_2 \in U_2\) such that \((v_1,v_2)\in E\). In other words, we are counting the number of edges in the supergraph \(S_G\) obtained by clustering the nodes of each component in a single node. We notice that this unconventional metric is motivated by our specific distributed ABMs scenario. In our simulation environment, a communication channel, between two components \(U_1\) and \(U_2\), is established when at least two vertices (agents) \(u_1 \in U_1\) and \(u_2 \in U_2\) share an edge. Thereafter, the same communication channel is used for every communication between \(U_1\) and \(U_2\), consequently, these additional communications have less impact on system performances;

  • Imbalance (I), the minimum value of \(\epsilon \) such that each component has size at most \((1 +\epsilon ) \lceil |V|/k \rceil \).

Moreover, we evaluate the real performances of each strategy by measuring the overall simulation time (T) to perform 10 simulation steps on the distributed SIR simulation.

Summarizing our experiments compares the performances (both analytically and on a real setting) of 5 k-way partitioning algorithms (\(A\in \{\)METIS, METIS Relaxed, KaFFPa, Ja-be-Ja and Random\(\}\)) with \(k\in \{2,4,8,16,32,64\}\) on 9 networks (\(N\in \{\)uk, data, 4elt, cti, t60k, wing, finan512, fe_ocean, powergrid\(\}\)). Overall we performed \(5 \times 6 \times 9=270\) tests.

4 Results

4.1 Analytical Results

Figures 1, 2 and 3 depict the analytical results. For each plot the networks appear along the X-axis, while the values of the measured parameter appear along the Y-axis. We present the results only for \(k\in \{4,64\}\) because of space limitations; results for the other values of k exhibit similar behaviors.

Analyzing the results from Figs. 1 and 2 we notice that the performances of the multilevel approach algorithms are comparable both in terms of edge-cut size and number of communication channels. Ja-be-Ja performances are a bit worse (this is probably due to the small number of iteration used in our tests as observed in Sect. 3) but always better than the random strategy.

Results on imbalance are fluctuating (see Fig. 3). In general all the algorithms provide a quite balanced partition. Apart from the random strategy that by construction provides the optimal solution, no strategy dominates the others.

Fig. 1.
figure 1

Edge-cut size (W) comparison:(left) \(k=4\), (right) \(k=64\). Y-axes appear in log scale.

Fig. 2.
figure 2

Number of communication channels (E) comparison:(left) \(k=4\), (right) \(k=64\). Y-axes appear in log scale.

Fig. 3.
figure 3

Imbalance (I) comparison: (left) \(k=4\), (right) \(k=64\).

4.2 Real Setting Results

Figure 4 reports on the results obtained in the real simulation setting. The results are consistent with the analytical ones, in terms of both edge-cut size and number of communication channels, although the gaps are amplified. The results thus confirm that the choice of partitioning strategy has a significant impact on performance in a real scenario.

In order to better understand how the metrics evolves according to k, Figure 5 depicts four plots which describes, for each algorithm, the growth of the Edge-cut size (top-left), the Imbalance (top-right), the number of communication channels (bottom-left) and the Simulation time on the f_ocean network as function of the parameter k (X-axis).

Fig. 4.
figure 4

Simulation time (T) comparison:(left) \(k=4\), (right) \(k=64\). Y-axes appear in log scale.

Fig. 5.
figure 5

Edge-cut size (top-left), Imbalance (top-right), Number of communication channels (bottom-left) and Simulation Time(bottom-right) on the f_ocean network, \(k\in \{2,4,8,16,32,64\}\).

4.3 Correlation Between Analytical and Real Setting Results

Analyzing the results from Figs. 1 \(-\) 4, we observe that the performances of the distributed simulations are influenced by the analytical metrics. In order to better evaluate the correlation between the overall simulation times and the performances of the algorithm (measured considering the edge-cut size, the number of communication channels and the imbalance), we measured the correlation using a statistical metric: the Pearson product-moment Correlation Coefficient (PCC). PCC is one of the measures of correlation which quantifies the strength as well as direction of the relationship between two variables. The correlation coefficient ranges from \(-1\) (strong negative correlation) to 1 (strong positive correlation). A value of 0 implies that there is no correlation between the variables. We computed the correlation PCC between simulation time (T) and the three analytical metrics (W, E and I), with all the considered value of the parameter k.

In particular, we considered four variables that are parametrized by the class N of Networks (\(N\in \{\)uk, data, 4elt, cti, t60k, wing, finan512, fe_ocean, powergrid\(\}\)), the considered algorithm (\(A\in \{\)METIS, METIS Relaxed, KaFFPa, Ja-be-Ja and Random\(\}\)), and the parameter \(k\in \{2,4,8,16,32,64\}\). The variable T(NAk) denotes the Simulation time; the variable W(NAk) denotes Edge-cut size; E(NAk) denotes the Number of communication channels; finally the variable I(NAk) denotes the Maximum Imbalance. Table 2 presents the correlation values obtained.

We observed that:

  • there is a strong positive correlation between simulation time and edge-cut size (the PCC always over 0.92);

  • there is a weak/moderate positive correlation between simulation time and the number of communication channelsFootnote 1 (the PCC ranges between 0.22 and 0.4). Moreover this correlation seems to be increasing in k;

  • there is a weak negative correlation between simulation time and imbalance (the PCC ranges between \(-0.22\) and \(-0.32\)).

This final result is counterintuitive: theoretically, the greater the imbalance, the larger the simulation time should be and this should lead to a positive correlation. The key observation is that a small amount of imbalance has a limited impact on the simulation time but can be extremely helpful for reducing both the edge-cut size and the number of communication channels, which seems to have a sensible payoff in terms of real performances.

Table 2. Correlation between analytical and real setting results.

5 Conclusion

We considered the problem of partitioning a network into k balanced components such that the number of edges that cross the boundaries of components is minimized. We evaluated, both analytically and on a real distributed ABM scenario, the performances of 5 heuristic approaches, which, to the best of our knowledge, are the current state-of-the-art on the problem. Experimental results show that the choice of the partitioning strategy strongly influence the performance a real distributed environment. Moreover analytical results (the edge-cut size in particular) correlate with the overall simulation time in a real setting. On the other hand, according to our results, the quality of the balance among components does not relate to the real performances on the field. Likely, this result is due to the fact that we analyzed of very small imbalance range. As a future work, we plan to investigate heuristics which allow identifying more efficient partitionings, at the expense of a minor balance.