Abstract
The identification of the most central nodes of a graph is a fundamental task of data analysis. The current flow betweenness is a centrality index which considers how the information flows along all the paths of a graph, not only on the shortest ones. Finding the exact value of the current flow betweenness is computationally expensive for large graphs, so the definition of algorithms returning an approximation of this measure is mandatory. In this paper we propose a solution, based on the Gather Apply Scatter model, that estimates the current flow betweenness in a distributed setting using the Apache Spark framework. The experimental evaluation shows that the algorithm achieves high correlation with the exact value of the index and outperforms other algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Avrachenkov, K., Litvak, N., Medyanikov, V., Sokol, M.: Alpha current flow betweenness centrality. In: Bonato, A., Mitzenmacher, M., Prałat, P. (eds.) WAW 2013. LNCS, vol. 8305, pp. 106–117. Springer, Heidelberg (2013). doi:10.1007/978-3-319-03536-9_9
Bader, D.A., Madduri, K.: Parallel algorithms for evaluating centrality indices in real-world networks. In: International Conference on Parallel Processing ICCP (2006)
Bertolucci, M., Lulli, A., Ricci, L., Carlini, E., Dazzi, P.: Static and dynamic big data partitioning on apache spark. In: ParCo International Conference on Parallel Computing, PARCO (2015), pp. 489–498, September 2015
Brandes, U.: A faster algorithm for betweenness centrality*. J. Math. Sociol. 25(2), 163–177 (2001)
Brandes, U., Fleischer, D.: Centrality measures based on current flow. In: Diekert, V., Durand, B. (eds.) STACS 2005. LNCS, vol. 3404, pp. 533–544. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31856-9_44
Carlini, E., Dazzi, P., Esposito, A., Lulli, A., Ricci, L.: Balanced graph partitioning with apache spark. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R.G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S.L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014. LNCS, vol. 8805, pp. 129–140. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14325-5_12
Carlini, E., Dazzi, P., Lulli, A., Ricci, L.: Distributed graph processing: an approach based on overlay composition. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1912–1917. ACM (2016)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)
Gonzalez, J.E., et al.: Graphx: graph processing in a distributed dataflow framework. In: OSDI 14, pp. 599–613 (2014)
Jiang, K.A.: Generalizing k-betweenness centrality using short paths and a parallel multithreaded implementation. In: ICPP 2009, pp. 542–549. IEEE (2009)
Lulli, A., Carlini, E., Dazzi, P., Lucchese, C., Ricci, L.: Fast connected components computation in large graphs by vertex pruning. IEEE Trans. Parallel Distrib. Syst. (2016). doi:10.1109/TPDS.2016.2591038
Lulli, A., Dazzi, P., Ricci, L., Carlini, E.: A multi-layer framework for graph processing via overlay composition. In: Hunold, S., Costan, A., Giménez, D., Iosup, A., Ricci, L., Gómez Requena, M.E., Scarano, V., Varbanescu, A.L., Scott, S.L., Lankes, S., Weidendorfer, J., Alexander, M. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 515–527. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27308-2_42
Lulli, A., Debatty, T., Dell’Amico, M., Michiardi, P., Ricci, L.: Scalable K-NN based text clustering. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 958–963. IEEE (2015)
Lulli, A., Gabrielli, L., Dazzi, P., Dell’Amico, M., Michiardi, P., Nanni, M., Ricci, L.: Improving population estimation from mobile calls: a clustering approach. In: 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 1097–1102. IEEE (2016)
Lulli, A., Ricci, L., Carlini, E., Dazzi, P.: Distributed current flow betweenness centrality. In: 2015 IEEE 9th International Conference on Self-adaptive and Self-organizing Systems (SASO), pp. 71–80. IEEE (2015)
Lulli, A., Ricci, L., Carlini, E., Dazzi, P., Lucchese, C.: Cracker: crumbling large graphs into connected components. In: 2015 IEEE Symposium on Computers and Communication (ISCC), pp. 574–581. IEEE (2015)
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146. ACM (2010)
McCune, R.R., et al.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48, 25 (2015)
Montresor, A., Jelasity, M.: Peersim: a scalable P2P simulator. In: IEEE Ninth Conference on Peer-to-Peer Computing, P2P 2009, pp. 99–100. IEEE (2009)
Newman, M.E.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)
Rahimian, F., Payberah, A.H., Girdzijauskas, S., Jelasity, M., Haridi, S.: Ja-be-ja: A distributed algorithm for balanced graph partitioning (2013)
Ricci, L., Carlini, E.: Distributed virtual environments: from client server to P2P architectures. In: Proceedings of the International Conference on High Performance Computing and Simulation, HPCS 2012 (2012)
Schult, D.A., et al.: Exploring network structure, dynamics, and function using networkx. In: SciPy 2008, vol. 2008, pp. 11–16 (2008)
Xin, R., et al.: Graphx: a resilient distributed graph system on spark. In: Graph Data Management Experiences and Systems, p. 2. ACM (2013)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Bertolucci, M., Lulli, A., Ricci, L. (2016). Current Flow Betweenness Centrality with Apache Spark. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-49583-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49582-8
Online ISBN: 978-3-319-49583-5
eBook Packages: Computer ScienceComputer Science (R0)