Skip to main content

Current Flow Betweenness Centrality with Apache Spark

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10048))

Abstract

The identification of the most central nodes of a graph is a fundamental task of data analysis. The current flow betweenness is a centrality index which considers how the information flows along all the paths of a graph, not only on the shortest ones. Finding the exact value of the current flow betweenness is computationally expensive for large graphs, so the definition of algorithms returning an approximation of this measure is mandatory. In this paper we propose a solution, based on the Gather Apply Scatter model, that estimates the current flow betweenness in a distributed setting using the Apache Spark framework. The experimental evaluation shows that the algorithm achieves high correlation with the exact value of the index and outperforms other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://neo4j.com/developer/apache-spark/.

  2. 2.

    https://github.com/kbastani/neo4j-mazerunner.

  3. 3.

    http://snap.stanford.edu/.

  4. 4.

    http://konect.uni-koblenz.de/networks/.

References

  1. Avrachenkov, K., Litvak, N., Medyanikov, V., Sokol, M.: Alpha current flow betweenness centrality. In: Bonato, A., Mitzenmacher, M., Prałat, P. (eds.) WAW 2013. LNCS, vol. 8305, pp. 106–117. Springer, Heidelberg (2013). doi:10.1007/978-3-319-03536-9_9

    Chapter  Google Scholar 

  2. Bader, D.A., Madduri, K.: Parallel algorithms for evaluating centrality indices in real-world networks. In: International Conference on Parallel Processing ICCP (2006)

    Google Scholar 

  3. Bertolucci, M., Lulli, A., Ricci, L., Carlini, E., Dazzi, P.: Static and dynamic big data partitioning on apache spark. In: ParCo International Conference on Parallel Computing, PARCO (2015), pp. 489–498, September 2015

    Google Scholar 

  4. Brandes, U.: A faster algorithm for betweenness centrality*. J. Math. Sociol. 25(2), 163–177 (2001)

    Article  MATH  Google Scholar 

  5. Brandes, U., Fleischer, D.: Centrality measures based on current flow. In: Diekert, V., Durand, B. (eds.) STACS 2005. LNCS, vol. 3404, pp. 533–544. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31856-9_44

    Chapter  Google Scholar 

  6. Carlini, E., Dazzi, P., Esposito, A., Lulli, A., Ricci, L.: Balanced graph partitioning with apache spark. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R.G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S.L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014. LNCS, vol. 8805, pp. 129–140. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14325-5_12

    Google Scholar 

  7. Carlini, E., Dazzi, P., Lulli, A., Ricci, L.: Distributed graph processing: an approach based on overlay composition. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1912–1917. ACM (2016)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)

    Article  Google Scholar 

  10. Gonzalez, J.E., et al.: Graphx: graph processing in a distributed dataflow framework. In: OSDI 14, pp. 599–613 (2014)

    Google Scholar 

  11. Jiang, K.A.: Generalizing k-betweenness centrality using short paths and a parallel multithreaded implementation. In: ICPP 2009, pp. 542–549. IEEE (2009)

    Google Scholar 

  12. Lulli, A., Carlini, E., Dazzi, P., Lucchese, C., Ricci, L.: Fast connected components computation in large graphs by vertex pruning. IEEE Trans. Parallel Distrib. Syst. (2016). doi:10.1109/TPDS.2016.2591038

    Google Scholar 

  13. Lulli, A., Dazzi, P., Ricci, L., Carlini, E.: A multi-layer framework for graph processing via overlay composition. In: Hunold, S., Costan, A., Giménez, D., Iosup, A., Ricci, L., Gómez Requena, M.E., Scarano, V., Varbanescu, A.L., Scott, S.L., Lankes, S., Weidendorfer, J., Alexander, M. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 515–527. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27308-2_42

    Chapter  Google Scholar 

  14. Lulli, A., Debatty, T., Dell’Amico, M., Michiardi, P., Ricci, L.: Scalable K-NN based text clustering. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 958–963. IEEE (2015)

    Google Scholar 

  15. Lulli, A., Gabrielli, L., Dazzi, P., Dell’Amico, M., Michiardi, P., Nanni, M., Ricci, L.: Improving population estimation from mobile calls: a clustering approach. In: 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 1097–1102. IEEE (2016)

    Google Scholar 

  16. Lulli, A., Ricci, L., Carlini, E., Dazzi, P.: Distributed current flow betweenness centrality. In: 2015 IEEE 9th International Conference on Self-adaptive and Self-organizing Systems (SASO), pp. 71–80. IEEE (2015)

    Google Scholar 

  17. Lulli, A., Ricci, L., Carlini, E., Dazzi, P., Lucchese, C.: Cracker: crumbling large graphs into connected components. In: 2015 IEEE Symposium on Computers and Communication (ISCC), pp. 574–581. IEEE (2015)

    Google Scholar 

  18. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146. ACM (2010)

    Google Scholar 

  19. McCune, R.R., et al.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48, 25 (2015)

    Article  Google Scholar 

  20. Montresor, A., Jelasity, M.: Peersim: a scalable P2P simulator. In: IEEE Ninth Conference on Peer-to-Peer Computing, P2P 2009, pp. 99–100. IEEE (2009)

    Google Scholar 

  21. Newman, M.E.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)

    Article  Google Scholar 

  22. Rahimian, F., Payberah, A.H., Girdzijauskas, S., Jelasity, M., Haridi, S.: Ja-be-ja: A distributed algorithm for balanced graph partitioning (2013)

    Google Scholar 

  23. Ricci, L., Carlini, E.: Distributed virtual environments: from client server to P2P architectures. In: Proceedings of the International Conference on High Performance Computing and Simulation, HPCS 2012 (2012)

    Google Scholar 

  24. Schult, D.A., et al.: Exploring network structure, dynamics, and function using networkx. In: SciPy 2008, vol. 2008, pp. 11–16 (2008)

    Google Scholar 

  25. Xin, R., et al.: Graphx: a resilient distributed graph system on spark. In: Graph Data Management Experiences and Systems, p. 2. ACM (2013)

    Google Scholar 

  26. Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimiliano Bertolucci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Bertolucci, M., Lulli, A., Ricci, L. (2016). Current Flow Betweenness Centrality with Apache Spark. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49583-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49582-8

  • Online ISBN: 978-3-319-49583-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics