Abstract
With regard to large networks there is a specific need to consider particular patterns relatable to structured groups of nodes which could be also defined as communities. In this work we will propose an approach to cluster the different communities using interval data. This approach is relevant in the context of the analysis of large networks and, in particular, in order to discover the different functionalities of the communities inside a network. The approach is shown in this paper by considering different examples of networks by means of synthetic data. The application is specifically related to a large network, that of the co-authorship network in Astrophysics.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
I thank the referees for their helpful suggestions
References
Aggarwal, C.C.: Network Analysis in the Big Data Age: Mining Graphs and Social Streams. Keynote Talk, ECML/PKDD, 2014 (2014)
Atzmueller, M., Hotho, A., Strohmaier, M., Chin, A. (Eds.): Analysis of Social Media and Ubiquitous Data: International Workshops MSM 2010, Toronto, Canada, June 13, 2010, and MUSE 2010, Barcelona, Spain, September 20, 2010, Revised Selected Papers, vol. 6904. Springer (2011)
Barabasi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis. J. Am. Stat. Assoc. 98(462), 470–487 (2003)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), P10008 (2008)
Bock, H.-H.: Clustering algorithms and kohonen maps for symbolic data. In: ‘ICNCB Proceedings’, Osaka, pp. 203–215 (2001)
Chavent, M, Francisco de A.T. De Carvalho, Yves Lechevallier, Rosanna Verde. New Clustering methods for interval data. Computational Statistics, vol. 21, pp. 211–229. Springer, Berlin (2006)
Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)
Coscia, M., Giannotti, F., Pedreschi, D.: A classification for community discovery methods in complex networks. Stat. Anal. Data Min.: ASA Data Sci. J. 4(5), 512–546 (2011)
Csardi, G., Nepusz, T.: The igraph software package for complex network research. InterJ. Complex Syst. 11, 1695 (2006). http://igraph.org
De Carvalho, F., Souza, R., Chavent, M., Lechevallier, Y.: Adaptive Hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognit. Lett. 27(3), 167–179 (2006)
Drago, C.: Exploring the community structure of complex networks. Annali del MEMOTEF - Note e Discussioni 10/2015; 2(forthcoming) (2015)
Erdos, P., Renyi, A.: On random graphs. Publ. Math. 6(195), 290–297 (1959)
Fay, S., Gautrias, S.: A scientometric study of general relativity and quantum cosmology from 2000 to 2012. arXiv:1502.03471 (2015)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
Gherghi, M., Lauro, C.: Appunti di analisi dei dati multidimensionali: metodologia ed esempi. RCE edizioni (2004)
Gioia, F., Lauro, C.N.: Basic statistical methods for interval data. Stat. Appl. 17(1), 75–104 (2005)
Giordano, G., Brito, P.: Social networks as symbolic data. In: Vicari, D., Okada, A., Ragozini, G., Weihs, C. (eds.) Analysis and Modeling of Complex Data in Behavioral and Social Science, pp. 133–142. Springer, Heidelberg (2014)
Giordano, G., Signoriello, S., Vitale, M.P.: Comparing social networks in the framework of complex data analysis. CLEUP Editore, Padova: pp. 1–2, In: XLIV Riunione Scientifica Societ Italiana di Statistica (2008)
Girvan, M., Newman, M.E.: Community Structure in Social and Biological Networks (2002)
Harenberg, S., Bello, G., Gjeltema, L., Ranshous, S., Harlalka, J., Seay, R., Samatova, N.: Community detection in large scale networks: a survey and empirical evaluation. Wiley Interdiscip. Rev.: Comput. Stat. 6(6), 426–439 (2014)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 2 (2007)
Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Mancoridis, S., Mitchell, B.S., Rorres, C., Chen, Y., Gansner, E.R.: in IWPC ’98: Proceedings of the 6th International Workshop on Program Comprehension. IEEE Computer Society, Washington, DC, USA (1998)
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute (2011)
Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)
Newman, M.E.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)
Newman, M.E.: The mathematics of networks. New Palgrave Encycl. Econ. 2(2008), 1–12 (2008)
Nickel, C.L.M.: Random dot product graphs: A model for social networks, Vol. 68, no. 04. (2007)
Peng, W., Li, T.: Interval data clustering with applications. In: 2006. ICTAI’06. 18th IEEE International Conference on Tools with Artificial Intelligence, pp. 355–362. IEEE (2006)
Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys. Rev. E 74(1), 016110 (2006)
Rodriguez, O.R. with contributions from Calderon, O., Zuniga, R.: RSDA: RSDA- R to symbolic data analysis. R package version 1.2. http://CRAN.R-project.org/package=RSDA (2014)
Sellis, T., Horadam, K.: Big data and complex networks analytics. IEEE Access 4, 1958–1996 (2015)
Vijgen, R.: Big data, big stories. New Challenges for Data Design, pp. 221–234. Springer, London (2015)
Zhao, Y., Levina, E., Zhu, J.: Community extraction for social networks. Proc. Natl. Acad. Sci. 108(18), 7321–7326 (2011)
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Drago, C. (2019). Clustering Communities Using Interval K-Means. In: Petrucci, A., Racioppi, F., Verde, R. (eds) New Statistical Developments in Data Science. SIS 2017. Springer Proceedings in Mathematics & Statistics, vol 288. Springer, Cham. https://doi.org/10.1007/978-3-030-21158-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-21158-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21157-8
Online ISBN: 978-3-030-21158-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)