Abstract
In the paper, a new multi-level hybrid method of community detection combining a density-based clustering with a label propagation method is evaluated and compared with the k-means benchmark and DBSCAN (Density-based spatial clustering of applications with noise). In spite of the sophisticated visualization methods, managers still usually find clustering results too difficult to evaluate and interpret. The article presents a set of key assessment measures that could be used to evaluate internal and external qualities of discovered clusters. The approach is validated on real life marketing database using advanced analytics platform, Upsaily.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Between-Cluster Dispersion can be calculated as \( BCD\left( n \right) = \mathop \sum \nolimits_{i} \overline{{c_{i} }} \cdot d^{2} (c_{i} ,c) \), where \( n \) is the number of clusters, \( d\left( {c_{i} , c} \right) \) is the distance between centroid of the cluster \( c_{i} \) and the global center of all clusters, \( \overline{{c_{i} }} \) is the number of elements in the cluster \( c_{i} \).
- 2.
Within-Cluster Dispersion can be calculated as \( WCD\left( n \right) = \mathop \sum \nolimits_{i} \mathop \sum \nolimits_{{x \in c_{i} }} d^{2} \left( {x, c_{i} } \right) \), where \( n \) is the number of clusters, \( x \) is an element of the cluster \( c_{i} \), \( d\left( {x, c_{i} } \right) \) is the distance between centroid of the cluster \( c_{i} \) and the element \( x \) belonging to the cluster \( c_{i} \).
- 3.
The code for Dunn index calculation was found on GitHub: Dunn index for clusters analysis - https://gist.github.com/douglasrizzo/cd7e792ff3a2dcaf27f6. Computing of Dunn index is relatively simple, authors verified published code before it was used in order to prove its validity.
- 4.
Implementation of Davies–Bouldin index was taken from Scikit learn library: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html.
- 5.
Implementation of Silhouette index was taken from Scikit learn library: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html.
- 6.
Implementation of Calinski-Harabasz index was taken from Scikit learn library: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html.
References
Wu, Z.H., et al.: Balanced multi-label propagation for overlapping community detection in social networks. J. Comput. Sci. Technol. 27(3), 468–479 (2012). https://doi.org/10.1007/s11390-012-1236-x
Barber, M.J.: Modularity and community detection in bipartite networks. Phys. Rev. E 76(6), 066102 (2007). https://doi.org/10.1103/PhysRevE.76.066102
Codaasco, G., Gargano, L.: Label propagation algorithm: a semi-synchronous approach. Int. J. Soc. Netw. Min. 1(1), 3–26 (2011). https://doi.org/10.1504/IJSNM.2012.045103
Gregory, S.: Finding overlapping communities in networks by label propagation. New J. Phys. 12, 103018 (2010). https://doi.org/10.1088/1367-2630/12/10/103018
Han, J., Li, W., Su, Z., Zhao, L., Deng, W.: Community detection by label propagation with compression of flow. e-print arXiv:161202463v1 (2016). https://doi.org/10.1140/epjb/e2016-70264-6
Liu, W., Jiang, X., Pellegrini, M., Wang X.: Discovering communities in complex networks by edge label propagation. Sci. Rep. 6 (2016). https://doi.org/10.1038/srep22470
Rossetti, G., Cazabet, R.: Community discovery in dynamic networks: a survey. arXiv:1707.03186 (2017). https://doi.org/10.1145/3172867
Subelj, L., Bajec, M.: Group detection in complex networks: an algorithm and comparison of the state of the art. Physica A 397, 144–156 (2014). https://doi.org/10.1016/j.physa.2013.12.003
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC, New York (2013). ISBN 978-1466558212
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. SIAM Series (2007). https://doi.org/10.1137/1.9780898718348
Witten, I.H., et al.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Pondel, M., Korczak, J.: Recommendations based on collective intelligence – case of customer segmentation. In: Ziemba, E. (ed.) AITM/ISM 2018. LNBIP, vol. 346, pp. 73–92. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15154-6_5
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 036106 (2007). https://doi.org/10.1103/PhysRevE.76.036106
Rosvall, M., Bergstorm, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. 104, 7327–7331 (2007). https://doi.org/10.1073/pnas.0611034104
Xie, J.R., Szymanski, B.K.: LabelRank: a stabilized label propagation algorithm for community detection in networks. In: Proceedings of the IEEE, Network Science Workshop, pp. 386–399 (2014). https://doi.org/10.1109/NSW.2013.6609210
Korczak, J., Pondel, M.: Kolektywna klasteryzacja danych marketingowych - System rekomendacji UPSAILY. Przegląd Organizacji 1, 42–52 (2019)
Applebaum, W.: Studying customer behavior in retail stores. J. Mark. 16(2), 172–178 (1951). https://doi.org/10.2307/1247625
See-To, E., Ngai, E.: An empirical study of payment technologies, the psychology of consumption, and spending behavior in a retailing context. Inf. Manag. 56(3), 329–342 (2019). https://doi.org/10.1016/j.im.2018.07.007
Korczak, J., Pondel, M., Sroka, W.: An approach to customer community discovery. In: Proceedings of Federated Conference on Computer Science and Information Systems (FedCSIS), ACSIS, vol. 18, pp. 675–683 (2019). https://doi.org/10.15439/2019F308
Rodriguez, M.Z., et al.: Clustering algorithms. A comparative approach. PLoS ONE 14(1), e0210236 (2019). https://doi.org/10.1371/journal.pone.0210236
Abbas, O.A.: Comparisons between data clustering algorithms. Int. Arab J. Inf. Technol. 5(3), 320–325 (2008)
Rossetti, G., Cazabet, R.: Community discovery in dynamic networks: a survey. Pre-print arXiv:1707.03186v2 [cs.SI] (2017). https://doi.org/10.1145/3172867
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979). https://doi.org/10.1109/TPAMI.1979.4766909
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974). https://doi.org/10.1080/01969727408546059
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974). https://doi.org/10.1080/03610927408827101
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)90125-7
Pondel, M., Korczak, J.: A view on the methodology of analysis and exploration of marketing data. In: Proceedings of Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1135–1143. IEEE (2017). https://doi.org/10.15439/2017F442
Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Database Syst. (TODS) 42(3), 19 (2017). https://doi.org/10.1145/3068335
McInnes, L., Healy, J.: UMAP: uniform manifold approximation and projection for dimension reduction. Preprint arXiv:1802.03426 (2018). https://doi.org/10.21105/joss.00861
Newman, M.E.J.: Detecting community structure in networks. Eur. Phys. J. B 38(2), 321–330 (2004). https://doi.org/10.1140/epjb/e2004-00124-y
Fortunato, S.: Community detection in graphs. Preprint arXiv:0906.0612 (2004). https://doi.org/10.1016/j.physrep.2009.11.002
Emmons, S., Kobourov, S., Gallant, M., Börner, K.: Analysis of network clustering algorithms and cluster quality metrics at scale. PLoS ONE 11(7), e0159161 (2016). https://doi.org/10.1371/journal.pone.0159161
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008
Waltman, L., Eck, N.J.: A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86(11), 1–14 (2013). https://doi.org/10.1140/epjb/e2013-40829-0
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks re-veal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008). https://doi.org/10.1073/pnas.0706851105
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Korczak, J., Pondel, M., Sroka, W. (2020). Discovery of Customer Communities – Evaluation Aspects. In: Ziemba, E. (eds) Information Technology for Management: Current Research and Future Directions. AITM ISM 2019 2019. Lecture Notes in Business Information Processing, vol 380. Springer, Cham. https://doi.org/10.1007/978-3-030-43353-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-43353-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43352-9
Online ISBN: 978-3-030-43353-6
eBook Packages: Computer ScienceComputer Science (R0)