Skip to main content

Discovery of Customer Communities – Evaluation Aspects

  • Conference paper
  • First Online:
Information Technology for Management: Current Research and Future Directions (AITM 2019, ISM 2019)

Abstract

In the paper, a new multi-level hybrid method of community detection combining a density-based clustering with a label propagation method is evaluated and compared with the k-means benchmark and DBSCAN (Density-based spatial clustering of applications with noise). In spite of the sophisticated visualization methods, managers still usually find clustering results too difficult to evaluate and interpret. The article presents a set of key assessment measures that could be used to evaluate internal and external qualities of discovered clusters. The approach is validated on real life marketing database using advanced analytics platform, Upsaily.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Between-Cluster Dispersion can be calculated as \( BCD\left( n \right) = \mathop \sum \nolimits_{i} \overline{{c_{i} }} \cdot d^{2} (c_{i} ,c) \), where \( n \) is the number of clusters, \( d\left( {c_{i} , c} \right) \) is the distance between centroid of the cluster \( c_{i} \) and the global center of all clusters, \( \overline{{c_{i} }} \) is the number of elements in the cluster \( c_{i} \).

  2. 2.

    Within-Cluster Dispersion can be calculated as \( WCD\left( n \right) = \mathop \sum \nolimits_{i} \mathop \sum \nolimits_{{x \in c_{i} }} d^{2} \left( {x, c_{i} } \right) \), where \( n \) is the number of clusters, \( x \) is an element of the cluster \( c_{i} \), \( d\left( {x, c_{i} } \right) \) is the distance between centroid of the cluster \( c_{i} \) and the element \( x \) belonging to the cluster \( c_{i} \).

  3. 3.

    The code for Dunn index calculation was found on GitHub: Dunn index for clusters analysis - https://gist.github.com/douglasrizzo/cd7e792ff3a2dcaf27f6. Computing of Dunn index is relatively simple, authors verified published code before it was used in order to prove its validity.

  4. 4.

    Implementation of Davies–Bouldin index was taken from Scikit learn library: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html.

  5. 5.

    Implementation of Silhouette index was taken from Scikit learn library: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html.

  6. 6.

    Implementation of Calinski-Harabasz index was taken from Scikit learn library: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html.

References

  1. Wu, Z.H., et al.: Balanced multi-label propagation for overlapping community detection in social networks. J. Comput. Sci. Technol. 27(3), 468–479 (2012). https://doi.org/10.1007/s11390-012-1236-x

    Article  MathSciNet  Google Scholar 

  2. Barber, M.J.: Modularity and community detection in bipartite networks. Phys. Rev. E 76(6), 066102 (2007). https://doi.org/10.1103/PhysRevE.76.066102

    Article  MathSciNet  Google Scholar 

  3. Codaasco, G., Gargano, L.: Label propagation algorithm: a semi-synchronous approach. Int. J. Soc. Netw. Min. 1(1), 3–26 (2011). https://doi.org/10.1504/IJSNM.2012.045103

    Article  Google Scholar 

  4. Gregory, S.: Finding overlapping communities in networks by label propagation. New J. Phys. 12, 103018 (2010). https://doi.org/10.1088/1367-2630/12/10/103018

    Article  Google Scholar 

  5. Han, J., Li, W., Su, Z., Zhao, L., Deng, W.: Community detection by label propagation with compression of flow. e-print arXiv:161202463v1 (2016). https://doi.org/10.1140/epjb/e2016-70264-6

  6. Liu, W., Jiang, X., Pellegrini, M., Wang X.: Discovering communities in complex networks by edge label propagation. Sci. Rep. 6 (2016). https://doi.org/10.1038/srep22470

  7. Rossetti, G., Cazabet, R.: Community discovery in dynamic networks: a survey. arXiv:1707.03186 (2017). https://doi.org/10.1145/3172867

    Article  Google Scholar 

  8. Subelj, L., Bajec, M.: Group detection in complex networks: an algorithm and comparison of the state of the art. Physica A 397, 144–156 (2014). https://doi.org/10.1016/j.physa.2013.12.003

    Article  MathSciNet  MATH  Google Scholar 

  9. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC, New York (2013). ISBN 978-1466558212

    Book  Google Scholar 

  10. Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. SIAM Series (2007). https://doi.org/10.1137/1.9780898718348

  11. Witten, I.H., et al.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)

    MATH  Google Scholar 

  12. Pondel, M., Korczak, J.: Recommendations based on collective intelligence – case of customer segmentation. In: Ziemba, E. (ed.) AITM/ISM 2018. LNBIP, vol. 346, pp. 73–92. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15154-6_5

    Chapter  Google Scholar 

  13. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 036106 (2007). https://doi.org/10.1103/PhysRevE.76.036106

    Article  Google Scholar 

  14. Rosvall, M., Bergstorm, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. 104, 7327–7331 (2007). https://doi.org/10.1073/pnas.0611034104

    Article  Google Scholar 

  15. Xie, J.R., Szymanski, B.K.: LabelRank: a stabilized label propagation algorithm for community detection in networks. In: Proceedings of the IEEE, Network Science Workshop, pp. 386–399 (2014). https://doi.org/10.1109/NSW.2013.6609210

  16. Korczak, J., Pondel, M.: Kolektywna klasteryzacja danych marketingowych - System rekomendacji UPSAILY. Przegląd Organizacji 1, 42–52 (2019)

    Article  Google Scholar 

  17. Applebaum, W.: Studying customer behavior in retail stores. J. Mark. 16(2), 172–178 (1951). https://doi.org/10.2307/1247625

    Article  Google Scholar 

  18. See-To, E., Ngai, E.: An empirical study of payment technologies, the psychology of consumption, and spending behavior in a retailing context. Inf. Manag. 56(3), 329–342 (2019). https://doi.org/10.1016/j.im.2018.07.007

    Article  Google Scholar 

  19. Korczak, J., Pondel, M., Sroka, W.: An approach to customer community discovery. In: Proceedings of Federated Conference on Computer Science and Information Systems (FedCSIS), ACSIS, vol. 18, pp. 675–683 (2019). https://doi.org/10.15439/2019F308

  20. Rodriguez, M.Z., et al.: Clustering algorithms. A comparative approach. PLoS ONE 14(1), e0210236 (2019). https://doi.org/10.1371/journal.pone.0210236

    Article  Google Scholar 

  21. Abbas, O.A.: Comparisons between data clustering algorithms. Int. Arab J. Inf. Technol. 5(3), 320–325 (2008)

    Google Scholar 

  22. Rossetti, G., Cazabet, R.: Community discovery in dynamic networks: a survey. Pre-print arXiv:1707.03186v2 [cs.SI] (2017). https://doi.org/10.1145/3172867

    Article  Google Scholar 

  23. Davies, D., Bouldin, D.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979). https://doi.org/10.1109/TPAMI.1979.4766909

    Article  Google Scholar 

  24. Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974). https://doi.org/10.1080/01969727408546059

    Article  MathSciNet  MATH  Google Scholar 

  25. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974). https://doi.org/10.1080/03610927408827101

    Article  MathSciNet  MATH  Google Scholar 

  26. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)90125-7

    Article  MATH  Google Scholar 

  27. Pondel, M., Korczak, J.: A view on the methodology of analysis and exploration of marketing data. In: Proceedings of Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1135–1143. IEEE (2017). https://doi.org/10.15439/2017F442

  28. Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Database Syst. (TODS) 42(3), 19 (2017). https://doi.org/10.1145/3068335

    Article  MathSciNet  Google Scholar 

  29. McInnes, L., Healy, J.: UMAP: uniform manifold approximation and projection for dimension reduction. Preprint arXiv:1802.03426 (2018). https://doi.org/10.21105/joss.00861

    Article  Google Scholar 

  30. Newman, M.E.J.: Detecting community structure in networks. Eur. Phys. J. B 38(2), 321–330 (2004). https://doi.org/10.1140/epjb/e2004-00124-y

    Article  Google Scholar 

  31. Fortunato, S.: Community detection in graphs. Preprint arXiv:0906.0612 (2004). https://doi.org/10.1016/j.physrep.2009.11.002

    Article  MathSciNet  Google Scholar 

  32. Emmons, S., Kobourov, S., Gallant, M., Börner, K.: Analysis of network clustering algorithms and cluster quality metrics at scale. PLoS ONE 11(7), e0159161 (2016). https://doi.org/10.1371/journal.pone.0159161

    Article  Google Scholar 

  33. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008

    Article  MATH  Google Scholar 

  34. Waltman, L., Eck, N.J.: A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86(11), 1–14 (2013). https://doi.org/10.1140/epjb/e2013-40829-0

    Article  Google Scholar 

  35. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks re-veal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008). https://doi.org/10.1073/pnas.0706851105

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jerzy Korczak , Maciej Pondel or Wiktor Sroka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Korczak, J., Pondel, M., Sroka, W. (2020). Discovery of Customer Communities – Evaluation Aspects. In: Ziemba, E. (eds) Information Technology for Management: Current Research and Future Directions. AITM ISM 2019 2019. Lecture Notes in Business Information Processing, vol 380. Springer, Cham. https://doi.org/10.1007/978-3-030-43353-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43353-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43352-9

  • Online ISBN: 978-3-030-43353-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics