Skip to main content

An Evaluation of Community Detection Algorithms on Large-Scale Email Traffic

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7276))

Abstract

Community detection algorithms are widely used to study the structural properties of real-world networks. In this paper, we experimentally evaluate the qualitative performance of several community detection algorithms using large-scale email networks. The email networks were generated from real email traffic and contain both legitimate email (ham) and unsolicited email (spam). We compare the quality of the algorithms with respect to a number of structural quality functions and a logical quality measure which assesses the ability of the algorithms to separate ham and spam emails by clustering them into distinct communities. Our study reveals that the algorithms that perform well with respect to structural quality, don’t achieve high logical quality. We also show that the algorithms with similar structural quality also have similar logical quality regardless of their approach to clustering. Finally, we reveal that the algorithm that performs link community detection is more suitable for clustering email networks than the node-based approaches, and it creates more distinct communities of ham and spam edges.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahn, Y.-Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature 466(7307), 761–764 (2010)

    Article  Google Scholar 

  2. Almeida, H., Guedes, D., Meira Jr., W., Zaki, M.J.: Is There a Best Quality Metric for Graph Clusters? In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS, vol. 6911, pp. 44–59. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  3. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008(10), P10008 (2008)

    Google Scholar 

  4. Brandes, U., Gaertler, M., Wagner, D.: Experiments on Graph Clustering Algorithms. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 568–579. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Danon, L., Díaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 2005(09), P09008 (2005)

    Google Scholar 

  6. Delling, D., Gaertler, M., Robert, G., Nikoloski, Z., Wagner, D.: How to Evaluate Clustering Techniques. Technical report, no. 2006-4, Universität Karlsruhe (2006)

    Google Scholar 

  7. Evans, T., Lambiotte, R.: Line graphs, link partitions, and overlapping communities. Physical Review E 80(1), 1–8 (2009)

    Google Scholar 

  8. Fortunato, S.: Community detection in graphs. Physics Reports 486(3-5), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  9. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America 99(12), 7821–7826 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  10. Guimerà, R., Danon, L., Díaz-Guilera, A., Giralt, F., Arenas, A.: Self-similar community structure in a network of human interactions. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics 68(6 pt. 2), 065103 (2003)

    Google Scholar 

  11. Kannan, R., Vempala, S., Veta, A.: On clusterings-good, bad and spectral. In: Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 367–377. IEEE Comput. Soc. (2000)

    Google Scholar 

  12. Lancichinetti, A., Fortunato, S.: Community detection algorithms: A comparative analysis. Physical Review E 80(5), 1–11 (2009)

    Article  Google Scholar 

  13. Lancichinetti, A., Kivelä, M., Saramäki, J., Fortunato, S.: Characterizing the community structure of complex networks. PloS One 5(8), e11976 (2010)

    Google Scholar 

  14. Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, p. 631. ACM Press, New York (2010)

    Chapter  Google Scholar 

  15. Moradi, F., Almgren, M., John, W., Olovsson, T., Tsigas, P.: On Collection of Large-Scale Multi-Purpose Datasets on Internet Backbone Links. In: Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (2011)

    Google Scholar 

  16. Moradi, F., Olovsson, T., Tsigas, P.: Structural and Temporal Properties of E-mail and Spam Networks. Technical report, no. 2011-18, Chalmers University of Technology (2011)

    Google Scholar 

  17. Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69(2), 1–15 (2004)

    Article  Google Scholar 

  18. Ronhovde, P., Nussinov, Z.: Multiresolution community detection for megascale networks by information-based replica correlations. Physical Review E 80(1), 1–18 (2009)

    Article  Google Scholar 

  19. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences of the United States of America 105(4), 1118–1123 (2008)

    Article  Google Scholar 

  20. Rosvall, M., Bergstrom, C.T.: Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PloS One 6(4), e18209 (2011)

    Google Scholar 

  21. Schaeffer, S.E.: Graph clustering. Computer Science Review 1(1), 27–64 (2007)

    Article  MathSciNet  Google Scholar 

  22. Tibély, G., Kovanen, L., Karsai, M., Kaski, K., Kertész, J., Saramäki, J.: Communities and beyond: Mesoscopic analysis of a large social network with complementary methods. Physical Review E 83(5), 1–10 (2011)

    Article  Google Scholar 

  23. Van Dongen, S.: Graph clustering by flow simulation. PhD thesis, University of Utrecht, The Netherlands (2000)

    Google Scholar 

  24. Viswanath, B., Post, A., Gummadi, K.P., Mislove, A.: An analysis of social network-based Sybil defenses. In: Proceedings of the ACM SIGCOMM 2010 Conference, p. 363. ACM Press, New York (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moradi, F., Olovsson, T., Tsigas, P. (2012). An Evaluation of Community Detection Algorithms on Large-Scale Email Traffic. In: Klasing, R. (eds) Experimental Algorithms. SEA 2012. Lecture Notes in Computer Science, vol 7276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30850-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30850-5_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30849-9

  • Online ISBN: 978-3-642-30850-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics