Exploiting the Spam Correlations in Scalable Online Social Spam Detection

  • Hailu Xu
  • Liting HuEmail author
  • Pinchao Liu
  • Boyuan Guan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11513)


The huge amount of social spam from large-scale social networks has been a common phenomenon in the contemporary world. The majority of former research focused on improving the efficiency of identifying social spam from a limited size of data in the algorithm side, however, few of them target on the data correlations among large-scale distributed social spam and utilize the benefits from the system side. In this paper, we propose a new scalable system, named SpamHunter, which can utilize the spam correlations from distributed data sources to enhance the performance of large-scale social spam detection. It identifies the correlated social spam from various distributed servers/sources through DHT-based hierarchical functional trees. These functional trees act as bridges among data servers/sources to aggregate, exchange, and communicate the updated and newly emerging social spam with each other. Furthermore, by processing the online social logs instantly, it allows online streaming data to be processed in a distributed manner, which reduces the online detection latency and avoids the inefficiency of outdated spam posts. Our experimental results with real-world social logs demonstrate that SpamHunter reaches 95% F1 score in the spam detection, achieves high efficiency in scaling to a large amount of data servers with low latency.


Social spam detection DHT-based overlay 



We gratefully thank the anonymous reviewers for their feedback that significantly improved the paper. We thank Florida International University School of Computing and Information Sciences for the travel award to present this work.


  1. 1.
  2. 2.
  3. 3.
    Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211–36 (2017)CrossRefGoogle Scholar
  4. 4.
    Bhimani, J., Mi, N., Leeser, M.: Performance prediction techniques for scalable large data processing in distributed MPI systems. In: 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) (2016)Google Scholar
  5. 5.
    Breeden, A.: Child abduction rumors lead to violence against roma in france, March 2019.
  6. 6.
    Castro, M., Druschel, P., Kermarrec, A.M., Rowstron, A.I.: Scribe: a large-scale and decentralized application-level multicast infrastructure. IEEE J. Sel. Areas Commun. 20(8), 1489–1499 (2002)CrossRefGoogle Scholar
  7. 7.
    Chen, C., Wang, Y., Zhang, J., Xiang, Y., Zhou, W., Min, G.: Statistical features-based real-time detection of drifted Twitter spam. IEEE Trans. Inf. Forensics Secur. 12(4), 914–925 (2017)CrossRefGoogle Scholar
  8. 8.
    Coviello, L., et al.: Detecting emotional contagion in massivesocial networks. PloS ONE 9(3), e90315 (2014)CrossRefGoogle Scholar
  9. 9.
    Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.N.: Towards online spam filtering in social networks. In: NDSS vol. 12, pp. 1–16 (2012)Google Scholar
  10. 10.
    Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.Y.: Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. ACM (2010)Google Scholar
  11. 11.
    Hodas, N.O., Lerman, K.: The simple rules of social contagion. Sci. Rep. 4, 4343 (2014)CrossRefGoogle Scholar
  12. 12.
    Hoefler, T., Barak, A., Shiloh, A., Drezner, Z.: Corrected gossip algorithms for fast reliable broadcast on unreliable systems. In: Parallel and Distributed Processing Symposium (IPDPS) (2017)Google Scholar
  13. 13.
    Jiang, J., et al.: Understanding latent interactions in online social networks. ACM Trans. Web (TWEB) (2013)Google Scholar
  14. 14.
    Kayes, I., Iamnitchi, A.: Privacy and security in online social networks: a survey. Online Soc. Netw. Media 3–4, 1–21 (2017)Google Scholar
  15. 15.
    Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web. ACM (2012)Google Scholar
  16. 16.
    Pop, D., Iuhasz, G., Petcu, D.: Distributed platforms and cloud services: enabling machine learning for big data. In: Mahmood, Z. (ed.) Data Science and Big Data Computing, pp. 139–159. Springer, Cham (2016). Scholar
  17. 17.
    Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001). Scholar
  18. 18.
    Ruan, X., Wu, Z., Wang, H., Jajodia, S.: Profiling online social behaviors for compromised account detection. IEEE Trans. Inf. Forensics Secur. 11(1), 176–187 (2016)CrossRefGoogle Scholar
  19. 19.
    Salaria, S., Brown, K., Jitsumoto, H., Matsuoka, S.: Evaluation of HPC-big data applications using cloud platforms. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2017)Google Scholar
  20. 20.
    Sedhai, S., Sun, A.: Effect of spam on hashtag recommendation for Tweets. In: Proceedings of the 25th International Conference Companion on World Wide Web. pp. 97–98. International World Wide Web Conferences Steering Committee (2016)Google Scholar
  21. 21.
    Shehnepoor, S., Salehi, M., Farahbakhsh, R., Crespi, N.: NetSpam: a network-based spam detection framework for reviews in online social media. IEEE Trans. Inf. Forensics Secur. 12(7), 1585–1595 (2017)CrossRefGoogle Scholar
  22. 22.
    VanDam, C., Tan, P.N.: Detecting hashtag hijacking from Twitter. In: Proceedings of the 8th ACM Conference on Web Science. ACM (2016)Google Scholar
  23. 23.
    Viswanath, B., et al.: Towards detecting anomalous user behavior in online social networks. In: USENIX Security Symposium (2014)Google Scholar
  24. 24.
    Wang, A.H.: Don’t follow me: spam detection in Twitter. In: 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10. IEEE (2010)Google Scholar
  25. 25.
    Wang, D., Pu, C.: Bean: a behavior analysis approach of URL spam filtering in Twitter. In: 2015 IEEE International Conference on Information Reuse and Integration (IRI). IEEE (2015)Google Scholar
  26. 26.
    Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: TopicSketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)CrossRefGoogle Scholar
  27. 27.
    Xing, W., Jie, W., Tsoumakos, D., Ghanem, M.: A network approach for managing and processing big cancer data in clouds. Clust. Comput. 18(3), 1285–1294 (2015)CrossRefGoogle Scholar
  28. 28.
    Xu, H., Guan, B., Liu, P., Escudero, W., Hu, L.: Harnessing the nature of spam in scalable online social spam detection. In: 2018 IEEE International Conference on Big Data (Big Data). IEEE (2018)Google Scholar
  29. 29.
    Xu, H., et al.: Oases: an online scalable spam detection system for social networks. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD) (2018)Google Scholar
  30. 30.
    Xu, H., Sun, W., Javaid, A.: Efficient spam detection across online social networks. In: 2016 IEEE International Conference on Big Data Analysis (2016)Google Scholar
  31. 31.
    Zhang, J., Tang, J., Li, J., Liu, Y., Xing, C.: Who influenced you? Predicting retweet via social influence locality. ACM Trans. Knowl. Discov. Data (TKDD) 9(3), 25 (2015)Google Scholar
  32. 32.
    Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648. ACM (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computing and Information ScienceFlorida International UniversityMiamiUSA

Personalised recommendations