Skip to main content
Log in

Co-Detection of crowdturfing microblogs and spammers in online social networks

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The rise of online crowdsourcing services has prompted an evolution from traditional spamming accounts, which spread unwanted advertisements and fraudulent content, into novel spammers that resemble those of normal users. Prior research has mainly focused on machine accounts and spams separately, but characteristics of new types of spammers and spamming make it difficult for traditional methods to perform well. In this paper, we integrate the study of these new types of spammers with the study of crowdturfing microblogs, investigating the mechanism of crowdsourcing and the close relationship between crowdturfing spammers and microblogs in order to detect new types of spammers and spams more precisely. We propose a novel semi-supervised learning framework for co-detecting crowdturfing microblogs and spammers by comprehensively modeling user behavior, message content, and users’ following and retweeting networks. In order to meet the challenge of sparsely labeled datasets, we design an elaborate co-detection target optimal function to minimize empirical error and to permit the dissemination of sparse labels to unlabeled samples. The advantage of this framework is threefold. First, through a deep-level mining of new-type spammers, we aggregate a number of new-found features that can help us make significant distinctions between normal users and new-type spammers. Secondly, by modeling both following networks and retweeting networks, we characterize the essence of the crowdsourcing mechanism abused by spammers in crowdturfing microblog diffusion to markedly increase detection performance. Thirdly, through our optimal function based on semi-supervised methods, we overcome the problem of label sparseness, thus obtaining a more reliable capacity to deal with the challenges of big, sparsely labeled data. Extensive experiments on real datasets demonstrate that our method outperforms four baselines in various metrics (Precision-Recall, AUC values, Precision@K and so on). We also develop a robust system, the functions of which include data collection and availability analysis, spam and spammer detection, and visualization. To render our experiments replicable, we have made our dataset and codes openly available at https://github.com/sunxiangguo/Crowdturfing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20

Similar content being viewed by others

Notes

  1. https://www.mturk.com

  2. https://www.freelancer.com

  3. http://www.sandaha.cc

References

  1. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), Vol. 6, p. 12 (2010)

  2. Brown, G., Howe, T., Ihbe, M., Prakash, A., Borders, K.: Social networks and context-aware spam. In: Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, pp. 403–412. ACM (2008)

  3. Chen, T., Li, X., Yin, H., Zhang, J.: Call attention to rumors: deep attention based recurrent neural networks for early rumor detection. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 40–52. Springer (2018)

  4. Chung, F.: Laplacians and the cheeger inequality for directed graphs. Ann. Comb. 9(1), 1–19 (2005)

    Article  MathSciNet  Google Scholar 

  5. Ding, C.H., Li, T., Jordan, M.I.: Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 45–55 (2010)

    Article  Google Scholar 

  6. Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N., Gummadi, K.P.: Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st International Conference on World Wide Web, pp. 61–70. ACM (2012)

  7. Hu, X., Tang, J., Gao, H., Liu, H.: Social spammer detection with sentiment information. In: 2014 IEEE International Conference on Data Mining (ICDM), pp. 180–189. IEEE (2014)

  8. Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: Catching synchronized behaviors in large networks: a graph mining approach. ACM Trans. Knowl. Discov. Data (TKDD) 10(4), 35 (2016)

    Google Scholar 

  9. Kim, H.J., Chae, D.K., Kim, S.W., Lee, J.: Analyzing crowdsourced promotion effects in online social networks. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 820–823. ACM (2016)

  10. Lee, K., Webb, S., Ge, H.: The dark side of micro-task marketplaces: characterizing fiverr and automatically detecting crowdturfing. In: ICWSM (2014)

  11. Li, H., Chen, Z., Mukherjee, A., Liu, B., Shao, J.: Analyzing and detecting opinion spam on a large-scale dataset via temporal and spatial patterns. In: ICWSM, pp. 634–637 (2015)

  12. Liu, B., Luo, J., Cao, J., Ni, X., Liu, B., Fu, X.: On crowd-retweeting spamming campaign in social networks. In: 2016 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2016)

  13. Liu, B., Ni, Z., Luo, J., Cao, J., Ni, X., Liu, B., Fu, X.: Analysis of and defense against crowd-retweeting based spam in social networks. World Wide Web, pp. 1–23 (2018)

  14. Liu, L., Jia, K.: Detecting spam in chinese microblogs-a study on sina weibo. In: 2012 Eighth International Conference on Computational Intelligence and Security (CIS), pp. 578–581. IEEE (2012)

  15. Liu, Y., Liu, Y., Zhang, M., Ma, S.: Pay Me and I’ll follow you: detection of crowdturfing following activities in microblog environment. In: IJCAI, pp. 3789–3796 (2016)

  16. Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Tech. rep., Stanford InfoLab (1999)

  17. Shu, K., Wang, S., Le, T., Lee, D., Liu, H.: Deep headline generation for clickbait detection. In: ICDM, pp. 467–476. IEEE Computer Society (2018)

  18. Song, J., Lee, S., Kim, J.: Crowdtarget: Target-based detection of crowdturfing in online social networks. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 793–804. ACM (2015)

  19. Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–9. ACM (2010)

  20. Tam, N.T., Weidlich, M., Zheng, B., Yin, H., Nguyen, Q.V.H., Stantic, B.: From anomaly detection to rumour detection using data streams of social platforms. In: Proceedings of the Forty-fifth International Conference on Very Large Data Bases (VLDB’19). CEUR-WS.org (2019)

    Article  Google Scholar 

  21. Thanh Tam, N., Matthias, W., Hongzhi, Y., Bolong, Z., Quoc Viet, H.N., Bela, S.: User guidance for efficient fact checking. In: Proceedings of the Forty-fifth International Conference on Very Large Data Bases (VLDB’19). CEUR-WS.org (2019)

  22. Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., Zhao, B.Y.: Serf and turf: crowdturfing for fun and profit. In: Proceedings of the 21st International Conference on World Wide Web, pp. 679–688. ACM (2012)

  23. Wang, T., Wang, G., Li, X., Zheng, H., Zhao, B.Y.: Characterizing and Detecting Malicious Crowdsourcing. In: ACM SIGCOMM Computer Communication Review, Vol. 43, pp. 537–538. ACM (2013)

  24. Wu, F., Shu, J., Huang, Y., Yuan, Z.: Social spammer and spam message co-detection in microblogging with social context regularization. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1601–1610. ACM (2015)

  25. Yang, X., Yang, Q., Wilson, C.: Penny for Your Thoughts: Searching for the 50 Cent Party on Sina Weibo. In: ICWSM, pp. 694–697 (2015)

  26. Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. ACM Trans. Knowl. Discov. Data (TKDD) 8(1), 2 (2014)

    Google Scholar 

  27. Yuan, D., Li, G., Li, Q., Zheng, Y.: Sybil defense in crowdsourcing platforms. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1529–1538. ACM (2017)

  28. Zhu, Y., Wang, X., Zhong, E., Liu, N.N., Li, H., Yang, Q.: Discovering spammers in social networks. In: AAAI (2012)

Download references

Acknowledgments

This work is supported by National Key R&D Program of China 2017YFB1003000, National Natural Science Foundation of China under Grants No. 61972087, No. 61772133, No.61472081, No. 61402104. Jiangsu Provincial Key Project BE2018706. Key Laboratory of Computer Network Technology of Jiangsu Province. Jiangsu Provincial Key Laboratory of Network and Information Security under Grants No. BM2003201, and Key Laboratory of Computer Network and Information Integration of Ministry of Education of China under Grants No. 93K-9.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Trust, Privacy, and Security in Crowdsourcing Computing

Guest Editors: An Liu, Guanfeng Liu, Mehmet A. Orgun, and Qing Li

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, B., Sun, X., Ni, Z. et al. Co-Detection of crowdturfing microblogs and spammers in online social networks. World Wide Web 23, 573–607 (2020). https://doi.org/10.1007/s11280-019-00727-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-019-00727-4

Keywords

Navigation