Skip to main content

A Social Spam Detection Framework via Semi-supervised Learning

  • Conference paper
  • First Online:
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9794))

Included in the following conference series:

Abstract

With the increasing popularity of social networking websites such as Twitter, Facebook, Sina Weibo and MySpace, spammers on them are getting more and more rampant. Social spammers always create a mass of compromised or fake accounts to deceive users and lead them to access malicious websites which contain illegal, pornography or dangerous information. As we all know, most of the studies on social spam detection are based on supervised machine learning which requires plenty of annotated datasets. Unfortunately, labeling a large number of datasets manually is a complex, error-prone and tedious task which may costs a lot of human efforts and time. In this paper, we propose a novel semi-supervised classification framework for social spam detection, which combines co-training with k-medoids. First we utilize k-medoids clustering algorithm to acquire some informative and presentative samples for labelling as our initial seeds set. Then we take advantage of the content features and behavior features of users for our co-training classification framework. In order to illustrate the effectiveness of k-medoids, we compare the performance with random selecting strategy. Finally, we evaluate the effectiveness of our proposed detection framework compared with several classical supervised algorithms.

This work was supported by National Science Foundation of China (No. 61272374, 61300190) and 863 Project (No. 2015AA015463).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amleshwaram, A.A., Reddy, N., Yadav, S., Gu, G., Yang, C.: Cats: characterizing automation of Twitter spammers. In: 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–10. IEEE (2013)

    Google Scholar 

  2. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS) (2010)

    Google Scholar 

  3. Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 620–627. ACM (2009)

    Google Scholar 

  4. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)

    Google Scholar 

  5. Chen, F., Tan, P., Jain, A.: A co-classification framework for detecting web spam and spammers in social media web sites. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 1807–1810. ACM (2009)

    Google Scholar 

  6. Du, J., Ling, C.X., Zhou, Z.H.: When does cotraining work in real data? IEEE Trans. Knowl. Data Eng. 23(5), 788–799 (2011)

    Article  Google Scholar 

  7. Fu, H., Xie, X., Rui, Y.: Leveraging careful microblog users for spammer detection. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 419–429. International World Wide Web Conferences Steering Committee (2015)

    Google Scholar 

  8. Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.: Detecting and characterizing social spam campaigns. In: Proceedings of the 10th Annual Conference on Internet Measurement (IMC), pp. 35–47. ACM (2010)

    Google Scholar 

  9. Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: ICML, pp. 327–334. Citeseer (2000)

    Google Scholar 

  10. Grier, C., Thomas, K., Paxson, V., Zhang, M.: @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS), pp. 27–37. ACM (2010)

    Google Scholar 

  11. Heymann, P., Koutrika, G., Garcia-Molina, H.: Fighting spam on social web sites: a survey of approaches and future challenges. IEEE Internet Comput. 11(6), 36–45 (2007)

    Article  Google Scholar 

  12. Kiritchenko, S., Matwin, S.: Email classification with co-training. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pp. 301–312. IBM Corp. (2011)

    Google Scholar 

  13. Lee, K., Caverlee, J., Cheng, Z., Sui, D.: Content-driven detection of campaigns in social media. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 551–556. ACM (2011)

    Google Scholar 

  14. Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+machine learning. In: Proceeding of the 33rd International ACM (SIGIR) Conference on Research and Development in Information Retrieval, pp. 435–442. ACM (2010)

    Google Scholar 

  15. Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: KDD, pp. 1023–1031 (2012)

    Google Scholar 

  16. Li, Z., Zhang, X., Shen, H., Liang, W., He, Z.: A semi-supervised framework for social spammer detection. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 177–188. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  17. Lin, C., He, J., Zhou, Y., Yang, X., Chen, K., Song, L.: Analysis and identification of spamming behaviors in sina weibo microblog. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, p. 5. ACM (2013)

    Google Scholar 

  18. Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–9. ACM (2010)

    Google Scholar 

  19. Wang, A.: Don’t follow me: spam detection in Twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10. IEEE (2010)

    Google Scholar 

  20. Wang, B., Zubiaga, A., Liakata, M., Procter, R.: Making the most of tweet-inherent features for social spam detection on Twitter. arXiv preprint arXiv:1503.07405 (2015)

  21. Wang, D., Irani, D., Pu, C.: A social-spam detection framework. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), pp. 46–54. ACM (2011)

    Google Scholar 

  22. Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8(8), 1280–1293 (2013)

    Article  Google Scholar 

  23. Zhang, Q., Zhang, C., Cai, P., Qian, W., Zhou, A.: Detecting spamming groups in social media based on latent graph. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 294–305. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  24. Zhang, X., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in the Twitter social network. In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining, pp. 1194–1199. IEEE Computer Society (2012)

    Google Scholar 

  25. Zhou, Z.H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)

    Article  Google Scholar 

  26. Zhou, Z.H., Li, M.: Semi-supervised learning by disagreement. Knowl. Inf. Syst. 24(3), 415–439 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenxin Liang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, X., Bai, H., Liang, W. (2016). A Social Spam Detection Framework via Semi-supervised Learning. In: Cao, H., Li, J., Wang, R. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9794. Springer, Cham. https://doi.org/10.1007/978-3-319-42996-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42996-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42995-3

  • Online ISBN: 978-3-319-42996-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics