A Social Spam Detection Framework via Semi-supervised Learning

Zhang, Xianchao; Bai, Haijun; Liang, Wenxin

doi:10.1007/978-3-319-42996-0_18

Xianchao Zhang¹⁶,
Haijun Bai¹⁶ &
Wenxin Liang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9794))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1220 Accesses
4 Citations

Abstract

With the increasing popularity of social networking websites such as Twitter, Facebook, Sina Weibo and MySpace, spammers on them are getting more and more rampant. Social spammers always create a mass of compromised or fake accounts to deceive users and lead them to access malicious websites which contain illegal, pornography or dangerous information. As we all know, most of the studies on social spam detection are based on supervised machine learning which requires plenty of annotated datasets. Unfortunately, labeling a large number of datasets manually is a complex, error-prone and tedious task which may costs a lot of human efforts and time. In this paper, we propose a novel semi-supervised classification framework for social spam detection, which combines co-training with k-medoids. First we utilize k-medoids clustering algorithm to acquire some informative and presentative samples for labelling as our initial seeds set. Then we take advantage of the content features and behavior features of users for our co-training classification framework. In order to illustrate the effectiveness of k-medoids, we compare the performance with random selecting strategy. Finally, we evaluate the effectiveness of our proposed detection framework compared with several classical supervised algorithms.

This work was supported by National Science Foundation of China (No. 61272374, 61300190) and 863 Project (No. 2015AA015463).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amleshwaram, A.A., Reddy, N., Yadav, S., Gu, G., Yang, C.: Cats: characterizing automation of Twitter spammers. In: 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–10. IEEE (2013)
Google Scholar
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS) (2010)
Google Scholar
Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 620–627. ACM (2009)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Google Scholar
Chen, F., Tan, P., Jain, A.: A co-classification framework for detecting web spam and spammers in social media web sites. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 1807–1810. ACM (2009)
Google Scholar
Du, J., Ling, C.X., Zhou, Z.H.: When does cotraining work in real data? IEEE Trans. Knowl. Data Eng. 23(5), 788–799 (2011)
Article Google Scholar
Fu, H., Xie, X., Rui, Y.: Leveraging careful microblog users for spammer detection. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 419–429. International World Wide Web Conferences Steering Committee (2015)
Google Scholar
Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.: Detecting and characterizing social spam campaigns. In: Proceedings of the 10th Annual Conference on Internet Measurement (IMC), pp. 35–47. ACM (2010)
Google Scholar
Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: ICML, pp. 327–334. Citeseer (2000)
Google Scholar
Grier, C., Thomas, K., Paxson, V., Zhang, M.: @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS), pp. 27–37. ACM (2010)
Google Scholar
Heymann, P., Koutrika, G., Garcia-Molina, H.: Fighting spam on social web sites: a survey of approaches and future challenges. IEEE Internet Comput. 11(6), 36–45 (2007)
Article Google Scholar
Kiritchenko, S., Matwin, S.: Email classification with co-training. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pp. 301–312. IBM Corp. (2011)
Google Scholar
Lee, K., Caverlee, J., Cheng, Z., Sui, D.: Content-driven detection of campaigns in social media. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 551–556. ACM (2011)
Google Scholar
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+machine learning. In: Proceeding of the 33rd International ACM (SIGIR) Conference on Research and Development in Information Retrieval, pp. 435–442. ACM (2010)
Google Scholar
Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: KDD, pp. 1023–1031 (2012)
Google Scholar
Li, Z., Zhang, X., Shen, H., Liang, W., He, Z.: A semi-supervised framework for social spammer detection. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 177–188. Springer, Heidelberg (2015)
Chapter Google Scholar
Lin, C., He, J., Zhou, Y., Yang, X., Chen, K., Song, L.: Analysis and identification of spamming behaviors in sina weibo microblog. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, p. 5. ACM (2013)
Google Scholar
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–9. ACM (2010)
Google Scholar
Wang, A.: Don’t follow me: spam detection in Twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10. IEEE (2010)
Google Scholar
Wang, B., Zubiaga, A., Liakata, M., Procter, R.: Making the most of tweet-inherent features for social spam detection on Twitter. arXiv preprint arXiv:1503.07405 (2015)
Wang, D., Irani, D., Pu, C.: A social-spam detection framework. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), pp. 46–54. ACM (2011)
Google Scholar
Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8(8), 1280–1293 (2013)
Article Google Scholar
Zhang, Q., Zhang, C., Cai, P., Qian, W., Zhou, A.: Detecting spamming groups in social media based on latent graph. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 294–305. Springer, Heidelberg (2015)
Chapter Google Scholar
Zhang, X., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in the Twitter social network. In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining, pp. 1194–1199. IEEE Computer Society (2012)
Google Scholar
Zhou, Z.H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Article Google Scholar
Zhou, Z.H., Li, M.: Semi-supervised learning by disagreement. Knowl. Inf. Syst. 24(3), 415–439 (2010)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Dalian University of Technology, Dalian, China
Xianchao Zhang, Haijun Bai & Wenxin Liang

Authors

Xianchao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haijun Bai
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxin Liang .

Editor information

Editors and Affiliations

New Mexico State University , Las Cruces, New Mexico, USA
Huiping Cao
University of Technology Sydney , Sydney, New South Wales, Australia
Jinyan Li
Massey University , Auckland, New Zealand
Ruili Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Bai, H., Liang, W. (2016). A Social Spam Detection Framework via Semi-supervised Learning. In: Cao, H., Li, J., Wang, R. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9794. Springer, Cham. https://doi.org/10.1007/978-3-319-42996-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-42996-0_18
Published: 15 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42995-3
Online ISBN: 978-3-319-42996-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics