Abstract
With the increasing popularity of social networking websites such as Twitter, Facebook, Sina Weibo and MySpace, spammers on them are getting more and more rampant. Social spammers always create a mass of compromised or fake accounts to deceive users and lead them to access malicious websites which contain illegal, pornography or dangerous information. As we all know, most of the studies on social spam detection are based on supervised machine learning which requires plenty of annotated datasets. Unfortunately, labeling a large number of datasets manually is a complex, error-prone and tedious task which may costs a lot of human efforts and time. In this paper, we propose a novel semi-supervised classification framework for social spam detection, which combines co-training with k-medoids. First we utilize k-medoids clustering algorithm to acquire some informative and presentative samples for labelling as our initial seeds set. Then we take advantage of the content features and behavior features of users for our co-training classification framework. In order to illustrate the effectiveness of k-medoids, we compare the performance with random selecting strategy. Finally, we evaluate the effectiveness of our proposed detection framework compared with several classical supervised algorithms.
This work was supported by National Science Foundation of China (No. 61272374, 61300190) and 863 Project (No. 2015AA015463).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amleshwaram, A.A., Reddy, N., Yadav, S., Gu, G., Yang, C.: Cats: characterizing automation of Twitter spammers. In: 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–10. IEEE (2013)
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS) (2010)
Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 620–627. ACM (2009)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Chen, F., Tan, P., Jain, A.: A co-classification framework for detecting web spam and spammers in social media web sites. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 1807–1810. ACM (2009)
Du, J., Ling, C.X., Zhou, Z.H.: When does cotraining work in real data? IEEE Trans. Knowl. Data Eng. 23(5), 788–799 (2011)
Fu, H., Xie, X., Rui, Y.: Leveraging careful microblog users for spammer detection. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 419–429. International World Wide Web Conferences Steering Committee (2015)
Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.: Detecting and characterizing social spam campaigns. In: Proceedings of the 10th Annual Conference on Internet Measurement (IMC), pp. 35–47. ACM (2010)
Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: ICML, pp. 327–334. Citeseer (2000)
Grier, C., Thomas, K., Paxson, V., Zhang, M.: @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS), pp. 27–37. ACM (2010)
Heymann, P., Koutrika, G., Garcia-Molina, H.: Fighting spam on social web sites: a survey of approaches and future challenges. IEEE Internet Comput. 11(6), 36–45 (2007)
Kiritchenko, S., Matwin, S.: Email classification with co-training. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pp. 301–312. IBM Corp. (2011)
Lee, K., Caverlee, J., Cheng, Z., Sui, D.: Content-driven detection of campaigns in social media. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 551–556. ACM (2011)
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+machine learning. In: Proceeding of the 33rd International ACM (SIGIR) Conference on Research and Development in Information Retrieval, pp. 435–442. ACM (2010)
Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: KDD, pp. 1023–1031 (2012)
Li, Z., Zhang, X., Shen, H., Liang, W., He, Z.: A semi-supervised framework for social spammer detection. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 177–188. Springer, Heidelberg (2015)
Lin, C., He, J., Zhou, Y., Yang, X., Chen, K., Song, L.: Analysis and identification of spamming behaviors in sina weibo microblog. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, p. 5. ACM (2013)
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–9. ACM (2010)
Wang, A.: Don’t follow me: spam detection in Twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10. IEEE (2010)
Wang, B., Zubiaga, A., Liakata, M., Procter, R.: Making the most of tweet-inherent features for social spam detection on Twitter. arXiv preprint arXiv:1503.07405 (2015)
Wang, D., Irani, D., Pu, C.: A social-spam detection framework. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), pp. 46–54. ACM (2011)
Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8(8), 1280–1293 (2013)
Zhang, Q., Zhang, C., Cai, P., Qian, W., Zhou, A.: Detecting spamming groups in social media based on latent graph. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 294–305. Springer, Heidelberg (2015)
Zhang, X., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in the Twitter social network. In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining, pp. 1194–1199. IEEE Computer Society (2012)
Zhou, Z.H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Zhou, Z.H., Li, M.: Semi-supervised learning by disagreement. Knowl. Inf. Syst. 24(3), 415–439 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, X., Bai, H., Liang, W. (2016). A Social Spam Detection Framework via Semi-supervised Learning. In: Cao, H., Li, J., Wang, R. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9794. Springer, Cham. https://doi.org/10.1007/978-3-319-42996-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-42996-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42995-3
Online ISBN: 978-3-319-42996-0
eBook Packages: Computer ScienceComputer Science (R0)