International Conference on Web-Age Information Management

WAIM 2015: Web-Age Information Management pp 29-40 | Cite as

Spammer Detection on Online Social Networks Based on Logistic Regression

  • Xiang ZhuEmail author
  • Yuanping Nie
  • Songchang Jin
  • Aiping Li
  • Yan Jia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9391)


Millions of users generate and propagate information in online social network. Search engines and data mining tools allow people to track hot topics and events online. However, the massive use of social media also makes it easier for malicious users, known as social spammers, to occupy social network with junk information. To solve this problem, a classifier is needed to detect social spammers. One effective way for spammer detection is based on contents and user information. Nevertheless, social spammers are tricky and able to fool the system with evolving their contents and information. Firstly, social spammers continually change their patterns to deceive detecting system. Secondly, spammers will try to gain influence and disguise themselves as far as possible. Due to the dynamic pattern of social spammers, it is difficult for existing methods to effectively and efficiently respond to social spammers. In this paper, we present a model based on logistic regression considering content attributes and behavior attributes of users in social network. Analyses of user attributes are made to differentiate spammers and non-spammers inherently. Experimental results on Twitter data show the effectiveness and efficiency of the proposed method.


Social network Social spammer Classifier Logistic regression 


  1. 1.
    Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)Google Scholar
  2. 2.
    Benevenuto, F., Rodrigues, T., Almeida, V.A., Almeida, J., Gonçalves, M., Ross, K.: Video pollution on the web. First Monday 15(4), 1–20 (2010)CrossRefGoogle Scholar
  3. 3.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  4. 4.
    Calais, P., Pires, D.E., Neto, D.O.G., Meira Jr., W., Hoepers, C., Steding-Jessen, K.: A campaign-based characterization of spamming strategies. In: CEAS (2008)Google Scholar
  5. 5.
    Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: Detection of hidden paid posters. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 116–120. ACM (2013)Google Scholar
  6. 6.
    Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 21–30. ACM (2010)Google Scholar
  7. 7.
    Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)Google Scholar
  8. 8.
    Fetterly, D., Manasse, M., Najork, M.: Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In: Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004, pp. 1–6. ACM (2004)Google Scholar
  9. 9.
    Genkin, A., Lewis, D.D., Madigan, D.: Large-scale bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Hosmer, D.W., Lemeshow, S., Sturdivant, R.X.: Introduction to the logistic regression model. Wiley Online Library (2000)Google Scholar
  11. 11.
    Hu, X., Tang, J., Liu, H.: Online social spammer detection. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)Google Scholar
  12. 12.
    Islam, M.S., Mahmud, A.A., Islam, M.R.: Machine learning approaches for modeling spammer behavior. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 251–260. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  13. 13.
    Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)Google Scholar
  14. 14.
    Pal, A., Counts, S.: Identifying topical authorities in microblogs. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 45–54. ACM (2011)Google Scholar
  15. 15.
    Ron, K., Foster, P.: Special issue on applications of machine learning and the knowledge discovery process. J. Mach. Learn. 30, 271–274 (1998)CrossRefGoogle Scholar
  16. 16.
    Sadowski, C., Levin, G.: Simhash: Hash-based similarity detection. Technical report, Google (2007)Google Scholar
  17. 17.
    Sumner, M., Frank, E., Hall, M.: Speeding up logistic model tree induction. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 675–683. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  18. 18.
    Yun, Z., Quan, Z., Caixin, S., Shaolan, L., Yuming, L., Yang, S.: RBF neural network and ANFIS-based short-term load forecasting approach in real-time price environment. IEEE Trans. Power Syst. 23(3), 853–858 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Xiang Zhu
    • 1
    Email author
  • Yuanping Nie
    • 1
  • Songchang Jin
    • 1
  • Aiping Li
    • 1
  • Yan Jia
    • 1
  1. 1.College of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations