Novel Comment Spam Filtering Method on Youtube: Sentiment Analysis and Personality Recognition

  • Enaitz EzpeletaEmail author
  • Iñaki Garitano
  • Ignacio Arenaza-Nuño
  • José María Gómez Hidalgo
  • Urko Zurutuza
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10544)


The deeply entrenched use of Online Social Networks (OSNs), where millions of users share unconsciously any kind of personal data, offers a very attractive channel to attackers. They provide the possibility of sending spam messages through different channels (wall posts, comments, private messages). In this paper we propose a novel spam filtering method focused on social media spam. It aims to demonstrate that using sentiment analysis and personality recognition techniques, in order to analyze the content of the texts, the improvement of spam filtering results is possible. We add these features to each OSN spam both independently and jointly, and then we compare Bayesian spam filters with and without the new features in terms of the number of false positive and accuracy. At the end, the results of the top ten filtering classifiers have been improved, reducing also the number of false positives (26.69% on average), reaching an 82.55% of accuracy.


Spam Social spam Youtube Polarity Security Personality 



This work has been developed by the intelligent systems for industrial systems group supported by the Department of Education, Language policy and Culture of the Basque Government. It has been partially funded by the Basque Department of Education, Language policy and Culture under the project SocialSPAM (PI 2014 1 102).

We thank Mattias Östmar for the valuable tools developed and published. And we thank Jon Kâgström (Founder of uClassify ( for the opportunity to use their API for research purposes.


  1. 1.
    Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.Y.: Detecting and characterizing social spam campaigns. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, pp. 681–683. ACM, New York (2010)Google Scholar
  2. 2.
    Ezpeleta, E., Zurutuza, U., Gómez Hidalgo, J.M.: Does sentiment analysis help in Bayesian spam filtering? In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 79–90. Springer, Cham (2016). CrossRefGoogle Scholar
  3. 3.
    Ezpeleta, E., Zurutuza, U., Gómez Hidalgo, J.M.: Using personality recognition techniques to improve Bayesian spam filtering. Journal Procesamiento del Lenguaje Natural 57, 125–132 (2016)Google Scholar
  4. 4.
    Almaatouq, A., Shmueli, E., Nouh, M., Alabdulkareem, A., Singh, V.K., Alsaleh, M., Alarifi, A., Alfaris, A., Pentland, A.S.: If it looks like a spammer and behaves like a spammer, it must be a spammer: analysis and detection of microblogging spam accounts. Int. J. Inf. Secur. 15(5), 475–491 (2016)CrossRefGoogle Scholar
  5. 5.
    Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 2010, pp. 1–9. ACM, New York (2010)Google Scholar
  6. 6.
    Wang, D., Irani, D., Pu, C.: A social-spam detection framework. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-abuse and Spam Conference, pp. 46–54. ACM (2011)Google Scholar
  7. 7.
    Egele, M., Stringhini, G., Kruegel, C., Vigna, G.: COMPA: detecting compromised accounts on social networks. In: NDSS. The Internet Society (2013)Google Scholar
  8. 8.
    Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.N.: Towards online spam filtering in social networks. In: NDSS. The Internet Society (2012)Google Scholar
  9. 9.
    Ezpeleta, E., Zurutuza, U., Hidalgo, J.M.G.: A study of the personalization of spam content using facebook public information. Log. J. IGPL 25(1), 30–41 (2017)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: Proceedings of the 21st International Conference on World Wide Web, pp. 71–80. ACM (2012)Google Scholar
  11. 11.
    Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011). CrossRefGoogle Scholar
  12. 12.
    Wang, A.H.: Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10. IEEE (2010)Google Scholar
  13. 13.
    Zheng, X., Zeng, Z., Chen, Z., Yu, Y., Rong, C.: Detecting spammers on social networks. Neurocomputing 159, 27–34 (2015)CrossRefGoogle Scholar
  14. 14.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  15. 15.
    Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012).
  16. 16.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)Google Scholar
  17. 17.
    Turney, P.D.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002)Google Scholar
  18. 18.
    Perveen, N., Missen, M.M.S., Rasool, Q., Akhtar, N.: Sentiment based twitter spam detection. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(7), 568–573 (2016)Google Scholar
  19. 19.
    Vinciarelli, A., Mohammadi, G.: A survey of personality computing. IEEE Trans. Affect. Comput. 5(3), 273–291 (2014)CrossRefGoogle Scholar
  20. 20.
    Celli, F., Poesio, M.: PR2: a language independent unsupervised tool for personality recognition from text. CoRR abs/1402.2796 (2014)Google Scholar
  21. 21.
    Myers, I.B., Myers, P.B.: Gifts Differing: Understanding Personality Type. CPP Inc., Palo Alto (1980)Google Scholar
  22. 22.
    Costa, P.T., McCrae, R.R.: Normal personality assessment in clinical practice: the neo personality inventory. Psychol. Assess. 4(1), 5 (1992)CrossRefGoogle Scholar
  23. 23.
    Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Int. Res. 30(1), 457–500 (2007)zbMATHGoogle Scholar
  24. 24.
    Oberlander, J., Nowson, S.: Whose thumb is it anyway?: Classifying author personality from weblog text. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, COLING-ACL 2006, pp. 627–634. Association for Computational Linguistics, Stroudsburg (2006)Google Scholar
  25. 25.
    Bai, S., Zhu, T., Cheng, L.: Big-five personality prediction based on user behaviors at social network sites. CoRR abs/1204.4809 (2012)Google Scholar
  26. 26.
    Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR Workshop Proceedings, CLEF and, September 2015Google Scholar
  27. 27.
    Shen, J., Brdiczka, O., Liu, J.: Understanding email writers: personality prediction from email messages. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) UMAP 2013. LNCS, vol. 7899, pp. 318–330. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  28. 28.
    Hernández Fusilier, D., Montes-y-Gómez, M., Rosso, P., Guzmán Cabrera, R.: Detecting positive and negative deceptive opinions using PU-learning. Inf. Process. Manag. 51(4), 433–443 (2015)CrossRefGoogle Scholar
  29. 29.
    Fornaciari, T., Celli, F., Poesio, M.: The effect of personality type on deceptive communication style. In: 2013 European Intelligence and Security Informatics Conference (EISIC), pp. 1–6, August 2013Google Scholar
  30. 30.
    O’Callaghan, D., Harrigan, M., Carthy, J., Cunningham, P.: Network analysis of recurring youtube spam campaigns. CoRR abs/1201.3783 (2012)Google Scholar
  31. 31.
    Jensen, G.H., DiTiberio, J.K.: Personality and the Teaching of Composition. Ablex, Norwood (1989)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Electronics and Computing DepartmentMondragon UniversityArrasate-MondragónSpain
  2. 2.Pragsis TechnologiesMadridSpain

Personalised recommendations