Skip to main content

An Ensemble Learning Approach for Addressing the Class Imbalance Problem in Twitter Spam Detection

  • Conference paper
  • First Online:
Information Security and Privacy (ACISP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9722))

Included in the following conference series:

Abstract

Being an important source for real-time information dissemination in recent years, Twitter is inevitably a prime target of spammers. It has been showed that the damage caused by Twitter spam can reach far beyond the social media platform itself. To mitigate the threat, a lot of recent studies use machine learning techniques to classify Twitter spam and report very satisfactory results. However, most of the studies overlook a fundamental issue that is widely seen in real-world Twitter data, i.e., the class imbalance problem. In this paper, we show that the unequal distribution between spam and non-spam classes in the data has a great impact on spam detection rate. To address the problem, we propose an ensemble learning approach, which involves three steps. In the first step, we adjust the class distribution in the imbalanced data set using various strategies, including random oversampling, random undersampling and fuzzy-based oversampling. In the next step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine all the classification models. Experimental results obtained using real-world Twitter data indicate that the proposed approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammer on twitter. In: Seventh Annual Collaboration, Electronic messaging, Anti-abuse and Spam Conference, July 2010

    Google Scholar 

  2. Pash, C.: The lure of naked hollywood star photos sent the internet into meltdown in New Zealand. Business Insider, September 2014

    Google Scholar 

  3. Oliver, J., Pajares, P., Ke, C., Chen, C., Xiang, Y.: An in-depth analysis of abuse on twitter. Technical report, Trend Micro, 225 E. John Carpenter Freeway, Suite 1500 Irving, Texas 75062 USA, September 2014

    Google Scholar 

  4. Jeyaraman, R.: Fighting spam with botmaker. Twitter Engineering Blog, August 2014

    Google Scholar 

  5. Grier, C., Thomas, K., Paxson, V., Zhang, M.: @spam: the under- ground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, pp. 27–37. ACM, New York (2010)

    Google Scholar 

  6. Thomas, K., Grier, C., Song, D., Paxson, V.: Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC 2011, pp. 243–258, ACM, New York (2011)

    Google Scholar 

  7. Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.: Towards online spam filtering in social networks. In: NDSS (2012)

    Google Scholar 

  8. Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 71–80, USA (2012)

    Google Scholar 

  9. Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 2010, pp. 1–9. ACM, New York (2010)

    Google Scholar 

  10. Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8(8), 1280–1293 (2013)

    Article  Google Scholar 

  11. Zhang, X., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in the twitter social network. In: Data Mining. IEEE ICDM 2012, pp. 1194–1199 (2012)

    Google Scholar 

  12. Pear Analytics: Twitter Study, August 2009

    Google Scholar 

  13. Yardi, S., Romero, D., Schoenebeck, G., Boyd, D.: Detecting spam in a twitter network. First Monday 15(1–4) (2010). http://dx.doi.org/10.5210/fm.v15i1.2793

  14. Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 591–600. ACM, New York (2010)

    Google Scholar 

  15. Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 435–442. ACM, New York (2010)

    Google Scholar 

  16. Wang, A.H.: Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10 (2010)

    Google Scholar 

  17. Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  18. Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: Proceedings of the 2011 IEEE Symposium on Security and Privacy, SP 2011, pp. 447– 462. IEEE Computer Society, Washington, DC (2011)

    Google Scholar 

  19. Lee, S., Kim, J.: Warningbird: a near real-time detection system for suspicious urls in twitter stream. IEEE Trans. Dependable Secur. Comput. 10(3), 183–195 (2013)

    Article  Google Scholar 

  20. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  21. Liu, S., Zhang, J., Wang, Y., Xiang, Y.: Fuzzy-Based feature and instance recover. In: Nguyen, T.N., et al. (eds.) ACIIDS 2016. LNCS, vol. 9621, pp. 605–615. Springer, Heidelberg (2016)

    Google Scholar 

  22. Weka 3: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/

  23. Choo, K.-K.R.: The cyber threat landscape: challenges and future research directions. Comput. Secur. 30(8), 719–731 (2011)

    Article  Google Scholar 

  24. Lai, S., Liu, J.K., Choo, K.-K.R., Liang, K.: Secret picture: an efficient tool for mitigating deletion delay on OSN. In: Qing, S., et al. (eds.) ICICS 2015. LNCS, vol. 9543, pp. 467–477. Springer, Heidelberg (2016). doi:10.1007/978-3-319-29814-6_40

    Chapter  Google Scholar 

  25. Norouzi, F., Dehghantanha, A., Eterovic-Soric, B., Choo, K.-K.R.: Investigating social networking applications on smartphones: detecting Facebook, Twitter, LinkedIn, and Google+ artifacts on android and iOS platforms. Aust. J. Forensic Sci. 1–20 (2015). doi:10.1080/00450618.2015.1066854

    Article  Google Scholar 

  26. Quick, D., Martini, B., Choo, K.-K.R.: Cloud Storage Forensics. Syngress Publishing/Elsevier, Boston (2013)

    Google Scholar 

  27. Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely twitter spam detection. In: IEEE International Conference on Communications (ICC 2015) (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, S., Wang, Y., Chen, C., Xiang, Y. (2016). An Ensemble Learning Approach for Addressing the Class Imbalance Problem in Twitter Spam Detection. In: Liu, J., Steinfeld, R. (eds) Information Security and Privacy. ACISP 2016. Lecture Notes in Computer Science(), vol 9722. Springer, Cham. https://doi.org/10.1007/978-3-319-40253-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40253-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40252-9

  • Online ISBN: 978-3-319-40253-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics