Clustering of Tweets: A Novel Approach to Label the Unlabelled Tweets

  • Tabassum Gull JanEmail author
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 597)


Twitter is one of the fastest growing microblogging and online social networking site that enables users to send and receive messages in the form of tweets. Twitter is the trend of today for news analysis and discussions. That is why Twitter has become the main target of attackers and cybercriminals. These attackers not only hamper the security of Twitter but also destroy the whole trust people have on it. Hence, making Twitter platform impure by misusing it. Misuse can be in the form of hurtful gossips, cyberbullying, cyber harassment, spams, pornographic content, identity theft, common Web attacks like phishing and malware downloading, etc. Twitter world is growing fast and hence prone to spams. So, there is a need for spam detection on Twitter. Spam detection using supervised algorithms is wholly and solely based on the labelled dataset of Twitter. To label the datasets manually is costly, time-consuming and a challenging task. Also, these old labelled datasets are nowadays not available because of Twitter data publishing policies. So, there is a need to design an approach to label the tweets as spam and non-spam in order to overcome the effect of spam drift. In this paper, we downloaded the recent dataset of Twitter and prepared an unlabelled dataset of tweets from it. Later on, we applied the cluster-then-label approach to label the tweets as spam and non-spam. This labelled dataset can then be used for spam detection in Twitter and categorization of different types of spams.


Spam labelling Clustering Tweets 


  1. 1.
    Ala’M, A.Z., Faris, H., et al.: Spam profile detection in social networks based on public features. In: 2017 8th International Conference on information and Communication Systems (ICICS). pp. 130–135. IEEE (2017)Google Scholar
  2. 2.
    Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS). vol. 6, p. 12 (2010)Google Scholar
  3. 3.
    Eshraqi, N., Jalali, M., Moattar, M.H.: Detecting spam tweets in twitter using a data stream clustering algorithm. In: 2015 International Congress on Technology, Communication and Knowledge (ICTCK). pp. 347–351. IEEE (2015)Google Scholar
  4. 4.
    Fazil, M., Abulaish, M.: A hybrid approach for detecting automated spammers in twitter. IEEE Trans. Inf. Forensics Secur. 13(11), 2707–2719 (2018)CrossRefGoogle Scholar
  5. 5.
    Gautam, G., Yadav, D.: Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In: 2014 Seventh International Conference on Contemporary Computing (IC3). pp. 437–442. IEEE (2014)Google Scholar
  6. 6.
    Liu, C., Wang, G.: Analysis and detection of spam accounts in social networks. In: 2016 2nd IEEE International Conference on Computer and Communications (ICCC). pp. 2526–2530. IEEE (2016)Google Scholar
  7. 7.
    Meda, C., Bisio, F., Gastaldo, P., Zunino, R.: A machine learning approach for twitter spammers detection. In: 2014 International Carnahan Conference on Security Technology (ICCST). pp. 1–6. IEEE (2014)Google Scholar
  8. 8.
    Peikari, M., Salama, S., Nofech-Mozes, S., Martel, A.L.: A cluster-then-label semi-supervised learning approach for pathology image classification. Sci. Rep. 8(1), 7193 (2018)CrossRefGoogle Scholar
  9. 9.
    Perveen, N., Missen, M.M.S., Rasool, Q., Akhtar, N.: Sentiment based twitter spam detection. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(7), 568–573 (2016)Google Scholar
  10. 10.
    Sedhai, S., Sun, A.: Semi-supervised spam detection in twitter stream. IEEE Trans. Computational Soc. Syst. 5(1), 169–175 (2018)CrossRefGoogle Scholar
  11. 11.
    Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: International workshop on recent advances in intrusion detection. pp. 301–317. Springer (2011)Google Scholar
  12. 12.
    Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference. pp. 1–9. ACM (2010)Google Scholar
  13. 13.
    Wu, T., Liu, S., Zhang, J., Xiang, Y.: Twitter spam detection based on deep learning. In: Proceedings of the Australasian Computer Science Week Multiconference. p. 3. ACM (2017)Google Scholar
  14. 14.
    Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Info. Forensics Sec. 8(8), 1280–1293 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer Science & Technology Central University of PunjabBathindaIndia

Personalised recommendations