Advertisement

Spam detection in social media using convolutional and long short term memory neural network

  • Gauri JainEmail author
  • Manisha Sharma
  • Basant Agarwal
Article
  • 18 Downloads

Abstract

As the use of the Internet is increasing, people are connected virtually using social media platforms such as text messages, Facebook, Twitter, etc. This has led to increase in the spread of unsolicited messages known as spam which is used for marketing, collecting personal information, or just to offend the people. Therefore, it is crucial to have a strong spam detection architecture that could prevent these types of messages. Spam detection in noisy platform such as Twitter is still a problem due to short text and high variability in the language used in social media. In this paper, we propose a novel deep learning architecture based on Convolutional Neural Network (CNN) and Long Short Term Neural Network (LSTM). The model is supported by introducing the semantic information in representation of the words with the help of knowledge-bases such as WordNet and ConceptNet. Use of these knowledge-bases improves the performance by providing better semantic vector representation of testing words which earlier were having random value due to not seen in the training. Proposed Experimental results on two benchmark datasets show the effectiveness of the proposed approach with respect to the accuracy and F1-score.

Keywords

Spam detection Deep learning Sequential Stacked CNN-LSTM CNN LSTM 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

References

  1. 1.
    Agarwal, B., Mittal, N.: Sentiment analysis using conceptnet ontology and context information. In: Prominent Feature Extraction for Sentiment Analysis. Springer.  https://doi.org/10.1007/978-3-319-25343-5  https://doi.org/10.1007/978-3-319-25343-5 (2016)
  2. 2.
    Almeida, T.A., Yamakami, A., Almeida, J.: Evaluation of approaches for dimensionality reduction applied with naive bayes anti-spam filters. In: International Conference on Machine Learning and Applications, 2009. ICMLA’09, pp 517–522. IEEE (2009)Google Scholar
  3. 3.
    Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and development in information retrieval, pp 160–167. ACM (2000)Google Scholar
  4. 4.
    Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 1724–1734 (2014)Google Scholar
  5. 5.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)zbMATHGoogle Scholar
  6. 6.
    Cournane, A., Hunt, R.: An analysis of the tools used for the generation and prevention of spam. Comput. Secur. 23(2), 154–166 (2004)CrossRefGoogle Scholar
  7. 7.
    DeBarr, D., Wechsler, H.: Spam detection using clustering, random forests, and active learning. In: Sixth Conference on Email and Anti-Spam. Mountain View (2009)Google Scholar
  8. 8.
    Devlin, J., Kamali, M., Subramanian, K., Prasad, R., Natarajan, P.: Statistical machine translation as a language model for handwriting recognition. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 291–296. IEEE (2012)Google Scholar
  9. 9.
    Gao, Y., Mi, G., Tan, Y.: Variable length concentration based feature construction method for spam detection. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2015)Google Scholar
  10. 10.
    Grier, C., Thomas, K., Paxson, V.: Zhang, M.: spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37. ACM (2010)Google Scholar
  11. 11.
    Havasi, C., Speer, R., Alonso, J.: Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In: Recent Advances in Natural Language Processing (RANLP). John Benjamins Philadelphia, pp 27–29 (2007)Google Scholar
  12. 12.
    Healy, M., Delany, S.J., Zamolotskikh, A.: An assessment of case base reasoning for short text message classification. In: Proceedings of the 15th Irish Conference on Artificial Intelligence and Cognitive Sciences (AICS’04), pp. 9–18 (2004)Google Scholar
  13. 13.
    Jain, G., Sharma, M.: Social media: a review. In: Information Systems Design and Intelligent Applications, pp. 387–395. Springer (2016)Google Scholar
  14. 14.
    Jain, G., Sharma, M., Agarwal, B.: Optimizing semantic lstm for spam detection. Int. J. Inf. Technol. 1–12 (2018)Google Scholar
  15. 15.
    Jain, G., Sharma, M., Agarwal, B.: Spam detection on social media using semantic convolutional neural network. Int. J. Knowl. Disc. Bioinfo 8(1), 12–26 (2018)CrossRefGoogle Scholar
  16. 16.
    Karami, A., Zhou, L.: Improving static sms spam detection by using new content-based features. In: Twentieth Americas Conference on Information Systems, Savannah, pp. 1–9 (2014)Google Scholar
  17. 17.
    Kim, C., Hwang, K.B.: Naive bayes classifier learning with feature selection for spam detection in social bookmarking. In: ECML PKDD Discovery Challenge, p 32 (2008)Google Scholar
  18. 18.
    Kim, J., Chung, K., Choi, K.: Spam filtering with dynamically updated url statistics. IEEE Secur. Priv. 5(4) (2007)Google Scholar
  19. 19.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics (2014)Google Scholar
  20. 20.
    Kolari, P., Finin, T., Joshi, A.: SVMS for the blogosphere: Blog identification and splog detection. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 92–99 (2006)Google Scholar
  21. 21.
    Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), vol. 333, pp. 2267–2273 (2015)Google Scholar
  22. 22.
    Lei, T., Barzilay, R., Jaakkola, T.: Molding cnns for text: non-linear, non-consecutive convolutions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1565–1575. Association for Computational Linguistics (2015)Google Scholar
  23. 23.
    Levine, J.R.: Experiences with greylisting. In: Second Conference on Email and Anti-Spam (CEAS), pp. 1–2 (2005)Google Scholar
  24. 24.
    Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., Cha, M.: Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI), pp. 3818–3824 (2016)Google Scholar
  25. 25.
    Martinez-Romo, J., Araujo, L.: Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40(8), 2992–3000 (2013)CrossRefGoogle Scholar
  26. 26.
    Mccord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: International Conference on Autonomic and Trusted Computing, pp. 175–186. Springer, Berlin (2011)Google Scholar
  27. 27.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
  28. 28.
    Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  29. 29.
    Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer detection using data stream clustering. Inf. Sci. 260, 64–73 (2014)CrossRefGoogle Scholar
  30. 30.
    Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., Jin, Z.: Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2315–2325. Association for Computational Linguistics (2015)Google Scholar
  31. 31.
    Sabri, A.T., Mohammads, A.H., Al-Shargabi, B., Hamdeh, M.A.: Developing new continuous learning approach for spam detection using artificial neural network. Eur. J. Sci. Res. 42(3), 525–535 (2010)Google Scholar
  32. 32.
    Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584. IEEE (2015)Google Scholar
  33. 33.
    Silva, R.M., Almeida, T.A., Yamakami, A.: Artificial neural networks for content-based web spam detection. In: Proceedings on the International Conference on Artificial Intelligence (ICAI), p. 1 (2012)Google Scholar
  34. 34.
    Socher, R., Bauer, J., Manning, C.D., Manning, C.D., Andrew, Y.N.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 455–465 (2013)Google Scholar
  35. 35.
    Stern, H.: A survey of modern spam tools. In: The Fifth Conference on Email and Anti-Spam (CEAS), pp. 1–10 (2008)Google Scholar
  36. 36.
    Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC’10), pp. 1–9. ACM (2010)Google Scholar
  37. 37.
    Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 1556–1566. Association for Computational Linguistics (2015)Google Scholar
  38. 38.
    Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: 2011 IEEE Symposium on Security and Privacy (SP), pp. 447–462. IEEE (2011)Google Scholar
  39. 39.
    Tseng, C.Y., Chen, M.S.: Incremental SVM model for spam detection on dynamic email social networks. In: International Conference on Computational Science and Engineering, (CSE’09), vol. 4, pp. 128–135. IEEE (2009)Google Scholar
  40. 40.
    Wang, H.B., Yu, Y., Liu, Z.: SVM classifier incorporating feature selection using ga for spam detection. In: Embedded and Ubiquitous Computing–EUC, vol. 2005, pp 1147–1154 (2005)Google Scholar
  41. 41.
    Wu, F., Shu, J., Huang, Y., Yuan, Z.: Co-detecting social spammers and spam messages in microblogging via exploiting social contexts. Neurocomputing 201, 51–65 (2016)CrossRefGoogle Scholar
  42. 42.
    Wu, T., Liu, S., Zhang, J., Xiang, Y.: Twitter spam detection based on deep learning. In: Proceedings of the Australasian Computer Science Week Multiconference, ACSW ’17, pp. 3:1–3:8. ACM, New York (2017)Google Scholar
  43. 43.
    Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 3(4), 243–269 (2004)CrossRefGoogle Scholar
  44. 44.
    Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary pso with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceBanasthali VidyapithBanasthaliIndia
  2. 2.Department of Computer Science and EngineeringSwami Keshvanand Institute of TechnologyJaipurIndia

Personalised recommendations