Skip to main content

Short Messages Spam Filtering Using Sentiment Analysis

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Included in the following conference series:

Abstract

In the same way that short instant messages are more and more used, spam and non-legitimate campaigns through this type of communication systems are growing up. Those campaigns, besides being an illegal online activity, are a direct threat to the privacy of the users. Previous short messages spam filtering techniques focus on automatic text classification and do not take message polarity into account. Focusing on phone SMS messages, this work demonstrates that it is possible to improve spam filtering in short message services using sentiment analysis techniques. Using a publicly available labelled (spam/legitimate) SMS dataset, we calculate the polarity of each message and aggregate the polarity score to the original dataset, creating new datasets. We compare the results of the best classifiers and filters over the different datasets (with and without polarity) in order to demonstrate the influence of the polarity. Experiments show that polarity score improves the SMS spam classification, on the one hand, reaching to a 98.91 % of accuracy. And on the other hand, obtaining a result of 0 false positives with 98.67 % of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://blog.whatsapp.com/616/One-billion/.

  2. 2.

    http://goo.gl/yqzMDz.

  3. 3.

    http://elpais.com/elpais/2015/04/20/inenglish/1429529298_001329.html.

  4. 4.

    http://goo.gl/CaxweY.

  5. 5.

    https://goo.gl/g6R7uW.

  6. 6.

    https://goo.gl/g6R7uW.

  7. 7.

    https://www.cs.york.ac.uk/semeval-2013/task2/.

  8. 8.

    http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/.

  9. 9.

    http://textblob.readthedocs.org/.

References

  1. Almeida, T.A., Gómez Hidalgo, J.M., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM Symposium on Document Engineering, pp. 259–262. ACM (2011)

    Google Scholar 

  2. Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 2200–2204 (2010)

    Google Scholar 

  3. Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)

    Article  Google Scholar 

  4. Echeverria Briones, P.F., Altamirano Valarezo, Z.V., Pinto Astudillo, A.B., Sanchez Guerrero, J.D.C.: Text mining aplicado a la clasificación y distribución automática de correo electrónico y detección de correo spam (2009)

    Google Scholar 

  5. Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6, pp. 417–422. Citeseer (2006)

    Google Scholar 

  6. Ezpeleta, E., Zurutuza, U., Gómez Hidalgo, J.M.: Does sentiment analysis help in Bayesian spam filtering? In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS, vol. 9648, pp. 79–90. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32034-2_7

    Chapter  Google Scholar 

  7. Giyanani, R., Desai, M.: Spam detection using natural language processing. Int. J. Comput. Sci. Res. Technol. 1, 55–58 (2013)

    Google Scholar 

  8. Gonçalves, P., Araújo, M., Benevenuto, F., Cha, M.: Comparing and combining sentiment analysis methods. In: Proceedings of the First ACM Conference on Online Social Networks, pp. 27–38. ACM (2013)

    Google Scholar 

  9. Kumar, R.K., Poonkuzhali, G., Sudhakar, P.: Comparative study on email spam classifier using data mining techniques. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, pp. 14–16 (2012)

    Google Scholar 

  10. Lau, R.Y.K., Liao, S.Y., Kwok, R.C.W., Xu, K., Xia, Y., Li, Y.: Text mining and probabilistic language modeling for online review spam detection. ACM Trans. Manag. Inf. Syst. 2(4), 25:1–25:30 (2012). http://doi.acm.org/10.1145/2070710.2070716

    Google Scholar 

  11. Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Berlin (2012). http://scholar.google.de/scholar.bib?q=info:CEE7xsbkW6cJ:scholar.google.com/&output=citation&hl=de&as_sdt=0&as_ylo=2012&ct=citation&cd=1

  12. Musto, C., Semeraro, G., Polignano, M.: A comparison of lexicon-based approaches for sentiment analysis of microblog posts. In: Information Filtering and Retrieval, p. 59 (2014)

    Google Scholar 

  13. Nagwani, N.K., Sharaff, A.: SMS spam filtering and thread identification using bi-level text classification and clustering techniques. J. Inf. Sci. 1–13, 3 December 2015. doi:10.1177/0165551515616310

    Google Scholar 

  14. Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: Sentiment analysis in Twitter (2013)

    Google Scholar 

  15. Narayan, A., Saxena, P.: The curse of 140 characters: evaluating the efficacy of SMS spam detection on android. In: Proceedings of the Third ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 33–42. ACM (2013)

    Google Scholar 

  16. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  17. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86, EMNLP 2002, Association for Computational Linguistics, Stroudsburg, PA, USA (2002). http://dx.doi.org/10.3115/1118693.1118704

  18. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424, ACL 2002, Association for Computational Linguistics, Stroudsburg, PA, USA (2002). http://dx.doi.org/10.3115/1073083.1073153

Download references

Acknowledgments

This work has been partially funded by the Basque Department of Education, Language policy and Culture under the project SocialSPAM (PI_2014_1_102).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enaitz Ezpeleta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ezpeleta, E., Zurutuza, U., Gómez Hidalgo, J.M. (2016). Short Messages Spam Filtering Using Sentiment Analysis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics