Skip to main content

Subject-Related Message Filtering in Social Media Through Context-Enriched Language Models

  • Chapter
  • 459 Accesses

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9630))

Abstract

Efficiently retrieving and understanding messages from social media is challenging, considering that shorter messages are strongly dependent on context. Assuming that their audience is aware of background and real world events, users can shorten their messages without compromising communication. However, traditional data mining algorithms do not account for contextual information. We argue that exploiting context can lead to advancements in the analysis of social media messages. Recall rate increases if context is taken into account, leading to context-aware methods for filtering messages without resorting only to keywords. A novel approach for subject classification of social media messages, using computational linguistics techniques, is proposed, employing both textual and extra-textual (or contextual) information. Experimental analysis over sports-related messages indicates over 50 % improvement in retrieval rate over text-based approaches due to the use of contextual information.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    WhatsApp, according to http://www.statista.com/statistics/258743/daily-mobile-message-volume-of-whatsapp-messenger/.

  2. 2.

    Source: https://zephoria.com/social-media/top-15-valuable-facebook-statistics/.

  3. 3.

    By context we mean all non-written information that is relevant to understand a message, such as real world events and common sense knowledge.

  4. 4.

    More than 35 million tweets were issued during the Brazil vs Germany match in the FIFA World Cup 2014 semifinals, an event of global repercussion, the most tweeted event on record so far – but who knows how many tweets are not included in this figure due to lack of contextual information?.

  5. 5.

    An uninterrupted chain of spoken or written language.

  6. 6.

    For simplicity, fixed-length time windows are adopted.

  7. 7.

    http://mashable.com/2013/04/25/nestivity-engaged-brands/.

References

  1. Community cleverness required. Nature, 455(7209), 1–1 (2008)

    Google Scholar 

  2. Calais Guerra, P.H., Veloso, A., Meira Jr, W., Almeida, V.: From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–158. ACM (2011)

    Google Scholar 

  3. Davis Jr., C.A., Pappa, G.L., de Oliveira, D.R.R., de L Arcanjo, F.: Inferring the location of Twitter messages based on user relationships. Trans. GIS 15(6), 735–751 (2011)

    Article  Google Scholar 

  4. Gomide, J., Veloso, A., Meira Jr, W., Almeida, V., Benevenuto, F., Ferraz, F., Teixeira, M.: Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In: Proceedings of the 3rd International Web Science Conference, pp. 3. ACM (2011)

    Google Scholar 

  5. Levinson, S.C.: Pragmatics (Cambridge textbooks in linguistics). Cambridge Press, Cambridge (1983)

    Google Scholar 

  6. Yus, F.: Humor and the search for relevance. J. Pragmatics 35(9), 1295–1331 (2003)

    Article  Google Scholar 

  7. Hanna, J.E., Tanenhaus, M.K.: Pragmatic effects on reference resolution in a collaborative task: evidence from eye movements. Cogn. Sci. 28(1), 105–115 (2004)

    Article  Google Scholar 

  8. Cruse, D.A.: A Glossary of Semantics and Pragmatics. Edinburgh University Press, Edinburgh (2006)

    Google Scholar 

  9. Levinson, S.C.: Presumptive Meanings: The Theory of Generalized Conversational Implicature. MIT Press, Cambridge (2000)

    Google Scholar 

  10. Barbulet, G.: Social media- a pragmatic approach: contexts & implicatures. Procedia - Soc. Behav. Sci. 83, 422–426 (2013)

    Article  Google Scholar 

  11. Pauls, A., Klein, D.: Faster and smaller n-gram language models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - vol. 1, pp. 258–267, Stroudsburg, PA, USA, Association for Computational Linguistics (2011)

    Google Scholar 

  12. Saluja, A., Lane, I., Zhang, Y.: Context-aware language modeling for conversational speech translation. In: Proceedings of Machine Translation Summit XIII, Xiamen, China (2011)

    Google Scholar 

  13. Ifrim, G., Bakir, G. and Weikum, G.: Fast logistic regression for text categorization with variable-length n-grams. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 354–362. ACM, New York, NY, USA (2008)

    Google Scholar 

  14. Kurland, O., Lee, L., Hyperlinks, P.W.: Structural reranking using links induced by language models. ACM Trans. Inf. Syst. 28(4), 18:1–18:38 (2010)

    Article  Google Scholar 

  15. Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of 3rd Annual Symposium on Document Analysis and Information Retrieval, SDAIR-94, pp. 161–175 (1994)

    Google Scholar 

  16. Erkan, G.: Language model-based document clustering using random walks. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 479–486, Stroudsburg, PA, USA, Association for Computational Linguistics (2006)

    Google Scholar 

  17. Peng, F., Schuurmans, D., Wang, S.: Augmenting naive bayes classifiers with statistical language models. Inf. Retrieval 7(3–4), 317–345 (2004)

    Article  Google Scholar 

  18. Hayes, P.J., Knecht, L.E., Cellio, M.J.: A news story categorization system. In: Proceedings of the Second Conference on Applied Natural Language Processing, pp. 9–17, Stroudsburg, PA, USA, Association for Computational Linguistics (1988)

    Google Scholar 

  19. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. (1997)

    Google Scholar 

  20. Mishne, G.: Blocking blog spam with language model disagreement. In: Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb) (2005)

    Google Scholar 

  21. Mishne, G.: Experiments with mood classification in blog posts. In: Proceedings of ACM SIGIR Workshop on Stylistic Analysis of Text for Information Access (2005)

    Google Scholar 

  22. Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.: An evaluation of naive bayesian anti-spam filtering. In: Proceeding of the Workshop on Machine Learning in the New Information Age (2000)

    Google Scholar 

  23. Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  24. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, Claire, Rouveirol, Céline (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)

    Google Scholar 

  25. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  26. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  27. Schwartz, R.M., Imai, T., Kubala, F., Nguyen, L., Makhoul, J.: A maximum likelihood model for topic classification of broadcast news. In: Kokkinakis, G., Fakotakis, N., Dermatas, E. (eds.) Eurospeech. ISCA (1997)

    Google Scholar 

  28. Natarajan, P., Prasad, R., Subramanian, K., Saleem, S., Choi, F., Schwartz, R.: Finding structure in noisy text: topic classification and unsupervised clustering. Int. J. Doc. Anal. Recognit. 10(3), 187–198 (2007)

    Article  Google Scholar 

  29. Crammer, K., Dredze, M., Pereira, F.: Confidence-weighted linear classification for text categorization. J. Mach. Learn. Res. 13(1), 1891–1926 (2012)

    MathSciNet  MATH  Google Scholar 

  30. Guan, H., Zhou, J., Guo, M.: A Class-feature-centroid classifier for text categorization. In: Proceedings of the 18th International Conference on World Wide Web, pp. 201–210. ACM, New York, NY, USA (2009)

    Google Scholar 

  31. Davis, A., Veloso, A., Da Silva, A.S., Meira Jr, W. and Laender, A.H.: Named entity disambiguation in streaming data. In: ACL 2012, pp. 815–824 (2012)

    Google Scholar 

  32. Li, Z., Xiong, Z., Zhang, Y., Liu, C., Li, K.: Fast text categorization using concise semantic analysis. Pattern Recogn. Lett. 32(3), 441–448 (2011)

    Article  Google Scholar 

  33. Guo, Y., Shao, Z., Hua, N.: Automatic text categorization based on content analysis with cognitive situation models. Inf. Sci. 180(5), 613–630 (2010)

    Article  MathSciNet  Google Scholar 

  34. Qiming, L., Chen, E., Xiong, H.: A semantic term weighting scheme for text categorization. Expert Syst. Appl. 38(10), 12708–12716 (2011)

    Article  Google Scholar 

  35. Husby, S.D., Barbosa, D.: Topic classification of blog posts using distant supervision. In: Proceedings of the Workshop on Semantic Analysis in Social Media, pp. 28–36, Stroudsburg, PA, USA, Association for Computational Linguistics (2012)

    Google Scholar 

  36. Lao, N., Subramanya, A., Pereira, F., Cohen, W.W.: Reading the web with learned syntactic-semantic inference rules. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1017–1026, Stroudsburg, PA, USA, Association for Computational Linguistics (2012)

    Google Scholar 

  37. Li, C.H., Yang, J.C., Park, S.C.: Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet. Expert Syst. Appl. 39(1), 765–772 (2012)

    Article  Google Scholar 

  38. Son, J.W., Kim, A. and Park, S.B.: A location-based news article recommendation with explicit localized semantic analysis. In: Proceedings of the 36th International ACM SIGIR Conference On Research and Development in Information Retrieval, pp. 293–302 (2013)

    Google Scholar 

  39. Machhour, H., Kassou, I.: Improving text categorization: A fully automated ontology based approach. In: 2013 Third International Conference on Communications and Information Technology (ICCIT), pp. 67–72 (2013)

    Google Scholar 

  40. Raghavan, S., Mooney, R.J., Hyeonseo, K.: Learning to read between the lines using bayesian logic programs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol.1, pp. 349–358. Association for Computational Linguistics (2012)

    Google Scholar 

  41. Lam, W., Meng, H.M.L., Wong, K.L., Yen, J.C.H.: Using contextual analysis for news event detection. Int. J. Intell. Syst. 16(4), 525–546 (2001)

    Article  MATH  Google Scholar 

  42. Yus, F.: Cyberpragmatics: Internet-Mediated Communication in Context. John Benjamins Publishing Company, Amsterdam (2011)

    Book  Google Scholar 

  43. Susan C Herring. Computer-mediated discourse. The handbook of discourse analysis (2001)

    Google Scholar 

  44. Brody, S., Diakopoulos, N.: Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 562–570, Stroudsburg, PA, USA, Association for Computational Linguistics. (2011)

    Google Scholar 

  45. Howard, P.N., Parks, M.R.: Social media and political change: capacity, constraint, and consequence. J. Commun. 62(2), 359–362 (2012)

    Article  Google Scholar 

  46. Cha, Y., Bi, B., Hsieh, C.-C., Cho, J.: Incorporating popularity in topic models for social network analysis. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 223–232 (2013)

    Google Scholar 

  47. Grice, P.: Syntax and semantics. 3: speech acts. In: Cole, P., Morgan, J.L. (eds.) Logic and Conversation. Academic Press, New York (1975)

    Google Scholar 

  48. Hirschberg, J.: A theory of scalar implicature. PhD thesis, University of Pennsylvania (1985)

    Google Scholar 

  49. Attardo, S.: Violation of conversational maxims and cooperation: the case of jokes. J. Pragmatics 19(6), 537–558 (1993)

    Article  Google Scholar 

  50. Eisterhold, J., Attardo, S., Boxer, D.: Reactions to irony in discourse: Evidence for the least disruption principle. J. Pragmatics 38(8), 1239–1256 (2006)

    Article  Google Scholar 

  51. Silva, I.S., Gomide, J., Veloso, A., Meira Jr, W. and Ferreira, R.: Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 475–484. ACM, New York, NY, USA (2011)

    Google Scholar 

  52. Phuvipadawat, S., Murata, T.: Breaking news detection and tracking in Twitter. In: Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 120–123 (2010)

    Google Scholar 

  53. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc, Boston (1999)

    Google Scholar 

  54. Cremonesi, P., Koren, Y., Turrin, R.: Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 39–46. ACM, New York, NY, USA (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre Davis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Davis, A., Veloso, A. (2016). Subject-Related Message Filtering in Social Media Through Context-Enriched Language Models. In: Nguyen, N.T., Kowalczyk, R., Rupino da Cunha, P. (eds) Transactions on Computational Collective Intelligence XXI. Lecture Notes in Computer Science(), vol 9630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49521-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49521-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49520-9

  • Online ISBN: 978-3-662-49521-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics