Subject-Related Message Filtering in Social Media Through Context-Enriched Language Models

Davis, Alexandre; Veloso, Adriano

doi:10.1007/978-3-662-49521-6_5

Subject-Related Message Filtering in Social Media Through Context-Enriched Language Models

Alexandre Davis¹⁷ &
Adriano Veloso¹⁷

Chapter

459 Accesses

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9630))

Abstract

Efficiently retrieving and understanding messages from social media is challenging, considering that shorter messages are strongly dependent on context. Assuming that their audience is aware of background and real world events, users can shorten their messages without compromising communication. However, traditional data mining algorithms do not account for contextual information. We argue that exploiting context can lead to advancements in the analysis of social media messages. Recall rate increases if context is taken into account, leading to context-aware methods for filtering messages without resorting only to keywords. A novel approach for subject classification of social media messages, using computational linguistics techniques, is proposed, employing both textual and extra-textual (or contextual) information. Experimental analysis over sports-related messages indicates over 50 % improvement in retrieval rate over text-based approaches due to the use of contextual information.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
WhatsApp, according to http://www.statista.com/statistics/258743/daily-mobile-message-volume-of-whatsapp-messenger/.
2.
Source: https://zephoria.com/social-media/top-15-valuable-facebook-statistics/.
3.
By context we mean all non-written information that is relevant to understand a message, such as real world events and common sense knowledge.
4.
More than 35 million tweets were issued during the Brazil vs Germany match in the FIFA World Cup 2014 semifinals, an event of global repercussion, the most tweeted event on record so far – but who knows how many tweets are not included in this figure due to lack of contextual information?.
5.
An uninterrupted chain of spoken or written language.
6.
For simplicity, fixed-length time windows are adopted.
7.
http://mashable.com/2013/04/25/nestivity-engaged-brands/.

References

Community cleverness required. Nature, 455(7209), 1–1 (2008)
Google Scholar
Calais Guerra, P.H., Veloso, A., Meira Jr, W., Almeida, V.: From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–158. ACM (2011)
Google Scholar
Davis Jr., C.A., Pappa, G.L., de Oliveira, D.R.R., de L Arcanjo, F.: Inferring the location of Twitter messages based on user relationships. Trans. GIS 15(6), 735–751 (2011)
Article Google Scholar
Gomide, J., Veloso, A., Meira Jr, W., Almeida, V., Benevenuto, F., Ferraz, F., Teixeira, M.: Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In: Proceedings of the 3rd International Web Science Conference, pp. 3. ACM (2011)
Google Scholar
Levinson, S.C.: Pragmatics (Cambridge textbooks in linguistics). Cambridge Press, Cambridge (1983)
Google Scholar
Yus, F.: Humor and the search for relevance. J. Pragmatics 35(9), 1295–1331 (2003)
Article Google Scholar
Hanna, J.E., Tanenhaus, M.K.: Pragmatic effects on reference resolution in a collaborative task: evidence from eye movements. Cogn. Sci. 28(1), 105–115 (2004)
Article Google Scholar
Cruse, D.A.: A Glossary of Semantics and Pragmatics. Edinburgh University Press, Edinburgh (2006)
Google Scholar
Levinson, S.C.: Presumptive Meanings: The Theory of Generalized Conversational Implicature. MIT Press, Cambridge (2000)
Google Scholar
Barbulet, G.: Social media- a pragmatic approach: contexts & implicatures. Procedia - Soc. Behav. Sci. 83, 422–426 (2013)
Article Google Scholar
Pauls, A., Klein, D.: Faster and smaller n-gram language models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - vol. 1, pp. 258–267, Stroudsburg, PA, USA, Association for Computational Linguistics (2011)
Google Scholar
Saluja, A., Lane, I., Zhang, Y.: Context-aware language modeling for conversational speech translation. In: Proceedings of Machine Translation Summit XIII, Xiamen, China (2011)
Google Scholar
Ifrim, G., Bakir, G. and Weikum, G.: Fast logistic regression for text categorization with variable-length n-grams. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 354–362. ACM, New York, NY, USA (2008)
Google Scholar
Kurland, O., Lee, L., Hyperlinks, P.W.: Structural reranking using links induced by language models. ACM Trans. Inf. Syst. 28(4), 18:1–18:38 (2010)
Article Google Scholar
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of 3rd Annual Symposium on Document Analysis and Information Retrieval, SDAIR-94, pp. 161–175 (1994)
Google Scholar
Erkan, G.: Language model-based document clustering using random walks. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 479–486, Stroudsburg, PA, USA, Association for Computational Linguistics (2006)
Google Scholar
Peng, F., Schuurmans, D., Wang, S.: Augmenting naive bayes classifiers with statistical language models. Inf. Retrieval 7(3–4), 317–345 (2004)
Article Google Scholar
Hayes, P.J., Knecht, L.E., Cellio, M.J.: A news story categorization system. In: Proceedings of the Second Conference on Applied Natural Language Processing, pp. 9–17, Stroudsburg, PA, USA, Association for Computational Linguistics (1988)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. (1997)
Google Scholar
Mishne, G.: Blocking blog spam with language model disagreement. In: Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb) (2005)
Google Scholar
Mishne, G.: Experiments with mood classification in blog posts. In: Proceedings of ACM SIGIR Workshop on Stylistic Analysis of Text for Information Access (2005)
Google Scholar
Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.: An evaluation of naive bayesian anti-spam filtering. In: Proceeding of the Workshop on Machine Learning in the New Information Age (2000)
Google Scholar
Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, Claire, Rouveirol, Céline (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Schwartz, R.M., Imai, T., Kubala, F., Nguyen, L., Makhoul, J.: A maximum likelihood model for topic classification of broadcast news. In: Kokkinakis, G., Fakotakis, N., Dermatas, E. (eds.) Eurospeech. ISCA (1997)
Google Scholar
Natarajan, P., Prasad, R., Subramanian, K., Saleem, S., Choi, F., Schwartz, R.: Finding structure in noisy text: topic classification and unsupervised clustering. Int. J. Doc. Anal. Recognit. 10(3), 187–198 (2007)
Article Google Scholar
Crammer, K., Dredze, M., Pereira, F.: Confidence-weighted linear classification for text categorization. J. Mach. Learn. Res. 13(1), 1891–1926 (2012)
MathSciNet MATH Google Scholar
Guan, H., Zhou, J., Guo, M.: A Class-feature-centroid classifier for text categorization. In: Proceedings of the 18th International Conference on World Wide Web, pp. 201–210. ACM, New York, NY, USA (2009)
Google Scholar
Davis, A., Veloso, A., Da Silva, A.S., Meira Jr, W. and Laender, A.H.: Named entity disambiguation in streaming data. In: ACL 2012, pp. 815–824 (2012)
Google Scholar
Li, Z., Xiong, Z., Zhang, Y., Liu, C., Li, K.: Fast text categorization using concise semantic analysis. Pattern Recogn. Lett. 32(3), 441–448 (2011)
Article Google Scholar
Guo, Y., Shao, Z., Hua, N.: Automatic text categorization based on content analysis with cognitive situation models. Inf. Sci. 180(5), 613–630 (2010)
Article MathSciNet Google Scholar
Qiming, L., Chen, E., Xiong, H.: A semantic term weighting scheme for text categorization. Expert Syst. Appl. 38(10), 12708–12716 (2011)
Article Google Scholar
Husby, S.D., Barbosa, D.: Topic classification of blog posts using distant supervision. In: Proceedings of the Workshop on Semantic Analysis in Social Media, pp. 28–36, Stroudsburg, PA, USA, Association for Computational Linguistics (2012)
Google Scholar
Lao, N., Subramanya, A., Pereira, F., Cohen, W.W.: Reading the web with learned syntactic-semantic inference rules. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1017–1026, Stroudsburg, PA, USA, Association for Computational Linguistics (2012)
Google Scholar
Li, C.H., Yang, J.C., Park, S.C.: Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet. Expert Syst. Appl. 39(1), 765–772 (2012)
Article Google Scholar
Son, J.W., Kim, A. and Park, S.B.: A location-based news article recommendation with explicit localized semantic analysis. In: Proceedings of the 36th International ACM SIGIR Conference On Research and Development in Information Retrieval, pp. 293–302 (2013)
Google Scholar
Machhour, H., Kassou, I.: Improving text categorization: A fully automated ontology based approach. In: 2013 Third International Conference on Communications and Information Technology (ICCIT), pp. 67–72 (2013)
Google Scholar
Raghavan, S., Mooney, R.J., Hyeonseo, K.: Learning to read between the lines using bayesian logic programs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol.1, pp. 349–358. Association for Computational Linguistics (2012)
Google Scholar
Lam, W., Meng, H.M.L., Wong, K.L., Yen, J.C.H.: Using contextual analysis for news event detection. Int. J. Intell. Syst. 16(4), 525–546 (2001)
Article MATH Google Scholar
Yus, F.: Cyberpragmatics: Internet-Mediated Communication in Context. John Benjamins Publishing Company, Amsterdam (2011)
Book Google Scholar
Susan C Herring. Computer-mediated discourse. The handbook of discourse analysis (2001)
Google Scholar
Brody, S., Diakopoulos, N.: Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 562–570, Stroudsburg, PA, USA, Association for Computational Linguistics. (2011)
Google Scholar
Howard, P.N., Parks, M.R.: Social media and political change: capacity, constraint, and consequence. J. Commun. 62(2), 359–362 (2012)
Article Google Scholar
Cha, Y., Bi, B., Hsieh, C.-C., Cho, J.: Incorporating popularity in topic models for social network analysis. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 223–232 (2013)
Google Scholar
Grice, P.: Syntax and semantics. 3: speech acts. In: Cole, P., Morgan, J.L. (eds.) Logic and Conversation. Academic Press, New York (1975)
Google Scholar
Hirschberg, J.: A theory of scalar implicature. PhD thesis, University of Pennsylvania (1985)
Google Scholar
Attardo, S.: Violation of conversational maxims and cooperation: the case of jokes. J. Pragmatics 19(6), 537–558 (1993)
Article Google Scholar
Eisterhold, J., Attardo, S., Boxer, D.: Reactions to irony in discourse: Evidence for the least disruption principle. J. Pragmatics 38(8), 1239–1256 (2006)
Article Google Scholar
Silva, I.S., Gomide, J., Veloso, A., Meira Jr, W. and Ferreira, R.: Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 475–484. ACM, New York, NY, USA (2011)
Google Scholar
Phuvipadawat, S., Murata, T.: Breaking news detection and tracking in Twitter. In: Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 120–123 (2010)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc, Boston (1999)
Google Scholar
Cremonesi, P., Koren, Y., Turrin, R.: Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 39–46. ACM, New York, NY, USA (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidade Federal de Minas Gerais, Avenida Presidente Antônio Carlos, 6627, Belo Horizonte, MG, 31270-901, Brazil
Alexandre Davis & Adriano Veloso

Authors

Alexandre Davis
View author publications
You can also search for this author in PubMed Google Scholar
Adriano Veloso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandre Davis .

Editor information

Editors and Affiliations

Wrocław University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Swinburne University of Technology, Hawthorn, VIC, Australia
Ryszard Kowalczyk
CISUC, Department of Informatics Engineering, Universidade de Coimbra, Coimbra, Portugal
Paulo Rupino da Cunha

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Davis, A., Veloso, A. (2016). Subject-Related Message Filtering in Social Media Through Context-Enriched Language Models. In: Nguyen, N.T., Kowalczyk, R., Rupino da Cunha, P. (eds) Transactions on Computational Collective Intelligence XXI. Lecture Notes in Computer Science(), vol 9630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49521-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-49521-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49520-9
Online ISBN: 978-3-662-49521-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics