Abstract
With the rapid increase in the volume of Arabic opinionated posts on different social media forums, comes an increased demand for Arabic sentiment analysis tools and resources. Social media posts, especially those made by the younger generation, are usually written using colloquial Arabic and include a lot of slang, many of which evolves over time. While some work has been carried out to build modern standard Arabic sentiment lexicons, these need to be supplemented with dialectical terms and continuously updated with slang. This paper proposes a fully automated approach for building a dialectical/slang subjectivity lexicon for use in Arabic Sentiment analysis using lexico-syntactic patterns. Since existing Arabic part of speech taggers and other morphological resources have been found to handle colloquial Arabic very poorly, the presented approach does not employ any such tools, allowing the presented approach to generalize across dialects with some minor modifications. Results of experiments, that targeted Egyptian Arabic, show the approach’s ability to detect subjective internet slang represented by single words or by multi-word expressions, as well as classifying the polarity of these with a high degree of precision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Semiocast, Geolocation analysis of Twitter accounts and tweets by Semiocast (2012), http://bit.ly/1kwY9OZ
Farid, D.: Egypt has the largest number of Facebook users in the Arab world. Daily News Egypt (September 2013)
El-Beltagy, S.R., Ali, A.: Open Issues in the Sentiment Analysis of Arabic Social Media: A Case Study. In: Proceedings of 9th International Conference on Innovations in Information Technology (IIT), pp. 215–220 (2013)
Volkova, S., Wilson, T., Yarowsky, D.: Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 505–510 (2013)
Turney, P.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Computational Linguistics (ACL), pp. 417–424 (July 2002)
Banea, C., Mihalcea, R., Wiebe, J.: A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 215–220 (2008)
Abdul-Mageed, M., Diab, M.: Toward Building a Large-Scale Arabic Sentiment Lexicon. In: Proceedings of the 6th International Global WordNet Conference, pp. 18–22 (2012)
Esuli, A., Sebastiani, F.: Determining the semantic orientation of terms through gloss classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 617–624 (2005)
Velikovich, L., Blair-Goldensohn, S.: The viability of web-derived polarity lexicons. In: Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL), pp. 777–785 (2010)
Hearst, M.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th Conference on Computational Linguistics, COLING 1992, vol. 2, pp. 539–545 (1992)
Klaussner, C., Zhekova, D.: Lexico-Syntactic Patterns for Automatic Ontology Building. In: Proceedings of the Student Research Workshop Associated with RANLP, pp. 109–114 (2011)
Xu, J., Croft, W.B.: Corpus-Based Stemming using Co-occurrence of Word Variants 1 Introduction. ACM Trans. Inf. Syst. 16(1), 61–81 (1998)
Twitter REST API version 1.1, https://dev.twitter.com/docs/api/1.1
Larkey, L.S., Ballesteros, L., Connell, M.E.: Light Stemming for Arabic Information Retrieval. In: Arabic Computational Morphology, pp. 221–243 (2007)
Singhal, A.: Modern Information Retrieval: A Brief Overview. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp. 35–43 (2001)
El-Beltagy, S.R., Ali, A.: unWeighted Opinion Mining Lexicon, Egyptian Arabic (2013), http://bit.ly/MGtMqU
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
ElSahar, H., El-Beltagy, S.R. (2014). A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs . In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-54906-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54905-2
Online ISBN: 978-3-642-54906-9
eBook Packages: Computer ScienceComputer Science (R0)