A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs

ElSahar, Hady; El-Beltagy, Samhaa R.

doi:10.1007/978-3-642-54906-9_7

Hady ElSahar¹⁷ &
Samhaa R. El-Beltagy¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8403))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2156 Accesses
18 Citations

Abstract

With the rapid increase in the volume of Arabic opinionated posts on different social media forums, comes an increased demand for Arabic sentiment analysis tools and resources. Social media posts, especially those made by the younger generation, are usually written using colloquial Arabic and include a lot of slang, many of which evolves over time. While some work has been carried out to build modern standard Arabic sentiment lexicons, these need to be supplemented with dialectical terms and continuously updated with slang. This paper proposes a fully automated approach for building a dialectical/slang subjectivity lexicon for use in Arabic Sentiment analysis using lexico-syntactic patterns. Since existing Arabic part of speech taggers and other morphological resources have been found to handle colloquial Arabic very poorly, the presented approach does not employ any such tools, allowing the presented approach to generalize across dialects with some minor modifications. Results of experiments, that targeted Egyptian Arabic, show the approach’s ability to detect subjective internet slang represented by single words or by multi-word expressions, as well as classifying the polarity of these with a high degree of precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Semiocast, Geolocation analysis of Twitter accounts and tweets by Semiocast (2012), http://bit.ly/1kwY9OZ
Farid, D.: Egypt has the largest number of Facebook users in the Arab world. Daily News Egypt (September 2013)
Google Scholar
El-Beltagy, S.R., Ali, A.: Open Issues in the Sentiment Analysis of Arabic Social Media: A Case Study. In: Proceedings of 9th International Conference on Innovations in Information Technology (IIT), pp. 215–220 (2013)
Google Scholar
Volkova, S., Wilson, T., Yarowsky, D.: Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 505–510 (2013)
Google Scholar
Turney, P.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Computational Linguistics (ACL), pp. 417–424 (July 2002)
Google Scholar
Banea, C., Mihalcea, R., Wiebe, J.: A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 215–220 (2008)
Google Scholar
Abdul-Mageed, M., Diab, M.: Toward Building a Large-Scale Arabic Sentiment Lexicon. In: Proceedings of the 6th International Global WordNet Conference, pp. 18–22 (2012)
Google Scholar
Esuli, A., Sebastiani, F.: Determining the semantic orientation of terms through gloss classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 617–624 (2005)
Google Scholar
Velikovich, L., Blair-Goldensohn, S.: The viability of web-derived polarity lexicons. In: Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL), pp. 777–785 (2010)
Google Scholar
Hearst, M.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th Conference on Computational Linguistics, COLING 1992, vol. 2, pp. 539–545 (1992)
Google Scholar
Klaussner, C., Zhekova, D.: Lexico-Syntactic Patterns for Automatic Ontology Building. In: Proceedings of the Student Research Workshop Associated with RANLP, pp. 109–114 (2011)
Google Scholar
Xu, J., Croft, W.B.: Corpus-Based Stemming using Co-occurrence of Word Variants 1 Introduction. ACM Trans. Inf. Syst. 16(1), 61–81 (1998)
Article Google Scholar
Twitter REST API version 1.1, https://dev.twitter.com/docs/api/1.1
Larkey, L.S., Ballesteros, L., Connell, M.E.: Light Stemming for Arabic Information Retrieval. In: Arabic Computational Morphology, pp. 221–243 (2007)
Google Scholar
Singhal, A.: Modern Information Retrieval: A Brief Overview. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp. 35–43 (2001)
Google Scholar
El-Beltagy, S.R., Ali, A.: unWeighted Opinion Mining Lexicon, Egyptian Arabic (2013), http://bit.ly/MGtMqU

Download references

Author information

Authors and Affiliations

Center of Informatics Sciences, Nile University, Cairo, Egypt
Hady ElSahar & Samhaa R. El-Beltagy

Authors

Hady ElSahar
View author publications
You can also search for this author in PubMed Google Scholar
Samhaa R. El-Beltagy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan Dios Bátiz, Col. Nueva Industrial Vallejo, 07738, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

ElSahar, H., El-Beltagy, S.R. (2014). A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs . In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-54906-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54905-2
Online ISBN: 978-3-642-54906-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics