A cross-study of Sentiment Classification on Arabic corpora

Mountassir, A.; Benbrahim, H.; Berrada, I.

doi:10.1007/978-1-4471-4739-8_21

A. Mountassir³,
H. Benbrahim³ &
I. Berrada³

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

902 Accesses
9 Citations

Abstract

Sentiment Analysis is a research area where the studies focus on processing and analyzing the opinions available on the web. Several interesting and advanced works were performed on English. In contrast, very few works were conducted on Arabic. This paper presents the study we have carried out to investigate supervised sentiment classification in an Arabic context. We use two Arabic Corpora which are different in many aspects. We use three common classifiers known by their effectiveness, namely Naïve Bayes, Support Vector Machines and k-Nearest Neighbor. We investigate some settings to identify those that allow achieving the best results. These settings are about stemming type, term frequency thresholding, term weighting and n-gram words. We show that Naïve Bayes and Support Vector Machines are competitively effective; however k- Nearest Neighbor’s effectiveness depends on the corpus. Through this study, we recommend to use light-stemming rather than stemming, to remove terms that occur once, to combine unigram and bigram words and to use presence-based weighting rather than frequency-based one. Our results show also that classification performance may be influenced by documents length, documents homogeneity and the nature of document authors. However, the size of data sets does not have an impact on classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdul-Mageed, M., Diab, M.T., Korayem, M.: Subjectivity and Sentiment Analysis of Modern Standard Arabic. In Proc. ACL (Short Papers), pp.587-591 (2011).
Google Scholar
Pang, B., Lee, L., Vaithyanathain, S.: Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.79-86 (2002).
Google Scholar
Wilson, T.A., Wiebe, J., Hwa. R.: Recognizing strong and weak opinion clauses. In Computational Intelligence, 22(2):73–99 (2006).
Google Scholar
Zhuang, L., Jing, F., Zhu, X.: Movie Review Mining and Summarization. In CIKM’06, Virginia, USA (2006).
Google Scholar
Turney, P.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In ACL’02, pp. 417–424 (2002).
Google Scholar
Tsarfaty, R., Seddah, D., Goldberg, Y., Kuebler, S., Versley, Y., Candito, M., Foster, J., Rehbein, I., Tounsi, L.: Statistical parsing of morphologically rich languages (spmrl) what, how and whither. In Proc. NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, Los Angeles, CA, (2010).
Google Scholar
Saad, M.K., Ashour, W.: OSAC: Open Source Arabic Corpora. In 6th ArchEng Int. Symposiums, EEECS’10 the 6th Int. Symposium on Electrical and Electronics Engineering and Computer Science, European University of Lefke, Cyprus, (2010).
Google Scholar
Mitchell, T.: Machine Learning. McCraw Hill (1996).
Google Scholar
Vapnik, V.: The Nature of Statistical Learning. Springer-Verlag (1995).
Google Scholar
Dasarathy, B.V.: Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. McGraw-Hill Computer Science Series. Las Alamitos, California: IEEE Computer Society Press (1991).
Google Scholar
Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst., 26, pp.1–34 (2008).
Google Scholar
Rushdi-Saleh, M., Mrtin-Valdivia, M.T., Urena-Lopez, L.A., Perea-Ortega, J.M.: Bilingual Experiments with an Arabic-English Corpus for Opinion Mining. In Proc. Of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp.740-745 (2011).
Google Scholar
Duwairi, R., Al-Refai, M., Khasawneh, N.: Feature reduction techniques for Arabic text categorization. Journal of the American Society for Information Science. Volume 60 Issue 11, pp. 2347-2352 (2009).
Article Google Scholar
Duwairi, R., Al-Refai, M., Khasawneh, N.: Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization. 4th Int. Conf. on Innovations in Information Technology. IIT’07. Pp. 446-450 (2007).
Google Scholar
Khoja, S., Garside, R.: Stemming Arabic text. Computer Science Department, Lancaster University, Lancaster, UK (1999).
Google Scholar
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, Pennsylvania: Addison-Wesley (1989).
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. In 2nd Edition, Morgan Kaufmann, San Francisco, California (2005).
Google Scholar
Abbasi, A., Chen, H.: Identification and comparison of extremist-group web forum messages using authorship analysis. In IEEE Intelligent Systems 20, 5, pp.67-75 (2005).
Article Google Scholar
Zheng, R., Li, J., Huang, Z. Chen, H.: A framework for authorship analysis of online messages: Writing-style features and techniques. In Journal of the American Society for Information Science and Technology 57, 3, pp.378-393 (2006).
Article Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Inform. Retr. 1, 1–2, pp. 69–90 (1999).
Article Google Scholar
John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp.338-345 (1995).
Google Scholar
Platt, J.: Fast training on SVMs using sequential minimal optimization. In Scholkopf, B., Burges, C., and Smola, A. (Ed.), Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, pp.185-208 (1999).
Google Scholar
Salton, G., McGill, M.: Modern Information Retrieval. New York: McGraw-Hill (1983).
MATH Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. In ACM Comput. Surv., Volume 34, Number 1, pp.1-47 (2002).
Article Google Scholar
Shannon, C.: A mathematical theory of communication. In Bell System Technical Journal, 27, Bell System Technical Journal (1948).
Google Scholar

Download references

Author information

Authors and Affiliations

ENSIAS, Mohamed 5 University, Rabat, Morocco
A. Mountassir, H. Benbrahim & I. Berrada

Authors

A. Mountassir
View author publications
You can also search for this author in PubMed Google Scholar
H. Benbrahim
View author publications
You can also search for this author in PubMed Google Scholar
I. Berrada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Mountassir .

Editor information

Editors and Affiliations

School of Computing, University of Portsmouth, Whitepost Lane The Lilacs, Portsmouth, PO1 3AH, Hampshire, United Kingdom
Max Bramer
School of Computing, Engineering & Mathe, University of Brighton, Lewes Road, Brighton, BN2 4GJ, West Sussex, United Kingdom
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mountassir, A., Benbrahim, H., Berrada, I. (2012). A cross-study of Sentiment Classification on Arabic corpora. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_21

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4739-8_21
Published: 09 October 2012
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics