Abstract
Sentiment Analysis is a research area where the studies focus on processing and analyzing the opinions available on the web. Several interesting and advanced works were performed on English. In contrast, very few works were conducted on Arabic. This paper presents the study we have carried out to investigate supervised sentiment classification in an Arabic context. We use two Arabic Corpora which are different in many aspects. We use three common classifiers known by their effectiveness, namely Naïve Bayes, Support Vector Machines and k-Nearest Neighbor. We investigate some settings to identify those that allow achieving the best results. These settings are about stemming type, term frequency thresholding, term weighting and n-gram words. We show that Naïve Bayes and Support Vector Machines are competitively effective; however k- Nearest Neighbor’s effectiveness depends on the corpus. Through this study, we recommend to use light-stemming rather than stemming, to remove terms that occur once, to combine unigram and bigram words and to use presence-based weighting rather than frequency-based one. Our results show also that classification performance may be influenced by documents length, documents homogeneity and the nature of document authors. However, the size of data sets does not have an impact on classification results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abdul-Mageed, M., Diab, M.T., Korayem, M.: Subjectivity and Sentiment Analysis of Modern Standard Arabic. In Proc. ACL (Short Papers), pp.587-591 (2011).
Pang, B., Lee, L., Vaithyanathain, S.: Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.79-86 (2002).
Wilson, T.A., Wiebe, J., Hwa. R.: Recognizing strong and weak opinion clauses. In Computational Intelligence, 22(2):73–99 (2006).
Zhuang, L., Jing, F., Zhu, X.: Movie Review Mining and Summarization. In CIKM’06, Virginia, USA (2006).
Turney, P.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In ACL’02, pp. 417–424 (2002).
Tsarfaty, R., Seddah, D., Goldberg, Y., Kuebler, S., Versley, Y., Candito, M., Foster, J., Rehbein, I., Tounsi, L.: Statistical parsing of morphologically rich languages (spmrl) what, how and whither. In Proc. NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, Los Angeles, CA, (2010).
Saad, M.K., Ashour, W.: OSAC: Open Source Arabic Corpora. In 6th ArchEng Int. Symposiums, EEECS’10 the 6th Int. Symposium on Electrical and Electronics Engineering and Computer Science, European University of Lefke, Cyprus, (2010).
Mitchell, T.: Machine Learning. McCraw Hill (1996).
Vapnik, V.: The Nature of Statistical Learning. Springer-Verlag (1995).
Dasarathy, B.V.: Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. McGraw-Hill Computer Science Series. Las Alamitos, California: IEEE Computer Society Press (1991).
Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst., 26, pp.1–34 (2008).
Rushdi-Saleh, M., Mrtin-Valdivia, M.T., Urena-Lopez, L.A., Perea-Ortega, J.M.: Bilingual Experiments with an Arabic-English Corpus for Opinion Mining. In Proc. Of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp.740-745 (2011).
Duwairi, R., Al-Refai, M., Khasawneh, N.: Feature reduction techniques for Arabic text categorization. Journal of the American Society for Information Science. Volume 60 Issue 11, pp. 2347-2352 (2009).
Duwairi, R., Al-Refai, M., Khasawneh, N.: Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization. 4th Int. Conf. on Innovations in Information Technology. IIT’07. Pp. 446-450 (2007).
Khoja, S., Garside, R.: Stemming Arabic text. Computer Science Department, Lancaster University, Lancaster, UK (1999).
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, Pennsylvania: Addison-Wesley (1989).
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. In 2nd Edition, Morgan Kaufmann, San Francisco, California (2005).
Abbasi, A., Chen, H.: Identification and comparison of extremist-group web forum messages using authorship analysis. In IEEE Intelligent Systems 20, 5, pp.67-75 (2005).
Zheng, R., Li, J., Huang, Z. Chen, H.: A framework for authorship analysis of online messages: Writing-style features and techniques. In Journal of the American Society for Information Science and Technology 57, 3, pp.378-393 (2006).
Yang, Y.: An evaluation of statistical approaches to text categorization. Inform. Retr. 1, 1–2, pp. 69–90 (1999).
John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp.338-345 (1995).
Platt, J.: Fast training on SVMs using sequential minimal optimization. In Scholkopf, B., Burges, C., and Smola, A. (Ed.), Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, pp.185-208 (1999).
Salton, G., McGill, M.: Modern Information Retrieval. New York: McGraw-Hill (1983).
Sebastiani, F.: Machine learning in automated text categorization. In ACM Comput. Surv., Volume 34, Number 1, pp.1-47 (2002).
Shannon, C.: A mathematical theory of communication. In Bell System Technical Journal, 27, Bell System Technical Journal (1948).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this paper
Cite this paper
Mountassir, A., Benbrahim, H., Berrada, I. (2012). A cross-study of Sentiment Classification on Arabic corpora. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_21
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4739-8_21
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)