Evaluation of a General-Purpose Sentiment Lexicon on A Product Review Corpus

Khoo, Christopher S. G.; Johnkhan, Sathik Basha; Na, Jin-Cheon

doi:10.1007/978-3-319-27974-9_9

Christopher S. G. Khoo¹⁶,
Sathik Basha Johnkhan¹⁶ &
Jin-Cheon Na¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9469))

Included in the following conference series:

International Conference on Asian Digital Libraries

2918 Accesses

Abstract

This paper introduces a new general-purpose sentiment lexicon called the WKWSCI Sentiment Lexicon and compares it with three existing lexicons. The WKWSCI Sentiment Lexicon is based on the 6of12dict lexicon, and currently covers adjectives, adverbs and verbs. The words were manually coded with a value on a 7-point sentiment strength scale. The effectiveness of the four sentiment lexicons for sentiment categorization at the document-level and sentence-level was evaluated using an Amazon product review dataset. The WKWSCI lexicon obtained the best results for document-level sentiment categorization, with an accuracy of 75%. The Hu & Liu lexicon obtained the best results for sentence-level sentiment categorization, with an accuracy of 77%. The best bag-of-words machine learning model obtained an accuracy of 82% for document-level sentiment categorization model. The strength of the lexicon-based method is in sentence-level and aspect-based sentiment analysis, where it is difficult to apply machine-learning because of the small number of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Oomen, J., Aroyo, L.: Crowdsourcing in the cultural heritage domain: opportunities and challenges. In: Proceedings of the 5th International Conference on Communities and Technologies, pp. 138–149. ACM, June 2011
Google Scholar
Dicts introduction. http://wordlist.aspell.net/12dicts-readme/
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
MATH Google Scholar
Zhang, H.: The optimality of Naive Bayes. In: Proceedings of the Seventeenth Florida Artificial Intelligence Research Society Conference, pp. 562–567. The AAAI Press (2004)
Google Scholar
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 90–94. Association for Computational Linguistics (2012)
Google Scholar
Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)
Google Scholar
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics 37(2), 267–307 (2011)
Article Google Scholar
Hatzivassiloglou, V., McKeown, K.: Predicting the semantic orientation of adjectives. In: Proceedings of 35th Meeting of the Association for Computational Linguistics, pp. 174–181 (1997)
Google Scholar
Turney, P., Littman, M.: Measuring Praise and Criticism: Inference of Semantic Orientation from Association. ACM Transactions on Information Systems 21(4), 315–346 (2003)
Article Google Scholar
Esuli, A., Sebastiani, F.: SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of 5th International Conference on Language Resources and Evaluation (LREC), pp. 417–422 (2006)
Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD–2004), pp. 168–177. ACM, New York (2004)
Google Scholar
Thet, T.T., Na, J.C., Khoo, C.: Aspect-Based Sentiment Analysis of Movie Reviews on Discussion Boards. Journal of Information Science 36(6), 823–848 (2010)
Article Google Scholar
Wiebe, J., Wilson, T., Cardie, C.: Annotating Expressions of Opinions and Emotions in Language. Language Resources and Evaluation 39(2–3), 165–210 (2005)
Article Google Scholar
Khoo, C., Nourbakhsh, A., Na, J.C.: Sentiment Analysis of News Text: A Case Study of Appraisal Theory. Online Information Review 36(6), 858–878 (2012)
Article Google Scholar
Riloff, E., Wiebe, J.: Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP–2003), pp. 105–112. Association for Computational Linguistics (2003)
Google Scholar
Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 219–230. ACM, New York (2008)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media (2009)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12, 2825–2830 (2011)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Wee Kim Wee School of Communication & Information, Nanyang Technological University, Singapore, Singapore
Christopher S. G. Khoo, Sathik Basha Johnkhan & Jin-Cheon Na

Authors

Christopher S. G. Khoo
View author publications
You can also search for this author in PubMed Google Scholar
Sathik Basha Johnkhan
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Cheon Na
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher S. G. Khoo .

Editor information

Editors and Affiliations

Yonsei University, Seoul, Korea (Republic of)
Robert B. Allen
School of ITEE, University of Queensland, St. Lucia, Queensland, Australia
Jane Hunter
School of Library & Info Sci, Kent State University, KENT, Ohio, USA
Marcia L. Zeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khoo, C.S.G., Johnkhan, S.B., Na, JC. (2015). Evaluation of a General-Purpose Sentiment Lexicon on A Product Review Corpus. In: Allen, R., Hunter, J., Zeng, M. (eds) Digital Libraries: Providing Quality Information. ICADL 2015. Lecture Notes in Computer Science(), vol 9469. Springer, Cham. https://doi.org/10.1007/978-3-319-27974-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-27974-9_9
Published: 18 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27973-2
Online ISBN: 978-3-319-27974-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics