Skip to main content

Minimally-Supervised Sentiment Lexicon Induction Model: A Case Study of Malay Sentiment Analysis

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10607))

Abstract

Vital to the task of mining sentiment from text is a sentiment lexicon, or a dictionary of terms annotated for their a priori information across the semantic dimension of sentiment. Each term has assigned a general, out-of-context sentiment polarity. Unfortunately, online dictionaries and similar lexical resources do not readily include information on the sentiment properties of their entries. Moreover, manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large volume of research concentrated on automated sentiment lexicon generation algorithms. Most of these algorithms were designed for English, attributable to the abundance of readily available lexical resources in this language. This is not the case for low-resource languages such as the Malay language. Although there has been an exponential increase in research on Malay sentiment analysis over the past few years, the subtask of sentiment lexicon induction for this particular language remains under-investigated. We present a minimally-supervised sentiment lexicon induction model specifically designed for the Malay language. It takes as input only two initial paradigm positive and negative terms, and mines WordNet Bahasa’s synonym chains and Kamus Dewan’s gloss information to extract subjective, sentiment-laden terms. The model automatically bootstraps a reliable, high coverage sentiment lexicon that can be employed in Malay sentiment analysis on full-text. Intrinsic evaluation of the model against a manually annotated test set demonstrates that its ability to assign sentiment properties to terms is on par with human judgement.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://wn-msa.sourceforge.net/index.eng.html.

  2. 2.

    http://prpm.dbp.gov.my.

  3. 3.

    The convention ‘term.pos.sense’ is used to define WordNet synsets here. For example, good.a.01 refers to the first sense of the adjective ‘good’, while bad.a.12 refers to the 12th sense of the adjective ‘bad’.

  4. 4.

    http://lrgs.ftsm.ukm.my/MalaySent.

References

  1. Stone, P.J., Dunphy, D.C., Smith, M.S.: The General Inquirer: A Computer Approach to Content Analysis (1966)

    Google Scholar 

  2. Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, pp. 2200–2204 (2010)

    Google Scholar 

  3. Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, pp, 174–181. Association for Computational Linguistics (1997)

    Google Scholar 

  4. Hassan, A., Abu-Jbara, A., Jha, R., Radev, D.: Identifying the semantic orientation of foreign words. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers. vol. 2, pp. 592–597. Association for Computational Linguistics (2011)

    Google Scholar 

  5. Mihalcea, R., Banea, C., Wiebe, J.: Learning multilingual subjective language via cross-lingual projections. In: Annual Meeting-Association for Computational Linguistics. vol. 1, p. 976 (2007)

    Google Scholar 

  6. Wan, X.: Co-training for cross-lingual sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 235–243. Association for Computational Linguistics (2009)

    Google Scholar 

  7. Mohammad, S.M., Salameh, M., Kiritchenko, S.: How translation alters sentiment. J. Artif. Intell. Res. (JAIR) 55, 95–130 (2016)

    MathSciNet  Google Scholar 

  8. Tan, Y.-F., Lam, H.-S., Azlan, A., Soo, W.-K.: Sentiment analysis for telco popularity on Twitter big data using a novel Malaysian dictionary. In: ICADIWT, pp. 112–125 (2016)

    Google Scholar 

  9. Shamsudin, N.F., Basiron, H., Sa’aya, Z.: Lexical based sentiment analysis-verb, adverb and negation. J. Telecommun. Electron. Comput. Eng. (JTEC) 8(2), 161–166 (2016)

    Google Scholar 

  10. Sadanandan, A.A., Osman, N.A., Hussain Saifuddin, M.K., Ahamad, D.N.P., Hoe, H.: Improving accuracy in sentiment analysis for Malay language

    Google Scholar 

  11. Nasharuddin, N.A., Abdullah, M.T., Azman, A., Kadir, R.A.: English and Malay cross-lingual sentiment lexicon acquisition and analysis. In: Kim, K., Joukov, N. (eds.) ICISA 2017. LNEE, vol. 424, pp. 467–475. Springer, Singapore (2017). doi:10.1007/978-981-10-4154-9_54

    Chapter  Google Scholar 

  12. Hijazi, M.H.A., Libin, L., Alfred, R., Coenen, F.: Bias aware lexicon-based sentiment analysis of Malay dialect on social media data: a study on the Sabah language. In: 2016 2nd International Conference on Science in Information Technology (ICSITech), pp. 356–361. IEEE (2016)

    Google Scholar 

  13. Alfred, R., Yee, W.W., Lim, Y., Obit, J.H.: Factors affecting sentiment prediction of Malay news headlines using machine learning approaches. In: Berry, M.W., Mohamed, A.H., Yap, B.W. (eds.) SCDS 2016. CCIS, vol. 652, pp. 289–299. Springer, Singapore (2016). doi:10.1007/978-981-10-2777-2_26

    Chapter  Google Scholar 

  14. Puteh, M., Isa, N., Puteh, S., Redzuan, N.A.: Sentiment mining of Malay newspaper (SAMNews) using artificial immune system. In: Proceedings of the World Congress on Engineering (2013)

    Google Scholar 

  15. Isa, N., Puteh, M., Kamarudin, R.: Sentiment classification of Malay newspaper using immune network (SCIN). In: Proceedings of the World Congress on Engineering (2013)

    Google Scholar 

  16. Samsudin, N., Puteh, M., Hamdan, A.R., Nazri, M.Z.A.: Normalization of noisy texts in Malaysian online reviews. J. ICT 12, 147–159 (2013)

    Google Scholar 

  17. Arif, S.M., Mustapha, M.: The effect of noise elimination and stemming in sentiment analysis for Malay documents. In: Ahmad, A.-R., Kor, L.K., Ahmad, I., Idrus, Z. (eds.) Proceedings of the International Conference on Computing, Mathematics and Statistics (iCMS 2015), pp. 93–102. Springer, Singapore (2017). doi:10.1007/978-981-10-2772-7_10

    Chapter  Google Scholar 

  18. Darwich, M., Noah, S.A.M., Omar, N.: Automatically generating a sentiment lexicon for the Malay language. Asia-Pacific J. Inf. Technol. Multimed. 5(1), 49–59 (2016)

    Google Scholar 

  19. Bond, F., Lim, L.T., Tang, E.K., Riza, H.: The combined wordnet bahasa. NUSA: Linguist. Stud. Lang. Around Indonesia 57, 83–100 (2014)

    Google Scholar 

  20. Perkamusan, D.: Kamus Dewan. Dewan Bahasa dan Pustaka, Kuala Lumpur (1984)

    Google Scholar 

  21. Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Opinionfinder: a system for subjectivity analysis. In: Proceedings of HLT/EMNLP on Interactive Demonstrations, pp. 34–35. Association for Computational Linguistics (2005)

    Google Scholar 

  22. Burt, R.S.: Models of network structure. Ann. Rev. Sociol. 6(1), 79–141 (1980)

    Article  MathSciNet  Google Scholar 

  23. Idris, A.A.: Modality in Malay (1980)

    Google Scholar 

  24. Kroeger, P.: External negation in Malay/Indonesian. Language 90(1), 137–184 (2014)

    Article  Google Scholar 

  25. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)

    Article  MATH  Google Scholar 

  26. Andreevskaia, A., Bergler, S.: Mining WordNet for a fuzzy sentiment: sentiment tag extraction from WordNet glosses. In: EACL, pp. 209–216 (2006)

    Google Scholar 

  27. Esuli, A., Sebastiani, F.: Determining term subjectivity and term orientation for opinion mining. In: EACL, p. 2006 (2006)

    Google Scholar 

  28. Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th international conference on Computational Linguistics, p. 1367. Association for Computational Linguistics (2004)

    Google Scholar 

  29. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)

    Article  Google Scholar 

Download references

Acknowledgement

This research was partially supported by the Malaysia Ministry of Education Grant FRGS/1/2014/ICT02/UKM/01/1 awarded to the Center for Artificial Intelligence Technology at Universiti Kebangsaan Malaysia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Darwich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Darwich, M., Noah, S.A.M., Omar, N. (2017). Minimally-Supervised Sentiment Lexicon Induction Model: A Case Study of Malay Sentiment Analysis. In: Phon-Amnuaisuk, S., Ang, SP., Lee, SY. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture Notes in Computer Science(), vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69456-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69455-9

  • Online ISBN: 978-3-319-69456-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics