Advertisement

Factors Affecting Sentiment Prediction of Malay News Headlines Using Machine Learning Approaches

  • Rayner AlfredEmail author
  • Wong Wei Yee
  • Yuto Lim
  • Joe Henry Obit
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 652)

Abstract

Most sentiment analysis researches are done with the help of supervised machine learning techniques. Analyzing sentiment for these English text reviews is a non-trivial task in order to gauge public perception and acceptance of a particular issue being addressed. Nevertheless, there are not many studies conducted on analyzing sentiment of Malay news headlines due to lack of resources and tools. The Malay news headlines normally consist of a few words and are often written with creativity to attract the readers’ attention. This paper proposes a standard framework that investigates factors affecting sentiment prediction of Malay news headlines using machine learning approaches. It is important to investigate factors (e.g., types of classifiers, proximity measurements and number of Nearest Neighbors, k) that influence the prediction performance of the sentiment analysis as it helps to study and understand the parameters that can be tuned to optimize the prediction performance. Based on the results obtained, Support Vector Machine and Naïve Bayes classifiers were capable to obtain higher accuracy compared to the k-Nearest Neighbors (k-NN) classifier. In term of proximity measurement and number of Nearest Neighbors, k, the k-NN classifier achieved higher prediction performance when the Cosine similarity is applied with a small value of k (e.g., 3 and 5), compared to the Euclidean distance because it measures can be affected by the high dimensionality of the data.

Keywords

Sentiment analysis Opinion mining Naïve bayes k-Nearest neighbors Text classification 

References

  1. 1.
    Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, San Rafael (2012)Google Scholar
  2. 2.
    Cassinelli, A., Chen, C.-W.: CS 224 N Final Project Boost up! Sentiment Categorization with Machine Learning Techniques. Stanford University: The Stanford Natural Language Processing Group (2009)Google Scholar
  3. 3.
    Gebremeskel, G.: Sentiment Analysis of Twitter posts about news. University of Malta: Department of Computer Science and Artificial Intelligence (2011)Google Scholar
  4. 4.
    Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment in twitter events. J. Am. Soc. Inform. Sci. Technol. 62(2), 406–418 (2011)CrossRefGoogle Scholar
  5. 5.
    Noah, S.A., Ismail, F.: Automatic classifications of Malay proverbs using naïve bayesian algorithm. Inf. Technol. J. 7(7), 1016–1022 (2008)CrossRefGoogle Scholar
  6. 6.
    Kaur, J., Saini, J.R.: An analysis of opinion mining research works based on language, writing style and feature selection parameters. Int. J. Adv. Netw. Appl. (2013)Google Scholar
  7. 7.
    Naradhipa, A.R., Purwarianti, A.: Sentiment classification for indonesian message in social media. In: International Conference on Electrical Engineering and Informatics 17–19 July, Bandung, Indonesia (2011)Google Scholar
  8. 8.
    Jamal, N.: Masnizah mohd and shahrul azman noah: poetry classification using support vector machines. J. Comput. Sci. 8(9), 1441–1446 (2012)CrossRefGoogle Scholar
  9. 9.
    Alsaffar, A., Omar, N.: Study on feature selection and machine learning algorithms for Malay sentiment classification. In: ICIMU2014, Putrajaya, Malaysia (2014)Google Scholar
  10. 10.
    Zhang, W., Gao, F.: An improvement to naive bayes for text classification. Proc. Eng. 15, 2160–2164 (2011)CrossRefGoogle Scholar
  11. 11.
    Multilingual sentiment-Data Science Labs. Accessed https://sites.google.com/site/datascienceslab/projects/multilingualsentiment
  12. 12.
    Kwee, A.T., Tsai, F.S., Tang, W.: Sentence-level novelty detection in English and Malay. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 40–51. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Raschka, S.: Naive Bayes and Text Classification: Introduction and Theory. Cornell university library, Ithaca (2014)Google Scholar
  14. 14.
    Kalaivai, P.: Sentiment classification of movie reviews by supervised machine learning approaches. Indian J. Comput. Sci. Eng. (IJCSE) 4(4), 317–323 (2013)Google Scholar
  15. 15.
    Patel, F.N., Soni, N.R.: Increasing accuracy of k-NN classifier for text classification. Int. J. Comput. Sci. Inform., ISSN (PRINT) 3(2), 2231–5292 (2013)Google Scholar
  16. 16.
    Khamar, K.: Short text classification using kNN based on distance function. Int. J. Adv. Res. Comput. Commun. Eng. 2(4) (2013)Google Scholar
  17. 17.
    Ashari, A., Paryudi, I., Tjoa, A.M.: Performance comparison between naïve bayes, decision tree and k-nearest neighbor in searching alternative design in an energy simulation tool. Int. J. Adv. Comput. Sci. Appl. 4(11) (2013)Google Scholar
  18. 18.
    Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J. Yang, Q., Motoda, H.: Top 10 algorithms in data mining. © Springer-Verlag London Limited (2007)Google Scholar
  19. 19.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2016

Authors and Affiliations

  • Rayner Alfred
    • 1
    Email author
  • Wong Wei Yee
    • 1
  • Yuto Lim
    • 2
  • Joe Henry Obit
    • 1
  1. 1.Faculty of Computing and InformaticsUniversiti Malaysia SabahKota KinabaluMalaysia
  2. 2.School of Information ScienceJapan Advanced Institute of Science and TechnologyNomiJapan

Personalised recommendations