Sentiment Analysis System for Roman Urdu

  • Khawar MehmoodEmail author
  • Daryl Essam
  • Kamran Shafi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 858)


Sentiment analysis is a computational process to identify positive or negative sentiments expressed in a piece of text. In this paper, we present a sentiment analysis system for Roman Urdu. For this task, we gathered Roman Urdu data of 779 reviews for five different domains, i.e., Drama, Movie/Telefilm, Mobile Reviews, Politics, and Miscellaneous (Misc). We selected unigram, bigram and uni-bigram (unigram + bigram) features for this task and used five different classifiers to compute accuracies before and after feature reduction. In total, thirty-six (36) experiments were performed, and they established that Naïve Bayes (NB) and Logistic Regression (LR) performed better than the rest of the classifiers on this task. It was also observed that the overall results were improved after feature reduction.


Opinion mining Roman urdu Urdu Social media 


  1. 1.
    Wan, X.: Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 553–561. Association for Computational Linguistics (2008)Google Scholar
  2. 2.
    Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, vol. 10, no. 2010 (2010)Google Scholar
  3. 3.
    Simons, G.F., Fennig, C.D. (eds.) Ethnologue: Languages of the World, Twentieth edition. SIL International, Dallas (2017).
  4. 4.
    Feldman, Ronen: Techniques and applications for sentiment analysis. Commun. ACM 56(4), 82–89 (2013)CrossRefGoogle Scholar
  5. 5.
    Tatemura, J.: Virtual reviewers for collaborative exploration of movie reviews. In: Proceedings of the 5th International Conference on Intelligent User Interfaces, pp. 272–275. ACM (2000)Google Scholar
  6. 6.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)Google Scholar
  7. 7.
    Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. Association for Computational Linguistics (2002)Google Scholar
  8. 8.
    Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)CrossRefGoogle Scholar
  9. 9.
    Alessia, D., Ferri, F., Grifoni, P., Guzzo, T.: Approaches, tools and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125(3) (2015)Google Scholar
  10. 10.
    Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014)CrossRefGoogle Scholar
  11. 11.
    Yessenalina, A., Yue, Y., Cardie, C.: Multi-level structured models for document-level sentiment classification. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1046–1056. Association for Computational Linguistics (2010)Google Scholar
  12. 12.
    Moraes, R., Valiati, J.F., Neto, W.P.G.: Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst. Appl. 40(2), 621–633 (2013)CrossRefGoogle Scholar
  13. 13.
    Zhang, C., Zeng, D., Li, J., Wang, F.Y., Zuo, W.: Sentiment analysis of Chinese documents: from sentence to document level. J. Assoc. Inf. Sci. Technol. 60(12), 2474–2487 (2009)CrossRefGoogle Scholar
  14. 14.
    Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. (TOIS) 26(3), 12 (2008)CrossRefGoogle Scholar
  15. 15.
    Singh, V.K., Piryani, R., Uddin, A., Waila, P.: Sentiment analysis of movie reviews: a new feature-based heuristic for aspect-level sentiment classification. In: 2013 International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), pp. 712–717. IEEE (2013)Google Scholar
  16. 16.
    Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642 (2013)Google Scholar
  17. 17.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)Google Scholar
  18. 18.
    Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics (2011)Google Scholar
  19. 19.
    Xu, T., Peng, Q., Cheng, Y.: Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowl. Based Syst. 35, 279–289 (2012)CrossRefGoogle Scholar
  20. 20.
    Yu, L.C., Wu, J.L., Chang, P.C., Chu, H.S.: Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news. Knowl. Based Syst. 41, 89–97 (2013)CrossRefGoogle Scholar
  21. 21.
    Hagenau, M., Liebmann, M., Neumann, D.: Automated news reading: stock price prediction based on financial news using context-capturing features. Decis. Support Syst. 55(3), 685–697 (2013)CrossRefGoogle Scholar
  22. 22.
    Maks, I., Vossen, P.: A lexicon model for deep sentiment analysis and opinion mining applications. Decis. Support Syst. 53(4), 680–688 (2012)CrossRefGoogle Scholar
  23. 23.
    Malik, M.K.: Urdu named entity recognition and classification system using artificial neural network. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 17(1), 2 (2017)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Malik, M.K., Sarwar, S.M.: Urdu named entity recognition system using hidden Markov model. Pak. J. Eng. Appl. Sci. (2017)Google Scholar
  25. 25.
    Malik, Muhammad Kamran, Sarwar, Syed Mansoor: Named entity recognition system for postpositional languages: urdu as a case study. Int. J. Adv. Comput. Sci. Appl. 7(10), 141–147 (2016)Google Scholar
  26. 26.
    Usman, Muhammad, Shafique, Zunaira, Ayub, Saba, Malik, Kamran: Urdu text classification using majority voting. Int. J. Adv. Comput. Sci. Appl. 7(8), 265–273 (2016)Google Scholar
  27. 27.
    Ali, A., Hussain, A., Malik, M.K.: Model for english-urdu statistical machine translation. World Appl. Sci. 24, 1362–1367 (2013)Google Scholar
  28. 28.
    Shahzadi, S., Fatima, B., Malik, K., Sarwar, S.M.: Urdu word prediction system for mobile phones. World Appl. Sci. J. 22(1), 113–120 (2013)Google Scholar
  29. 29.
    Karamat, N., Malik, K., Hussain, S.: Improving generation in machine translation by separating syntactic and morphological processes. In: Frontiers of Information Technology (FIT), pp. 195–200. IEEE (2011)Google Scholar
  30. 30.
    Siddiq, S., Hussain, S., Ali, A., Malik, K., Ali, W.: Urdu noun phrase chunking-hybrid approach. In: 2010 International Conference on Asian Language Processing (IALP), pp. 69–72. IEEE (2010)Google Scholar
  31. 31.
    Malik, M.K., Ali, A., Siddiq, S.: Behavior of Word ‘kaa’ in Urdu language. In: 2010 International Conference on Asian Language Processing (IALP), pp. 23–26. IEEE (2010)Google Scholar
  32. 32.
    Ali, W., Malik, M.K., Hussain, S., Siddiq, S., Ali, A.: Urdu noun phrase chunking: HMM based approach. In: 2010 International Conference on Educational and Information Technology (ICEIT), vol. 2, pp. V2-494. IEEE (2010)Google Scholar
  33. 33.
    Ali, A., Siddiq, S., Malik, M.K.: Development of parallel corpus and english to urdu statistical machine translation. Int. J. Eng. Technol. IJET-IJENS 10, 31–33 (2010)Google Scholar
  34. 34.
    Malik, K., Ahmed, T., Sulger, S., Bögel, T., Gulzar, A., Raza, G., Hussain, S., Butt, M.: Transliterating Urdu for a broad-coverage Urdu/Hindi LFG grammar. In: Seventh International Conference on Language Resources and Evaluation, LREC 2010, pp. 2921–2927 (2010)Google Scholar
  35. 35.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of New South Wales (UNSW)KensingtonAustralia

Personalised recommendations