Mining the Urdu Language-Based Web Content for Opinion Extraction

  • Afraz Z. Syed
  • A. M. Martinez-Enriquez
  • Akhzar Nazir
  • Muhammad AslamEmail author
  • Rida Hijab Basit
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10267)


People prefer to share and express opinions in their own language. Internet is a biggest repository for sharing opinions. Opinion mining uses Natural Language Processing (NLP), text analysis and computational linguistics to identify and extract subjective information in data. Opinion mining for Urdu language is not a well explored area. Therefore, an approach has been proposed which identifies and extracts adji-units and decisions from the given text using lexicon-based approach focusing on Urdu language. Adji-units are the expressions which contain subjective text in a sentence. Our proposed approach uses two-step lexicon to extract opinions from text chunks. Moreover, for Urdu language no such lexicons exist. The main aim is to develop a diverse two-step lexicon and highlight the linguistic as well as technical aspects of this multidimensional research problem. The performance of the proposed system is evaluated on multiple texts and the achieved results are quite satisfactory.


NLP Opinion mining Sentiment analysis Urdu lexicon Adji-units 


  1. 1.
    Syed, A.Z., Aslam, M., Martinez-Enriquez, A.M.: Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu Text. Artif. Intell. Rev. 41(4), 535–561 (2014)CrossRefGoogle Scholar
  2. 2.
    Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of 12th International Conference on World Wide Web, pp. 519–528 (2003)Google Scholar
  3. 3.
    Hussain, S.: Resources for Urdu language processing. In: Proceedings of 6th Workshop on Asian Language Resources IJCNLP, pp. 1–10 (2008)Google Scholar
  4. 4.
    Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. J. 1138–1152 (2011)Google Scholar
  5. 5.
    Han, E.H.S, Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 53–65 (2001)Google Scholar
  6. 6.
    Turney, P.D.: Thumbs up or Thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424 (2002)Google Scholar
  7. 7.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations Trends Inf. Retrieval 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  8. 8.
    Kim, S.M., Hovy, E.: Identifying and Analyzing Judgment Opinions, In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 200–207 (2006)Google Scholar
  9. 9.
    Lu, B., Tan, C., Cardie, C., Tsou, B.K.: Joint bilingual sentiment classification with unlabeled parallel corpora. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics – Human Language Technologies, vol, 1, pp. 320–330 (2011)Google Scholar
  10. 10.
    Humayoun, M., Hammarström, H., Ranta, A.: Urdu morphology, orthography and lexicon extraction. In: Proceedings of 2nd Workshop on Computational Approaches to Arabic Script-based Languages (2007)Google Scholar
  11. 11.
    Ijaz, M., Hussain, S.: Corpus based Urdu lexicon development. In: Proceedings of Conference on Language and Technology (CLT), pp. 1–10 (2007)Google Scholar
  12. 12.
    Syed, A.Z., Muhammad, A.: Lexicon based sentiment analysis of Urdu text using senti-units, In: Proceedings of 10th Mexican International Conference on Advances in Artificial Intelligence, pp. 32–43 (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Afraz Z. Syed
    • 1
  • A. M. Martinez-Enriquez
    • 2
  • Akhzar Nazir
    • 3
  • Muhammad Aslam
    • 3
    Email author
  • Rida Hijab Basit
    • 3
  1. 1.Information Technology Program (ITP)Lambton College of Applied Science and TechnologySarniaCanada
  2. 2.Department of CSCINVESTAV-IPND.F. MexicoMexico
  3. 3.Department of CS and EUniversity of Engineering and TechnologyLahorePakistan

Personalised recommendations