Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging

  • Reza Karimpour
  • Amineh Ghorbani
  • Azadeh Pishdad
  • Mitra Mohtarami
  • Abolfazl AleAhmad
  • Hadi Amiri
  • Farhad Oroumchian
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5706)


With the emergence of vast resources of information, it is necessary to develop methods that retrieve the most relevant information according to needs. These retrieval methods may benefit from natural language constructs to boost their results by achieving higher precision and recall rates. In this study, we have used part of speech properties of terms as extra source of information about document and query terms and have evaluated the impact of such data on the performance of the Persian retrieval algorithms. Furthermore the effect of stemming has been experimented as a complement to this research. Our findings indicate that part of speech tags may have small influence on effectiveness of the retrieved results. However, when this information is combined with stemming it improves the accuracy of the outcomes considerably.


Natural language Persian information retrieval Part of speech 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. IEEE Transactions on Information Theory 41(6) (1995)Google Scholar
  2. 2.
    Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proc. 19th ACM SIGIR, pp. 21–29. ACM, New York (1996)Google Scholar
  3. 3.
    Strohman, T., Metzler, D., Turtle, H., Croft, W.: Indri: A Language-Model Based Search Engine for Complex Queries. Technical Report IR-407, CIIR, UMass Amherst (2005)Google Scholar
  4. 4.
    Liddy, E.D.: Automatic Document Retrieval. Encyclopedia of Language and Linguistics. Elsevier Press, Amsterdam (2005)Google Scholar
  5. 5.
    Lewis, D., Jones, K.: Natural Language Processing for Information Retrieval. Communications of the ACM 39(1), 92–101 (1996)CrossRefGoogle Scholar
  6. 6.
    Amiri, H., AleAhmad, A., Oroumchian, F., Lucas, C., Rahgozar, M.: Using OWA Fuzzy Operator to Merge Retrieval System Results. In: Computational Approaches to Arabic Script-based Languages (2007)Google Scholar
  7. 7.
    Amiri, H., Hojjat, H., Oroumchian, F.: Investigation on a Feasible Corpus for Persian POS Tagging. In: Proc. 12th International CSI Computer Conference, CSICC (2007)Google Scholar
  8. 8.
    Raja, F., Amiri, H., Tasharofi, S., Sarmadi, M., Hojjat, H., Oroumchian, F.: Evaluation of Part of Speech Tagging on Persian Text. In: The Second Workshop on Computational Approaches to Arabic Script-Based Languages, Stanford University, U.S.A (2007)Google Scholar
  9. 9.
    Mohtarami, M., Amiri, H., Oroumchian, F.: Using Heuristic Rules to Improve Persian Part of speech Tagging Accuracy. In: Proc. 6th International Conference on Informatics and Systems, INFOS 2006 (2006)Google Scholar
  10. 10.
    Oroumchian, F., Tasharofi, S., Amiri, H., Hojjat, H., Raja, F.: Creating a Feasible Corpus for Persian POS Tagging. Technical Report, No. TR3/06, University of Wollongong, Dubai Campus (2006)Google Scholar
  11. 11.
    Shah, C., Bombay, I.I.T., Mumbai, P., Maharashtra, I., Bhattacharyya, P.: A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR). In: Proc. International Conference on Universal Knowledge and Languages, ICUKL (2002)Google Scholar
  12. 12.
    Carlberger, J., Kann, V.: Implementing an Efficient Part-Of-Speech Tagger. Software Practice and Experience 29(9), 815–832 (1999)CrossRefGoogle Scholar
  13. 13.
    BijanKhan, M.: The Role of the Corpus in Writing a Grammar: An Introduction to a Software. Iranian Journal of Linguistics 19(2) (2004)Google Scholar
  14. 14.
    Turney, P., Littman, M.: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus. National Research Council of Canada (2002)Google Scholar
  15. 15.
    Paik, W., Liddy, E., Yu, E., McKenna, M.: Interpretation of Proper Nouns for Information Retrieval. In: Proc. Workshop on Human Language Technology, pp. 309–313. Association for Computational Linguistics Morristown, NJ (1993)CrossRefGoogle Scholar
  16. 16.
    Klavans, J.L., Kan, M.Y.: The Role of Verbs in Document Analysis. In: Proc. Coling-ACL, vol. 36, pp. 680–686. Association for Computational Linguistics (1998)Google Scholar
  17. 17.
    Brants, T.: TnT–a Statistical Part-of-Speech Tagger. In: Proc. 6th Conference on Applied Natural Language Processing (ANLP 2000), Seattle, WA, pp. 224–231 (2000)Google Scholar
  18. 18.
    Agirre, E., Nunzio, G.M.D., Ferro, N., Mandl, T., Peters, C.: CLEF 2008: Ad Hoc Track Overview. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 15–37. Springer, Heidelberg (2009)Google Scholar
  19. 19.
    Aleahmad, A., Hakimian, P., Mahdikhani, F., Oroumchian, F.: N-gram and Local Context Analysis for Persian Text Retrieval. In: Proc. IEEE International Symposium on Signal Processing and its Applications, Sharjah, UAE, pp. 1–4 (2007)Google Scholar
  20. 20.
    Dehdari, J., Lonsdale, D.: A Link Grammar Parser for Persian. Aspects of Iranian Linguistics, vol. 1. Cambridge Scholars Press, Cambridge (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Reza Karimpour
    • 1
  • Amineh Ghorbani
    • 1
  • Azadeh Pishdad
    • 1
  • Mitra Mohtarami
    • 1
  • Abolfazl AleAhmad
    • 1
  • Hadi Amiri
    • 1
  • Farhad Oroumchian
    • 2
  1. 1.Electerical and Computer Engineering FacultyUniversity of TehranIran
  2. 2.University of Wollongong in DubaiUnited Arab Emirates

Personalised recommendations