Advertisement

FindStem: Analysis and Evaluation of a Turkish Stemming Algorithm

  • Hayri Sever
  • Yıltan Bitirim
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2857)

Abstract

In this paper, we evaluate the effectiveness of a new stemming algorithm, FINDSTEM, for use with Turkish documents and queries, and compare the use of this algorithm with the other two previously defined Turkish stemmers, namely ”A-F” and ”L-M” algorithms. Of them, the FINDSTEM and A-F algorithms employ inflectional and derivational stemmers, whereas the L-M one handles only inflectional rules. Comparison of stemming algorithms was done manually using 5,000 distinct words out of which the FINDSTEM, A-F, and L-M failed on, in respect, 49, 270, and 559 cases. A medium-size collection, which is comprised of 2,468 law records with 280K document words, 15 queries in natural language with average length of 17 search words, and a complete relevancy information for each query, was used for the effectiveness of the stemming algorithm FINDSTEM. We localized SMART retrieval system in terms of a stopping list, introduction of Turkish characters, i.e., the ISO8859-9 (Latin-5) code set, a stemming algorithm (FINDSTEM), and a Turkish translation at message level. Our results based on average precision values at 11-point recall levels shows that indexing document as well as search terms with the use of FINDSTEM for stemming is clearly and consistently more effective than the one where the terms are indexed as they are (that is, no stemming at all).

Keywords

Relevant Document Average Precision Retrieval Performance Punctuation Mark Word Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Diaz, I., Morato, J., Lloréns, J.: An algorithm for term conflation based on tree structures. Journal of The American Society for Information Science and Technology 53(3), 199–208 (2002)CrossRefGoogle Scholar
  2. 2.
    Krovetz, R.: Viewing morphology as an inference process. In: Proceeding 16th International Conference Research and Development in Information Retrieval, pp. 191–202. ACM, New York (1993)Google Scholar
  3. 3.
    Horman, D.: How effective is suffixing? JASIS 42(1), 7–15 (1991)CrossRefGoogle Scholar
  4. 4.
    Popovic, M., Willett, P.: The effectiveness of stemming for natural language access to Slovene textual data. Journal of the American Society for Information Science 43, 384–390 (1992)CrossRefGoogle Scholar
  5. 5.
    Ercilasun, B., et al.: İmla Klavuzu, vol. 525. Atatürk Kültür ve Tarih Yüksek Kurumu, Türk Dili Kurumu Yayınları, Ankara, Turkey (1996)Google Scholar
  6. 6.
    Banguoğlu, T.: Türkçenin Grameri. Atatürk Kültür ve Tarih Yüksek Kurumu, Türk Dili Kurumu Yayınları: 528, Ankara, Turkey (1995)Google Scholar
  7. 7.
    Pirkola, A.: Morphological typology of languages for IR. Journal of Documentation 57(3), 330–348 (2001)CrossRefGoogle Scholar
  8. 8.
    Kut, A., Alpkoçak, A., Özkarahan, E.: Bilgi bulma sistemleri için otomatik türkçe dizinleme yöntemi. In: Bilişim Bildirileri, Dokuz Eylül University, İzmir, Turkey (1995)Google Scholar
  9. 9.
    Köksal, A., Tümüyle özdevimli deneysel bir belge dizinleme ve erişim dizgesi. TURDER, TBD 3. Ulusal Bilişim Kurultayı, April 6-8, pp. 37–44, Ankara,Turkey (1981)Google Scholar
  10. 10.
    Solak, A., Can, F.: Effects of stemming on Turkish text retrieval. Technical report BUCEIS- 94-20, Bilkent University, Ankara, Turkey (1994)Google Scholar
  11. 11.
    Duran, G., Sever, H.: Türkçe gövdeleme algoritmalarının analizi. In: Ulusal Bilişim Kurultayı Bildiri Kitabı, İstanbul,Turkey, September 1996, pp. 235–242 (1996)Google Scholar
  12. 12.
    Çuna Ekmekçioğlu, F., Willet, P.: Effectiveness of stemming for Turkish text retrieval. Program 34(2), 195–200 (2000)Google Scholar
  13. 13.
    Köksal, A.: Automatic Morphological Analysis of Turkish. PhD thesis, Hacettepe University (1975)Google Scholar
  14. 14.
    Lewis, G.L.: Teach Yourself Turkish, 2nd edn. Sevenoaks (1989)Google Scholar
  15. 15.
    Oflazer, K.: Two-level description of turkish morphology. Literary and Linguistic Computing (1994)Google Scholar
  16. 16.
    Antworth, E.L.: Glossing text with the pc-kimmo morphological parser. Computers and the Humanities (1993)Google Scholar
  17. 17.
    Oflazer, K., Guzey, C.: Spelling correction in agglitunative languages. In: Proceedings of 4th ACL Conference on Applied Natural Language Processing, Stuttgart, Germany, October 1994, pp. 194–195 (1994)Google Scholar
  18. 18.
    Hull, D.: Stemming algorithms:A case study for detailed evaluation. Journal of The American Society for Information Science 47(1), 70–84 (1996)CrossRefGoogle Scholar
  19. 19.
    Sproat, R.: Morphology and Computation. MIT Press, Cambridge (1992)Google Scholar
  20. 20.
    Paice, C.D.: An evaluation method for stemming algorithms. In: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, July 3-6, pp. 42–50 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Hayri Sever
    • 1
  • Yıltan Bitirim
    • 2
  1. 1.Department of Computer EngineeringBaşkent UniversityAnkaraTurkey
  2. 2.Department of Computer EngineeringEastern Mediterranean UniversityTurkey

Personalised recommendations