Advertisement

Stemming in Agglutinative Languages: A Probabilistic Stemmer for Turkish

  • B. Taner Dinçer
  • Bahar Karaoğlan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2869)

Abstract

In this paper, we introduce a new lexicon free, probabilistic stemmer to be used in a developing Turkish Information Retrieval system. It has a linear computational complexity and its test success ratio is 95.8%. The main contribution of this paper is to give a thorough description of a probabilistic perspective for stemming which can also be generalized to apply to other agglutinative languages like Finnish, Hungarian, Estonian and Czech.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jurafsky, D., Martin, J.M.: Speech and Language Processing. Prentice-Hall, New Jersey (2000)Google Scholar
  2. 2.
    Hankamer, J.: Turkish generative morphology and morphological parsing. In: Second International Conference on Turkish Linguistics, Istanbul, Turkey (1984)Google Scholar
  3. 3.
    Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. In: Publications of the Department of General Linguistics, vol. 11, University of Helsinki, Helsinki (1983)Google Scholar
  4. 4.
    Oflazer, K.: Two Level Description of Turkish Morphology. In: Proceedings of EACL 1998, Utrecht, The Netherlands (1993)Google Scholar
  5. 5.
    Ekmekçioglu, F., Çuna, L., Michael, F., Willett, P.: Stemming and N-gram matching for term conflation in Turkish texts. Information Research 1(1) (1996), Available at http://informationr.net/ir/2-2/paper13.html
  6. 6.
    Solak, A.: Can. F.: Effects of Stemming on Turkish Text Retrieval. Technical Report BU-CEIS-94-20. Department of Computer Engineering and Information Science, Bilkent University, Ankara (1994)Google Scholar
  7. 7.
    Barton, G.E.: Computational Complexity in Two-Level morphology. In: ACL Proceedings, 24th Annual Meeting (1986) Google Scholar
  8. 8.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 1st edn. Addison-Wesley, England (1999)Google Scholar
  9. 9.
    Lovins, J.B.: Developing of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)Google Scholar
  10. 10.
    Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)Google Scholar
  11. 11.
    Öztaner, S.M.: A Word Grammar of Turkish with Morphophonemic Rules. M. Sc. Thesis. Department of Computer Engineering, METU, Ankara, Turkey (1996)Google Scholar
  12. 12.
    Crystal, D.: The Cambridge Encyclopedia of Language. Cambridge University Press, Cambridge (1987)Google Scholar
  13. 13.
    Lewis, G.L.: Turkish Grammar. Oxford University Press, UK (1991)Google Scholar
  14. 14.
    Duran, G.: Turkish Stemming Algorithm. M. Sc. Thesis. Department of Computer Engineering, Hacettepe University, Ankara (1997)Google Scholar
  15. 15.
    Alpkoçak, A., Kut, A., Özkarahan, E.: Bilgi Bulma Sistemleri için Otomatik Türkçe Dizinleme Yöntemi. In: Bilişim Bildirileri. Dokuz Eylül Üniversitesi, İzmir, Türkiye, pp. 247–253 (1995)Google Scholar
  16. 16.
    Köksal, A.: Bilgi Erişim Sorunu ve Bir Belge Dizinleme ve Erişim Dizgesi Tasarim ve Gerçeklestirimi. Docentlik Tezi. In: Fen Bilimleri Enstitüsü, Bilgisayar Bilimleri Mühendisliği Anabilim Dali, Hacettepe Üniversitesi, Ankara (1979)Google Scholar
  17. 17.
    Hakkani-Tiir, D.Z., Oflazer, K., Tir, G.: Statistical Morphological Disambiguation for Agglutinative Languages. In: COLLING (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • B. Taner Dinçer
    • 1
  • Bahar Karaoğlan
    • 1
  1. 1.Ege Üniversitesi, Uluslararasi Bilgisayar EnstitüsüBornova, IzmirTürkiye

Personalised recommendations