Stemming in Agglutinative Languages: A Probabilistic Stemmer for Turkish

Dinçer, B. Taner; Karaoğlan, Bahar

doi:10.1007/978-3-540-39737-3_31

B. Taner Dinçer⁶ &
Bahar Karaoğlan⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2869))

Included in the following conference series:

International Symposium on Computer and Information Sciences

678 Accesses
1 Citations

Abstract

In this paper, we introduce a new lexicon free, probabilistic stemmer to be used in a developing Turkish Information Retrieval system. It has a linear computational complexity and its test success ratio is 95.8%. The main contribution of this paper is to give a thorough description of a probabilistic perspective for stemming which can also be generalized to apply to other agglutinative languages like Finnish, Hungarian, Estonian and Czech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jurafsky, D., Martin, J.M.: Speech and Language Processing. Prentice-Hall, New Jersey (2000)
Google Scholar
Hankamer, J.: Turkish generative morphology and morphological parsing. In: Second International Conference on Turkish Linguistics, Istanbul, Turkey (1984)
Google Scholar
Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. In: Publications of the Department of General Linguistics, vol. 11, University of Helsinki, Helsinki (1983)
Google Scholar
Oflazer, K.: Two Level Description of Turkish Morphology. In: Proceedings of EACL 1998, Utrecht, The Netherlands (1993)
Google Scholar
Ekmekçioglu, F., Çuna, L., Michael, F., Willett, P.: Stemming and N-gram matching for term conflation in Turkish texts. Information Research 1(1) (1996), Available at http://informationr.net/ir/2-2/paper13.html
Solak, A.: Can. F.: Effects of Stemming on Turkish Text Retrieval. Technical Report BU-CEIS-94-20. Department of Computer Engineering and Information Science, Bilkent University, Ankara (1994)
Google Scholar
Barton, G.E.: Computational Complexity in Two-Level morphology. In: ACL Proceedings, 24th Annual Meeting (1986)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 1st edn. Addison-Wesley, England (1999)
Google Scholar
Lovins, J.B.: Developing of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)
Google Scholar
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Google Scholar
Öztaner, S.M.: A Word Grammar of Turkish with Morphophonemic Rules. M. Sc. Thesis. Department of Computer Engineering, METU, Ankara, Turkey (1996)
Google Scholar
Crystal, D.: The Cambridge Encyclopedia of Language. Cambridge University Press, Cambridge (1987)
Google Scholar
Lewis, G.L.: Turkish Grammar. Oxford University Press, UK (1991)
Google Scholar
Duran, G.: Turkish Stemming Algorithm. M. Sc. Thesis. Department of Computer Engineering, Hacettepe University, Ankara (1997)
Google Scholar
Alpkoçak, A., Kut, A., Özkarahan, E.: Bilgi Bulma Sistemleri için Otomatik Türkçe Dizinleme Yöntemi. In: Bilişim Bildirileri. Dokuz Eylül Üniversitesi, İzmir, Türkiye, pp. 247–253 (1995)
Google Scholar
Köksal, A.: Bilgi Erişim Sorunu ve Bir Belge Dizinleme ve Erişim Dizgesi Tasarim ve Gerçeklestirimi. Docentlik Tezi. In: Fen Bilimleri Enstitüsü, Bilgisayar Bilimleri Mühendisliği Anabilim Dali, Hacettepe Üniversitesi, Ankara (1979)
Google Scholar
Hakkani-Tiir, D.Z., Oflazer, K., Tir, G.: Statistical Morphological Disambiguation for Agglutinative Languages. In: COLLING (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Ege Üniversitesi, Uluslararasi Bilgisayar Enstitüsü, 35100, Bornova, Izmir, Türkiye
B. Taner Dinçer & Bahar Karaoğlan

Authors

B. Taner Dinçer
View author publications
You can also search for this author in PubMed Google Scholar
Bahar Karaoğlan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Engineering, Middle East Technical University, Ankara, Turkey
Adnan Yazıcı
Department of Computer Engineering, Middle East Technical University, 06531, Ankara, Turkey
Cevat Şener

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dinçer, B.T., Karaoğlan, B. (2003). Stemming in Agglutinative Languages: A Probabilistic Stemmer for Turkish. In: Yazıcı, A., Şener, C. (eds) Computer and Information Sciences - ISCIS 2003. ISCIS 2003. Lecture Notes in Computer Science, vol 2869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39737-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-39737-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20409-1
Online ISBN: 978-3-540-39737-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics