Skip to main content

Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus

  • Conference paper
Advances in Natural Language Processing (GoTAL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

Abstract

In this paper, we propose a set of language resources for building Turkish language processing applications. Specifically, we present a finite-state implementation of a morphological parser, an averaged perceptron-based morphological disambiguator, and compilation of a web corpus. Turkish is an agglutinative language with a highly productive inflectional and derivational morphology. We present an implementation of a morphological parser based on two-level morphology. This parser is one of the most complete parsers for Turkish and it runs independent of any other external system such as PC-KIMMO in contrast to existing parsers. Due to complex phonology and morphology of Turkish, parsing introduces some ambiguous parses. We developed a morphological disambiguator with accuracy of about 98% using averaged perceptron algorithm. We also present our efforts to build a Turkish web corpus of about 423 million words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Collins, M.: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In: EMNLP (2002)

    Google Scholar 

  2. Dilek, Z.H.T., Oflazer, K., Tür, G.: Statistical Morphological Disambiguation for Agglutinative Languages. Computers and the Humanities 36(4) (2002)

    Google Scholar 

  3. Ezeiza, N., Alegria, I., Arriola, J.M., Urizar, R., Aduriz, I.: Combining Stochastic and Rule-based Methods for Disambiguation in Agglutinative Languages. In: COLING-ACL (1998)

    Google Scholar 

  4. Hajic, J., Hladka, B.: Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset. In: COLING-ACL, pp. 483–490 (1998)

    Google Scholar 

  5. Koskenniemi, K.: A General Computational Model for Word-form Recognition and Production. In: 22nd Annual Meeting on Association for Computational Linguistics, pp. 178–181 (1984)

    Google Scholar 

  6. Lewis, G.: Turkish Grammar. Oxford University Press, Oxford (2001)

    Google Scholar 

  7. Megyesi, B.: Improving Brill’s PoS Tagger for an Agglutinative Language. In: Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)

    Google Scholar 

  8. Mohri, M.: Finite-state Transducers in Language and Speech Processing. Computational Linguistics 23(2), 269–311 (1997)

    MathSciNet  Google Scholar 

  9. Oflazer, K., Tür, G.: Morphological Disambiguation by Voting Constraints. In: ACL, pp. 222–229 (1997)

    Google Scholar 

  10. Sak, H., Güngör, T., Saraçlar, M.: Morphological Disambiguation of Turkish Text with Perceptron Algorithm. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 107–118. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Salor, Ö., Pellom, B.L., Çiloğlu, T., Hacıoğlu, K., Demirekler, M.: On Developing New Text and Audio Corpora and Speech Recognition Tools for the Turkish Language. In: ICSLP (2002)

    Google Scholar 

  12. Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a Corpus and a Treebank for Present-day Written Turkish. In: 11th International Conference of Turkish Linguistics (2002)

    Google Scholar 

  13. Yüret, D., Türe, F.: Learning Morphological Disambiguation Rules for Turkish. In: HLT-NAACL (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sak, H., Güngör, T., Saraçlar, M. (2008). Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85287-2_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85286-5

  • Online ISBN: 978-3-540-85287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics