Skip to main content

A Comparative Study to Determine the Effective Window Size of Turkish Word Sense Disambiguation Systems

  • Conference paper
  • First Online:
Information Sciences and Systems 2013

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 264))

Abstract

In this paper, the effect of different windowing schemes on word sense disambiguation accuracy is presented. Turkish Lexical Sample Dataset has been used in the experiments. We took the samples of ambiguous verbs and nouns of the dataset and used bag-of-word properties as context information. The experi-ments have been repeated for different window sizes based on several machine learning algorithms. We follow 2/3 splitting strategy (2/3 for training, 1/3 for test-ing) and determine the most frequently used words in the training part. After re-moving stop words, we repeated the experiments by using most frequent 100, 75, 50 and 25 content words of the training data. Our findings show that the usage of most frequent 75 words as features improves the accuracy in results for Turkish verbs. Similar results have been obtained for Turkish nouns when we use the most frequent 100 words of the training set. Considering this information, selected al-gorithms have been tested on varying window sizes {30, 15, 10 and 5}. Our find-ings show that Naïve Bayes and Functional Tree methods yielded better accuracy results. And the window size \(\pm \)5 gives the best average results both for noun and the verb groups. It is observed that the best results of the two groups are 65.8 and 56 % points above the most frequent sense baseline of the verb and noun groups respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Adjectives, adverbs and prepositions.

  2. 2.

    An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes.

References

  1. Ide N, Veronis J (1998) Introduction to the special issue on WSD: the state of the art; special issue on word disambiguation. Comput Linguist 24(1):1–40

    Google Scholar 

  2. Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):1–69. doi:10.1145/1459352.1459355

    Article  Google Scholar 

  3. Altintas E, Karsligil E, Coskun V (2005) The effect of windowing in word sense disambiguation computer and information sciences-ISCIS 2005. Springer, pp 626–635.

    Google Scholar 

  4. Banerjee S, Pedersen T (2002) An adapted Lesk algorithm for word sense disambiguation using WordNet computational linguistics and intelligent text processing. Springer,pp 136–145.

    Google Scholar 

  5. Singh S, Siddiqui TJ (2012) Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation. In , Paper presented at the international conference on information retrieval and knowledge management (CAMP)

    Google Scholar 

  6. Yarowsky D (1993) One sense per collocation. Paper presented at the proceedings of the workshop on human language technology, In

    Google Scholar 

  7. Ilgen B, Adali E, Tantug A (2012a) Building up lexical sample dataset for Turkish word sense disambiguation. In , Paper presented at international symposium on the innovations in intelligent systems and applications (INISTA)

    Google Scholar 

  8. Seo H-C, Rim H-C, Kim S-H (2001) KUNLP system in Senseval-3. In: Paper presented at the proceedings of SENSEVAL-2 workshop.

    Google Scholar 

  9. Escudero G, Màrquez L, Rigau G (2004) TALP system for the english lexical sample task. In: Paper presented at the Proceedings of SENSEVAL-3, Barcelona, Spain.

    Google Scholar 

  10. Güncel Türkçe Sözlük (2005) Turkish Language Association.

    Google Scholar 

  11. Ilgen B, Adali E, Tantug A (2012b) The impact of collocational features in Turkish word sense disambiguation. In: Paper presented at the IEEE 16th international conference on intelligent, engineering systems (INES).

    Google Scholar 

  12. Oflazer K (1994) Two-level description of Turkish morphology. Literary Linguist Comput 9(2):137–148

    Article  Google Scholar 

  13. Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. Paper presented at the proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics, In

    Google Scholar 

  14. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bahar İlgen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

İlgen, B., Adalı, E., Tantuğ , A.C. (2013). A Comparative Study to Determine the Effective Window Size of Turkish Word Sense Disambiguation Systems. In: Gelenbe, E., Lent, R. (eds) Information Sciences and Systems 2013. Lecture Notes in Electrical Engineering, vol 264. Springer, Cham. https://doi.org/10.1007/978-3-319-01604-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01604-7_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01603-0

  • Online ISBN: 978-3-319-01604-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics