Abstract
In this paper, the effect of different windowing schemes on word sense disambiguation accuracy is presented. Turkish Lexical Sample Dataset has been used in the experiments. We took the samples of ambiguous verbs and nouns of the dataset and used bag-of-word properties as context information. The experi-ments have been repeated for different window sizes based on several machine learning algorithms. We follow 2/3 splitting strategy (2/3 for training, 1/3 for test-ing) and determine the most frequently used words in the training part. After re-moving stop words, we repeated the experiments by using most frequent 100, 75, 50 and 25 content words of the training data. Our findings show that the usage of most frequent 75 words as features improves the accuracy in results for Turkish verbs. Similar results have been obtained for Turkish nouns when we use the most frequent 100 words of the training set. Considering this information, selected al-gorithms have been tested on varying window sizes {30, 15, 10 and 5}. Our find-ings show that Naïve Bayes and Functional Tree methods yielded better accuracy results. And the window size \(\pm \)5 gives the best average results both for noun and the verb groups. It is observed that the best results of the two groups are 65.8 and 56 % points above the most frequent sense baseline of the verb and noun groups respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Adjectives, adverbs and prepositions.
- 2.
An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes.
References
Ide N, Veronis J (1998) Introduction to the special issue on WSD: the state of the art; special issue on word disambiguation. Comput Linguist 24(1):1–40
Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):1–69. doi:10.1145/1459352.1459355
Altintas E, Karsligil E, Coskun V (2005) The effect of windowing in word sense disambiguation computer and information sciences-ISCIS 2005. Springer, pp 626–635.
Banerjee S, Pedersen T (2002) An adapted Lesk algorithm for word sense disambiguation using WordNet computational linguistics and intelligent text processing. Springer,pp 136–145.
Singh S, Siddiqui TJ (2012) Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation. In , Paper presented at the international conference on information retrieval and knowledge management (CAMP)
Yarowsky D (1993) One sense per collocation. Paper presented at the proceedings of the workshop on human language technology, In
Ilgen B, Adali E, Tantug A (2012a) Building up lexical sample dataset for Turkish word sense disambiguation. In , Paper presented at international symposium on the innovations in intelligent systems and applications (INISTA)
Seo H-C, Rim H-C, Kim S-H (2001) KUNLP system in Senseval-3. In: Paper presented at the proceedings of SENSEVAL-2 workshop.
Escudero G, Màrquez L, Rigau G (2004) TALP system for the english lexical sample task. In: Paper presented at the Proceedings of SENSEVAL-3, Barcelona, Spain.
Güncel Türkçe Sözlük (2005) Turkish Language Association.
Ilgen B, Adali E, Tantug A (2012b) The impact of collocational features in Turkish word sense disambiguation. In: Paper presented at the IEEE 16th international conference on intelligent, engineering systems (INES).
Oflazer K (1994) Two-level description of Turkish morphology. Literary Linguist Comput 9(2):137–148
Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. Paper presented at the proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics, In
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
İlgen, B., Adalı, E., Tantuğ , A.C. (2013). A Comparative Study to Determine the Effective Window Size of Turkish Word Sense Disambiguation Systems. In: Gelenbe, E., Lent, R. (eds) Information Sciences and Systems 2013. Lecture Notes in Electrical Engineering, vol 264. Springer, Cham. https://doi.org/10.1007/978-3-319-01604-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-01604-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01603-0
Online ISBN: 978-3-319-01604-7
eBook Packages: Computer ScienceComputer Science (R0)