Keywords

1 Introduction

Word Sense Disambiguation (WSD) is one of the most important, complicated, and challenging task in the computational linguistics research area [1]. WSD is a task to find correct senses for the given words that have multiple senses but their appearances are the same (similarly, a polysemous type of word). For example, a noun bat represents not only for a chiropteran Footnote 1 but also rackets for squash, and baseball bat. When this polysemous word is appeared in a sentence, people are able to understand the meaning of the given polysemous word by referring co-occurrenced words but computers are not [8]. In order to compute human languages, many researchers have been studying for long time to discover the best approaches to obtain a good result, but it is still an ongoing problem.

Early approaches were started to make a corpus which was manually tagged senses of polysemous words from the small number of sentences [4]. After expanding and developing this small corpus, dictionaries had been lunched to public. It provided vast amount of definitions for the target language so, people started to apply the dictionaries to WSD task. The most famous dictionary-based approach is the Lest algorithm which was introduced by Michael Lesk in 1986 [2]. However, it has a limitation that WSD results are depended on the dictionaries.

A supervised and unsupervised method had been applied to overcome this limitation in WSD task. The supervised method can be considered as a classification task by using collocation, bag-of-words, n-gramFootnote 2, and context words a feature [5]. This method can apply many kinds of pattern recognition and machine learning approaches such as, Decision list, Naive Bayes classifier, k-Nearest Neighbors (kNN) algorithm, and Support Vector Machines (SVMs). However, it requires corpus which includes tagging information for words, but a problem is that tagging task has to be done manually, so it requires many times and costs. The unsupervised method is based on the assumption that the same sense of a words will occur in similar contexts. Therefore, it can be called as Word Sense Discrimination, in other words, this method is not able to distinguish specific senses for given words from a target sentence [6].

Currently, the most popular and powerful approach to WSD task is based on Knowledge dictionaries [7] such as WordNetFootnote 3. Especially, Structural Semantic Interconnections (SSI) algorithm [3] is the most well-known approach to WSD task. This algorithm creates structural specifications of the possible senses for each target word to disambiguate in a context. And it selects the best hypothesis sense according to a grammar which describing relations between sense specifications. Even though, this SSI algorithm is powerful algorithm based on the strong knowledge-base, there is a limitation to overcome. This paper will be focusing on this limitation provide proposals to overcome this weakness.

In this paper, we propose a new WSD algorithm, named Low Ambiguity First (LAF), which is based on the hypothesises that it is able to calculate ambiguity of words by using WordNet, and the word with low ambiguity degree must be disambiguated first. Moreover, adjacent words are semantically relevant than other words far way. These hypothesises are the proposals to overcome weaknesses of the SSI algorithm. We believe that the LAF algorithm can improve precision performance of WSD.

The reminder of this paper is organized as follows: In Sect. 2, we describe the SSI algorithm in details and point out its weaknesses. Section 3 is the main part of this paper that we present the low ambiguity first algorithm and explain how it works with examples. Also, the word ambiguity measurement will be illustrated. Finally, Sect. 4 concludes this paper with future works.

2 Related Works

Structural Semantic Interconnections (SSI) algorithm is a method to disambiguate polysemous words by creating structural specifications of the candidate senses for each word and select the most appropriate sense by using the structural grammar. The structural grammar is a possible relevant relations between structural specifications precisely, semantic interconnections among graphs. This SSI algorithm can be described as following variables:

  • \(T = [t_1, ..., t_n]\) where, t is list of co-occurring terms to be disambiguated and n is a total number of noun types of word in the given sentence.

  • \(S^t_1, S^t_2, ... , S^t_k\) are structural specifications of the possible concepts for the given t, where k is a total number of the possible concepts.

  • \(I = [S^{t_{1}}, ... , S^{t_{n}}]\) is a list of the disambiguated senses (precisely, semantic interpretation of T), where \(S^{t_{i}}\) is the chosen sense for the given t or the null element that the t is not yet disambiguated.

  • \(P = [t_i|S^{t_{i}} = null]\), where P is a list of pending terms to be disambiguated.

  • \(G = (E, N, S_G, P_G)\), where G is a context-free grammar, E is edge labels to indicate semantic relations between possible senses. N is a path between concepts and \(S_G\) is a start symbol of G. \(P_G\) is set of productions includes about 40 productions.

The SSI algorithm only considers noun types of word as a term to be disambiguated. Therefore, the list of t will be initialized with noun types of words from the given sentence. The WordNet definitions for the given t will be considered as a possible concepts (\(S^t_j\)) for the t. If the target term t is a monosemousFootnote 4 word, I will be updated with \(S^{t_{1}}\). If there are no monosemous terms nor initial synsets, the algorithm will choose the most probable sense based on the frequency of word senses. I will be updated as long as the SSI algorithm can find semantic relations between senses of I and possible senses of t in P by using G.

Let us assume that there is a sentence to be disambiguated as follows. Sentence = Retrospective is an exhibition of a representative selection of an artists life work and art exhibition is an exhibition of art objects (paintings or statues). The initial values of each variable will be updated as following:

  • T = [retrospective, work, object, exhibition, life, statue, artist, selection, representative, painting, art]

  • I = [retrospective#1, -, -, -, -, -, -, -, -, -, -]

  • P = [work, object, exhibition, life, statue, artist, selection, representative, painting, art]

At first, I will be updated with the senses of monosemous words in the list of P as follows:

  • I = [retrospective#1, statue#1, artist#1]

  • P = [work, object, exhibition, life, selection, representative, painting, art]

The I will be enriched until the senses of I and possible senses of t in P have semantic interconnections (such as, kind-of, has-kind, part-of and has-part relations). Therefore, the final statuses of the lists are as follows:

  • I = [retrospective#1, statue#1, artist#1, exhibition#2, object#1, art#1, painting#1, life#12]

  • P = [work, selection, representative]

As we can see in this SSI algorithm, there are two limitations to overcome. First, there are no criteria to measure which word needs to disambiguate earlier than others. P is a pending list to prepare terms to be disambiguated. However, the problem is that all the ambiguity of words is different from each other. Some words are mostly used by the first sense among other senses. Some words are uncertain due to their high ambiguities. Therefore, we need a method to measure the Word Ambiguity (WA) to decide which word has lower ambiguity than other words. So, we can reduce a possibility to make wrong choices in each disambiguation step.

Another limitation is based on the hypothesis that the adjacent words are semantically relevant to each other. In other word, a nearest word might have higher possibility to make strong semantic relations than a distant word. If this hypothesis is correct, we need to alter this SSI algorithm to consider structural locations of words in the given sentence.

In this paper, we demonstrate our hypothesis and prove that the SSI algorithm can be improved by concerning semantic relations among adjacent words. Also, we present the word ambiguity measurement by using WordNet sense frequency and propose Low Ambiguity First (LAF) algorithm in Sect. 3.

3 Low Ambiguity First Algorithm

This section presents the Log Ambiguity First algorithm (LAF), a knowledge-based Word Sense Disambiguation by applying the hypothesis that word ambiguity can be measured by frequency of senses in WordNet and the adjacent word are semantically relevant to each other. The variables of LAF algorithm is the same as the SSI algorithm as described in Sect. 2. However, items of pending list P will be ordered by their word ambiguity values.

3.1 A Measurement for Word Ambiguity

A word Ambiguity (WA) is a criteria to measure complexity of word senses in the pending list P. It is important to decide which words disambiguate earlier than other words due to the fact that the SSI algorithm disambiguates senses of t in P by using the senses in I which already have been founded in the previous step. In other word, in each step, the best senses are chosen according to the current I and P, therefore, the order in which senses are chosen may affect the final result.

Let us assume that we have pending list P as follow: P = [group, Friday, investigation, Atlanta, primary_election, evidence, irregularity]. According to the SSI algorithm, the I will be enriched by the monosemous words from the P. Therefore, we’ve got I = [-, Friday#1, -, -, primary_election#1, -, -] and P = [group, investigation, Atlanta, evidence, irregularity] where,

\(S^{group_{3}}\) = [(2350)group#1, (9)group#2, (3)group#3],

\(S^{investigation_{2}}\) = [(16)investigation#1, (8)investigation#2],

\(S^{Atlanta_{2}}\) = [(7)Atlanta#1, (1)Atlanta#2],

\(S^{evidence_{3}}\) = [(54)evidence#1, (24)evidence#2, (7)evidence#3],

\(S^{irregularity_{4}}\) = [(3)irregularity#1, (2)irregularity#2, (1)irregularity#3, ()irregularity#4]

The word ambiguity can be measured by following Eq. 1 based on the frequencies of senses from each terms defined in WordNet.

$$\begin{aligned} Word Ambiguity(t) = \prod ^{k-1}_{i=1}\prod ^{k}_{j=i+1}\log \frac{frequency(S^t_i)}{frequency(S^t_j)+1} \end{aligned}$$
(1)

where, \(frequency(S^t_i)\) is the frequency of the ith sense of the given term t as illustrated in the previous paragraph. For example, the word ambiguity of term (t = group) can be obtained as follow:

$$\begin{aligned} WordAmbiguity(group) = \log \frac{2350}{10} \times \log \frac{2350}{4} \times \log \frac{9}{4} \approx 2.312 \end{aligned}$$

If the word ambiguity value close to the zero, it means that the given term (t) has high ambiguity (semantic complexity). In contrast, if the value far apart from the zero, it means that the given term has low ambiguity. In other word, the term with low ambiguity will be disambiguated earlier than other words. Therefore, the pending list P will be updated by the order of word ambiguity as shown in the Fig. 1.

Fig. 1.
figure 1

Examples of Word Ambiguity corresponding to terms (t) in the pending list P.

Therefore, the pending list P will be configured as follows: P = [group, Atlanta, investigation, evidence, irregularity]. This is the major difference between the SSI and LAF algorithm that the LAF algorithm will disambiguate low ambiguity word first. Detailed algorithm and explanations will be described in the next section.

3.2 An Adjacent Word May Have Stronger Relations than Other Words

This section describes and demonstrates the hypothesis that an adjacent word of the target term (t) may have stronger semantic relations than other words which far apart from the target term. In order to discover an evidence to prove this hypothesis can be applied in the SSI algorithm, we defined a Morphological Distance (MD) and WordNet Hierarchical Distance (WHD) as shown in Fig. 2. The MD is a morphological distance between target term (t) and its adjacent words. And WHD is the shortest path distance between the target term and its adjacent words based on WordNet hierarchy.

Let us assume that we have a sentence as follows: Sentence = [The Fulton_County_Grand_Jury said Friday an investigation of Atlanta’s recent primary_election produced no evidence that any irregularities took_place] which is the first sentence of the Brown1Footnote 5 of SemCorFootnote 6 data corpus. After preprocessing steps, we can obtain terms to be disambiguated as shown in Fig. 2. Therefore, the MD and WHD between the target term (t = group) will be calculated as follows:

  • MD(Group, Friday) = 1, MD(Group, investigation) = 2, ... ,

    MD(Group, irregularity) = 6

  • WHD(Group, Friday) = 8, WHD(Groupo, investigation) = 9, ... ,

    WHD(Group, irregularity) = 8

Fig. 2.
figure 2

Examples of the morphological distance and WordNet hierarchical distance between terms in pending list.

According to the examples in Fig. 2, the WHD was increased when the MD was getting bigger. However, the WHD was decreased even though the MD is getting bigger after certain point. Therefore, we need to discover this point that the tendency of WHD will be reversed, by comparing the MD and WHD from all sentences in Brown1 corpus. As a result, we obtained the average of WHD corresponding to the MD as shown in the following Table 1.

Table 1. Averages and Standard Deviations of WHD corresponding to MDs between terms in Brown1 corpus.

As we can see in this table, the average value of WHD was increased until MD is equal to three. However, the average value of WHD was decreased when the MD is 4. These results can indicate that if the morphological distance between target term (t) and its adjacent words is less than 4, there will be a tenancy that morphologically close words have a high possibility to make strong semantic relations. The most of the words (more than 80 percent) in Brown1 corpus are located in condition that the MD is one to six. Because it is hard to find a sentence with more than six kinds of nouns in the natural human language.

We hereby have proved that the adjacent words of the target term are likely to have shorter WHD than a word with higher MD under the certain point.

3.3 Low Ambiguity First Algorithm

In the previous sections, we have demonstrated two hypothesises that will be applied in the LAF algorithm, in order to improve the traditional SSI algorithm. Before we start to apply the LAF algorithm, we need to initialize the variables which were required for saving co-occurring terms, structural specifications of the possible concepts, list of the disambiguated senses, and pending list as described in Sect. 2. However, in the LAF algorithm, the pending list P will be updated after measuring the word ambiguity followed by the Eq. 1, and weighting function will be applied to calculate semantic similarity between unknown senses and known senses, changed by the morphological distance.

Fig. 3.
figure 3

Initializing processes for executing the LAF algorithm.

The proposed system will be developed by Python language and Pseudo-code for an implementation of the LAF algorithm described in the followings. This algorithm will take four kinds of list as input data which were the initialized list after preprocessing tasks shown in the Fig. 3. This algorithm will update list I and P until the system cannot find no further matching senses between items of P_C and I, by using the shortest path distance based on WordNet. The weighting function will be determined by the values of alpha, beta, and theta, respectively. These parameters are the constant values for applying the hypothesis that described in the Sect. 3.2.

figure a

In this proposed LAF algorithm, we are able to apply two hypothesises described in the previous Sects. 3.1 and 3.2, in order to overcome the limitations of traditional SSI algorithm. According to a base experimentFootnote 7 to verify reliability of the proposed algorithm, we have founded that our algorithm can improve the SSI algorithm. However, we are not going to describe experimental results in this paper due to the fact that the amount of testing data was not big enough. The major goal of this paper is the proposal of the LAF algorithm and demonstrations of the hypothesises. The experimental results will be introduced in the future works.

4 Conclusions and Future Works

In this paper, we propose a new approach to overcome weaknesses of the traditional SSI algorithm which is the most popular knowledge-based WSD algorithm. People started to apply statistical approaches for disambiguating polysemous words however, the performance of these methods are still required improvements. Even though, knowledge base which is a machine readable knowledge database had been applied to WSD, there is a limitation that the current knowledge-base cannot cover entire senses of human language so far. The more we hold a rich knowledge base, the more we gained high performance for the WSD. However, enriching and maintaining knowledge base require many costs and time. Therefore we proposed a method only by using the limited current resources based on the two hypothesises. We demonstrated these hypothesises in this paper and proposed the Low Ambiguity First algorithm which is able to overcome weaknesses of the SSI algorithm. Experiments are still ongoing however, we believe that the proposed method can improve the performance of WSD.