Low Ambiguity First Algorithm: A New Approach to Knowledge-Based Word Sense Disambiguation

Choi, Dongjin; Hwang, Myunggwon; Ko, Byeongkyu; You, Sicheon; Kim, Pankoo

doi:10.1007/978-3-319-20895-4_52

Dongjin Choi¹⁵,
Myunggwon Hwang¹⁶,
Byeongkyu Ko¹⁵,
Sicheon You¹⁷ &
…
Pankoo Kim¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9191))

Included in the following conference series:

International Conference on HCI in Business

4013 Accesses

Abstract

The Word Sense Disambiguation (WSD) problem has been considered as one of the most important challenging task in Natural Language Processing (NLP) research area. Even though, many of scientists applied the robust machine learning, statistical techniques, and structural pattern matching approach, the performance of WSD is still not able to bit human results due to the complexity of human language. In order to overcome this limitation, currently, the knowledge base such as WordNet has gained high popularity among researchers due to the fact that this knowledge base can extensively provide not only the definitions of nouns and verbs, but also the semantic networks between senses which were defined by linguists. However, knowledge bases are not fully dealing with entire words of human languages because maintaining and expanding the knowledge base is huge task which requires many efforts and time. Expanding knowledge base is not a big issue to concern however, a new approach is the major goal of this paper to solve WSD problem only based on limited knowledge resources. In this paper, we propose a method, named low ambiguity first (LAF) algorithm, which disambiguates a polysemous word with a low ambiguity degree first with given disambiguated words, based on the structural semantic interconnections (SSI) approach. The LAF algorithm is based on the two hypothesises that first, adjacent words are semantically relevant than other words far way. Second, word ambiguity can be measured by frequency differences between synsets of the given word in WordNet. We have proved these hypothesises in the experiment results, the LAF algorithm can improve the performance of traditional WSD results.

You have full access to this open access chapter, Download conference paper PDF

Word Sense Disambiguation Using WordNet Semantic Knowledge

Using Context Information for Knowledge-Based Word Sense Disambiguation

Word sense disambiguation based on dependency constraint knowledge

Article 16 February 2018

Keywords

1 Introduction

Word Sense Disambiguation (WSD) is one of the most important, complicated, and challenging task in the computational linguistics research area [1]. WSD is a task to find correct senses for the given words that have multiple senses but their appearances are the same (similarly, a polysemous type of word). For example, a noun bat represents not only for a chiropteran ^{Footnote 1} but also rackets for squash, and baseball bat. When this polysemous word is appeared in a sentence, people are able to understand the meaning of the given polysemous word by referring co-occurrenced words but computers are not [8]. In order to compute human languages, many researchers have been studying for long time to discover the best approaches to obtain a good result, but it is still an ongoing problem.

Early approaches were started to make a corpus which was manually tagged senses of polysemous words from the small number of sentences [4]. After expanding and developing this small corpus, dictionaries had been lunched to public. It provided vast amount of definitions for the target language so, people started to apply the dictionaries to WSD task. The most famous dictionary-based approach is the Lest algorithm which was introduced by Michael Lesk in 1986 [2]. However, it has a limitation that WSD results are depended on the dictionaries.

A supervised and unsupervised method had been applied to overcome this limitation in WSD task. The supervised method can be considered as a classification task by using collocation, bag-of-words, n-gram^{Footnote 2}, and context words a feature [5]. This method can apply many kinds of pattern recognition and machine learning approaches such as, Decision list, Naive Bayes classifier, k-Nearest Neighbors (kNN) algorithm, and Support Vector Machines (SVMs). However, it requires corpus which includes tagging information for words, but a problem is that tagging task has to be done manually, so it requires many times and costs. The unsupervised method is based on the assumption that the same sense of a words will occur in similar contexts. Therefore, it can be called as Word Sense Discrimination, in other words, this method is not able to distinguish specific senses for given words from a target sentence [6].

Currently, the most popular and powerful approach to WSD task is based on Knowledge dictionaries [7] such as WordNet^{Footnote 3}. Especially, Structural Semantic Interconnections (SSI) algorithm [3] is the most well-known approach to WSD task. This algorithm creates structural specifications of the possible senses for each target word to disambiguate in a context. And it selects the best hypothesis sense according to a grammar which describing relations between sense specifications. Even though, this SSI algorithm is powerful algorithm based on the strong knowledge-base, there is a limitation to overcome. This paper will be focusing on this limitation provide proposals to overcome this weakness.

In this paper, we propose a new WSD algorithm, named Low Ambiguity First (LAF), which is based on the hypothesises that it is able to calculate ambiguity of words by using WordNet, and the word with low ambiguity degree must be disambiguated first. Moreover, adjacent words are semantically relevant than other words far way. These hypothesises are the proposals to overcome weaknesses of the SSI algorithm. We believe that the LAF algorithm can improve precision performance of WSD.

The reminder of this paper is organized as follows: In Sect. 2, we describe the SSI algorithm in details and point out its weaknesses. Section 3 is the main part of this paper that we present the low ambiguity first algorithm and explain how it works with examples. Also, the word ambiguity measurement will be illustrated. Finally, Sect. 4 concludes this paper with future works.

2 Related Works

Structural Semantic Interconnections (SSI) algorithm is a method to disambiguate polysemous words by creating structural specifications of the candidate senses for each word and select the most appropriate sense by using the structural grammar. The structural grammar is a possible relevant relations between structural specifications precisely, semantic interconnections among graphs. This SSI algorithm can be described as following variables:

$T = [t_1, ..., t_n]$ where, t is list of co-occurring terms to be disambiguated and n is a total number of noun types of word in the given sentence.
$S^t_1, S^t_2, ... , S^t_k$ are structural specifications of the possible concepts for the given t, where k is a total number of the possible concepts.
$I = [S^{t_{1}}, ... , S^{t_{n}}]$ is a list of the disambiguated senses (precisely, semantic interpretation of T), where $S^{t_{i}}$ is the chosen sense for the given t or the null element that the t is not yet disambiguated.
$P = [t_i|S^{t_{i}} = null]$, where P is a list of pending terms to be disambiguated.
$G = (E, N, S_G, P_G)$, where G is a context-free grammar, E is edge labels to indicate semantic relations between possible senses. N is a path between concepts and $S_G$ is a start symbol of G. $P_G$ is set of productions includes about 40 productions.

The SSI algorithm only considers noun types of word as a term to be disambiguated. Therefore, the list of t will be initialized with noun types of words from the given sentence. The WordNet definitions for the given t will be considered as a possible concepts ($S^t_j$) for the t. If the target term t is a monosemous^{Footnote 4} word, I will be updated with $S^{t_{1}}$. If there are no monosemous terms nor initial synsets, the algorithm will choose the most probable sense based on the frequency of word senses. I will be updated as long as the SSI algorithm can find semantic relations between senses of I and possible senses of t in P by using G.

Let us assume that there is a sentence to be disambiguated as follows. Sentence = Retrospective is an exhibition of a representative selection of an artists life work and art exhibition is an exhibition of art objects (paintings or statues). The initial values of each variable will be updated as following:

T = [retrospective, work, object, exhibition, life, statue, artist, selection, representative, painting, art]
I = [retrospective#1, -, -, -, -, -, -, -, -, -, -]
P = [work, object, exhibition, life, statue, artist, selection, representative, painting, art]

At first, I will be updated with the senses of monosemous words in the list of P as follows:

I = [retrospective#1, statue#1, artist#1]
P = [work, object, exhibition, life, selection, representative, painting, art]

The I will be enriched until the senses of I and possible senses of t in P have semantic interconnections (such as, kind-of, has-kind, part-of and has-part relations). Therefore, the final statuses of the lists are as follows:

I = [retrospective#1, statue#1, artist#1, exhibition#2, object#1, art#1, painting#1, life#12]
P = [work, selection, representative]

As we can see in this SSI algorithm, there are two limitations to overcome. First, there are no criteria to measure which word needs to disambiguate earlier than others. P is a pending list to prepare terms to be disambiguated. However, the problem is that all the ambiguity of words is different from each other. Some words are mostly used by the first sense among other senses. Some words are uncertain due to their high ambiguities. Therefore, we need a method to measure the Word Ambiguity (WA) to decide which word has lower ambiguity than other words. So, we can reduce a possibility to make wrong choices in each disambiguation step.

Another limitation is based on the hypothesis that the adjacent words are semantically relevant to each other. In other word, a nearest word might have higher possibility to make strong semantic relations than a distant word. If this hypothesis is correct, we need to alter this SSI algorithm to consider structural locations of words in the given sentence.

In this paper, we demonstrate our hypothesis and prove that the SSI algorithm can be improved by concerning semantic relations among adjacent words. Also, we present the word ambiguity measurement by using WordNet sense frequency and propose Low Ambiguity First (LAF) algorithm in Sect. 3.

3 Low Ambiguity First Algorithm

This section presents the Log Ambiguity First algorithm (LAF), a knowledge-based Word Sense Disambiguation by applying the hypothesis that word ambiguity can be measured by frequency of senses in WordNet and the adjacent word are semantically relevant to each other. The variables of LAF algorithm is the same as the SSI algorithm as described in Sect. 2. However, items of pending list P will be ordered by their word ambiguity values.

3.1 A Measurement for Word Ambiguity

A word Ambiguity (WA) is a criteria to measure complexity of word senses in the pending list P. It is important to decide which words disambiguate earlier than other words due to the fact that the SSI algorithm disambiguates senses of t in P by using the senses in I which already have been founded in the previous step. In other word, in each step, the best senses are chosen according to the current I and P, therefore, the order in which senses are chosen may affect the final result.

Let us assume that we have pending list P as follow: P = [group, Friday, investigation, Atlanta, primary_election, evidence, irregularity]. According to the SSI algorithm, the I will be enriched by the monosemous words from the P. Therefore, we’ve got I = [-, Friday#1, -, -, primary_election#1, -, -] and P = [group, investigation, Atlanta, evidence, irregularity] where,

$S^{group_{3}}$ = [(2350)group#1, (9)group#2, (3)group#3],

$S^{investigation_{2}}$ = [(16)investigation#1, (8)investigation#2],

$S^{Atlanta_{2}}$ = [(7)Atlanta#1, (1)Atlanta#2],

$S^{evidence_{3}}$ = [(54)evidence#1, (24)evidence#2, (7)evidence#3],

$S^{irregularity_{4}}$ = [(3)irregularity#1, (2)irregularity#2, (1)irregularity#3, ()irregularity#4]

The word ambiguity can be measured by following Eq. 1 based on the frequencies of senses from each terms defined in WordNet.

$$\begin{aligned} Word Ambiguity(t) = \prod ^{k-1}_{i=1}\prod ^{k}_{j=i+1}\log \frac{frequency(S^t_i)}{frequency(S^t_j)+1} \end{aligned}$$

(1)

where, $frequency(S^t_i)$ is the frequency of the ith sense of the given term t as illustrated in the previous paragraph. For example, the word ambiguity of term (t = group) can be obtained as follow:

$$\begin{aligned} WordAmbiguity(group) = \log \frac{2350}{10} \times \log \frac{2350}{4} \times \log \frac{9}{4} \approx 2.312 \end{aligned}$$

If the word ambiguity value close to the zero, it means that the given term (t) has high ambiguity (semantic complexity). In contrast, if the value far apart from the zero, it means that the given term has low ambiguity. In other word, the term with low ambiguity will be disambiguated earlier than other words. Therefore, the pending list P will be updated by the order of word ambiguity as shown in the Fig. 1.

Therefore, the pending list P will be configured as follows: P = [group, Atlanta, investigation, evidence, irregularity]. This is the major difference between the SSI and LAF algorithm that the LAF algorithm will disambiguate low ambiguity word first. Detailed algorithm and explanations will be described in the next section.

3.2 An Adjacent Word May Have Stronger Relations than Other Words

This section describes and demonstrates the hypothesis that an adjacent word of the target term (t) may have stronger semantic relations than other words which far apart from the target term. In order to discover an evidence to prove this hypothesis can be applied in the SSI algorithm, we defined a Morphological Distance (MD) and WordNet Hierarchical Distance (WHD) as shown in Fig. 2. The MD is a morphological distance between target term (t) and its adjacent words. And WHD is the shortest path distance between the target term and its adjacent words based on WordNet hierarchy.

Let us assume that we have a sentence as follows: Sentence = [The Fulton_County_Grand_Jury said Friday an investigation of Atlanta’s recent primary_election produced no evidence that any irregularities took_place] which is the first sentence of the Brown1^{Footnote 5} of SemCor^{Footnote 6} data corpus. After preprocessing steps, we can obtain terms to be disambiguated as shown in Fig. 2. Therefore, the MD and WHD between the target term (t = group) will be calculated as follows:

MD(Group, Friday) = 1, MD(Group, investigation) = 2, ... ,

MD(Group, irregularity) = 6
WHD(Group, Friday) = 8, WHD(Groupo, investigation) = 9, ... ,

WHD(Group, irregularity) = 8

According to the examples in Fig. 2, the WHD was increased when the MD was getting bigger. However, the WHD was decreased even though the MD is getting bigger after certain point. Therefore, we need to discover this point that the tendency of WHD will be reversed, by comparing the MD and WHD from all sentences in Brown1 corpus. As a result, we obtained the average of WHD corresponding to the MD as shown in the following Table 1.

Table 1. Averages and Standard Deviations of WHD corresponding to MDs between terms in Brown1 corpus.

Full size table

As we can see in this table, the average value of WHD was increased until MD is equal to three. However, the average value of WHD was decreased when the MD is 4. These results can indicate that if the morphological distance between target term (t) and its adjacent words is less than 4, there will be a tenancy that morphologically close words have a high possibility to make strong semantic relations. The most of the words (more than 80 percent) in Brown1 corpus are located in condition that the MD is one to six. Because it is hard to find a sentence with more than six kinds of nouns in the natural human language.

We hereby have proved that the adjacent words of the target term are likely to have shorter WHD than a word with higher MD under the certain point.

3.3 Low Ambiguity First Algorithm

In the previous sections, we have demonstrated two hypothesises that will be applied in the LAF algorithm, in order to improve the traditional SSI algorithm. Before we start to apply the LAF algorithm, we need to initialize the variables which were required for saving co-occurring terms, structural specifications of the possible concepts, list of the disambiguated senses, and pending list as described in Sect. 2. However, in the LAF algorithm, the pending list P will be updated after measuring the word ambiguity followed by the Eq. 1, and weighting function will be applied to calculate semantic similarity between unknown senses and known senses, changed by the morphological distance.

The proposed system will be developed by Python language and Pseudo-code for an implementation of the LAF algorithm described in the followings. This algorithm will take four kinds of list as input data which were the initialized list after preprocessing tasks shown in the Fig. 3. This algorithm will update list I and P until the system cannot find no further matching senses between items of P_C and I, by using the shortest path distance based on WordNet. The weighting function will be determined by the values of alpha, beta, and theta, respectively. These parameters are the constant values for applying the hypothesis that described in the Sect. 3.2.

In this proposed LAF algorithm, we are able to apply two hypothesises described in the previous Sects. 3.1 and 3.2, in order to overcome the limitations of traditional SSI algorithm. According to a base experiment^{Footnote 7} to verify reliability of the proposed algorithm, we have founded that our algorithm can improve the SSI algorithm. However, we are not going to describe experimental results in this paper due to the fact that the amount of testing data was not big enough. The major goal of this paper is the proposal of the LAF algorithm and demonstrations of the hypothesises. The experimental results will be introduced in the future works.

4 Conclusions and Future Works

In this paper, we propose a new approach to overcome weaknesses of the traditional SSI algorithm which is the most popular knowledge-based WSD algorithm. People started to apply statistical approaches for disambiguating polysemous words however, the performance of these methods are still required improvements. Even though, knowledge base which is a machine readable knowledge database had been applied to WSD, there is a limitation that the current knowledge-base cannot cover entire senses of human language so far. The more we hold a rich knowledge base, the more we gained high performance for the WSD. However, enriching and maintaining knowledge base require many costs and time. Therefore we proposed a method only by using the limited current resources based on the two hypothesises. We demonstrated these hypothesises in this paper and proposed the Low Ambiguity First algorithm which is able to overcome weaknesses of the SSI algorithm. Experiments are still ongoing however, we believe that the proposed method can improve the performance of WSD.

Notes

1.
Nocturnal mouselike mammal with forelimbs modified to form membranous wings and anatomical adaptations for echolocation by which they navigate.
2.
is a contiguous sequence of n items from a given sequence of text or speech.
3.
is a lexical database for the human languages provides definitions and relations among synonyms developed by Cognitive Science Laboratory of Princeton University.
4.
having only single meaning or sense.
5.
The Brown University Standard Corpus of Present-Day American English.
6.
A SemCor corpus is a manually sense-tagged corpora created by the WordNet project research team in Princeton University.
7.
We simply run a comparison test by using small amount of sentences.

References

Ide, N., Veronis, J.: Introduction to the special issue on word sense disambiguation. Comput. Linguist. 24(1), 2–40 (1998)
Google Scholar
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone form an ice cream cone. In: SIGDOC 1986: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26. ACM, New York (1986)
Google Scholar
Navigli, R., Velardi, P.: Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Trans. Pattern Anal. Mach. Intell. 27(7), 1075–1086 (2005)
Article Google Scholar
Weiss, S.F.: Learning to disambiguate. Inform. Storage Retrieval 9(1), 33–41 (1973)
Article Google Scholar
Jurafsky, D., Martin, J.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River (2000)
Google Scholar
Pedersen, T.: Unsupervised corpus-based methods for WSD. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation: Text, Speech and Language Technology, vol. 33, pp. 133–166. Springer, New York (2006)
Chapter Google Scholar
Hwang, M., Choi, C., Kim, P.: Automatic enrichment of semantic relation network and its application to word sense disambiguation. IEEE Trans. Knowl. Data Eng. 23(6), 845–858 (2011)
Article Google Scholar
Choi, D. Kim, P.: Identifying the most appropriate expansion of acronyms used in wikipedia text. Softw. Pract. Experience (2014). doi:10.1002/spe.2006

Download references

Acknowledgments

This research was supported by SW Master’s course of hiring contract Program grant funded by the Ministry of Science, ICT and Future Planning (H0116-15-1013) and the Human Resource Training Program for Regional Innovation and Creativity through the Ministry of Education and National Research Foundation of Korea (NRF-2014H1C1A1066494).

Author information

Authors and Affiliations

Department of Computer Engineering, Chosun University, 375 Seoseok-dong, Dong-gu, Gwangju, Republic of Korea
Dongjin Choi, Byeongkyu Ko & Pankoo Kim
Korea Institute of Science and Technology Institute (KISTI), 245 Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea
Myunggwon Hwang
School of Informatics and Product Design, Chosun University, 375 Seoseok-dong, Dong-gu, Gwangju, Republic of Korea
Sicheon You

Authors

Dongjin Choi
View author publications
You can also search for this author in PubMed Google Scholar
Myunggwon Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Byeongkyu Ko
View author publications
You can also search for this author in PubMed Google Scholar
Sicheon You
View author publications
You can also search for this author in PubMed Google Scholar
Pankoo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pankoo Kim .

Editor information

Editors and Affiliations

Missouri University of Science and Technology, Rolla, Montana, USA
Fiona Fui-Hoon Nah
Department of Information Systems, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
Chuan-Hoo Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, D., Hwang, M., Ko, B., You, S., Kim, P. (2015). Low Ambiguity First Algorithm: A New Approach to Knowledge-Based Word Sense Disambiguation. In: Fui-Hoon Nah, F., Tan, CH. (eds) HCI in Business. HCIB 2015. Lecture Notes in Computer Science(), vol 9191. Springer, Cham. https://doi.org/10.1007/978-3-319-20895-4_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-20895-4_52
Published: 21 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20894-7
Online ISBN: 978-3-319-20895-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics