Word Similarity Based on Domain Graph

Konaka, Fumito; Miura, Takao

doi:10.1007/978-3-319-45547-1_27

Word Similarity Based on Domain Graph

Fumito Konaka¹⁷ &
Takao Miura¹⁷

Conference paper
First Online: 07 September 2016

623 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9893))

Abstract

In this work we propose a new formalization for word similarity. Assuming that each word corresponds to unit of semantics, called synset, with categorical features, called domain, we construct a domain graph of a synset which is all the hypernyms which belong to the domain of the synset. Here we take an advantage of domain graphs to reflect semantic aspect of words. In experiments we show how well the domain graph approach goes well with word similarity. Then we extend sentence similarity (or Semantic Textual Similarity) independent of Bag-of-Words.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://wn-similarity.sourceforge.net/, http://www.nltk.org/.
2.
Sometimes this is called a ring.
3.
There are 45 Lexicographer Files based on syntactic category and logical groupings. They contain synsets during WordNet development. There is another approach WordNet Domains which is a lexical resource created in a semi-automatic way by augmenting WordNet with domain labels. To each synset, there exists at least one semantic domain label annotated by hands from 200 labels [1].
4.
https://code.google.com/archive/p/ws4j/.
5.
It includes many short sentences extracted at more than 500 Twitter sites from April 24, 2013 to May 3, 2013. The corpus contain 17,790 pairs of sentences divided into 13,063 pairs for training and 4,727 pairs for development. And there are 972 pairs included for test. We examine these 13,063 pairs for training and the 972 pairs for test.
6.
http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/.

References

Bentivogli, L., Forner, P., Magnini, B., Pianta, E.: Revising WordNet domains hierarchy: semantics, coverage, and balancing. In: COLING 2004 Workshop on “Multilingual Linguistic Resources”, pp. 101–108 (2004)
Google Scholar
Chang, C.C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Cohen, E., et al.: Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng. 13(1), 64–78 (2001)
Article Google Scholar
Das, D., Smith, N.A.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1. Association for Computational Linguistics, pp. 468–476 (2009)
Google Scholar
Eyecioglu, A., Keller, B.: ASOBEK: twitter paraphrase identification with simple overlap features and SVMs. In: Proceedings of SemEval (2015)
Google Scholar
Finkeltsein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web. ACM, pp. 406–414 (2001)
Google Scholar
Finlayson, M.A.: Java libraries for accessing the Princeton WordNet: comparison and evaluation. In: Proceedings of the 7th Global Wordnet Conference, Tartu, Estonia (2014)
Google Scholar
Guo, W., Diab, M.: Modeling sentences in the latent space. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1. Association for Computational Linguistics, pp. 864–872 (2012)
Google Scholar
Konaka, F., Miura, T.: Textual similarity for word sequences. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 244–249. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25087-8_23
Chapter Google Scholar
Li, Y., et al.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)
Article Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. (CSUR) 41(2), 10 (2009)
Article Google Scholar
Richens, T.: Anomalies in the WordNet verb hierarchy. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1. Association for Computational Linguistics, pp. 729–736 (2008)
Google Scholar
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Article Google Scholar
Xu, W., Callison-Burch, C., Dolan, W.B.: SemEval-2015 task 1: paraphrase and semantic similarity in Twitter (PIT). In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval) (2015)
Google Scholar
Yang, D., Powers, D.M.W.: Verb similarity on the taxonomy of WordNet. Masaryk University (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Advanced Sciences, Hosei University, 3-7-2 KajinoCho, Koganei, Tokyo, 184–8584, Japan
Fumito Konaka & Takao Miura

Authors

Fumito Konaka
View author publications
You can also search for this author in PubMed Google Scholar
Takao Miura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fumito Konaka .

Editor information

Editors and Affiliations

LIAS/ISAE-ENSMA , Futuroscope Chasseneuil, France
Ladjel Bellatreche
Department of Information Systems and Computation, Universitat Politècnica de València, Valencia, Spain
Óscar Pastor
University of Almería , Almería, Spain
Jesús M. Almendros Jiménez
IRIT / ENSEIHT , Toulouse, France
Yamine Aït-Ameur

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Konaka, F., Miura, T. (2016). Word Similarity Based on Domain Graph. In: Bellatreche, L., Pastor, Ó., Almendros Jiménez, J., Aït-Ameur, Y. (eds) Model and Data Engineering. MEDI 2016. Lecture Notes in Computer Science(), vol 9893. Springer, Cham. https://doi.org/10.1007/978-3-319-45547-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-45547-1_27
Published: 07 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45546-4
Online ISBN: 978-3-319-45547-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics