Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary

Sajous, Franck; Navarro, Emmanuel; Gaume, Bruno; Prévot, Laurent; Chudy, Yannick

doi:10.1007/978-3-642-14770-8_37

Franck Sajous²²,
Emmanuel Navarro²³,
Bruno Gaume²²,
Laurent Prévot²⁴ &
…
Yannick Chudy²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6233))

Included in the following conference series:

International Conference on Natural Language Processing

1177 Accesses
4 Citations

Abstract

The lack of large-scale, freely available and durable lexical resources, and the consequences for NLP, is widely acknowledged but the attempts to cope with usual bottlenecks preventing their development often result in dead-ends. This article introduces a language-independent, semi-automatic and endogenous method for enriching lexical resources, based on collaborative editing and random walks through existing lexical relationships, and shows how this approach enables us to overcome recurrent impediments. It compares the impact of using different data sources and similarity measures on the task of improving synonymy networks. Finally, it defines an architecture for applying the presented method to Wiktionary and explains how it has been implemented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sekine, S.: We desperately need linguistic resources! –based on the users’ point of view. In: FLaReNet Forum 2010, Barcelona, Spain (2010)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Vossen, P. (ed.): EuroWordNet: a Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Norwell (1998)
MATH Google Scholar
Tufis, D.: Balkanet Design and Development of a Multilingual Balkan Wordnet. Romanian Journal of Information Science and Technology 7 (2000)
Google Scholar
Jacquin, C., Desmontils, E., Monceaux, L.: French EuroWordNet Lexical Database Improvements. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 12–22. Springer, Heidelberg (2007)
Chapter Google Scholar
Sagot, B., Fišer, D.: Building a Free French Wordnet from Multilingual Resources. In: Proceedings of OntoLex 2008, Marrakech (2008)
Google Scholar
Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING), Nantes, pp. 539–545 (1992)
Google Scholar
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proceedings of the International Conference on Computational Linguistics, Sydney, pp. 113–120. ACL Press (2006)
Google Scholar
Voormann, H., Gut, U.: Agile Corpus Creation. Corpus Linguistics and Lingustic Theory 4, 235–251 (2008)
Google Scholar
Brunello, M.: The Creation of Free Linguistic Corpora from the Web. In: Proceedings of WAC5: 5th Workshop on Web As Corpus, San Sebastian, pp. 37–44 (2009)
Google Scholar
Giles, J.: Internet Encyclopaedias Go Head to Head. Nature 438, 900–901 (2005)
Article Google Scholar
Encyclopaedia Britannica: Fatally Flawed: Refuting the Recent Study on Encyclopedic Accuracy by the Journal Nature (2006)
Google Scholar
Zesch, T., Gurevych, I.: Wisdom of Crowds versus Wisdom of Linguists – Measuring the Semantic Relatedness of Words. Journal of Natural Language Engineering 16, 25–59 (2010)
Article Google Scholar
Lafourcade, M.: Making People Play for Lexical Acquisition with the JeuxDeMots prototype. In: SNLP 2007: 7th International Symposium on Natural Language Processing, Pattaya, Thailand (2007)
Google Scholar
Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), Marrakech (2008)
Google Scholar
Navarro, E., Sajous, F., Gaume, B., Prévot, L., Hsieh, S., Kuo, I., Magistry, P., Huang, C.R.: Wiktionary and NLP: Improving Synonymy Networks. In: Proceedings of the ACL-IJCNLP Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, Singapore, pp. 19–27 (2009)
Google Scholar
Meyer, C.M., Gurevych, I.: Worth its Weight in Gold or Yet Another Resource – A Comparative Study of Wiktionary, OpenThesaurus and GermaNet. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 38–49. Springer, Heidelberg (2010)
Google Scholar
Gaume, B., Venant, F., Victorri, B.: Hierarchy in Lexical Organization of Natural Language. In: Pumain, D. (ed.) Hierarchy in Natural and Social Sciences. Methodos series, pp. 121–143. Kluwer Academic Publishers, Dordrecht (2005)
Google Scholar
Zesch, T.: What’s the Difference? Comparing Expert-Built and Collaboratively-Built Lexical Semantic Resources. In: FLaReNet Forum 2010, Barcelona, Spain (2010)
Google Scholar
Forte, A., Bruckman, A.: Scaling Consensus: Increasing Decentralization in Wikipedia Governance. In: Proceedings of the 41st Hawaii International Conference on System Sciences, Washington DC, p. 157. IEEE Computer Society, Los Alamitos (2008)
Chapter Google Scholar
Gaume, B., Mathieu, F.: PageRank Induced Topology for Real-World Networks. Complex Systems (2008)
Google Scholar
Hughes, T., Ramage, D.: Lexical Semantic Relatedness with Random Graph Walks. In: Proceedings of EMNLP-CoNLL, pp. 581–589 (2007)
Google Scholar
Weale, T., Brew, C., Fosler-Lussier, E.: Using the Wiktionary Graph Structure for Synonym Detection. In: Proceedings of the ACL-IJCNLP Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, Singapore, pp. 28–31 (2009)
Google Scholar
Huang, C.R., Chen, C.L., Weng, C.X., Lee, H.P., Chen, Y.X., Chen, K.J.: The Sinica Sense Management System: Design and Implementation. Computational Linguistics and Chinese Language Processing 10, 417–430 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

CLLE-ERSS, CNRS & Université de Toulouse,
Franck Sajous, Bruno Gaume & Yannick Chudy
IRIT, CNRS & Université de Toulouse,
Emmanuel Navarro
LPL, CNRS & Université de Provence,
Laurent Prévot

Authors

Franck Sajous
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Gaume
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Prévot
View author publications
You can also search for this author in PubMed Google Scholar
Yannick Chudy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Reykjavik University, Kringlan 1, 103, Reykjavik, Iceland
Hrafn Loftsson
Department of Icelandic, University of Iceland, Árnagardur v/Sudurgötu, 101, Reykjavik, Iceland
Eiríkur Rögnvaldsson
Arni Magnusson Institute for Icelandic Studies, Neshagi 16, 101, Reykjavik, Iceland
Sigrún Helgadóttir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sajous, F., Navarro, E., Gaume, B., Prévot, L., Chudy, Y. (2010). Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-14770-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14769-2
Online ISBN: 978-3-642-14770-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics