Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Data Cloud

  • Sebastian Hellmann
  • Jonas Brekle
  • Sören Auer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7774)


We present a declarative approach implemented in a comprehensive open-source framework based on DBpedia to extract lexical-semantic resources – an ontology about language use – from Wiktionary. The data currently includes language, part of speech, senses, definitions, synonyms, translations and taxonomies (hyponyms, hyperonyms, synonyms, antonyms) for each lexical word. Main focus is on flexibility to the loose schema and configurability towards differing language-editions of Wiktionary. This is achieved by a declarative mediator/wrapper approach. The goal is to allow the addition of languages just by configuration without the need of programming, thus enabling the swift and resource-conserving adaption of wrappers by domain experts. The extracted data is as fine granular as the source data in Wiktionary and additionally follows the lemon model. It enables use cases like disambiguation or machine translation. By offering a linked data service, we hope to extend DBpedia’s central role in the LOD infrastructure to the world of Open Linguistics.


Machine Translation Triple Pattern Lexical Resource Triple Store SPARQL Endpoint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, S., Lehmann, J.: Making the web a data washing machine - creating knowledge out of interlinked data. Semantic Web Journal (2010)Google Scholar
  2. 2.
    Chesley, P., Vincent, B., Xu, L., Srihari, R.K.: Using verbs and adjectives to automatically classify blog sentiment. In: AAAI Spring Symposium (2006)Google Scholar
  3. 3.
    Chiarcos, C., Hellmann, S., Nordhoff, S., Moran, S., Littauer, R., Eckle-Kohler, J., Gurevych, I., Hartmann, S., Matuschek, M., Meyer, C.M.: The open linguistics working group. In: LREC (2012)Google Scholar
  4. 4.
    Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C.M., Wirth, C.: Uby - a large-scale unified lexical-semantic resource based on lmf. In: EACL 2012 (2012)Google Scholar
  5. 5.
    Hellmann, S., Lehmann, J., Auer, S.: Linked-data aware URI schemes for referencing text fragments. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 175–184. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    ISO 24613:2008. Language resource management – Lexical markup framework. ISO, Geneva, SwitzerlandGoogle Scholar
  7. 7.
    Kontokostas, D., Bratsas, C., Auer, S., Hellmann, S., Antoniou, I., Metakides, G.: Internationalization of Linked Data: The case of the Greek DBpedia edition. Journal of Web Semantics (2012)Google Scholar
  8. 8.
    Krizhanovsky, A.A.: Transformation of wiktionary entry structure into tables and relations in a relational database schema. CoRR (2010),
  9. 9.
    McCrae, J., Cimiano, P., Montiel-Ponsoda, E.: Integrating WordNet and Wiktionary with lemon. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics. Springer (2012)Google Scholar
  10. 10.
    McCrae, J., Spohr, D., Cimiano, P.: Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Meyer, C.M., Gurevych, I.: How web communities analyze human language: Word senses in wiktionary. In: Second Web Science Conference (2010)Google Scholar
  12. 12.
    Meyer, C.M., Gurevych, I.: Worth its weight in gold or yet another resource — A comparative study of wiktionary, openThesaurus and germaNet. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 38–49. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Meyer, C.M., Gurevych, I.: OntoWiktionary – Constructing an Ontology from the Collaborative Online Dictionary Wiktionary. In: Semi-Automatic Ontology Development: Processes and Resources. IGI Global (2011)Google Scholar
  14. 14.
    Moerth, K., Declerck, T., Lendvai, P., Váradi, T.: Accessing multilingual data on the web for the semantic annotation of cultural heritage texts. In: 2nd Workshop on the MSW, ISWC (2011)Google Scholar
  15. 15.
    Ngonga Ngomo, A.-C., Auer, S.: Limes - a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of IJCAI (2011)Google Scholar
  16. 16.
    Nuzzolese, A.G., Gangemi, A., Presutti, V.: Gathering lexical linked data and knowledge patterns from framenet. In: K-CAP (2011)Google Scholar
  17. 17.
    Sajous, F., Navarro, E., Gaume, B., Prévot, L., Chudy, Y.: Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS (LNAI), vol. 6233, pp. 332–344. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Mörth, K., Budin, G., Declerck, T., Lendvai, P., Váradi, T.: Towards linked language data for digital humanitiesGoogle Scholar
  19. 19.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  20. 20.
    Weale, T., Brew, C., Fosler-Lussier, E.: Using the wiktionary graph structure for synonym detection. In: The People’s Web Meets NLP, ACL-IJCNLP (2009)Google Scholar
  21. 21.
    Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: LREC (2008)Google Scholar
  22. 22.
    Zesch, T., Müller, C., Gurevych, I.: Using wiktionary for computing semantic relatedness. In: AAAI (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sebastian Hellmann
    • 1
  • Jonas Brekle
    • 1
  • Sören Auer
    • 1
  1. 1.Institut für Informatik, AKSWUniversität LeipzigLeipzigGermany

Personalised recommendations