RuThes Cloud: Towards a Multilevel Linguistic Linked Open Data Resource for Russian

  • Alexander KirillovichEmail author
  • Olga Nevzorova
  • Emil Gimadiev
  • Natalia Loukachevitch
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 786)


In this paper we present a new multi-level Linguistic Linked Open Data resource for Russian. It covers four linguistic levels: semantic, lexical, morphological and syntactic. The resource has been constructed on base of the well-known RuThes thesaurus and the original hitherto unpublished Extended Zaliznyak grammatical dictionary. The resource is represented in terms of SKOS, Lemon, and LexInfo ontologies and a new custom ontology. Building the resource, we automatically completed the following tasks: merging source resources upon common lexical entries, decomposing complex lexical entries, and publishing constructed resource as LLOD-compatible dataset. We demonstrate the use case in which the developed resource is exploited in IR task. We hope that our work can serve as a crystallization point of the LLOD cloud in Russian.


Linguistic Linked Open Data Linked data Language resources Ontology Thesaurus Lexicon Grammatical dictionary Dependency grammar RuThes Russian language 



The main part of the reported work was funded by Russian Science Foundation according to the research project no. 16-18-02074. Developing the semantic publishing technological platform was funded by the subsidy allocated to Kazan Federal University for the state assignment in the sphere of scientific activities, grant agreement no. 1.2368.2017.


  1. 1.
    Chiarcos, C., McCrae, J., Cimiano, P., Fellbaum, C.: Towards open data for linguistics: Linguistic Linked Data. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources. Theory and Applications of Natural Language Processing, pp. 7–25. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-31782-8_2 CrossRefGoogle Scholar
  2. 2.
    McCrae, J.P., et al.: The open linguistics working group: developing the Linguistic Linked Open Data cloud. In: Calzolari, N., et al. (eds.) Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 2435–2441 (2016)Google Scholar
  3. 3.
    van Assem, M., Gangemi, A., Schreiber, G.: Conversion of WordNet to a standard RDF/OWL representation. In: Calzolari, N., et al. (eds.) Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), pp. 237–242 (2006)Google Scholar
  4. 4.
    Eckle-Kohler, J., McCrae, J.P., Chiarcos, C.: lemonUby - a large, interlinked, syntactically-rich lexical resource for ontologies. Semant. Web 6(4), 371–378 (2015). doi: 10.3233/SW-140159 CrossRefGoogle Scholar
  5. 5.
    McCrae, J.P., Fellbaum, C., Cimiano, P.: Publishing and linking WordNet using Lemon and RDF. In: Chiarcos, C., et al. (eds.) Proceedings of the 3rd Workshop on Linked Data in Linguistics (LDL-2014) (2014)Google Scholar
  6. 6.
    Sérasset, G.: DBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF. Semant. Web 6(4), 355–361 (2015). doi: 10.3233/SW-140147 CrossRefGoogle Scholar
  7. 7.
    Paredes, L.P., Álvarez Rodríguez, J.M., Azcona, E.R.: Promoting government controlled vocabularies for the Semantic Web: the EUROVOC thesaurus and the CPV product classification system. In: Kollias, S., Cousins, J. (eds.) Proceedings of the 1st International Workshop on Semantic Interoperability in the European Digital Library (SIEDL 2008), pp. 111–122 (2008)Google Scholar
  8. 8.
    Caracciolo, C., Stellato, A.: Thesaurus maintenance, alignment and publication as Linked Data: the AGROVOC use case. Int. J. Metadata Semant. Ontol. 7(1), 65–75 (2012). doi: 10.1504/IJMSO.2012.048511 CrossRefGoogle Scholar
  9. 9.
    Caracciolo, C., Stellato, A., Morshed, A., Johannsen, G., Rajbhandari, S., Jaques, Y., Keizer, J.: The AGROVOC linked dataset. Semant. Web 4(3), 341–348 (2013). doi: 10.3233/SW-130106 Google Scholar
  10. 10.
    Zapilko, B., Schaible, J., Mayr, P., Mathiak, B.: TheSoz: a SKOS representation of the thesaurus for the social sciences. Semant. Web 4(3), 257–263 (2013). doi: 10.3233/SW-2012-0081 Google Scholar
  11. 11.
    Summers, E., Isaac, A., Redding, C., Krech, D.: LCSH, SKOS and Linked Data. In: Greenberg, J., Klas, W. (eds.) Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications (DC 2008), pp. 25–33 (2008)Google Scholar
  12. 12.
    Ustalov, D.: Russian thesauri as Linked Open Data. In: Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, vol. 1, pp. 616–625. RGGU (2015)Google Scholar
  13. 13.
    Nevzorova, O., Zhiltsov, N., Kirillovich, A., Lipachev, E.: OntoMathPro ontology: a Linked Data hub for mathematics. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2014. CCIS, vol 468, pp. 105–119. Springer, Cham (2014). doi: 10.1007/978-3-319-11716-4_9 Google Scholar
  14. 14.
    Elizarov, A.M., Kirillovich, A.V., Lipachev, E.K., Nevzorova, O.A., Solovyev, V.D., Zhiltsov, N.G.: Mathematical knowledge representation: semantic models and formalisms. Lobachevskii J. Math. 35(4), 348–354 (2014). doi: 10.1134/S1995080214040143 MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012). doi: 10.1016/j.artint.2012.07.001 MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Ehrmann, M., Cecconi, F., Vannella, D., McCrae, J., Cimiano, P., Navigli, R.: Representing multilingual data as Linked Data: the case of BabelNet 2.0. In: Calzolari, N., et al. (eds.) Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 401–408 (2014)Google Scholar
  17. 17.
    Baker, T., et al.: Key choices in the design of Simple Knowledge Organization System (SKOS). J. Web Semant. 20, 35–49 (2013). doi: 10.1016/j.websem.2013.05.001 CrossRefGoogle Scholar
  18. 18.
    McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the Semantic Web with Lemon. In: Antoniou, G., et al. (eds.) ESWC 2011. Part I, LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21034-1_17 Google Scholar
  19. 19.
    McCrae, J., et al.: The Lemon cookbook.
  20. 20.
    Cimiano, P., McCrae, J.P., Buitelaar, P.: Lexicon model for ontologies. Final community group report, 10 May 2016.
  21. 21.
    ISO 24613:2008: Language resource management - Lexical markup framework (LMF)Google Scholar
  22. 22.
    Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., Wright, S.E.: ISOcat: remodelling metadata for language resources. Int. J. Metadata Semant. Ontol. 4(4), 261–276 (2009). doi: 10.1504/IJMSO.2009.029230 CrossRefGoogle Scholar
  23. 23.
    Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., Wright, S.E.: ISOcat: corralling data categories in the wild. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), pp. 887–891 (2008)Google Scholar
  24. 24.
    Windhouwer, M., Wright, S.E.: Linking to linguistic data categories in ISOcat. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 99–107. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-28249-2_10 CrossRefGoogle Scholar
  25. 25.
    ISO 12620:2009: Terminology and other language and content resources—Specification of data categories and management of a Data Category Registry for language resourcesGoogle Scholar
  26. 26.
  27. 27.
    Chiarcos, C.: OLiA – Ontologies of Linguistic Annotation. Semant. Web 6(4), 379–386 (2015). doi: 10.3233/SW-140167 CrossRefGoogle Scholar
  28. 28.
    Chiarcos, C.: Ontologies of linguistic annotation: survey and perspectives. In: Calzolari, N., et al. (eds.) Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 303–310 (2012)Google Scholar
  29. 29.
    Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using Linked Data. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol 8219, pp. 98–113. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41338-4_7
  30. 30.
    Sanderson, R., Ciccarese, P., Young, B.: Web annotation data model. W3C Recommendation, 23 February 2017.
  31. 31.
    Nevzorova, O., Nevzorov, V.: The Development Support System “OntoIntegrator” for Linguistic Applications. Information Science and Computing, vol. 13, Intelligent Information and Engineering Systems, vol. 3, pp. 78–84. ITHEA, Rzeszow-Sofia (2009)Google Scholar
  32. 32.
    Loukachevitch, N., Dobrov, B., Chetviorkin, I.: RuThes-Lite, a publicly available version of thesaurus of Russian language RuThes. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, pp. 340–349. RGGU (2014)Google Scholar
  33. 33.
    Loukachevitch, N., Dobrov, B.: Development of ontologies with minimal set of conceptual relations. In: Lino, M.T., et al. (eds.) Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pp. 1889–1892 (2004)Google Scholar
  34. 34.
    Gil, Y., Miles, S.: PROV Model Primer. W3C Working Group Note, 30 April 2013.
  35. 35.
    Guarino, N., Welty, C.A.: A Formal ontology of properties. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS, vol. 1937, pp. 97–112. Springer, Heidelberg (2000). doi: 10.1007/3-540-39967-4_8

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Alexander Kirillovich
    • 1
    Email author
  • Olga Nevzorova
    • 2
  • Emil Gimadiev
    • 1
  • Natalia Loukachevitch
    • 3
  1. 1.Kazan Federal UniversityKazanRussia
  2. 2.Institute of Applied SemioticsTatarstan Academy of SciencesKazanRussia
  3. 3.Lomonosov Moscow State UniversityMoscowRussia

Personalised recommendations