Cleaning Up After a Party: Post-processing Thesaurus Crowdsourced Data

  • Oksana AntropovaEmail author
  • Elena Arslanova
  • Maxim Shaposhnikov
  • Pavel Braslavski
  • Mikhail Mukhin
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 930)


The study deals with post-processing of a noisy collection of synsets created using crowdsourcing. First, we cluster long synsets in three different ways. Second, we apply four cluster cleaning techniques based either on word popularity or word embeddings. Evaluation shows that the method based on word embeddings and existing dictionary definitions delivers best results.


Crowdsourcing Thesaurus Semantic resources 



PB was supported by RFH grant #16-04-12019, OA was supported by RFBR according to the research project No. 18-312-00129.


  1. 1.
    Biemann, C.: Creating a system for lexical substitutions from scratch using crowdsourcing. Lang. Resour. Eval. 47(1), 97–122 (2013)CrossRefGoogle Scholar
  2. 2.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
  3. 3.
    Braslavski, P., Ustalov, D., Mukhin, M., Kiselev, Y.: YARN: spinning-in-progress. In: GWC, pp. 58–65 (2016)Google Scholar
  4. 4.
    Braslavski, P., Ustalov, D., Mukhin, M.: A spinning wheel for YARN: user interface for a crowdsourced thesaurus. In: EACL (demo), pp. 101–104 (2014)Google Scholar
  5. 5.
    Fellbaum, C.: Wordnet: An Electronic Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  6. 6.
    Gurevych, I., Kim, J. (eds.): The People’s Web Meets NLP. Springer, Heidelberg (2013). Scholar
  7. 7.
    Kiselev, Y., Ustalov, D., Porshnev, S.: Eliminating fuzzy duplicates in crowdsourced lexical resources. In: GWC, pp. 161–167 (2016)Google Scholar
  8. 8.
    Kiselev, Y., et al.: Russian lexicographic landscape: a tale of 12 dictionaries. In: Dialogue, pp. 254–271 (2015)Google Scholar
  9. 9.
    Kutuzov, A., Kuzmenko, E.: Webvectors: a toolkit for building web interfaces for vector semantic models. In: AIST, pp. 155–161 (2017)Google Scholar
  10. 10.
    Ustalov, D., Panchenko, A., Biemann, C.: Watset: automatic induction of synsets from a graph of synonyms. In: ACL, pp. 1579–1590 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Oksana Antropova
    • 1
    Email author
  • Elena Arslanova
    • 1
  • Maxim Shaposhnikov
    • 1
  • Pavel Braslavski
    • 1
    • 2
    • 3
  • Mikhail Mukhin
    • 1
  1. 1.Ural Federal UniversityYekaterinburgRussia
  2. 2.JetBrains ResearchSaint PetersburgRussia
  3. 3.National Research University Higher School of EconomicsSaint PetersburgRussia

Personalised recommendations