Skip to main content

Using Explicit Semantic Analysis and Word2Vec in Measuring Semantic Relatedness of Russian Paraphrases

  • Conference paper
  • First Online:
Book cover Digital Transformation and Global Society (DTGS 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 859))

Included in the following conference series:

Abstract

In this study we compare two semantic relatedness algorithms, namely, Explicit Semantic Analysis (ESA) and Word2Vec. ESA represents text meaning in a high-dimensional space of concepts derived from Wikipedia. Word2Vec generates distributed vector representations from large text corpora). Experiments were carried out on the Russian paraphrase corpus of news titles and Russian ParaPlag paraphrase corpus. The paper contains thorough analysis of results and evaluation procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://metacpan.org/pod/Text::Similarity.

  2. 2.

    https://en.wikipedia.org/wiki/SemEval.

  3. 3.

    http://wn-similarity.sourceforge.net/.

  4. 4.

    http://community.nzdl.org/ELKB/.

  5. 5.

    https://github.com/dkpro/dkpro-similarity.

  6. 6.

    https://github.com/faraday/wikiprep-esa.

  7. 7.

    http://treo.deri.ie/easyesa/.

  8. 8.

    https://code.google.com/archive/p/research-esa/.

  9. 9.

    https://github.com/ticcky/esalib.

  10. 10.

    https://github.com/pvoosten/explicit-semantic-analysis.

  11. 11.

    https://github.com/fozziethebeat/S-Space/.

  12. 12.

    https://github.com/fozziethebeat/S-Space/blob/master/src/main/java/edu/ucla/sspace/esa/ExplicitSemanticAnalysis.java .

  13. 13.

    https://radimrehurek.com/gensim/models/word2vec.html.

  14. 14.

    https://github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/examples/tutorials/word2vec/word2vec_basic.py.

  15. 15.

    https://github.com/deeplearning4j/deeplearning4j.

  16. 16.

    http://rusvectores.org/ru/.

  17. 17.

    http://pan.webis.de/tasks.html.

  18. 18.

    https://plagevalrus.github.io/.

  19. 19.

    http://ru-eval.ru/plageval/rules.html.

References

  1. Mitrofanova, O.A.: Measuring semantic distances as a problem of applied linguistics. In: Structural and Applied Linguistics (in Russian), vol. 7. St.-Petersburg (2008)

    Google Scholar 

  2. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)

    Google Scholar 

  3. Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: TakeLab: systems for measuring semantic text similarity. In: SemEval 2012 Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1–2, pp. 441–448 (2012)

    Google Scholar 

  4. Bär, D., Biemann, C., Gurevich, I., Zesch, T.: UKP: computing semantic textual similarity by combining multiple content similarity measures. In: SemEval 2012 Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1–2, pp. 435–440 (2012)

    Google Scholar 

  5. Kriukova, A.: Computing semantic similarity of Russian texts by means of DKPro similarity tool (in Russian). In: IMS 2017 Proceedings, St.-Petersburg (2017)

    Google Scholar 

  6. Rohde, D.L.T., Gonnerman, L.M., Plaut, D.C.: An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence (2005). https://github.com/hbrouwer/coals

  7. Landauer, T.K., Foltz, P., Laham, D.: Introduction to latent semantic analysis. Discourse Process. 25 (1998). 10.1080/01638539809545028

    Google Scholar 

  8. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)

    Google Scholar 

  9. Le, Q., Mikolov, T.: Distributed representations of sentences and documents (2014). http://arxiv.org/pdf/1405.4053v2.pdf

  10. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1606–1611 (2007)

    Google Scholar 

  11. Sochenkov, I.V., Zubarev, D.V., Smirnov, I.V.: The ParaPlag: Russian dataset for paraphrased plagiarism detection. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 1, pp. 284–297 (2017)

    Google Scholar 

  12. Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. Commun. Comput. Inf. Sci. 542, 320–332 (2015). https://doi.org/10.1007/978-3-319-26123-2_31

    Article  Google Scholar 

  13. Pronoza, E., Yagunova, E.: Comparison of sentence similarity measures for Russian paraphrase identification. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), pp. 74–82 (2015). 10.1109/AINL-ISMW-FRUCT.2015.7382973

    Google Scholar 

  14. Enikeeva, E., Mitrofanova, O.: Russian collocation extraction based on word embeddings. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 1, pp. 52–64 (2017)

    Google Scholar 

Download references

Acknowledgements

The authors express their gratitude to anonymous reviewers for their careful reading of the paper, for their critical comments and for giving useful suggestions that helped to improve the work.

The research discussed in the paper is supported by the RFBR grant № 16-06-00529 «Development of a linguistic toolkit for semantic analysis of Russian text corpora by statistical techniques».

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Kriukova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kriukova, A., Mitrofanova, O., Sukharev, K., Roschina, N. (2018). Using Explicit Semantic Analysis and Word2Vec in Measuring Semantic Relatedness of Russian Paraphrases. In: Alexandrov, D., Boukhanovsky, A., Chugunov, A., Kabanov, Y., Koltsova, O. (eds) Digital Transformation and Global Society. DTGS 2018. Communications in Computer and Information Science, vol 859. Springer, Cham. https://doi.org/10.1007/978-3-030-02846-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02846-6_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02845-9

  • Online ISBN: 978-3-030-02846-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics