Abstract
In this study we compare two semantic relatedness algorithms, namely, Explicit Semantic Analysis (ESA) and Word2Vec. ESA represents text meaning in a high-dimensional space of concepts derived from Wikipedia. Word2Vec generates distributed vector representations from large text corpora). Experiments were carried out on the Russian paraphrase corpus of news titles and Russian ParaPlag paraphrase corpus. The paper contains thorough analysis of results and evaluation procedure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
References
Mitrofanova, O.A.: Measuring semantic distances as a problem of applied linguistics. In: Structural and Applied Linguistics (in Russian), vol. 7. St.-Petersburg (2008)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)
Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: TakeLab: systems for measuring semantic text similarity. In: SemEval 2012 Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1–2, pp. 441–448 (2012)
Bär, D., Biemann, C., Gurevich, I., Zesch, T.: UKP: computing semantic textual similarity by combining multiple content similarity measures. In: SemEval 2012 Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1–2, pp. 435–440 (2012)
Kriukova, A.: Computing semantic similarity of Russian texts by means of DKPro similarity tool (in Russian). In: IMS 2017 Proceedings, St.-Petersburg (2017)
Rohde, D.L.T., Gonnerman, L.M., Plaut, D.C.: An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence (2005). https://github.com/hbrouwer/coals
Landauer, T.K., Foltz, P., Laham, D.: Introduction to latent semantic analysis. Discourse Process. 25 (1998). 10.1080/01638539809545028
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents (2014). http://arxiv.org/pdf/1405.4053v2.pdf
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1606–1611 (2007)
Sochenkov, I.V., Zubarev, D.V., Smirnov, I.V.: The ParaPlag: Russian dataset for paraphrased plagiarism detection. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 1, pp. 284–297 (2017)
Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. Commun. Comput. Inf. Sci. 542, 320–332 (2015). https://doi.org/10.1007/978-3-319-26123-2_31
Pronoza, E., Yagunova, E.: Comparison of sentence similarity measures for Russian paraphrase identification. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), pp. 74–82 (2015). 10.1109/AINL-ISMW-FRUCT.2015.7382973
Enikeeva, E., Mitrofanova, O.: Russian collocation extraction based on word embeddings. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 1, pp. 52–64 (2017)
Acknowledgements
The authors express their gratitude to anonymous reviewers for their careful reading of the paper, for their critical comments and for giving useful suggestions that helped to improve the work.
The research discussed in the paper is supported by the RFBR grant № 16-06-00529 «Development of a linguistic toolkit for semantic analysis of Russian text corpora by statistical techniques».
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kriukova, A., Mitrofanova, O., Sukharev, K., Roschina, N. (2018). Using Explicit Semantic Analysis and Word2Vec in Measuring Semantic Relatedness of Russian Paraphrases. In: Alexandrov, D., Boukhanovsky, A., Chugunov, A., Kabanov, Y., Koltsova, O. (eds) Digital Transformation and Global Society. DTGS 2018. Communications in Computer and Information Science, vol 859. Springer, Cham. https://doi.org/10.1007/978-3-030-02846-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-02846-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02845-9
Online ISBN: 978-3-030-02846-6
eBook Packages: Computer ScienceComputer Science (R0)