Comparison of Vector Space Representations of Documents for the Task of Information Retrieval of Massive Open Online Courses

Klenin, Julius; Botov, Dmitry; Dmitrin, Yuri

doi:10.1007/978-3-319-71746-3_14

Julius Klenin¹²,
Dmitry Botov¹² &
Yuri Dmitrin¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 789))

Included in the following conference series:

Conference on Artificial Intelligence and Natural Language

1297 Accesses
3 Citations

Abstract

One of the important issues, arising in development of educational courses is maintaining relevance for the intended receivers of the course. In general, it requires developers of such courses to use and borrow some elements presented in similar content developed by others. This form of collaboration allows for the integration of experience and points of view of multiple authors, which tends to result in better, more relevant content. This article addresses the question of searching for relevant massive open online courses (MOOC) using a course programme document as a query. As a novel solution to this task we propose the application of language modelling. Presented results of the experiment, comparing several most popular models of vector space representation of text documents, such as the classical weighting scheme TF-IDF, Latent Semantic Indexing, topic modeling in the form of Latent Dirichlet Allocation, popular modern neural net language models word2vec and paragraph vectors. The experiment is carried out on the corpus of courses in Russian, collected from several popular MOOC-platforms. The effectiveness of the proposed model is evaluated taking into account opinions of university professors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Class Central. By The Numbers: MOOCS in 2016. https://www.class-central.com/report/mooc-stats-2016/
Chernikova, E.: A Novel Process Model-driven Approach to Comparing Educational Courses using Ontology Alignment (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed represenations of sentences and documents. In: Proceedings of ICML 2014, pp. 1188–1196 (2014)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Deerwester, S., et al.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Panchenko, A., et al.: RUSSE: the first workshop on Russian semantic similarity. In: Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference Dialogue, RGGU 2015, Moscow, vol. 2, pp. 89–105 (2015)
Google Scholar
Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC) (2015)
Google Scholar
Ganguly, D.: Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 795–798 (2015)
Google Scholar
Nalisnick, E., et al.: Improving document ranking with dual word embeddings. In: Proceedings of WWW. International World Wide Web Conferences Steering Committee (2016)
Google Scholar
Mitra, B., Craswell, N.: Neural text embeddings for information retrieval. In: Proceedings of WSDM. ACM, pp. 813–814 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Technologies Institute, Chelyabinsk State University, Chelyabinskaya oblast’, Chelyabinsk, 454001, Russian Federation
Julius Klenin, Dmitry Botov & Yuri Dmitrin

Authors

Julius Klenin
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Botov
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Dmitrin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julius Klenin .

Editor information

Editors and Affiliations

ITMO University, St. Petersburg, Russia
Andrey Filchenkov
University of Helsinki, Helsinki, Finland
Lidia Pivovarova
Mendel University , Brno, Czech Republic
Jan Žižka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klenin, J., Botov, D., Dmitrin, Y. (2018). Comparison of Vector Space Representations of Documents for the Task of Information Retrieval of Massive Open Online Courses. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2017. Communications in Computer and Information Science, vol 789. Springer, Cham. https://doi.org/10.1007/978-3-319-71746-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-71746-3_14
Published: 28 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71745-6
Online ISBN: 978-3-319-71746-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics