Analyzing Scientific Corpora Using Word Embedding

Segarra-Faggioni, Veronica; Romero-Pelaez, Audrey

doi:10.1007/978-3-030-11890-7_7

Veronica Segarra-Faggioni¹⁷ &
Audrey Romero-Pelaez¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 918))

Included in the following conference series:

International Conference on Information Technology & Systems

2131 Accesses

Abstract

The bibliographic databases have abstract and citations of scientific articles, the summary being the most consulted section of an article. In order to classify and address the entries in a system of indexing and retrieval of information in the databases of a manuscript, there are keywords, which in many cases this information should not achieve greater dissemination. This paper presents an evaluation of the semantic relatedness between the abstract of scientific papers and their keywords. This analysis will be using word2vec that is a predictive model, and it will find the nearest words. Thus, this study is focused on the metadata quality assessment through the similar semantics between two words that allow the accuracy in relation to metadata of scientific databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.scopus.com/.
2.
Common words like the, at, which, and others.

References

Yih, W., Zweig, G., Platt, J.C.: Polarity inducing latent semantic analysis. In: Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1212–1222 (2012)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space, CoRR, vol. abs/1301.3 (2013)
Google Scholar
Yan, E., Zhu, Y.: Tracking word semantic change in biomedical literature. Int. J. Med. Inform. 109, 76–86 (2018)
Article Google Scholar
Ferrone, L., Zanzotto, F.M.: A symbolic, distributed and distributional representations for natural language processing in the era of deep learning: a survey (2017)
Google Scholar
Goldberg, Y.: A primer on neural network models for natural language processing. JAIR 57, 345–420 (2016)
Article MathSciNet Google Scholar
Deerwest, S.T., Dumais, G.W., Furnas, T.K., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 1212–1222 (1990)
Google Scholar
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting Similarities among Languages for Machine Translation, CoRR, vol. abs/1309.4 (2013)
Google Scholar
Romero Pelaez, A., Segarra-Faggioni, V., Alarcon, P.P.: Exploring the provenance and accuracy as metadata quality metrics in assessment resources of OCW repositories. In: ICETC 2018 (2018)
Google Scholar
Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors (2014)
Google Scholar
ISO 25000 software product quality. http://iso25000.com/index.php/en/iso-25000-standards/iso-25012. Accessed 01 Apr 2018

Download references

Acknowledgments

The research team would like to thank Universidad Técnica Particular de Loja, especially to Tecnologías Avanzadas de la Web y SBC Group.

Author information

Authors and Affiliations

Universidad Técnica Particular de Loja, Loja, Ecuador
Veronica Segarra-Faggioni & Audrey Romero-Pelaez

Authors

Veronica Segarra-Faggioni
View author publications
You can also search for this author in PubMed Google Scholar
Audrey Romero-Pelaez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Veronica Segarra-Faggioni .

Editor information

Editors and Affiliations

DEI/FCT, Universidade de Coimbra, Coimbra, Portugal
Álvaro Rocha
Facultad de Geografía, Universidad de Santiago de Compostela, Santiago Compostela, La Coruña, Spain
Carlos Ferrás
Departamento de Eléctrica, Electrónica y Telecomunicaciones, Universidad de las Fuerzas Armadas “ESPE”, Sangolqui, Ecuador
Manolo Paredes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Segarra-Faggioni, V., Romero-Pelaez, A. (2019). Analyzing Scientific Corpora Using Word Embedding. In: Rocha, Á., Ferrás, C., Paredes, M. (eds) Information Technology and Systems. ICITS 2019. Advances in Intelligent Systems and Computing, vol 918. Springer, Cham. https://doi.org/10.1007/978-3-030-11890-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-11890-7_7
Published: 29 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11889-1
Online ISBN: 978-3-030-11890-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics