A System for Information Extraction from Scientific Texts in Russian

Bruches, Elena; Mezentseva, Anastasia; Batura, Tatiana

doi:10.1007/978-3-031-12285-9_15

Elena Bruches^10,11,
Anastasia Mezentseva¹¹ &
Tatiana Batura ORCID: orcid.org/0000-0003-4333-7888^10,11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1620))

Included in the following conference series:

International Conference on Data Analytics and Management in Data Intensive Domains

360 Accesses
1 Citations

Abstract

In this paper, we present a system for information extraction from scientific texts in the Russian language. The system performs several tasks in an end-to-end manner: term recognition, extraction of relations between terms, and term linking with entities from the knowledge base. These tasks are extremely important for information retrieval, recommendation systems, and classification. The advantage of the implemented methods is that the system does not require a large amount of labeled data, which saves time and effort for data labeling and therefore can be applied in low- and mid-resource settings. The source code is publicly available and can be used for different research purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Domain Specific Features Driven Information Extraction from Web Pages of Scientific Conferences

An Iterative Approach to the Terminology Extraction from Ukrainian-Language Scientific Text Corpora

Article 25 November 2014

KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning

Notes

References

Bilu, Y., Gretz, S., Cohen, E., Slonim, N.: What if we had no Wikipedia? Domain-independent term extraction from a large news corpus. arXiv preprint arXiv:2009.08240 (2020)
Bolshakova, E., Loukachevitch, N., Nokel, M.: Topic models can improve domain term extraction. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 684–687. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_60
Chapter Google Scholar
Bruches, E., Pauls, A., Batura, T., Isachenko, V.: Entity recognition and relation extraction from scientific and technical texts in Russian. In: 2020 Science and Artificial Intelligence Conference (SAI ence), pp. 41–45. IEEE (2020)
Google Scholar
Bunescu, R., Paşca, M.: Using encyclopedic knowledge for named entity disambiguation. In: 11th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Trento (2006)
Google Scholar
Cachola, I., Lo, K., Cohan, A., Weld, D.S.: TLDR: extreme summarization of scientific documents. In: EMNLP (2020)
Google Scholar
D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in STEM scholarly content to authoritative encyclopedic and lexicographic sources. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2192–2203. European Language Resources Association, Marseille, May 2020. https://aclanthology.org/2020.lrec-1.268
Eberts, M., Ulges, A.: Span-based joint entity and relation extraction with transformer pre-training. arXiv preprint arXiv:1909.07755 (2019)
Ganea, O.E., Hofmann, T.: Deep joint entity disambiguation with local neural attention. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2619–2629. Association for Computational Linguistics, Copenhagen, 2017
Google Scholar
Gupta, N., Singh, S., Roth, D.: Entity linking via joint encoding of types, descriptions, and context. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2681–2690. Association for Computational Linguistics, Copenhagen (2017)
Google Scholar
Head, A., et al.: Augmenting scientific papers with just-in-time, position-sensitive definitions of terms and symbols. ArXiv abs/2009.14237 (2020)
Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics (1992). https://www.aclweb.org/anthology/C92-2082
Huang, H., Heck, L.P., Ji, H.: Leveraging deep neural networks and knowledge graphs for entity disambiguation. ArXiv abs/1504.07678 (2015)
Google Scholar
Ji, B., et al.: Span-based joint entity and relation extraction with attention-based span-specific and contextual semantic representations. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 88–99. International Committee on Computational Linguistics, Barcelona, December 2020. https://aclanthology.org/2020.coling-main.8
Kucza, M., Niehues, J., Zenkel, T., Waibel, A., Stüker, S.: Term extraction via neural sequence labeling a comparative evaluation of strategies using recurrent neural networks. In: INTERSPEECH, pp. 2072–2076 (2018)
Google Scholar
Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3219–3232. Association for Computational Linguistics, Brussels (2018)
Google Scholar
Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 879–888. Association for Computational Linguistics, Lisbon (2015)
Google Scholar
Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014). https://aclanthology.org/Q14-1019
Pershina, M., He, Y., Grishman, R.: Personalized page rank for named entity disambiguation. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 238–243. Association for Computational Linguistics, Denver (2015)
Google Scholar
Stanković, R., Krstev, C., Obradović, I., Lazić, B., Trtovac, A.: Rule-based automatic multi-word term extraction and lemmatization. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 507–514 (2016)
Google Scholar
Tao, Q., Luo, X., Wang, H.: Enhancing relation extraction using syntactic indicators and sentential contexts. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1574–1580 (2019)
Google Scholar
Varma, V., et al.: IIIT Hyderabad at TAC 2009. In: TAC (2009)
Google Scholar
Wadden, D., Wennberg, U., Luan, Y., Hajishirzi, H.: Entity, relation, and event extraction with contextualized span representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5784–5789. Association for Computational Linguistics, Hong Kong, November 2019
Google Scholar
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (2020)
Google Scholar
Wu, S., He, Y.: Enriching pre-trained language model with entity information for relation classification. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, pp. 2361–2364. Association for Computing Machinery, New York (2019)
Google Scholar
Yamada, I., Shindo, H., Takeda, H., Takefuji, Y.: Joint learning of the embedding of words and entities for named entity disambiguation. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 250–259. Association for Computational Linguistics, Berlin (2016)
Google Scholar
Zhang, W., Su, J., Tan, C.L., Wang, W.T.: Entity linking leveraging automatically generated annotation. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1290–1298. Coling 2010 Organizing Committee, Beijing (2010)
Google Scholar
Zhang, Z., Gao, J., Ciravegna, F.: SemRe-rank: improving automatic term extraction by incorporating semantic relatedness with personalised PageRank. ACM Trans. Knowl. Discov. Data (TKDD) 12(5), 1–41 (2018)
Article Google Scholar

Download references

Acknowledgement

The study was funded by RFBR according to the research project 19-07-01134.

Author information

Authors and Affiliations

A. P. Ershov Institute of Informatics Systems, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia
Elena Bruches & Tatiana Batura
Novosibirsk State University, Novosibirsk, Russia
Elena Bruches, Anastasia Mezentseva & Tatiana Batura

Authors

Elena Bruches
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Mezentseva
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Batura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tatiana Batura .

Editor information

Editors and Affiliations

Space Research Institute of the Russian Academy of Sciences, Moscow, Russia
Alexei Pozanenko
Federal Research Center “Computer Science and Control” of RAS, Moscow, Russia
Sergey Stupnikov
Christian-Albrecht University of Kiel, Kiel, Germany
Bernhard Thalheim
Universidad Carlos III de Madrid, Getafe, Spain
Eva Mendez
A. A. Baikov Institute of Metallurgy and Materials Science of RAS (IMET RAS), Moscow, Russia
Nadezhda Kiselyova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bruches, E., Mezentseva, A., Batura, T. (2022). A System for Information Extraction from Scientific Texts in Russian. In: Pozanenko, A., Stupnikov, S., Thalheim, B., Mendez, E., Kiselyova, N. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2021. Communications in Computer and Information Science, vol 1620. Springer, Cham. https://doi.org/10.1007/978-3-031-12285-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-12285-9_15
Published: 26 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12284-2
Online ISBN: 978-3-031-12285-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A System for Information Extraction from Scientific Texts in Russian

Abstract

Access this chapter

Similar content being viewed by others

Domain Specific Features Driven Information Extraction from Web Pages of Scientific Conferences

An Iterative Approach to the Terminology Extraction from Ukrainian-Language Scientific Text Corpora

KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A System for Information Extraction from Scientific Texts in Russian

Abstract

Access this chapter

Similar content being viewed by others

Domain Specific Features Driven Information Extraction from Web Pages of Scientific Conferences

An Iterative Approach to the Terminology Extraction from Ukrainian-Language Scientific Text Corpora

KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation