Advertisement

Capturing Lexical, Grammatical, and Semantic Information with Vecsigrafo

  • Jose Manuel Gomez-Perez
  • Ronald Denaux
  • Andres Garcia-Silva
Chapter
  • 61 Downloads

Abstract

Embedding algorithms work by optimizing the distance between a word and its context(s), generating an embedding space that encodes their distributional representation. In addition to single words or word pieces, other features, which result from a deeper analysis of the text, can be used to enrich such representations with additional information. Such features are influenced by the tokenization strategy used to chunk the text and can include not only lexical and part-of-speech information but also annotations about the disambiguated sense of a word according to a structured knowledge graph. In this chapter we analyze the impact that explicitly adding lexical, grammatical and semantic information during the training of Vecsigrafo has in the resulting representations and whether or not this can enhance their downstream performance. To illustrate this analysis we focus on corpora from the scientific domain, where rich, multi-word expressions are frequent, hence requiring advanced tokenization strategies.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Jose Manuel Gomez-Perez
    • 1
  • Ronald Denaux
    • 1
  • Andres Garcia-Silva
    • 1
  1. 1.Expert SystemMadridSpain

Personalised recommendations