Advertisement

Syntactic n-grams in Computational Linguistics

  • Grigori Sidorov
Book

Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Table of contents

  1. Front Matter
    Pages i-ix
  2. Vector Space Model in the Analysis of Similarity between Texts

    1. Front Matter
      Pages 1-1
    2. Grigori Sidorov
      Pages 3-4
    3. Grigori Sidorov
      Pages 5-10
    4. Grigori Sidorov
      Pages 41-43
  3. Non-linear Construction of n-grams

    1. Front Matter
      Pages 45-45
    2. Grigori Sidorov
      Pages 47-58
    3. Grigori Sidorov
      Pages 81-84
    4. Grigori Sidorov
      Pages 85-86
  4. Back Matter
    Pages 87-92

About this book

Introduction

This book is about a new approach in the field of computational linguistics related to the idea of constructing n-grams in non-linear manner, while the traditional approach consists in using the data from the surface structure of texts, i.e., the linear structure.

In this book, we propose and systematize the concept of syntactic n-grams, which allows using syntactic information within the automatic text processing methods related to classification or clustering. It is a very interesting example of application of linguistic information in the automatic (computational) methods. Roughly speaking, the suggestion is to follow syntactic trees and construct n-grams based on paths in these trees. There are several types of non-linear n-grams; future work should determine, which types of n-grams are more useful in which natural language processing (NLP) tasks.

This book is intended for specialists in the field of computational linguistics. However, we made an effort to explain in a clear manner how to use n-grams; we provide a large number of examples, and therefore we believe that the book is also useful for graduate students who already have some previous background in the field.

Keywords

natural language processing n-grams computational linguistics vector space model textual similarity

Authors and affiliations

  • Grigori Sidorov
    • 1
  1. 1.Instituto Politécnico NacionalCentro de Investigación en ComputaciónMexico CityMexico

Bibliographic information

  • DOI https://doi.org/10.1007/978-3-030-14771-6
  • Copyright Information The Author(s), under exclusive licence to Springer Nature Switzerland AG 2019
  • Publisher Name Springer, Cham
  • eBook Packages Computer Science
  • Print ISBN 978-3-030-14770-9
  • Online ISBN 978-3-030-14771-6
  • Series Print ISSN 2191-5768
  • Series Online ISSN 2191-5776
  • Buy this book on publisher's site
Industry Sectors
Biotechnology
IT & Software
Telecommunications
Engineering