Syntactic n-grams: The Concept

  • Grigori Sidorov
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)


As we have already mentioned, the main idea of the formal features applicable in computational linguistics is related to the vector space model and the use of n-grams as features in this space, which also includes unigrams, i.e., words. The words are considered in the contexts. Usually, it is neighbor words. But some words that have syntactic relations are not neighbors, thus, here appears the idea to use syntactic information to obtain real (syntactically related) neigbors as the context of a word. So, we suggest obtaining n-grams by following paths in sytanctic trees.


  1. 1.
    Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. IEEE Intelligent Systems, Vol. 20, No. 5, pp. 67–75 (2005)CrossRefGoogle Scholar
  2. 13.
    Cheng, W., Greaves, C., Warren, M.: From n-gram to skipgram to concgram. International Journal of Corpus Linguistics 11, no. 4, pp 411–433 (2006)CrossRefGoogle Scholar
  3. 46.
    Hernández-Reyes, E., Martínez-Trinidad, J. Fco., Carrasco-Ochoa, J.A., García-Hernández, R.A.: Document Representation Based on Maximal Frequent Sequence Sets. LNCS 4225, pp. 854–863 (2006)Google Scholar
  4. 6.
    Baayen, H., Tweedie, F. and Halteren, H.: Outside The Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing, pp. 121–131 (1996)Google Scholar
  5. 44.
    Habash, N.: The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. LNCS, 3123, pp. 61–69 (2004)Google Scholar
  6. 56.
    Khalilov, M., Fonollosa, J.A.R.: N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation: comparison and system combination. In: Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 424–432 (2009)Google Scholar
  7. 79.
    Pado, S., Lapata, M.: Dependency-based construction of semantic space models. Computational Linguistics, 33(2): 161–199 (2007)CrossRefGoogle Scholar
  8. 82.
    Padro, L., Turmo, J.: TextServer: Cloud-based multilingual natural language processing. IEEE International Conference On Data Mining (2015)Google Scholar
  9. 40.
    Goldberg, Y., Orwant, J.: A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books. In: Proc. of Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task, pp. 241–247, Atlanta, Georgia (2013)Google Scholar
  10. 3.
    Aguilar-Galicia, H., Sidorov, G., Ledeneva, Y.: Extracción automática de hechos de libros de texto basada en estructuras sintácticas [(in Spanish) Automatic extraction of facts in text books based on syntactic structures]. Research in computing science, Vol. 55, pp. 15–26 (2012)Google Scholar
  11. 26.
    Galitsky, B., Ilvovsky, D., Kuznetsov, S.O.: Detecting logical argumentation in text via communicative discourse tree. Journal of Experimental & Theoretical Artificial Intelligence, 30(5):637–663 (2018)Google Scholar
  12. 27.
    Galitsky, B.A.: Matching parse thickets for open domain question answering. Data and Knowledge Engineering 107:24–50 (2017)CrossRefGoogle Scholar
  13. 28.
    Galitsky, B.A., Ilvovsky, D.I., Kuznetsov, S.O: Rhetoric Map of an Answer to Compound Queries. In: ACL (2), pp. 681–686 (2015)Google Scholar
  14. 29.
    Galitsky, B.A., Kuznetsov, S.O., Usikov, D.: Parse Thicket Representation for Multi-sentence Search. In: Proc. Int. Conf. on Conceptual Structures (ICCS 2013), LNCS 7735, pp. 153–172 (2013)Google Scholar
  15. 63.
    Lin, D.: Automatic retrieval and clustering of similar words. Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2, ACL ’98, pp. 768–774, Stroudsburg, PA, USA (1998)Google Scholar
  16. 64.
    Lin, D., Pantel, P.: Dirt: discovery of inference rules from text. KDD, pp. 323–328 (2001)Google Scholar
  17. 8.
    Baroni, M., Lenci, A. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4):673–721 (2010)CrossRefGoogle Scholar
  18. 19.
    Erk, K., Padó, S.: A structured vector space model for word meaning in context. In: Proceedings of EMNLP, pp. 897–906, Honolulu, HI (2008)Google Scholar
  19. 20.
    Erk, K., Padó, S., Padó, U.: A flexible, corpus-driven model of regular and inverse selectional preferences. Computational Linguistics, 36(4):723–763 (2010)CrossRefGoogle Scholar
  20. 108.
    Wu, F., Weld, D.: Open information extraction using Wikipedia. In: ACL, pp. 118–127 (2010)Google Scholar
  21. 12.
    Chen, W., Kazama, J., Uchimoto, K., Torisawa, K.: Improving dependency parsing with subtrees from auto-parsed data. In: EMNLP, pp. 570–579 (2009)CrossRefGoogle Scholar
  22. 89.
    Sagae, K., Gordon, A.: Clustering words by syntactic similarity improves dependency parsing of predicate-argument structures. In IWPT, pp. 192–201 (2009)Google Scholar
  23. 2.
    Agarwal, A., Biads, F., Mckeown, K.R.: Contextual Phrase-Level Polarity Analysis using Lexical Affect Scoring and Syntactic N-grams. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL), pp. 24–32 (2009)Google Scholar
  24. 10.
    Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing: An Open-Source Suite of Language Analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04) (2004)Google Scholar
  25. 80.
    Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-Source Language Processing Tools. In: Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), ELRA La Valletta, Malta (2010)Google Scholar
  26. 81.
    Padró, L., Stanilovsky, E.: FreeLing 3.0: Towards Wider Multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012), ELRA, Turkey (2012)Google Scholar
  27. 15.
    de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating Typed Dependency Parses from Phrase Structure Parses. In: Proc. of LREC (2006)Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Grigori Sidorov
    • 1
  1. 1.Instituto Politécnico NacionalCentro de Investigación en ComputaciónMexico CityMexico

Personalised recommendations