Advertisement

Parsing of Polish in Graph Database Environment

  • Jan Posiadała
  • Hubert CzajaEmail author
  • Eliza Szczechla
  • Paweł Susicki
Conference paper
  • 288 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10930)

Abstract

This paper describes the basic concepts and features of the Langusta system. Langusta is a natural language processing environment embedded in a graph database. The paper presents a rule-based syntactic parsing system for the Polish language using various linguistic resources, including those containing semantic information. The advantages of this approach are directly related to the deployment of the graph paradigm, in particular to the assumption, that rules describing the syntax of the Polish language are valid queries in a graph database query language (Cypher).

Keywords

NLP Graph databases Cypher Deep parsing Corpus analysis Written corpora Stand-off annotation 

References

  1. Buczyński, A., Przepiórkowski, A.: Demo: an open source tool for partial parsing and morphosyntactic disambiguation. In: Proceedings of LREC 2008 (2008)Google Scholar
  2. Dipper, S.: Stand-off representation and exploitation of multi-level linguistic annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), pp. 39–50, Berlin (2005)Google Scholar
  3. Graliński, F., Jassem, K., Junczys-Dowmunt, M.: PSI-Toolkit: Natural language processing pipeline. Computational Linguistics – Applications. Springer, Heidelberg (2012)Google Scholar
  4. Ide, N., Suderman, K.: GrAF: a graph-based format for linguistic annotations. In: Proceedings of the Linguistic Annotation Workshop, pp. 1–8. Czech Republic, Prague (2007)Google Scholar
  5. Joshi, A.K., Schabes, Y.: Tree-adjoining grammars. In: Handbook of Formal Languages, vol. 3, pp. 69–123. Springer-Verlag New York, Inc., New York (1997). ISBN:3–540-60649-1Google Scholar
  6. Negnevitsky, M.: Artificial Intelligence: A Guide to Intelligent Systems. Addison-Wesley Longman Publishing Co., Inc., Boston (2001)Google Scholar
  7. Maziarz, M., Piasecki, M., Szpakowicz, S.: Approaching plWordNet 2.0. In: Proceedings of the 6th Global Wordnet Conference. Matsue, Japan (2012)Google Scholar
  8. Mazur, P.: Text segmentation in polish. In: Proceedings of the 5th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 43–48, 8–10 September 2005, Wroclaw, Poland (2005)Google Scholar
  9. Mihalcea, R., Radev, D.: Graph-Based Natural Language Processing and Information Retrieval. Cambridge University Press, Cambridge (2011)Google Scholar
  10. Pęzik, P.: Indexed graph databases for querying rich TEI annotation (2013). http://digilab2.let.uniroma1.it/teiconf2013/wp-content/uploads/2013/09/Pezik.pdf
  11. Przepiórkowski, A.: Powierzchniowe przetwarzanie języka polskiego. Akademicka Oficyna Wydawnicza EXIT, Warsaw (2008)Google Scholar
  12. Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)Google Scholar
  13. Przepiórkowski, A., Bański, P.: Which XML standards for multilevel corpus annotation? In: Proceedings of the 4th Language & Technology Conference, Poznań, Poland (2009)Google Scholar
  14. Przepiórkowski, A., Buczyński, A.: Shallow parsing and disambiguation engine. In: Vetulani, Z. (ed.) Proceedings of the 3rd Language & Technology Conference, Poznań, Poland, pp. 340–344 (2007)Google Scholar
  15. Przepiórkowski, A., Hajnicz, E., Patejuk, A., Woliński, M., Skwarski, F., Świdziński M.: Walenty: Towards a comprehensive valence dictionary of Polish. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, pp. 2785–2792, Reykjavík, Iceland. ELRA (2014)Google Scholar
  16. Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly Media, Massachusetts (2013)Google Scholar
  17. Rudolf, M., Świdziński, M.: Automatic utterance boundaries recognition in large Polish text corpora. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol. 25, pp. 247–256. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-39985-8_26
  18. Shi, C., Verhagen, M., Pustejovsky, M.: A conceptual framework of online natural language processing pipeline application. In: Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT, pp. 53–59, Dublin, Ireland, 23 August (2014)Google Scholar
  19. Strauch, Ch.: NoSQL Databases (2011). http://www.christof-strauch.de/nosqldbs.pdf
  20. Szpakowicz, S.: Automatyczna analiza składniowa polskich zdań pisanych. Praca doktorska (promotor Waligórski S.), Instytut Informatyki UW (1978)Google Scholar
  21. Świdziński, M.: Gramatyka formalna języka polskiego, “Rozprawy Uniwersytetu Warszawskiego”, t. 349, Warsaw (1992)Google Scholar
  22. Wilson, J.R.: Introduction to Graph Theory, 4th edn. Addison Wesley, Reading (1996)Google Scholar
  23. Woliński, M., Miłkowski, M., Ogrodniczuk, M., Przepiórkowski, A., Szałkiewicz, Ł.: PoliMorf: a (not so) new open morphological dictionary for Polish. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pp. 860–864, Istanbul, Turkey. ELRA (2012)Google Scholar
  24. Woliński, M., Przepiórkowski, A.: Projekt anotacji morfosynktaktycznej korpusu języka polskiego. Prace IPI PAN 938, grudzień 2001 (2001)Google Scholar
  25. Wood, P.T.: Query languages for graph databases. ACM SIGMOD Rec. 41(1), 50–60 (2012)Google Scholar
  26. Zeldes, A., Ritz, J., Lüdeling, A., Chiarcos, C.: ANNIS: a search tool for multi-layer annotated corpora. In: Proceedings of Corpus Linguistics 2009, Liverpool, 20–23 July, 2009Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jan Posiadała
    • 1
  • Hubert Czaja
    • 1
    Email author
  • Eliza Szczechla
    • 1
  • Paweł Susicki
    • 1
  1. 1.Scott Tiger S.A.WarsawPoland

Personalised recommendations