Advertisement

An Example of a Compatible NLP Toolkit

  • Krzysztof JassemEmail author
  • Roman GrundkiewiczEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9561)

Abstract

The paper describes an open-source set of linguistic tools, whose distinctive features are its customisability and compatibility with other NLP toolkits: texts in various natural languages and character encodings may be read from a number of popular data formats; all annotation tools may be run with several options to differentiate the format of input and output; rule lists used by individual tools may be supplemented or replaced by the user; external tools (including NLP tools designed in independent research centres) may be incorporated into the toolkit’s environment.

Keywords

PSI-Toolkit NLP tools Polish language Software architecture Open source 

References

  1. 1.
    Bański, P., Przepiórkowski, A.: The TEI and the NCP: the model and its application. In: LREC2010 Workshop on Language Resources: From Storyboard to Sustainability and LR Lifecycle Management. ELRA, Valletta (2010)Google Scholar
  2. 2.
    Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python, 1st edn. O’Reilly Media Inc., Sebastopol (2009)zbMATHGoogle Scholar
  3. 3.
    Forcada, M.L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Sánchez-Martínez, F., Ramírez-Sánchez, G., Tyers, F.M.: Apertium: a free/open-source platform for rule-based machine translation. Mach. Transl. 25(2), 127–144 (2011)CrossRefGoogle Scholar
  4. 4.
    Graliński, F.: Some methods of describing discontinuity in Polish and their cost-effectiveness. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 69–77. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Graliński, F.: Formalizacja nieciągłości zdań przy zastosowaniu rozszerzonej gramatyki bezkontekstowej. Ph.D. thesis, Adam Mickiewicz University in Poznań, The Faculty of Mathematics and Computer Science, Poznań, supervisor: Zygmunt Vetulani (2007)Google Scholar
  6. 6.
    Graliński, F., Jassem, K., Junczys-Dowmunt, M.: PSI-Toolkit: Natural language processing pipeline. Comput. Linguist. Appl. 458, 27–39 (2012)CrossRefGoogle Scholar
  7. 7.
    Junczys-Dowmunt, M.: It’s all about the trees – towards a hybrid syntax-based MT system. In: 4th International Multiconference on Computer Science and Information Technology, Mrgowo, Poland, pp. 219–226 (2009)Google Scholar
  8. 8.
    Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Conference Proceedings: The Tenth Machine Translation Summit, vol. 5, pp. 79–86 (2005)Google Scholar
  9. 9.
    Manicki, L.: Płytki parser języka polskiego (eng: A shallow parser for Polish) (2009). supervisor: Krzysztof JassemGoogle Scholar
  10. 10.
    Obrębski, T., Stolarski, M.: UAM text tools – a text processing toolkit for Polish. In: Proceedings of 2nd Language and Technology Conference, pp. 301–304 (2005)Google Scholar
  11. 11.
    Przepiórkowski, A., Bańko, M., Górski, R., Barbara, L.T. (eds.): Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)Google Scholar
  12. 12.
    Przepiórkowski, A., Bański, P.: XML text interchange format in the national corpus of Polish. In: Proceedings of Practical Applications in Language and Computers PALC, pp. 55–65 (2009)Google Scholar
  13. 13.
    Sleator, D.D., Temperley, D.: Parsing English with a link grammar. Technical report, Carnegie Mellon University Computer Science Technical report CMU-CS-91-196 (1995)Google Scholar
  14. 14.
    Verspoor, K., Baumgartner Jr., W., Roeder, C., Hunter, L.: Abstracting the types away from a UIMA type system. In: From Form to Meaning: Processing Texts Automatically, pp. 249–256 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Adam Mickiewicz UniversityPoznańPoland

Personalised recommendations