Skip to main content

Evaluation and Improvements in Punctuation Detection for Czech

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Abstract

Punctuation detection and correction belongs to the hardest automatic grammar checking tasks for the Czech language. The paper compares available grammar and punctuation correction programs on several data sets. It also describes a set of improvements of one of the available tools, leading to significantly better recall, as well as precision.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://openfst.org/.

  2. 2.

    SET is an abbreviation of “syntactic engineering tool”.

  3. 3.

    http://nlp.fi.muni.cz/trac/set/browser/punct.set.

  4. 4.

    http://nlp.fi.muni.cz/trac/set/browser/punct2.set.

References

  1. Behún, D.: Kontrola české gramatiky pro MS Office - konec korektor\(\mathring{\text{u}}\) v Čechách? (2005). https://interval.cz/clanky/kontrola-ceske-gramatiky-pro-ms-office-konec-korektoru-v-cechach

  2. Boháč, M., Blavka, K., Kuchařová, M., Škodová, S.: Post-processing of the recognized speech for web presentation of large audio archive. In: 2012 35th International Conference on Telecommunications and Signal Processing (TSP), pp. 441–445 (2012)

    Google Scholar 

  3. Holan, T., Kuboň, V., Plátek, M.: A prototype of a grammar checker for Czech. In: Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 147–154. Association for Computational Linguistics (1997)

    Google Scholar 

  4. Horák, A.: Computer Processing of Czech Syntax and Semantics. Librix.eu, Brno (2008)

    Google Scholar 

  5. Jakubíček, M., Horák, A.: Punctuation detection with full syntactic parsing. Res. Comput. Sci. Spec. issue: Nat. Lang. Process. Appl. 46, 335–343 (2010)

    Google Scholar 

  6. Kovář, V.: Partial grammar checking for Czech using the set parser. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 308–314. Springer, Heidelberg (2014)

    Google Scholar 

  7. Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: a new parsing system for Czech. In: Vetulani, Z. (ed.) LTC 2009. LNCS, vol. 6562, pp. 161–171. Springer, Heidelberg (2011)

    Google Scholar 

  8. Lingea s.r.o.: Grammaticon (2003). www.lingea.cz/grammaticon.htm

  9. Oliva, K., Petkevič, V., Microsoft s.r.o.: Czech Grammar Checker (2005). http://office.microsoft.com/word

  10. Pala, K.: Pište dopisy konečně bez chyb – Český gramatický korektor pro Microsoft Office. Computer, 13–14 (2005)

    Google Scholar 

  11. Petkevič, V.: Kontrola české gramatiky (český grammar checker). Studie z aplikované lingvistiky-Stud. Appl. Linguist. 5(2), 48–66 (2014)

    Google Scholar 

  12. Sedláček, R., Smrž, P.: A new Czech morphological analyser ajka. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 100–107. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. Suchomel, V., Michelfeit, J., Pomikálek, J.: Text tokenisation using unitok. In: Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 71–75. Tribun EU, Brno (2014)

    Google Scholar 

  14. Šmerk, P.: Unsupervised learning of rules for morphological disambiguation. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 211–216. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work has been partly supported by the Grant Agency of CR within the project 15-13277S. The research leading to these results has received funding from the Norwegian Financial Mechanism 2009–2014 and the Ministry of Education, Youth and Sports under Project Contract no. MSMT-28477/2014 within the HaBiT Project 7F14047. This work was also partly supported by Student Grant Scheme 2016 of Technical University of Liberec.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vojtěch Kovář .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kovář, V., Machura, J., Zemková, K., Rott, M. (2016). Evaluation and Improvements in Punctuation Detection for Czech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics