Abstract
This paper presents a description of a Czech text corpus (Chyby) containing various kinds of errors such as spelling, typographical, grammatical, style, lexical. We explain how Chyby has been built, how the errors in it have been discovered, marked and annotated. The classification of the errors is presented and the statistics concerning the types of errors is given. The tools for annotating the errors are also described. To the best of our knowledge, this is first text corpus of this sort prepared for Czech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Leech, G.: Learner corpora: what they are and what can be done with them. In: Granger, S. (ed.) Learner English on Computer. Addison Wesley Longman, London (1998)
Burnard, L. (ed.): Users Reference Guide for the British National Corpus. Oxford University Computing Service, Oxford (1995)
Kocek, J., Kopřivová, M., Kučera, K. (eds.): Český národní korpus – úvod a příručka uživatele (Czech National Corpus – Introduction and Users Guide). FF UK – ÚCŇK (2000)
Rychlý, P.: Corpus Managers and Their Effective Implementation. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2000)
Carlberger, J., Domeij, R., Kann, V., Kuntsson, O.: A swedish grammar checker (2000), http://citeseer.nj.nec.com/305098.html
Wei, Y.H., Davies, G.: Do grammar checkers work (2002), http://www.camsoftpartners.-co.uk/euro96b.htm
Hlavsa, Z., et al.: Akademická pravidla českého pravopisu (Rules of Czech Orthography). Akademia, Praha (1993)
Kukačka, M.: Correcting errors in WinCorr (Student Project at the Laboratory of Natural Language Processing, Faculty of Informatics, Masaryk University, Brno, Czech Republic) (2000)
Pala, K., Rychlý, P., Smrž, P.: DESAM – an annotated corpus for Czech. In: Proceedings of SOFSEM 1998. Springer, Heidelberg (1998)
Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A. (eds.): Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pala, K., Rychlý, P., Smrž, P. (2003). Text Corpus with Errors. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-39398-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive