Text Corpus with Errors

Pala, Karel; Rychlý, Pavel; Smrž, Pavel

doi:10.1007/978-3-540-39398-6_13

Karel Pala⁷,
Pavel Rychlý⁷ &
Pavel Smrž⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

430 Accesses

Abstract

This paper presents a description of a Czech text corpus (Chyby) containing various kinds of errors such as spelling, typographical, grammatical, style, lexical. We explain how Chyby has been built, how the errors in it have been discovered, marked and annotated. The classification of the errors is presented and the statistics concerning the types of errors is given. The tools for annotating the errors are also described. To the best of our knowledge, this is first text corpus of this sort prepared for Czech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Leech, G.: Learner corpora: what they are and what can be done with them. In: Granger, S. (ed.) Learner English on Computer. Addison Wesley Longman, London (1998)
Google Scholar
Burnard, L. (ed.): Users Reference Guide for the British National Corpus. Oxford University Computing Service, Oxford (1995)
Google Scholar
Kocek, J., Kopřivová, M., Kučera, K. (eds.): Český národní korpus – úvod a příručka uživatele (Czech National Corpus – Introduction and Users Guide). FF UK – ÚCŇK (2000)
Google Scholar
Rychlý, P.: Corpus Managers and Their Effective Implementation. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2000)
Google Scholar
Carlberger, J., Domeij, R., Kann, V., Kuntsson, O.: A swedish grammar checker (2000), http://citeseer.nj.nec.com/305098.html
Wei, Y.H., Davies, G.: Do grammar checkers work (2002), http://www.camsoftpartners.-co.uk/euro96b.htm
Hlavsa, Z., et al.: Akademická pravidla českého pravopisu (Rules of Czech Orthography). Akademia, Praha (1993)
Google Scholar
Kukačka, M.: Correcting errors in WinCorr (Student Project at the Laboratory of Natural Language Processing, Faculty of Informatics, Masaryk University, Brno, Czech Republic) (2000)
Google Scholar
Pala, K., Rychlý, P., Smrž, P.: DESAM – an annotated corpus for Czech. In: Proceedings of SOFSEM 1998. Springer, Heidelberg (1998)
Google Scholar
Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A. (eds.): Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Karel Pala, Pavel Rychlý & Pavel Smrž

Authors

Karel Pala
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Rychlý
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Smrž
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek & Pavel Mautner &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pala, K., Rychlý, P., Smrž, P. (2003). Text Corpus with Errors. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-39398-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics