Skip to main content

Language Corpora: The Czech Case

  • Conference paper
  • First Online:
Text, Speech and Dialogue (TSD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Included in the following conference series:

  • 402 Accesses

Abstract

Against background of the growing need of information, which for language used to be supplied in a rather limited way, the new solution found in language corpora and the way how this has been implemented is outlined and discussed. For the Czech language, this solution has materialized in the 100 million representative Czech National Corpus (CNC, 2000). In the following, a brief tour is offered through various stages of its build-up, characterizing both various corpora within CNC and giving some figures about proportions of various types of language represented. The last part of the contribution sets a minimal programme for further research and desiderata to be followed in general in this branch of important and international stream of modern science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burnard Lou, 1995, Users’ Reference Guide for the British National Corpus, Oxford U. Press, Oxford.

    Google Scholar 

  2. Čermák, F. 1995. Jazykový korpus: Prostředek a zdroj poznání. Slovo a slovesnost 56: 119–140. (Language Corpus: Means and Source of Knowledge).

    Google Scholar 

  3. Čermák F., 1997, Czech National Corpus: A Case in Many Contexts. International Journal of Corpus Linguistics 2, 181–197.

    Google Scholar 

  4. Čermák F., 1998, Czech National Corpus: Its Character, Goal and Background, In Text, Speech, Dialogue, Proceedings of the First Workshop on Text, Speech, Dialogue-TSD’98, Brno, Czech Republic, September, eds. P. Sojka, V. Matoušek, K. Pala, I. Kopeček, Masaryk University: Brno, 9–14.

    Google Scholar 

  5. Čermák F. Králík J. Kučera K., 1997, Recepce současné češtiny a reprezentativnost korpusu, Slovo a slovesnost 58, 117–124 (Reception of the Contemporary Czech and the Representativeness of Corpus).

    Google Scholar 

  6. Český národní korpus. Pt]Úvod a příručka uživatele, 2000. Eds. Kocek J., Kopřivová M., Kučera K., Filozofická fakulta KU Praha (Czech National Corpus. An Introduction and User’s Manual).

    Google Scholar 

  7. Kruyt, J. G. 1993. Design Criteria for Corpora Construction in the Framework of a European Corpora Network. Final Report. Institute for Dutch Lexicology INL: Leiden.

    Google Scholar 

  8. Kučera K., 1998, Diachronní složka Českého národního korpusu: obecné zásady, kontext a současný stav. Listy filologické 121, 303–313 (Diachronic Component of the Czech National Corpus: General Principles, Context and Current State of Affairs).

    Google Scholar 

  9. Norling-Christensen, O. 1992. Preparing a Text Corpus. Computational Tools and Methods for Standardizing, Tagging and StructuringText Data. Papers in Computational Lexicography COMPLEX’92, ed. by R. Kiefer et al.: 251–259. Research Institute for Linguistics, Hungarian Academy of Sciences: Budapest

    Google Scholar 

  10. Petkevič V., 2001, Neprojektivní konstrukce v češtině z hlediska automatické morfologické disambiguace (Nonprojective Constructions in Czech from the Viewpoint of an Automatic Morphological Disambiguation of Czech Texts), in Čeština. Univerzália a specifika 3, eds. Z. Hladká, P. Karlík, Masarykova univerzita Brno. 197–206

    Google Scholar 

  11. Šulc, M. Korpusová lingvistika. První vstup. Karolinum 1999 (Corpus Linguistics. A First Introduction).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Čermák, F. (2001). Language Corpora: The Czech Case. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-44805-5_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42557-1

  • Online ISBN: 978-3-540-44805-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics