Skip to main content

DESAM — Annotated corpus for Czech

  • Contributed Papers
  • Conference paper
  • First Online:
SOFSEM'97: Theory and Practice of Informatics (SOFSEM 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1338))

Abstract

This paper deals with Czech disambiguated corpus DESAM. It is a tagged corpus which has been manually disambiguated and can be used in various applications. We discuss the structure of the corpus, tools used for its managing, linguistic applications, and also possible use of machine learning techniques relying on the disambiguated data. Possible ways of developing the procedures for complete automatic disambiguation are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. Pala. Desambiguating syntactic constructions from tagged corpus. In Workshop on AI Methods in Machine Learning, 1996.

    Google Scholar 

  2. R. Garside. The CLAWS word-tagging system, The computational analysis of English. Longman, London, 1987.

    Google Scholar 

  3. D. Cutting. A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Natural Language Processing, Trento, Italy, March–April 1992.

    Google Scholar 

  4. F. Karlsson, A. Voutilainen, J. Heikkila, and A. Anttila. Constraint Grammars. Mouton de Gruyter, Berlin, 1995.

    Google Scholar 

  5. P. Ševeček. LEMMA — a lemmatizer for Czech. Brno, 1996. (manuscript).

    Google Scholar 

  6. K. Osolsobě. Algorithmic description of Czech morphology. PhD thesis, Masaryk University, Brno, 1996.

    Google Scholar 

  7. V. Puža. Syntactic analysis of natural language with a view to a corpora tagging. Master's thesis, Faculty of Informatics, Masaryk University, Brno, 1997.

    Google Scholar 

  8. B. M. Schulze and O. Christ. The CQP User's Manual.

    Google Scholar 

  9. O. Christ. The XKWIC User Manual.

    Google Scholar 

  10. J. Jelinek, J. V. Bečka, and M. Těšiteloá. Frequency Dictionary of Czech. Academia, Praha, 1961.

    Google Scholar 

  11. J. Hajič and B. Hladká. Probabilistic and rule-based tagging of an inflective language — a comparison. Technical Report 1, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, November 1996.

    Google Scholar 

  12. T. J. Sejnowski and C. R. Rosenberg. Parallel Networks that Learn to Pronounce English Text. Complex Systems, 1:145–168, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

František Plášil Keith G. Jeffery

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pala, K., Rychlý, P., Smrž, P. (1997). DESAM — Annotated corpus for Czech. In: Plášil, F., Jeffery, K.G. (eds) SOFSEM'97: Theory and Practice of Informatics. SOFSEM 1997. Lecture Notes in Computer Science, vol 1338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63774-5_134

Download citation

  • DOI: https://doi.org/10.1007/3-540-63774-5_134

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63774-5

  • Online ISBN: 978-3-540-69645-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics