DESAM — Annotated corpus for Czech

Pala, Karel; Rychlý, Pavel; Smrž, Pavel

doi:10.1007/3-540-63774-5_134

Karel Pala¹,
Pavel Rychlý¹ &
Pavel Smrž¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1338))

Included in the following conference series:

International Conference on Current Trends in Theory and Practice of Computer Science

156 Accesses
16 Citations

Abstract

This paper deals with Czech disambiguated corpus DESAM. It is a tagged corpus which has been manually disambiguated and can be used in various applications. We discuss the structure of the corpus, tools used for its managing, linguistic applications, and also possible use of machine learning techniques relying on the disambiguated data. Possible ways of developing the procedures for complete automatic disambiguation are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

K. Pala. Desambiguating syntactic constructions from tagged corpus. In Workshop on AI Methods in Machine Learning, 1996.
Google Scholar
R. Garside. The CLAWS word-tagging system, The computational analysis of English. Longman, London, 1987.
Google Scholar
D. Cutting. A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Natural Language Processing, Trento, Italy, March–April 1992.
Google Scholar
F. Karlsson, A. Voutilainen, J. Heikkila, and A. Anttila. Constraint Grammars. Mouton de Gruyter, Berlin, 1995.
Google Scholar
P. Ševeček. LEMMA — a lemmatizer for Czech. Brno, 1996. (manuscript).
Google Scholar
K. Osolsobě. Algorithmic description of Czech morphology. PhD thesis, Masaryk University, Brno, 1996.
Google Scholar
V. Puža. Syntactic analysis of natural language with a view to a corpora tagging. Master's thesis, Faculty of Informatics, Masaryk University, Brno, 1997.
Google Scholar
B. M. Schulze and O. Christ. The CQP User's Manual.
Google Scholar
O. Christ. The XKWIC User Manual.
Google Scholar
J. Jelinek, J. V. Bečka, and M. Těšiteloá. Frequency Dictionary of Czech. Academia, Praha, 1961.
Google Scholar
J. Hajič and B. Hladká. Probabilistic and rule-based tagging of an inflective language — a comparison. Technical Report 1, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, November 1996.
Google Scholar
T. J. Sejnowski and C. R. Rosenberg. Parallel Networks that Learn to Pronounce English Text. Complex Systems, 1:145–168, 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Karel Pala, Pavel Rychlý & Pavel Smrž

Authors

Karel Pala
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Rychlý
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Smrž
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

František Plášil Keith G. Jeffery

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pala, K., Rychlý, P., Smrž, P. (1997). DESAM — Annotated corpus for Czech. In: Plášil, F., Jeffery, K.G. (eds) SOFSEM'97: Theory and Practice of Informatics. SOFSEM 1997. Lecture Notes in Computer Science, vol 1338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63774-5_134

Download citation

DOI: https://doi.org/10.1007/3-540-63774-5_134
Published: 29 July 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63774-5
Online ISBN: 978-3-540-69645-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics