Abstract
This paper describes the development of an open-source tool named Trdlo. Trdlo was developed as part of our effort to build a machine translation system between very close languages. These languages usually do not have available pre-processed linguistic resources or dictionaries suitable for computer processing. Bilingual dictionaries have a big impact on quality of translation. Proposed methods described in this paper attempt to extend existing dictionaries with inferable translation pairs. Our approach requires only ‘cheap’ resources: a list of lemmata for each language and rules for inferring words from one language to another. It is also possible to use other resources like annotated corpora or Wikipedia. Results show that this approach greatly improves effectivity of building Czech-Slovak dictionary.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nábělková, M.: Closely-related languages in contact: Czech, slovak, czechoslovak. International Journal of the Sociology of Language, 53–73 (2007)
Český národní korpus SYN2006PUB: Ústav Českého národního korpusu FF UK (2006), http://www.korpus.cz
Slovenský národný korpus – prim-3.0-public-all: Jazykovedný ústav L’. Štúra SAV (2007), http://korpus.juls.savba.sk
Intercorp Parallel Corpora: Ústav Českého národního korpusu FF UK (2009), http://ucnk.ff.cuni.cz/intercorp/
Kolář, P.: Czech dictionary for ispell (2006), http://www.kai.vslib.cz/~kolar/rpms.html
Podobný, Z.: Slovak dictionary for ispell (2006), http://sk-spell.sk.cx
Bémová, A., Kuboň, V.: Czech-to-russian transducing dictionary. In: Proceedings of the 13th conference on Computational linguistics, Morristown, NJ, USA, pp. 314–316. Association for Computational Linguistics (1990)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)
Hajič, J.: Disambiguation of rich inflection: computational morphology of Czech. Karolinum (2004)
Spoustova, D., Hajic, J., Votrubec, J., Krbec, P., Kveton, P.: The best of two worlds: Cooperation of statistical and rule-based taggers for Czech. In: Proc. of the ACL Workshop on Balto-Slavonic Natural Language Processing (2007)
Wikipedia contributors: Wikipedia (2009), http://www.wikipedia.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grác, M. (2009). Trdlo, an Open Source Tool for Building Transducing Dictionary. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2009. Lecture Notes in Computer Science(), vol 5729. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04208-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-04208-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04207-2
Online ISBN: 978-3-642-04208-9
eBook Packages: Computer ScienceComputer Science (R0)