Abstract
The dictionaries are one of the most useful lexical resources. However, most of the dictionaries today are not in digital form. This makes them cumbersome for usage by humans and impossible for integration in computer programs. The process of digitalizing an existing traditional dictionary is expensive and labor intensive task. In this paper, we present a method for development of Machine Readable Dictionaries by using the already available resources. Machine readable dictionary consists of simple word-toword mappings, where word from the source language can be mapped into several optional words in the target language. We present a series of experiments where by using the parallel corpora and open source Statistical Machine Translation tools at our disposal, we managed to develop an English- Macedonian Machine Readable Dictionary containing 23,296 translation pairs (17,708 English and 18,343 Macedonian terms). A subset of the produced dictionary has been manually evaluated and showed accuracy of 79.8%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Charitakis, K.: Using parallel corpora to create a Greek-English dictionary with Uplug. In: Nodalida (2007)
Tiedemann, J.: Automatical Lexicon Extraction from Aligned Bilingual Corpora. Master Thesis at University of Magdeburg (1997)
Velupillai, S., Dalianis, H.: Automatic Construction of Domain-specific Dictionaries on Sparse Parallel Corpora in the Nordic Languages. In: Coling (ed.) Workshop on Multi-source Multilingual Information Extraction and Summarization, Manchester (2008)
Hao-chun, X., Xin, Z.: Using parallel corpora and Uplug to create a Chinese-English dictionary. Master Thesis at Stockholm University, Royal Institute of Technology (2008)
Stolic, M., Zdravkova, K.: Resources for Machine Translation of the Macedonian Language. In: ICT Innovations Conference, Ohrid, Macedonia (2009)
Tiedemann, J.: News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In: Nicolov, N., Bontcheva, K., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing, vol. 5, pp. 237–248. Amsterdam (2009)
Tibor, K., Strunk, J.: Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics 32(4) (2006)
NLTK - Natural Language Toolkit, http://www.nltk.org/
Varga, D., et al.: Parallel corpora for medium density languages. In: Recent Advances in Natural Language Processing, pp. 590–596 (2005)
Tiedemann, J.: Recycling Translations - Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing. Doctoral Thesis at Uppsala University (2003)
XCES - Corpus Encoding Standard for XML, http://www.xces.org/
Petrovski, A.: Морфолошки компјутерски речник - придонес кон македонските јазични ресурси. Doctoral Thesis, Cyril and Methodius University. In Macedonian (2008)
Dagan, I., Church, W.: Termight: Identifying and Translating Technical Terminology. In: Conference on Applied Natural Language Processing, pp. 34–40 (1994)
Fung, P., McKeown, K.: A Technical Word and Term Translation Aid using Noisy Parallel Corpora Across Language Groups. In: The Machine Translation Journal, Special Issue on New Tools for Human Translators, pp. 53–87 (1996)
Merkel, M., Ahrenberg, L.: Evaluating Word Alignment Systems. In: Second International Conference on Language Resources and Evaluation (LREC), pp. 1255–1261 (2000)
WordNet, http://wordnet.princeton.edu/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saveski, M., Trajkovski, I. (2011). Development of an English-Macedonian Machine Readable Dictionary by Using Parallel Corpora. In: Gusev, M., Mitrevski, P. (eds) ICT Innovations 2010. ICT Innovations 2010. Communications in Computer and Information Science, vol 83. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19325-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-19325-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19324-8
Online ISBN: 978-3-642-19325-5
eBook Packages: Computer ScienceComputer Science (R0)