Abstract
In this paper, we present a new method for acquiring bilingual dictionaries from on-line text corpora. The method merges rule-based techniques for obtaining dictionaries from structuralised data, such as paper dictionaries (in electronic form) or on-line glossaries, with methods used by aligning tools, such as GIZA. The basic idea is to search for anchor words such as abstract or keywords followed by their equivalents in another language. Text fragments that follow anchor words are likely to supply new entries for bilingual lexica.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown, P.F., Cocke, J., Pietra, S.D., Pietra, V.J.D., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)
Cao, G., Gao, J., Nie, J.Y.: A system to mine large-scale bilingual dictionaries from monolingual web pages. Machine Translation Summit XI, 57–64 (2007)
Lipski, J.: Urównoleglanie tekstów dwujęzycznych na poziomie zdania. Master’s thesis, Adam Mickiewicz University in Poznań (2007)
Lopez, A., Resnik, P.: Word-based alignment, phrase-based translation: What’s the link? In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pp. 90–99 (2006)
Moore, R.C.: Fast and accurate sentence alignment of bilingual corpora. In: Stephen D, pp. 135–244. Springer, Heidelberg (2002)
Nazar, R., Wanner, L., Vivald, J.: Two step flow in bilingual lexicon extraction from unrelated corpora. In: Proceedings of the EAMT(European Association for Machine Translation) 2008 Conference, Hamburg, Germany, September 22-23 (2008)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Resnik, P.: Parallel strands: A preliminary investigation into mining the web for bilingual text. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 72–82. Springer, Heidelberg (1998)
Rosińska, M.: Collecting Polish-German parallel corpora in the Internet. In: Proceedings of the International Multiconference on Computer Science and Information Technology, XXIII Autumn Meeting of Polish Information Processing Society, vol. 2 (2007)
Weaver, W.: Translation. In: Mimeographed, pp. 15–23. MIT Press, Cambridge (1949)
Xiaoyi, M., Liberman, M.: BITS. a method for bilingual text search over the Web. Machine Translation Summit VII, September 13 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Graliński, F., Jassem, K., Kurc, R. (2011). Acquiring Bilingual Lexica from Keyword Listings. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-20095-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20094-6
Online ISBN: 978-3-642-20095-3
eBook Packages: Computer ScienceComputer Science (R0)