Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6562))

Included in the following conference series:

  • 1068 Accesses

Abstract

In this paper, we present a new method for acquiring bilingual dictionaries from on-line text corpora. The method merges rule-based techniques for obtaining dictionaries from structuralised data, such as paper dictionaries (in electronic form) or on-line glossaries, with methods used by aligning tools, such as GIZA. The basic idea is to search for anchor words such as abstract or keywords followed by their equivalents in another language. Text fragments that follow anchor words are likely to supply new entries for bilingual lexica.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P.F., Cocke, J., Pietra, S.D., Pietra, V.J.D., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)

    Google Scholar 

  2. Cao, G., Gao, J., Nie, J.Y.: A system to mine large-scale bilingual dictionaries from monolingual web pages. Machine Translation Summit XI, 57–64 (2007)

    Google Scholar 

  3. Lipski, J.: Urównoleglanie tekstów dwujęzycznych na poziomie zdania. Master’s thesis, Adam Mickiewicz University in Poznań (2007)

    Google Scholar 

  4. Lopez, A., Resnik, P.: Word-based alignment, phrase-based translation: What’s the link? In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pp. 90–99 (2006)

    Google Scholar 

  5. Moore, R.C.: Fast and accurate sentence alignment of bilingual corpora. In: Stephen D, pp. 135–244. Springer, Heidelberg (2002)

    Google Scholar 

  6. Nazar, R., Wanner, L., Vivald, J.: Two step flow in bilingual lexicon extraction from unrelated corpora. In: Proceedings of the EAMT(European Association for Machine Translation) 2008 Conference, Hamburg, Germany, September 22-23 (2008)

    Google Scholar 

  7. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  8. Resnik, P.: Parallel strands: A preliminary investigation into mining the web for bilingual text. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 72–82. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  9. Rosińska, M.: Collecting Polish-German parallel corpora in the Internet. In: Proceedings of the International Multiconference on Computer Science and Information Technology, XXIII Autumn Meeting of Polish Information Processing Society, vol. 2 (2007)

    Google Scholar 

  10. Weaver, W.: Translation. In: Mimeographed, pp. 15–23. MIT Press, Cambridge (1949)

    Google Scholar 

  11. Xiaoyi, M., Liberman, M.: BITS. a method for bilingual text search over the Web. Machine Translation Summit VII, September 13 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Graliński, F., Jassem, K., Kurc, R. (2011). Acquiring Bilingual Lexica from Keyword Listings. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20095-3_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20094-6

  • Online ISBN: 978-3-642-20095-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics