Skip to main content

Mining Parenthetical Translations for Polish-English Lexica

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

  • 1783 Accesses

Abstract

Documents written in languages other than English sometimes include parenthetical English translations, usually for technical and scientific terminology. Techniques had been developed for extracting such translations (as well as transliterations) from large Chinese text corpora. This paper presents methods for mining parenthetical translation in Polish texts. The main difference between translation mining in Chinese and Polish is that the latter is based on the Latin alphabet and it is more difficult to identify English translations in Polish texts. On the other hand, some parenthetically translated terms are preceded with the abbreviation ”ang.” (=English), a kind of an ”anchor”, allowing for querying a Web search engine for such translations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Resnik, P., Smith, N.A.: The web as a parallel corpus. Comput. Linguist. 29(3), 349–380 (2003)

    Article  Google Scholar 

  2. Melamed, I.D.: Automatic discovery of non-compositional compounds in parallel data. In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (1997)

    Google Scholar 

  3. Shao, L., Ng, H.T.: Mining new word translations from comparable corpora. In: COLING 2004: Proceedings of the 20th international conference on Computational Linguistics, Morristown, NJ, USA. Association for Computational Linguistics, p. 618 (2004)

    Google Scholar 

  4. Haghighi, A., Liang, P., Berg-Kirkpatrick, T., Klein, D.: Learning bilingual lexicons from monolingual corpora. In: Proceedings of ACL-08: HLT, Columbus, Ohio. Association for Computational Linguistics, pp. 771–779 (2008)

    Google Scholar 

  5. Jiang, L., Yang, S., Zhou, M., Liu, X., Zhu, Q.: Mining bilingual data from the web with adaptively learnt patterns. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore. Association for Computational Linguistics, pp. 870–878 (2009)

    Google Scholar 

  6. Graliński, F., Jassem, K., Kurc, R.: Acquiring bilingual lexica from keyword listings. In: Vetulani, Z. (ed.) Proceedings of 4th Language & Technology Conference, Poznań, Wydawnictwo Poznańskie Sp. z o.o, pp. 326–330 (2009)

    Google Scholar 

  7. Cao, G., Gao, J., Nie, J.Y.: A system to mine large-scale bilingual dictionaries from monolingual web pages. In: MT Summit XI, pp. 57–64 (2007)

    Google Scholar 

  8. Wu, X., Okazaki, N., Tsujii, J.: Semi-supervised lexicon mining from parenthetical expressions in monolingual web pages. In: NAACL 2009: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Morristown, NJ, USA. Association for Computational Linguistics, pp. 424–432 (2009)

    Google Scholar 

  9. Lin, D., Zhao, S., Van Durme, B., Paşca, M.: Mining parenthetical translations from the web by word alignment. In: Proceedings of ACL 2008: HLT, Columbus, Ohio. Association for Computational Linguistics, pp. 994–1002 (2008)

    Google Scholar 

  10. Melamed, I.D.: Models of translational equivalence among words. Comput. Linguist. 26, 221–249 (2000)

    Article  Google Scholar 

  11. Tiedemann, J.: Word to word alignment strategies. In: COLING 2004: Proceedings of the 20th international conference on Computational Linguistics, Morristown, NJ, USA. Association for Computational Linguistics, p. 212 (2004)

    Google Scholar 

  12. Pantel, P., Pennacchiotti, M.: Espresso: leveraging generic patterns for automatically harvesting semantic relations. In: ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Morristown, NJ, USA. Association for Computational Linguistics, pp. 113–120 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Graliński, F. (2010). Mining Parenthetical Translations for Polish-English Lexica. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12116-6_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12115-9

  • Online ISBN: 978-3-642-12116-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics