Advertisement

A Seed Based Method for Dictionary Translation

  • Robert Krajewski
  • Henryk Rybiński
  • Marek Kozłowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8502)

Abstract

The paper refers to the topic of automatic machine translation. The proposed method enables translating a dictionary by means of mining repositories in the source and target repository, without any directly given relationships connecting two languages. It consists of two stages: (1) translation by lexical similarity, where words are compared graphically, and (2) translation by semantic similarity, where contexts are compared. Polish and English version of Wikipedia were used as multilingual corpora. The method and its stages are thoroughly analyzed. The results allow implementing this method in human-in-the-middle systems.

Keywords

Machine translation dictionary translation semantic similarity multilingual corpus 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pimienta, D., Prado, D., Blanco, A.: Twelve years of measuring linguistic diversity in the Internet: Balance and perspectives. United Nations Educational, Scientific and Cultural Organization (2009)Google Scholar
  2. 2.
    Salton, G.: Automatic processing of foreign language documents. Journal of the American Society for Information Science 21(3) (1970)Google Scholar
  3. 3.
    Hull, D., Grefenstette, G.: Querying across languages: a dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1996)Google Scholar
  4. 4.
    Ballesteros, L., Croft, W.: QPhrasal translation and query expansion techniques for crosslanguage information retrieval. In: ACM SIGIR Forum, vol. 31. ACM (1997)Google Scholar
  5. 5.
    Pirkola, A.: The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1998)Google Scholar
  6. 6.
    Sorg, P., Cimiano, P.: Cross-lingual information retrieval with explicit semantic analysis. In: Working Notes for the CLEF Workshop (2008)Google Scholar
  7. 7.
    Sorg, P., Cimiano, P.: Enriching the crosslingual link structure of wikipedia classification-based approach. In: Proceedings of the AAAI 2008 Workshop on Wikipedia and Artifical Intelligence (2008)Google Scholar
  8. 8.
    Soergel, D.: Multilingual thesauri in cross-language text and speech retrieval. In: AAAI Symposium on Cross-Language Text and Speech Retrieval (1997)Google Scholar
  9. 9.
    Brown, P., et al.: A statistical approach to machine translation. Computational linguistics 16(2) (1990)Google Scholar
  10. 10.
    Koehn, P., Och, F., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1. Association for Computational Linguistics (2003)Google Scholar
  11. 11.
    Koehn, P., et al.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2007)Google Scholar
  12. 12.
    Deng, Y., Byrne, W.: Hmm word and phrase alignment for statistical machine translation. IEEE Transactions Audio, Speech, and Language Processing (2008)Google Scholar
  13. 13.
    Dumais, S., Letsche, T., Littman, M., Landauer, T.: Automatic cross-language retrieval using latent semantic indexing. In: AAAI Spring Symposium on Cross-Language Text and Speech Retrieval (1997)Google Scholar
  14. 14.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI (2007)Google Scholar
  15. 15.
    Navigli, R., Ponzetto, S.: BabelNet: Building a very large multilingual semantic network. In: 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden (2010)Google Scholar
  16. 16.
    McCrae, J., Espinoza, M., Montiel-Ponsoda, E., Aguado de Cea, G., Cimiano, P.: Combining statistical and semantic approaches to the translation of ontologies and taxonomies. In: Proceedings of the Fifth Workshop on Syntax, Structure and Semantics in Statistical Translation, Uppsala, Sweden (2010)Google Scholar
  17. 17.
    Rapp, R.: Identifying word translations in non-parallel texts. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1995)Google Scholar
  18. 18.
    Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computationald Linguistics on Computational Linguistics. Association for Computational Linguistics (1999)Google Scholar
  19. 19.
    Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, vol. 9. Association for Computational Linguistics (2002)Google Scholar
  20. 20.
    Rybiński, H., Kryszkiewicz, M., Protaziuk, G., Kontkiewicz, A., Marcinkowska, K., Delteil, A.: Discovering Word Meanings Based on Frequent Termsets. In: Raś, Z.W., Tsumoto, S., Zighed, D.A. (eds.) MCD 2007. LNCS (LNAI), vol. 4944, pp. 82–92. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  21. 21.
    Kozlowski, M.: Word sense discovery using frequent termsets. PhD Thesis, Warsaw University of Technology (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Robert Krajewski
    • 1
  • Henryk Rybiński
    • 1
  • Marek Kozłowski
    • 1
  1. 1.Warsaw University of TechnologyWarsawPoland

Personalised recommendations