Skip to main content

Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Abstract

So far, research on extraction of translation equivalents from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed patterns generated from external bilingual dictionaries, allows us to achieve similar results to those from parallel corpus.In this way, the huge amount of comparable corpora available via Web can be viewed as a never-ending source of lexicographic information. In this paper, we describe the experiments performed on a comparable, Spanish-Galician corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahrenberg, L., Andersson, M., Merkel, M.: A simple hybrid aligner for generating lexical correspondences in parallel texts. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, pp. 29–35 (1998)

    Google Scholar 

  2. Armentano-Oller, C., et al.: Open-source portuguese-spanish machine translation. In: Vieira, R., et al. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 50–59. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Carreras, X., Chao, I., Padró, L., Padró, M.: An open-source suite of language analyzers. In: 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal (2004)

    Google Scholar 

  4. Chiao, Y.-C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: 19th COLING 2002 (2002)

    Google Scholar 

  5. Dejean, H., Gaussier, E., Sadat, F.: Bilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora. In: COLING 2002, Tapei, Taiwan (2002)

    Google Scholar 

  6. Fung, P., McKeown, K.: Finding terminology translation from non-parallel corpora. In: 5th Annual Workshop on Very Large Corpora, pp. 192–202 (1997)

    Google Scholar 

  7. Fung, P., Yee, L.Y.: An ir approach for translating new words from nonparallel, comparable texts. In: Coling 1998, Montreal, Canada, pp. 414–420 (1998)

    Google Scholar 

  8. Gale, W., Church, K.: Identifying word correspondences in parallel texts. In: Workshop DARPA SNL (1991)

    Google Scholar 

  9. Gamallo, P.: Learning bilingual lexicons from comparable english and spanish corpora. In: Machine Translation SUMMIT XI, Copenhagen, Denmark (2007)

    Google Scholar 

  10. Gamallo, P., Agustini, A., Lopes, G.: Clustering syntactic positions with similar semantic requirements. Computational Linguistics 31(1), 107–146 (2005)

    Article  Google Scholar 

  11. Gamallo, P., Pichel, J.R.: An approach to acquire word translations from non-parallel corpora. In: Bento, C., Cardoso, A., Dias, G. (eds.) EPIA 2005. LNCS (LNAI), vol. 3808, Springer, Heidelberg (2005)

    Google Scholar 

  12. Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, USA (1994)

    MATH  Google Scholar 

  13. Harris, Z.: Distributional structure. In: Katz, J.J. (ed.) The Philosophy of Linguistics, pp. 26–47. Oxford University Press, New York (1985)

    Google Scholar 

  14. Kwong, O.Y., Tsou, B.K., Lai, T.B.: Alignment and extraction of bilingual legal terminology from context profiles. Terminology 10(1), 81–99 (2004)

    Article  Google Scholar 

  15. Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL 1998, Montreal (1998)

    Google Scholar 

  16. Melamed, D.: A portable algorithm for mapping bitext correspondences. In: 35th Conference of the Association of Computational Linguistics, Madrid, Spain, pp. 305–312 (1997)

    Google Scholar 

  17. Nakagawa, H.: Disambiguation of single noun translations extracted from bilingual comparable corpora. Terminology 7(1), 63–83 (2001)

    Google Scholar 

  18. Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: ACL 1999, pp. 519–526 (1999)

    Google Scholar 

  19. Shao, L., Ng, H.T.: Mining new word translations from comparable corpora. In: 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 618–624 (2004)

    Google Scholar 

  20. Silva, J.F., Dias, G., Guilloré, S., Lopes, G.P.: Using localmaxs algorithm for the extraction of contiguous and non-contiguous multiword lexical units. In: Progress in Artificial Intelligence. LNCS (LNAI), pp. 113–132. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  21. Tanala, T.: Measuring the similarity between compound nouns in different languages using non-parallel corpora. In: 19th COLING 2002, pp. 981–987 (2002)

    Google Scholar 

  22. Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: 11th Nordic Conference of Computational Linguistics, Copenhagen, Denmark (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gamallo Otero, P., Pichel Campos, J.R. (2008). Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78135-6_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78134-9

  • Online ISBN: 978-3-540-78135-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics