Skip to main content

Using Cognates to Improve Lexical Alignment Systems

  • Conference paper
  • 1639 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7499))

Abstract

In this paper, we describe a cognate detection module integrated into a lexical alignment system for French and Romanian. Our cognate detection module uses lemmatized, tagged and sentence-aligned legal parallel corpora. As a first step, this module apply a set of orthographic adjustments based on orthographic and phonetic similarities between French - Romanian pairs of words. Then, statistical techniques and linguistic information (lemmas, POS tags) are combined to detect cognates from our corpora. We automatically align the set of obtained cognates and the multiword terms containing cognates. We study the impact of cognate detection on the results of a baseline lexical alignment system for French and Romanian. We show that the integration of cognates in the alignment process improves the results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kondrak, G., Marcu, D., Knight, K.: Cognates Can Improve Statistical Translation Models. In: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003) Companion volume, Edmonton, Alberta, pp. 46–48 (2003)

    Google Scholar 

  2. Bergsma, S., Kondrak, G.: Multilingual Cognate Identification using Integer Linear Programming. In: RANLP 2007, Borovets, Bulgaria, pp. 11–18 (2007)

    Google Scholar 

  3. Inkpen, D., Frunză, O., Kondrak, G.: Automatic Identification of Cognates and False Friends in French and English. In: RANLP 2005, Bulgaria, pp. 251–257 (2005)

    Google Scholar 

  4. Simard, M., Foster, G., Isabelle, P.: Using cognates to align sentences. In: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, Montréal, pp. 67–81 (1992)

    Google Scholar 

  5. Adamson, G.W., Boreham, J.: The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval 10(7-8), 253–260 (1974)

    Article  Google Scholar 

  6. Brew, C., McKelvie, D.: Word-pair extraction for lexicography. In: Proceedings of International Conference on New Methods in Natural Language Processing, Bilkent, Turkey, pp. 45–55 (1996)

    Google Scholar 

  7. Melamed, D.I.: Bitext Maps and Alignment via Pattern Recognition. Computational Linguistics 25(1), 107–130 (1999)

    Google Scholar 

  8. Kraif, O.: Identification des cognats et alignement bi-textuel: une étude empirique. In: Actes de la 6éme conférence annuelle sur le Traitement Automatique des Langues Naturelles, TALN 1999, Cargése, pp. 205–214 (1999)

    Google Scholar 

  9. Wagner, R.A., Fischer, M.J.: The String-to-String Correction Problem. Journal of the ACM 21(1), 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  10. Oakes, M.P.: Computer Estimation of Vocabulary in Protolanguage from Word Lists in Four Daughter Languages. Journal of Quantitative Linguistics 7(3), 233–243 (2000)

    Article  Google Scholar 

  11. Todiraşcu, A., Ion, R., Navlea, M., Longo, L.: French text preprocessing with TTL. In: Proceedings of the Romanian Academy, Series A: Mathematics, Physics, Technical Sciences and Information Science, vol. 12(2), pp. 151–158. Romanian Academy Publishing House, Bucharest (2011)

    Google Scholar 

  12. Ion, R.: Metode de dezambiguizare semanticǎ automatǎ. Aplicaţii pentru limbile englezǎ şi românǎ. Ph.D. Thesis, Romanian Academy, Bucharest, 148 p. (May 2007)

    Google Scholar 

  13. Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  14. Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–312 (1993)

    Google Scholar 

  15. Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D.: Combined Aligners. In: Proceedings of the Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, pp. 107–110. Michigan, Ann Arbor (2005)

    Google Scholar 

  16. Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Edmonton, pp. 48–54 (May-June 2003)

    Google Scholar 

  17. Todiraşcu, A., Heid, U., Ştefǎnescu, D., Tufiş, D., Gledhill, C., Weller, M., Rousselot, F.: Vers un dictionnaire de collocations multilingue. Cahiers de Linguistique 33(1), 161–186 (2008)

    Google Scholar 

  18. Navlea, M., Todiraşcu, A.: Linguistic Resources for Factored Phrase-Based Statistical Machine Translation Systems. In: Proceedings of the International Workshop on Exploitation of Multilingual Resources and Tools for Central and (South-) Eastern European Languages, 7th International Conference on Language Resources and Evaluation (LREC 2010), Malta, pp. 41–48 (2010)

    Google Scholar 

  19. Navlea, M., Todiraşcu, A.: Using Cognates in a French - Romanian Lexical Alignment System: A Comparative Study. In: Proceedings of RANLP 2011, pp. 247–253. INCOMA Ltd., Bulgaria (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Navlea, M., Todirascu, A. (2012). Using Cognates to Improve Lexical Alignment Systems. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32790-2_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32789-6

  • Online ISBN: 978-3-642-32790-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics