Using Cognates to Improve Lexical Alignment Systems

Navlea, Mirabela; Todirascu, Amalia

doi:10.1007/978-3-642-32790-2_45

Using Cognates to Improve Lexical Alignment Systems

Mirabela Navlea²¹ &
Amalia Todirascu²¹

Conference paper

1639 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7499))

Abstract

In this paper, we describe a cognate detection module integrated into a lexical alignment system for French and Romanian. Our cognate detection module uses lemmatized, tagged and sentence-aligned legal parallel corpora. As a first step, this module apply a set of orthographic adjustments based on orthographic and phonetic similarities between French - Romanian pairs of words. Then, statistical techniques and linguistic information (lemmas, POS tags) are combined to detect cognates from our corpora. We automatically align the set of obtained cognates and the multiword terms containing cognates. We study the impact of cognate detection on the results of a baseline lexical alignment system for French and Romanian. We show that the integration of cognates in the alignment process improves the results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kondrak, G., Marcu, D., Knight, K.: Cognates Can Improve Statistical Translation Models. In: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003) Companion volume, Edmonton, Alberta, pp. 46–48 (2003)
Google Scholar
Bergsma, S., Kondrak, G.: Multilingual Cognate Identification using Integer Linear Programming. In: RANLP 2007, Borovets, Bulgaria, pp. 11–18 (2007)
Google Scholar
Inkpen, D., Frunză, O., Kondrak, G.: Automatic Identification of Cognates and False Friends in French and English. In: RANLP 2005, Bulgaria, pp. 251–257 (2005)
Google Scholar
Simard, M., Foster, G., Isabelle, P.: Using cognates to align sentences. In: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, Montréal, pp. 67–81 (1992)
Google Scholar
Adamson, G.W., Boreham, J.: The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval 10(7-8), 253–260 (1974)
Article Google Scholar
Brew, C., McKelvie, D.: Word-pair extraction for lexicography. In: Proceedings of International Conference on New Methods in Natural Language Processing, Bilkent, Turkey, pp. 45–55 (1996)
Google Scholar
Melamed, D.I.: Bitext Maps and Alignment via Pattern Recognition. Computational Linguistics 25(1), 107–130 (1999)
Google Scholar
Kraif, O.: Identification des cognats et alignement bi-textuel: une étude empirique. In: Actes de la 6éme conférence annuelle sur le Traitement Automatique des Langues Naturelles, TALN 1999, Cargése, pp. 205–214 (1999)
Google Scholar
Wagner, R.A., Fischer, M.J.: The String-to-String Correction Problem. Journal of the ACM 21(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar
Oakes, M.P.: Computer Estimation of Vocabulary in Protolanguage from Word Lists in Four Daughter Languages. Journal of Quantitative Linguistics 7(3), 233–243 (2000)
Article Google Scholar
Todiraşcu, A., Ion, R., Navlea, M., Longo, L.: French text preprocessing with TTL. In: Proceedings of the Romanian Academy, Series A: Mathematics, Physics, Technical Sciences and Information Science, vol. 12(2), pp. 151–158. Romanian Academy Publishing House, Bucharest (2011)
Google Scholar
Ion, R.: Metode de dezambiguizare semanticǎ automatǎ. Aplicaţii pentru limbile englezǎ şi românǎ. Ph.D. Thesis, Romanian Academy, Bucharest, 148 p. (May 2007)
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–312 (1993)
Google Scholar
Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D.: Combined Aligners. In: Proceedings of the Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, pp. 107–110. Michigan, Ann Arbor (2005)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Edmonton, pp. 48–54 (May-June 2003)
Google Scholar
Todiraşcu, A., Heid, U., Ştefǎnescu, D., Tufiş, D., Gledhill, C., Weller, M., Rousselot, F.: Vers un dictionnaire de collocations multilingue. Cahiers de Linguistique 33(1), 161–186 (2008)
Google Scholar
Navlea, M., Todiraşcu, A.: Linguistic Resources for Factored Phrase-Based Statistical Machine Translation Systems. In: Proceedings of the International Workshop on Exploitation of Multilingual Resources and Tools for Central and (South-) Eastern European Languages, 7th International Conference on Language Resources and Evaluation (LREC 2010), Malta, pp. 41–48 (2010)
Google Scholar
Navlea, M., Todiraşcu, A.: Using Cognates in a French - Romanian Lexical Alignment System: A Comparative Study. In: Proceedings of RANLP 2011, pp. 247–253. INCOMA Ltd., Bulgaria (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

LILPA, Université de Strasbourg, 22 rue René Descartes, BP 80010, 67084, Strasbourg Cedex, France
Mirabela Navlea & Amalia Todirascu

Authors

Mirabela Navlea
View author publications
You can also search for this author in PubMed Google Scholar
Amalia Todirascu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Navlea, M., Todirascu, A. (2012). Using Cognates to Improve Lexical Alignment Systems. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-32790-2_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32789-6
Online ISBN: 978-3-642-32790-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics