Identification of Bilingual Segments for Translation Generation

Karimbi Mahesh, Kavitha; Gomes, Luís; Lopes, José Gabriel P.

doi:10.1007/978-3-319-12571-8_15

Kavitha Karimbi Mahesh^17,19,
Luís Gomes^17,18 &
José Gabriel P. Lopes^17,18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8819))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1490 Accesses
3 Citations
2 Altmetric

Abstract

We present an approach that uses known translation forms in a validated bilingual lexicon and identifies bilingual stem and suffix segments. By applying the longest sequence common to pair of orthographically similar translations we initially induce the bilingual suffix transformations (replacement rules). Redundant analyses are discarded by examining the distribution of stem pairs and associated transformations. Set of bilingual suffixes conflating various translation forms are grouped. Stem pairs sharing similar transformations are subsequently clustered which serves as a basis for the generative approach. The primary motivation behind this work is to eventually improve the lexicon coverage by utilising the correct bilingual entries in suggesting translations for OOV words. In the preliminary results, we report generation results, wherein, 90% of the generated translations are correct. This was achieved when both the bilingual segments (bilingual stem and bilingual suffix) in the bilingual pair being analysed are known to have occurred in the training data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gomes, L., Pereira Lopes, J.G.: Parallel texts alignment. In: New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, pp. 513–524 (2009)
Google Scholar
Aires, J., Pereira Lopes, J.G., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Progress in Artificial Intelligence, pp. 587–597 (2009)
Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational linguistics 19(2), 263–311 (1993)
Google Scholar
Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of Recent Advances in Natural Language Processing, pp. 214–218 (2009)
Google Scholar
Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011)
Chapter Google Scholar
Déjean, H.: Morphemes as necessary concept for structures discovery from untagged corpora. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 295–298. ACL (1998)
Google Scholar
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational linguistics 27(2), 153–198 (2001)
Article MathSciNet Google Scholar
Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning, vol. 6, pp. 21–30. ACL (2002)
Google Scholar
Hammarström, H., Borin, L.: Unsupervised learning of morphology. Computational Linguistics 37(2), 309–350 (2011)
Article Google Scholar
Monson, C., Carbonell, J., Lavie, A., Levin, L.: ParaMor and morpho challenge 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 967–974. Springer, Heidelberg (2009)
Chapter Google Scholar
Momouchi, H.S.K.A.Y., Tochinai, K.: Prediction method of word for translation of unknown word. In: Proceedings of the IASTED International Conference, Artificial Intelligence and Soft Computing, Banff, Canada, July 27-August 1, p. 228. Acta Pr. (1997)
Google Scholar
Koehn, P., Knight, K.: Empirical methods for compound splitting. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 187–193 (2003)
Google Scholar
Yang, M., Kirchhoff, K.: Phrase-based backoff models for machine translation of highly inflected languages. In: Proceedings of EACL, pp. 41–48 (2006)
Google Scholar
de Gispert, A., Mariño, J.B., Crego, J.M.: Improving statistical machine translation by classifying and generalizing inflected verb forms. In: Proceedings of 9th European Conference on Speech Communication and Technology, Lisboa, Portugal, pp. 3193–3196 (2005)
Google Scholar
de Gispert, A., Marino, J.B.: On the impact of morphology in english to spanish statistical mt. Speech Communication 50(11-12), 1034–1046 (2008)
Article Google Scholar
Snyder, B., Barzilay, R.: Unsupervised multilingual learning for morphological segmentation, pp. 737–745. ACL (2008)
Google Scholar
Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 209–217 (2009)
Google Scholar
Jisha, P.J., Rajeev, R.R.: Morphological analyser and morphological generator for malayalam-tamil machine translation. International Journal of Computer Applications 13(8), 15–18 (2011)
Article Google Scholar
Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 515–524. ACM (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

CITI (NOVA LINCS), Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Quinta da Torre, 2829-516, Caparica, Portugal
Kavitha Karimbi Mahesh, Luís Gomes & José Gabriel P. Lopes
ISTRION BOX-Translation & Revision, Lda., Parkurbis, Covilhã, 6200-865, Portugal
Luís Gomes & José Gabriel P. Lopes
Department of Computer Applications, St. Joseph Engineering College, Vamanjoor, Mangalore, 575 028, India
Kavitha Karimbi Mahesh

Authors

Kavitha Karimbi Mahesh
View author publications
You can also search for this author in PubMed Google Scholar
Luís Gomes
View author publications
You can also search for this author in PubMed Google Scholar
José Gabriel P. Lopes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, KU Leuven, 3001, Heverlee, Belgium
Hendrik Blockeel & Matthijs van Leeuwen &
Brunel University, UB8 3PH, Uxbridge, UK
Veronica Vinciotti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karimbi Mahesh, K., Gomes, L., Lopes, J.G.P. (2014). Identification of Bilingual Segments for Translation Generation. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds) Advances in Intelligent Data Analysis XIII. IDA 2014. Lecture Notes in Computer Science, vol 8819. Springer, Cham. https://doi.org/10.1007/978-3-319-12571-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-12571-8_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12570-1
Online ISBN: 978-3-319-12571-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics