Skip to main content

Issues in Analogical Inference Over Sequences of Symbols: A Case Study on Proper Name Transliteration

  • Chapter
  • First Online:
Computational Approaches to Analogical Reasoning: Current Trends

Part of the book series: Studies in Computational Intelligence ((SCI,volume 548))

Abstract

Formal analogies, that is, proportional analogies involving relations at a formal level (e.g. cordially is to cordial as appreciatively is to appreciative) have a long history in Linguistics. They can accommodate a wide variety of linguistic data without resorting to ad hoc representations and are inherently good at capturing long range dependencies between data. Unfortunately, applying analogical learning on top of formal analogy to current Natural Language Processing (NLP) tasks, which often involve massive amount of data, is quite challenging. In this chapter, we draw on our previous works and identify some issues that remain to be addressed for formal analogy to stand by itself in the landscape of NLP. As a case study, we monitor our current implementation of analogical learning on a task of transliterating English proper names into Chinese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We also use \([{{x} : {y}\,{: :}\,{z} : {t}}]\) as a predicate.

  2. 2.

    A solver typically produces several analogical solutions, among which a few are valid.

  3. 3.

    Anagram forms do not have to be considered separately.

  4. 4.

    Possibly involving filtering.

  5. 5.

    Typical values are min = max = 3, \(\kappa \) = 20 000, and \(\eta \) = 5 000.

  6. 6.

    The time measurements reported in this study have been made on an ordinary desktop computer, and are provided for illustration purposes only.

  7. 7.

    http://research.microsoft.com/en-us/um/beijing/projects/letor/.

  8. 8.

    http://en.wikipedia.org/wiki/Template:Transcription_into_Chinese. We could not segment lessan with this table.

  9. 9.

    For the English-into-Chinese transliteration task we consider, there is only one reference for each test form.

  10. 10.

    In the samp-tc configuration, 20 test forms receive several solutions with the same maximum frequency, among which the sanctioned transliteration. We do credit the system with a rank 1 solution in those cases, even if the correct transliteration is not listed first.

  11. 11.

    We downloaded the script news_evaluation.py at http://translit.i2r.a-star.edu.sg/news2009/evaluation/.

  12. 12.

    To the exception of the way we broke ties in the unfrequent cases where several solutions are produced with the top frequency.

  13. 13.

    17 features were considered, based on a greedy search over the feature space, minimizing the training error rate over 100 epochs.

  14. 14.

    We took this example just because the number of analogies involved is small enough.

References

  1. Turney, P., Littman, M.: Corpus-based learning of analogies and semantic relations. Mach. Learn. 60, 251–278 (2005)

    Article  Google Scholar 

  2. Duc, N.T., Bollegala, D., Ishizuka, M.: Cross-language latent relational search: mapping knowledge across languages. In: AAAI’11, pp. 1237–1242 (2011)

    Google Scholar 

  3. Marshall, J.B.: A self-watching model of analogy-making and perception. J. Exp. Theor. Artif. Intell. 18(3), 267–307 (2002)

    Article  Google Scholar 

  4. Hofstadter, D.R.: The Copycat Project: An Experiment in Nondeterminism and Creative Analogies, vol. 755. Massachusetts Institute of Technology, Cambridge (1984)

    Google Scholar 

  5. Mitchell, M.: Analogy-Making as Perception. MIT Press/Bradford Books, Cambridge (1993)

    Google Scholar 

  6. Yvon, F.: Paradigmatic cascades: a linguistically sound model of pronunciation by analogy. In: Proceedings of 35th ACL, pp. 429–435 (1997)

    Google Scholar 

  7. Lepage, Y., Shin-ichi, A.: Saussurian analogy: a theoretical account and its application. In: 7th COLING, pp. 717–722 (1996)

    Google Scholar 

  8. Yvon, F.: Finite-state machines solving analogies on words. Technical Report D008, École Nationale Supérieure des Télécommunications (2003)

    Google Scholar 

  9. Yvon, F., Stroppa, N., Delhay, A., Miclet, L.: Solving analogies on words. Technical Report D005, École Nationale Supérieure des Télécommuncations, Paris, France (2004)

    Google Scholar 

  10. Stroppa, N., Yvon, F.: Formal Models of Analogical Proportions. Available on HAL Portal (2007)

    Google Scholar 

  11. Miclet, L., Bayroudh, S., Delhay, A.: Analogical dissimilarity: definitions, algorithms and two experiments in machine learning. J. Artif. Intell. Res. 32, 793–824 (2008)

    Google Scholar 

  12. Ben Hassena, A.: Apprentissage analogique par analogie de structures d’arbres. Ph.D. thesis, University de Rennes I, France (2011)

    Google Scholar 

  13. Bhargava, A., Kondrak, G.: How do you pronounce your name? improving g2p with transliterations. In: 49th ACL/HLT, Portland, USA, pp. 399–408 (2011)

    Google Scholar 

  14. Lavallée, J.F., Langlais, P.: Moranapho: un système multilingue d’analyse morphologique basé sur l’analogie formelle. TAL 52, 17–44 (2011)

    Google Scholar 

  15. Kurimo, M., Virpioja, S., Turunen, V., Blackwood, G., Byrne, W.: Overview and results of morpho challenge. In: 10th Workshop of the Cross-Language Evaluation Forum (CLEF 2009). Lecture Notes in Computer Science, pp. 578–597 (2009)

    Google Scholar 

  16. Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unan-notated text. In: International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR’05), Espoo, Finland, pp. 106–113 (2005)

    Google Scholar 

  17. Spiegler, S.: Emma: A novel evaluation metric for morphological analysis—experimental results in detail. Technical Report CSTR-10-004, University of Bristol, Bristol (2010)

    Google Scholar 

  18. Stroppa, N., Yvon, F.: An analogical learner for morphological analysis. In: 9th Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, USA, pp. 120–127 (2005)

    Google Scholar 

  19. van den Bosch, A., Daelemans, W.: Data-oriented methods for grapheme-to-phoneme conversion. In: EACL, Utrecht, Netherlands, pp. 45–53 (1993)

    Google Scholar 

  20. Lepage, Y., Denoual, E.: Purest ever example-based machine translation: detailed presentation and assesment. Mach. Translat 19, 25–252 (2005)

    Google Scholar 

  21. Lepage, Y., Lardilleux, A., Gosme, J.: The greyc translation memory for the iwslt 2009 evaluation campaign: one step beyond translation memory. In: 6th IWSLT, Tokyo, Japan, pp. 45–49 (2009)

    Google Scholar 

  22. Langlais, P., Patry, A.: Translating unknown words by analogical learning. In: EMNLP, Prague, Czech Republic, pp. 877–886 (2007)

    Google Scholar 

  23. Denoual, E.: Analogical translation of unknown words in a statistical machine translation framework. In: MT Summit XI, Copenhagen, Denmark, pp. 135–141 (2007)

    Google Scholar 

  24. Langlais, P., Yvon, F., Zweigenbaum, P.: Improvements in analogical learning: application to translating multi-terms of the medical domain. In: 12th EACL, Athens, pp. 487–495 (2009)

    Google Scholar 

  25. Koehn, P.: Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: 6th AMTA, Washington DC (2004)

    Google Scholar 

  26. Somers, H., Sandapat, S., Naskar, S.K.: A review of ebmt using proportional analogies. In: 3rd Workshop on Example-Based Machine Translation, Dublin, Ireland, pp. 53–60 (2009)

    Google Scholar 

  27. Gosme, J., Lepage, Y.: Structure des trigrammes inconnus et lissage par analogie. In: 18e TALN, Montpellier, France (2011)

    Google Scholar 

  28. Claveau, V., L’Homme, M.C.: Structuring terminology by analogy-based machine learning. In: 7th International Conference on Terminology and Knowledge Engineering, Copenhagen, Denmark (2005)

    Google Scholar 

  29. Moreau, F., Claveau, V., Sébillot, P.: Automatic morphological query expansion using analogy-based machine learning. In: 29th European conference on IR research (ECIR’07), Berlin, Heidelberg, pp. 222–233 (2007)

    Google Scholar 

  30. Correa, W.F., Prade, H., Richard, G.: When intelligence is just a matter of copying. In: ECAI’12, pp. 276–281 (2012)

    Google Scholar 

  31. Langlais, P., Yvon, F.: Scaling up analogical learning. Technical report, Paritech, INFRES, IC2, Paris, France (2008)

    Google Scholar 

  32. Stroppa, N.: Définitions et caractérisations de modèles à base d’analogies pour l’apprentissage automatique des langues naturelles. Ph.D. thesis, Telecom Paris, ENST, Paris, France (2005)

    Google Scholar 

  33. Eisner, J.: Parameter estimation for probabilistic finite-state transducers. In: 40th ACL, Philadelphia, USA, pp. 1–8 (2002)

    Google Scholar 

  34. Lepage, Y., Lardilleux, A.: The greyc translation memory for the iwslt 2007 evaluation campaign. In: 4th IWSLT, Trento, Italy, pp. 49–54 (2008)

    Google Scholar 

  35. Ando, S., Lepage, Y.: Linguistic structure analysis by analogy: Its efficiency. In: NLPRS, Phuket, Thailand, pp. 401–406 (1997)

    Google Scholar 

  36. Dandapat, S., Morrissey, S., Naskar, S.K., Somers, H.: Mitigating problems in analogy-based ebmt with smt and vice versa: a case study with named entity transliteration. In: PACLIC, Sendai, Japan (2010)

    Google Scholar 

  37. Li, H., Kumaran, A., Pervouchine, V., Zhang, M.: Report of news 2009 machine transliteration shared task. In: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration. NEWS ’09, pp. 1–18 (2009)

    Google Scholar 

  38. Langlais, P.: Formal analogy for natural language processing: a review of issues to be adressed. In: 1st International Workshop Similarity and Analogy-based Methods in AI (ECAI Workshop), Montpellier, France, pp. 49–55 (2012)

    Google Scholar 

  39. Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. Mach. Learn. 37, 277–296 (1999)

    Article  MATH  Google Scholar 

  40. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  41. Eisner, J.: Parameter estimation for probabilistic finite-state transducers. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 1–8 (2002)

    Google Scholar 

  42. Lepage, Y.: Solving analogies on words: an algorithm. In: COLING-ACL, Montreal, Canada, pp. 728–733 (1998)

    Google Scholar 

Download references

Acknowledgments

This work has been partially founded by the Natural Sciences and Engineering Research Council of Canada (NSERC). We are grateful to the anonymous reviewers of the short paper submitted to the 2012 SAMAI workshop, as well as those that reviewed this article. We found one review in particular especially inspiring.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Langlais .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Langlais, P., Yvon, F. (2014). Issues in Analogical Inference Over Sequences of Symbols: A Case Study on Proper Name Transliteration. In: Prade, H., Richard, G. (eds) Computational Approaches to Analogical Reasoning: Current Trends. Studies in Computational Intelligence, vol 548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54516-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54516-0_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54515-3

  • Online ISBN: 978-3-642-54516-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics