Spoken Language Translation

  • Farzad Ehsani
  • Robert Frederking
  • Manny Rayner
  • Pierrette Bouillon


Researchers in the field of spoken language translation are plagued by a device from popular science fiction. Numerous television series and movies, most notably those in the “Star Trek” franchise, have assumed the existence of a Universal Translator, a device that immediately understands any language (human or alien), translates it into the other person’s language (always correctly), and speaks it fluently, with appropriate prosody. While this is a very useful plot device, avoiding tedious stretches of translation and the need to invent convincing alien languages, it sets up wildly unrealistic expectations on the part of the public [1]. In contrast, anything that is actually possible can only be a disappointment.


Speech Recognition Language Model Machine Translation Automatic Speech Recognition Speech Synthesis 


  1. 1.
    Adams, D. (1979). The Hitchhiker's Guide to the Galaxy, London: Pan Books.Google Scholar
  2. 2.
    Levinson, S., Liberman, M. (1981). Speech recognition by computer. Sci. Am., 64-76.Google Scholar
  3. 3.
    Weinstein, C., McCandless, S., Mondshein, L., Zue, V. (1975). A system for acoustic-phonetic analysis of continuous speech. IEEE Trans. Acoust. Speech Signal Process., 54-67.Google Scholar
  4. 4.
    Bernstein, J., Franco, H. (1996). Speech recognition by computer. In: Principles of Experimental Phonetics, St. Louis: Mosby, 408-434.Google Scholar
  5. 5.
    Young, S. (1996). A review of large-vocabulary continuous-speech recognition. IEEE Signal Process. Mag., 45-57.Google Scholar
  6. 6.
    Ehsani, F., Knodt, E. (1998). Speech technology in computer-aided language learning: strengths and limitations of a new CALL paradigm. Language Learning and Technology, 2, 45-60. Available online, February 2010: http://llt.msu.edu/vol2num1/article3/index.html.
  7. 7.
    Deng, L., Huang, X. (2004). Challenges in adopting speech recognition. Commun. ACM, (47-1), 69-75.Google Scholar
  8. 8.
    Nyberg, E., Mitamura, T. (1992). The KANT system: fast, accurate, high-quality translation in practical domains. In: Proc. 14th Conf. on Computational Linguistics, Nantes, France.Google Scholar
  9. 9.
    Cavalli-Sforza, V., Czuba, K., Mitamura, T., Nyberg, E. (2000). Challenges in adapting an interlingua for bidirectional english-italian translation. In: Proc. 4th Conf. Assoc. Machine Translation in the Americas on Envisioning Machine Translation in the Information Future, 169-178.Google Scholar
  10. 10.
    Somers, H. (1999). Review article: Example-based machine translation. Machine Translation, 113-157.Google Scholar
  11. 11.
    Brown, R. (1996). Example-based machine translation in the pangloss system. In: Proc. 16th Int. Conf. on Computational Linguistics (COLING-96), Copenhagen, Denmark.Google Scholar
  12. 12.
    Trujillo, A. (1999). Translation engines: Techniques for machine translation. London: Springer.CrossRefMATHGoogle Scholar
  13. 13.
    Brown, P., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Lafferty, J., Mercer, R., Roossin, P. (1990). A statistical approach to machine translation, Comput. Linguistics, 16(2), 79-85.Google Scholar
  14. 14.
    Berger, A., Della Pietra, V., Della Pietra, S. (1996). A maximum entropy approach to natural language processing. Comput. Linguistics, 22(1), 39-71.Google Scholar
  15. 15.
    Brown, R., Frederking, R. (1995). Applying statistical English language modeling to symbolic machine translation. In: Proc. 6th Int. Conf. on Theoretical and Methodological Issues in Machine Translation (TMI-95): Leuven, Belgium, 221-239.Google Scholar
  16. 16.
    Knight, K. (1999). A statistical MT tutorial workbook. Unpublished. Available online, May 2010: http://www.isi.edu/natural-language/mt/wkbk.rtf.
  17. 17.
    Koehn, P., Knight, K. (2001). Knowledge sources for word-level translation models. In: Proc. EMNLP 2001 Conf. on Empirical Methods in Natural Language Processing, Pittsburgh, PA, 27-35.Google Scholar
  18. 18.
    Brown, R. (1999). Adding linguistic knowledge to a lexical example-based translation system. In: Proc. TMI-99, Chester, England.Google Scholar
  19. 19.
    Yamada, K., Knight, K. (2001). A syntax-based statistical translation model. In: Proc. 39th Annual Meeting on Association for Computational Linguistics, Toulouse, France, 523-530.Google Scholar
  20. 20.
    Alshawi, H., Douglas, S., Bangalore, S. (2000). Learning dependency translation models as collections of finite-state head transducers. Comput. Linguistics, 26(1), 45-60.MathSciNetCrossRefGoogle Scholar
  21. 21.
    Wu, D. (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguistics, 23(3), 377-403.Google Scholar
  22. 22.
    Wang, Y. (1998). Grammar inference and statistical machine translation. Ph.D. thesis, Carnegie Mellon University.Google Scholar
  23. 23.
    Och, F., Tillmann, J., Ney, H. (1999). Improved alignment models for statistical machine translation. In: Proc. Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora: University of Maryland, College Park, MD, 20-28.Google Scholar
  24. 24.
    Galley, M., Graehl, J., Knight, K., Marcu, D., DeNeefe, S., Wang, W., Thayer, I. (2006). Scalable inference and training of context-rich syntactic translation models. In: Proc. 21st Int. Conf. on Computation Linguistics, Sydney, 961-968.Google Scholar
  25. 25.
    Venugopal, A. (2007). Hierarchical and Syntax Structured Models, MT Marathon, Edinburgh, Scotland.Google Scholar
  26. 26.
    Chiang, D. (2007). Hierarchical phrase-based translation. Assoc. Comput. Linguistics, 33(2), 201-228.MATHGoogle Scholar
  27. 27.
    Schultz, T., Black, A. (2006). Challenges with rapid adaptation of speech translation systems to new language pairs. In: Proc. ICASSP2006, Toulouse, France.Google Scholar
  28. 28.
    Waibel, A. (1996). Interactive translation of conversational speech. Computer, 29(7), 41-48.CrossRefGoogle Scholar
  29. 29.
    Woszczyna, M., Coccaro, N., Eisele, A., Lavie, A., McNair, A., Polzin, T., Rogina, I., Rose, C., Sloboda, T., Tomita, T., Tsutsumi, J., Aoki-Waibel, N., Waibel, A., Ward, W. (1993). Recent advances in JANUS: A speech translation system. In: Proc. Workshop on Human Language Technology, Princeton, NJ.Google Scholar
  30. 30.
    Wahlster, W. (2002). Verbmobil: Foundations of Speech-to-Speech Translation, Springer, Berlin.Google Scholar
  31. 31.
    Rayner, M. H., Alshawi, I., Bretan, D., Carter, V., Digalakis, B., Gambck, J., Kaja, J., Karlgren, B., Lyberg, P., Price, S., Pulman, S., Samuelsson, C. (1993). A speech to speech translation system built from standard components. In: Proc. 1993 ARPA workshop on Human Language Technology, Princeton, NJ.Google Scholar
  32. 32.
    Rayner, M., Carter, D. (1997). Hybrid language processing in the spoken language translator. In: Proc. ICASSP'97, Munich, Germany.Google Scholar
  33. 33.
    Rayner, M., Carter, D., Bouillon, P., Wiren, M., Digalakis, V. (2000). The Spoken Language Translator, Cambridge University Press, Cambridge.MATHGoogle Scholar
  34. 34.
    Isotani, R., Yamabana, K., Ando, S., Hanazawa, K., Ishikawa, S., Iso, K. (2003). Speech- to-Speech Translation Software on PDAs for Travel Conversation. NEC Research and Development.Google Scholar
  35. 35.
    Yasuda, K., Sugaya, F., Toshiyuki, T., Seichi, Y., Masuzo, Y. (2003). An automatic evaluation method of translation quality using translation answer candidates queried from a parallel corpus. In: Proc. Machine Translation Summit VIII, 373-378. Santiago de Compostela, Spain.Google Scholar
  36. 36.
    Metze, F., McDonough, J., Soltau, H., Waibel, A., Lavie, A., Burger, S., Langley, C., Laskowski, K., Levin, L., Schultz, T., Pianesi, F., Cattoni, R., Lazzari, G., Mana, N., Pianta, E., Besacier, L., Blanchon, H., Vaufreydaz, D., Taddei, L. (2002). The NESPOLE! Speech-to-speech translation system. In: Proc. HLT 2002, San Diego, CA.Google Scholar
  37. 37.
    Bangalore, S., Riccardi, G. (2000). Stochastic finite-state models for spoken language machine translation. In: NAACL-ANLP 2000 Workshop on Embedded Machine Translation Systems, Seattle, WA, 52-59.Google Scholar
  38. 38.
    Zhang, Y. (2003). Survey of Current Speech Translation Research. Unpublished. Available online, May 2010: http://projectile.sv.cmu.edu/research/public/talks/speechTranslation/sst-survey-joy.pdf
  39. 39.
    Agnas, M. S., Alshawi, H., Bretan, I., Carter, D. M., Ceder, K., Collins, M., Crouch, R., Digalakis, V., Ekholm, B., Gamback, B., Kaja, J., Karlgren, J., Lyberg, B., Price, P., Pulman, S., Rayner, M., Samuelsson, C., Svensson, T. (1994). Spoken language translator: first year report. SRI Technical Report CRC-043.Google Scholar
  40. 40.
    Digalakis, V., Monaco, P. (1996). Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers. IEEE Trans. Speech Audio Process., 4(4), 281-289.CrossRefGoogle Scholar
  41. 41.
    Alshawi, H. (1992). The Core Language Engine. MIT Press, Cambridge, MA.Google Scholar
  42. 42.
    Alshawi, H., van Eijck, J. (1989). Logical forms in the core language engine. In: Proc. 27th Annual Meeting on Association for Computational Linguistics, Vancouver, British Columbia, Canada, 25-32.Google Scholar
  43. 43.
    Alshawi, H., Carter, D. (1994). Training and scaling preference functions for disambiguation. Comput. Linguistics, 20(4), 635-648.Google Scholar
  44. 44.
    Rayner, M., Samuelsson, C. (1994). Grammar Specialisation. In: [39], 39-52.Google Scholar
  45. 45.
    Samuelsson, C. (1994). Fast natural-language parsing using explanation-based learning. PhD thesis, Royal College of Technology, Stockholm, Sweden.Google Scholar
  46. 46.
    Frederking, R., Nirenburg, S. (1994). Three heads are better than one. In: Proc. 4th Conf. on Applied Natural Language Processing, Stuttgart, Germany.Google Scholar
  47. 47.
    Rayner, M., Bouillon, P. (2002). A flexible speech to speech phrasebook translator. In: Proc. ACL Workshop on Speech-to-Speech Translation, Philadelphia, PA.Google Scholar
  48. 48.
    Rayner, M., Hockey, B. A., Bouillon, P. (2006). Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler. CSLI Press, Stanford, CA.Google Scholar
  49. 49.
    Rayner, M., Bouillon, P., Santaholma, M., Nakao, Y. (2005). Representational and architec- tural issues in a limited-domain medical speech translator. In: Proc. TALN 2005, Dourdan, France.Google Scholar
  50. 50.
    Chatzichrisafis, N., Bouillon, P., Rayner, M., Santaholma, M., Starlander, M., Hockey, B. A. (2006). Evaluating task performance for a unidirectional controlled language medical speech translation system. In: Proc. 1st Int. Workshop on Medical Speech Translation, HLT-NAACL, New York, NY.Google Scholar
  51. 51.
    Sarich, A. (2004). Development and fielding of the phraselator phrase translation system. In: Proc. 26th Conf. on Translating and the Computer, London.Google Scholar
  52. 52.
    Frederking, R., Rudnicky, A., Hogan, C., Lenzo, K. (2000). Interactive speech translation in the Diplomat project. Machine Translation J., Special Issue on Spoken Language Translation, 15(1-2), 27-42.Google Scholar
  53. 53.
    Huang X., Alleva F., Hon H. W., Hwang K. F., Lee M. Y., Rosenfeld R. (1993). The SPHINX- II Speech Recognition System: An overview. Comput. Speech Lang., 2, 137-148.CrossRefGoogle Scholar
  54. 54.
    Lenzo, K., Hogan, C., Allen, J. (1998). Rapid-deployment text-to-speech in the DIPLOMAT system. In: Proc. 5th Int. Conf. on Spoken Language Processing (ICSLP-98), Sydney, Australia.Google Scholar
  55. 55.
    Frederking, R., Brown, R. (1996). The Pangloss-Lite machine translation system. In: Proc. Conf. Assoc. for Machine Translation in the Americas (AMTA).Google Scholar
  56. 56.
    Nielsen, J. (1993). Usability Engineering. AP Professional, Boston, MA.MATHGoogle Scholar
  57. 57.
    Rudnicky, A. (1995). Language modeling with limited domain data. In: Proc. ARPA Workshop on Spoken Language Technology, Morgan Kaufmann, San Francisco, CA, 66-69.Google Scholar
  58. 58.
    Gates, D., Lavie, A., Levin, L., Waibel, A., Gavaldá, M., Mayfield, L., Woszczyna, M., Zhan, P. (1996). End-to-end evaluation in JANUS: A speech-to-speech translation system. In: Workshop on Dialogue Processing in Spoken Language Systems. Lecture Notes in Computer Science, Springer, Berlin.Google Scholar
  59. 59.
    Black, A., Brown, R., Frederking, R., Lenzo, K., Moody, J., Rudnicky, A., Singh, R., Steinbrecher, E. (2002). Rapid development of speech-to-speech translation systems. In: Proc. ICSLP-2002, Denver.Google Scholar
  60. 60.
    Black, A., Lenzo, K. (2000). Building voices in the festival speech synthesis system. Unpublished. Available online, May 2010: http://www.festvox.org/festvox/index.html.
  61. 61.
    Joachims, T. (2002). Learning to Classify Text Using Support Vector Machines. Dissertation, Kluwer.CrossRefGoogle Scholar
  62. 62.
    Klinkenberg, R., Joachims, T. (2000). Detecting concept drift with support vector machines. In: Proc. 17th Int. Conf. on Machine Learning (ICML), Morgan Kaufmann, San Francisco, CA.Google Scholar
  63. 63.
    Zens, R., Ney, H. (2004). Improvements in phrase-based statistical machine translation. In: Proc. Human Language Technology Conf. (HLT-NAACL), Boston, MA, 257-264.Google Scholar
  64. 64.
    Tillmann, C., Zhang, T. (2005). A localized prediction model for statistical machine translation. In: Proc. 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, 557-564.Google Scholar
  65. 65.
    Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery, B. P. (2007). Numerical Recipes 3rd Edition: The Art of Scientific Computing. Cambridge University Press, Cambridge.MATHGoogle Scholar
  66. 66.
    Schlenoff et al. (2007). Transtac July 2007 Evaluation Report, NIST Internal Document. Published in September 2007.Google Scholar
  67. 67.
    Baker, D. W., Parker, R. M., Williams, M. V., Coates, W. C., Pitkin, K. (1996). Use and effectiveness of interpreters in an emergency department. JAMA, 275, 783-788.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Farzad Ehsani
    • 1
  • Robert Frederking
    • 2
  • Manny Rayner
    • 3
  • Pierrette Bouillon
    • 3
  1. 1.Fluential, IncSunnyvaleUSA
  2. 2.Language Technologies Institute/Center for Machine TranslationCarnegie Mellon UniversityPittsburghUSA
  3. 3.ISSCO/TIM, University of GenevaGeneva 4Switzerland

Personalised recommendations