Skip to main content

Audiovisual Alignment in a Face-to-Face Conversation Translation Framework

  • Conference paper
  • 1050 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5707))

Abstract

Recent improvements in audiovisual alignment for a translating videophone are presented. A method for audiovisual alignment in the target language is proposed and the process of audiovisual speech synthesis is described. The proposed method has been evaluated in the VideoTRAN translating videophone environment, where an H.323 software client translating videophone allows for the transmission and translation of a set of multimodal verbal and nonverbal clues in a multilingual face-to-face communication setting. An extension of subjective evaluation metrics of fluency and adequacy, which are commonly used in subjective machine translation evaluation tests, is proposed for usage in an audiovisual translation environment.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Roebuck, C.: Effective Communication. American Management Association (1999)

    Google Scholar 

  2. Begley, A.K.: Face to Face Communication: Making Human Connections in a Technology-Driven World. In: Thomson Learning, Boston, MA (2004)

    Google Scholar 

  3. Žganec Gros, J.: VideoTRAN: A translation framework for audiovisual face-to-face conversations. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 219–226. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Spens, K.-E., Agelfors, E., Beskow, J., Granström, B., Karlsson, I., Salvi, G.: SYNFACE, a Talking Head Telephone for the Hearing Impaired. In: Proceedings of the IFHOH 7th World Congress, Helsinki, Finland (2004)

    Google Scholar 

  5. Agelfors, E., Beskow, J., Karlsson, I., Kewley, J., Salvi, G., Thomas, N.: User evaluation of the SYNFACE talking head telephone. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 579–586. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Žganec Gros, J., Mihelič, F., Erjavec, T., Vintar, Š.: The VoiceTRAN Speech-to-Speech Communicator. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 379–384. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Žganec Gros, J., Gruden, S.: The VoiceTRAN Machine Translation System. In: Proceedings of the Interspeech 2007, Antwerpen, Belgium, pp. 1521–1524 (2007)

    Google Scholar 

  8. Campbell, N.: On the use of nonVerbal speech sounds in human communication. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 117–128. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Bernsen, N.O., Dybkjær, L.: Annotation schemes for verbal and non-verbal communication: Some general issues. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 11–22. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Ruttkay, Z.: A Presenting in Style by Virtual Humans. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 23–36. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Ekman, P., Friesen, W.V.: Facial Action Coding System. Consulting Psychologists Press, Palo Alto (1978)

    Google Scholar 

  12. Ekman, P., Friesen, W.V., Hager, J.C. (eds.): Facial Action Coding System. Research Nexus, Network Research Information, Salt Lake City, UT (2002)

    Google Scholar 

  13. Krahmer, E., Ruttkay, Z., Swerts, M., Wesselink, W.: Perceptual Evaluation of Audiovisual Cues for Prominence. In: Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), Denver, CO, pp. 1933–1936 (2002)

    Google Scholar 

  14. Beskow, J., Granström, B., House, D.: Visual Correlates to Prominence in Several Expressive Modes. In: Proceedings of the Interspeech 2006, Pittsburg, PA, pp. 1272–1275 (2006)

    Google Scholar 

  15. Tian, Y.L., Kanade, T., Cohn, J.F.: Facial Expression Analysis. In: Li, S.Z., Jain, A.K. (eds.) Handbook of Face Recognition. Springer, New York (2005)

    Google Scholar 

  16. Pandzic, I., Forchheimer, R.: MPEG-4 Facial Animation – the Standard, Implementation and Applications. John Wiley & Sons, Chichester (2002)

    Book  Google Scholar 

  17. Beskow, J., Granström, B., House, D.: Analysis and Synthesis of Multimodal Verbal and Non-verbal Interaction for Animated Interface Agents. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 250–263. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Ezzat, T., Geiger, G., Poggio, T.: Trainable Videorealistic Speech Animation. In: Proceedings of the ACM SIGGRAPH 2002, San Antonio, TX, pp. 388–398 (2002)

    Google Scholar 

  19. Cphone project, http://sourceforge.net/projects/cphone

  20. White, J., O’Connell, T., O’Mara, F.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proc. of the AMTA, pp. 193–205 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gros, J.Ž., Mihelič, A. (2009). Audiovisual Alignment in a Face-to-Face Conversation Translation Framework. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04391-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04390-1

  • Online ISBN: 978-3-642-04391-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics