Multi-lingual and Multi-modal Speech Processing and Applications

Ivanecky, Jozef; Fischer, Julia; Mast, Marion; Kunzmann, Siegfried; Ross, Thomas; Fischer, Volker

doi:10.1007/11550518_19

Jozef Ivanecky¹⁹,
Julia Fischer¹⁹,
Marion Mast¹⁹,
Siegfried Kunzmann¹⁹,
Thomas Ross¹⁹ &
…
Volker Fischer¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3663))

Included in the following conference series:

Joint Pattern Recognition Symposium

1882 Accesses

Abstract

Over the last decade voice technologies for telephony and embedded solutions became much more mature, resulting in applications providing mobile access to digital information from anywhere. Both a growing demand for voice driven applications in many languages and the need for improved usability and user experience now drives the exploration of multi-lingual speech processing techniques for recognition, synthesis and conversational dialog management. In this overview article we discuss our recent activities on multi-lingual voice technologies and describe the benefits of multi-lingual modeling for the creation of multi-modal mobile and telephony applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kunzmann, S.: VoiceType: A Multi-Lingual, Large vocabulary Speech Recognition System for a PC. In: Proceedings of the 2nd SQEL Workshop on Multi-Lingual Information Retrieval Dialogs, Pilsen (1997)
Google Scholar
Kunzmann, S.: Applied Speech Processing Technologies – our Journey. European Language Resources Association Newsletter, Paris (2000)
Google Scholar
Kunzmann, S., Fischer, V., Gonzalez, J., Emam, O., Günther, C., Janke, E.: Multilingual Acoustic Models for Speech Recognition and Synthesis. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Montreal (2004)
Google Scholar
Wells, C.J.: Computer Coded Phonemic Notation of Individual Languages of the European Community. Journal of the International Phonetic Association 19, 32–54 (1989)
Article MathSciNet Google Scholar
Schultz, T., Waibel, A.: Language Independent and Language Adaptive Acoustic Modeling for Speech Recognition. Speech Communications 35 (2001)
Google Scholar
Fischer, V., Janke, E., Kunzmann, S.: Likelihood Combination and Recognition Output Voting for the Decoding of Non-Native Speech with Multilingual HMMs. In: Proc. of the 7th Int. Conference on Spoken Language Processing, Denver (2002)
Google Scholar
Fischer, V., Gonzalez, J., Janke, E., Villani, M., Waast-Richard, C.: Towards Multilingual Acoustic Modeling for Large Vocabulary Speech Recognition. In: Proc. of the IEEE Workshop on Multilingual Speech Communications, Kyoto (2000)
Google Scholar
Mast, M., Roß, T., Schulz, H., Harrikari, H.: Different Approaches to Build Multilingual Conversational Systems. In: Proc. of the 5th International Conference on Text, Speech and Dialogue, Brno, Czech Republic (2002)
Google Scholar
Ostendorf, M., Bulyko, I.: The Impact of Speech Recognition on Speech Synthesis. In: Proc. of the IEEE 2002 Workshop on Speech Synthesis, Santa Monica, CA (2002)
Google Scholar
Sproat, R.: Multilingual Text-to-Speech Synthesis. In: The Bell Labs Approach. Kluwer Academic Publishers, Dordrecht (1998)
Google Scholar
Hoffmann, R., Jokisch, O., Hirschfeld, D., Kruschke, H., Kordon, U., Koloska, U.: A Multilingual TTS System with less than 1 Mbyte Footprint fro Embedded Applications. In: Proc. of the IEEE Int. Conference on Acoustics, Speech, and Signal Processing, Hong Kong (2003)
Google Scholar
Mayfield Tomokiyo, L., Black, A., Lenzo, K.: Arabic in my Hand: Small-footprint Synthesis of Egyptian Arabic. In: Proc. of the 8th European Conf. on Speech Communication and Technology, Geneva (2003)
Google Scholar
Eide, E., Aaron, A., Bakis, R., Cohen, P., Donovan, R., Hamza, W., Mathes, T., Picheny, M., Polkosky, M., Smith, M., Viswanathan, M.: Recent Improvements to the IBM Trainable Speech Synthesos System. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Hong Kong (2003)
Google Scholar
Romsdorfer, H., Pfister, B.: Multi-Context Rules for Phonological Processing in Polyglott TTS Synthesis. In: Proc. of the 8th Int. Conf. on Spoken Language Processing, Jeju Island, Korea (2004)
Google Scholar
Marcadet, J.C., Fischer, V., Waast-Richard, C.: A Transformation-Based Learning Approach To Language Identification For Mixed-Lingual Text-To-Speech Synthesis. In: Proc. of the 9th European Conf. on Speech Communication and Technology, Lisbon (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

AIM Voice Technologies, IBM Deutschland Entwicklung, Schönaicher Str. 220, D-71072, Böblingen
Jozef Ivanecky, Julia Fischer, Marion Mast, Siegfried Kunzmann, Thomas Ross & Volker Fischer

Authors

Jozef Ivanecky
View author publications
You can also search for this author in PubMed Google Scholar
Julia Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Marion Mast
View author publications
You can also search for this author in PubMed Google Scholar
Siegfried Kunzmann
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Ross
View author publications
You can also search for this author in PubMed Google Scholar
Volker Fischer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

PRIP, Vienna University of Technology, Austria
Walter G. Kropatsch
Vienna University of Technology, Vienna, Austria
Robert Sablatnig
Pattern Recognition and Image Processing Group, Institute of Computer-Aided Automation, Vienna University of Technology, Favoritenstraße 9/1832, A-1040, Vienna, Austria
Allan Hanbury

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ivanecky, J., Fischer, J., Mast, M., Kunzmann, S., Ross, T., Fischer, V. (2005). Multi-lingual and Multi-modal Speech Processing and Applications. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds) Pattern Recognition. DAGM 2005. Lecture Notes in Computer Science, vol 3663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550518_19

Download citation

DOI: https://doi.org/10.1007/11550518_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28703-2
Online ISBN: 978-3-540-31942-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics