Abstract
In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input.
In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization.
We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fischer, V., Janke, E., Kunzmann, S.: Recent progress in the decoding of non-native speech with multilingual acoustic models. In: Proc. Eurospeech, pp. 3105–3108 (2003)
Fuegen, C.: Efficient handling of multilingual language models. In: Proc. ASRU, pp. 441–446 (2003)
Gruhn, R., Markov, K., Nakamura, S.: A statistical lexicon for non-native speech recognition. In: Proc. Interspeech, Jeju Island, Korea, pp. 1497–1500 (2004)
Iskra, D., Grosskopf, B., Marasek, K., van den Huevel, H., Diehl, F., Kiessling, A.: Speecon - speech databases for consumer devices: Database specification and validation. In: Proc. LREC (2002)
Koehler, J.: Multilingual phone models for vocabulary-independent speech recognition tasks. Speech Communication Journal 35(1-2), 21–30 (2001)
Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantization design. IEEE Transactions on Communications 28(1), 84–95 (1980)
Raab, M.: Language Modeling for Machine Translation. Vdm Verlag, Saarbruecken (2007)
Raab, M., Gruhn, R., Noeth, E.: Non-native speech databases. In: Proc. ASRU, Kyoto, Japan, pp. 413–418 (2007)
Raab, M., Gruhn, R., Noeth, E.: Multilingual weighted codebooks. In: Proc. ICASSP, Las Vegas, USA (2008)
Schultz, T., Waibel, A.: Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Communication 35, 31–51 (2001)
Segura, J., et al.: The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication (2007), http://www.hiwire.org/
Steidl, S.: Interpolation von Hidden Markov Modellen. Master’s thesis, University Erlangen-Nuremberg (2002)
Tomokiyo, L.: Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition. PhD thesis, Carnegie Mellon University, Pennsylvania (2001)
Witt, S.: Use of Speech Recognition in Computer-Assisted Language Learning. PhD thesis, Cambridge University Engineering Department, UK (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Raab, M., Gruhn, R., Noeth, E. (2008). Codebook Design for Speech Guided Car Infotainment Systems. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-69369-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69368-0
Online ISBN: 978-3-540-69369-7
eBook Packages: Computer ScienceComputer Science (R0)