Abstract
Speech-enabled interfaces have been increasingly appearing in small devices, such as cellular phones, PDAs, car kits, and various other consumer electronics products, resulting is what is now being called “embedded speech.” The new generation of small-scale computing devices has severe resource constraints, notably low CPU resources and small memory footprints. This makes the design and efficient implementation of speech interfaces for these devices a challenging task. This chapter discusses first the evolution of spoken language interfaces and evaluates their potential benefits for embedded applications. The basic requirements for these kinds of interfaces and the inherent restrictions imposed by low-resource systems are investigated. Then, the chapter analyzes current theoretical and practical solutions in adapting speech recognition and synthesis technologies to portable electronic devices. As a concrete example, implementation issues in developing an optimized embedded version of a complete text-to-speech synthesis system are described.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Astrov, S., Bauer, J. G., & Stan, S. (2003). High performance speaker and vocabulary independent ASR technology for mobile phones. Proceedings of ICASSP 2003 (p. 2.281-2.284). IEEE.
Bi, N., Garudadri, H., Chang, C., DeJaco, A., Qi, Y., Malayath, N., & Huang, W. (2002). A robust speech recognition system embedded in CDMA cellular phone chipsets. Proceedings of ICASSP 2002 (pp. 4.3804-4.3807). IEEE.
Boite, R., Bourlard, H., Dutoit, T., Hancq, J., & Leich, H. (2000). Traitement de la parole. Lausanne: Presses Polytechniques et Universitaires Romandes.
Burileanu, D. (2002). Basic research and implementation decisions for a text-to-speech synthesis system in Romanian. International Journal of Speech Technology, 5(3), 211-225.
Burileanu, D., Fecioru, A., & Ion, D. (2003a). On automatic speech synthesis for spoken language interfaces. In C. Burileanu (Ed.), Speech technology and human-computer dialogue (pp. 127-138). Bucharest: Publishing House of the Romanian Academy.
Burileanu, D., Sima, M., Negrescu, C., & Croitoru, V. (2003b). Robust recognition of small-vocabulary telephone-quality speech. In C. Burileanu (Ed.), Speech technology and human-computer dialogue (pp. 145-154). Bucharest: Publishing House of the Romanian Academy.
Burileanu, D., Fecioru, A., Ion, D., Stoica, M., & Ilas, C. (2004). An optimized TTS system implementation using a Motorola StarCore SC140-based processor”. Proceedings of ICASSP 2004 (pp. 5.317-5.320). IEEE.
Comerford, L., Frank, D., Gopalakrishnan, P., Gopinath, R., & Sedivy, J. (2001). The IBM personal speech assistant. Proceedings of ICASSP 2001 (pp. 1.1-1.4). IEEE.
Cornu, E., Destrez, N., Dufaux, A., Sheikhzadeh, H., and Brennan, R. (2002). An ultra low power, ultra miniature voice command system based on hidden Markov models. Proceedings of ICASSP 2002 (pp. 4.3800-4.3803). IEEE.
Deligne, S., Dharanipragada, S., Gopinath, R., Maison, B., Olsen, P., & Printz, H. (2002). A robust high accuracy speech recognition system for mobile applications. IEEE Transactions on Speech and Audio Processing, 10(8), 551-561.
Deng, L., Wang. K., Acero, A., Hon, H.-W., Droppo, J., Boulis, C., Wang, Y.-Y., Jacoby, D., Mahajan, M., Chelba, C., & Huang, X. (2002). Distributed speech processing in MiPad’s multimodal user interface. IEEE Transactions on Speech and Audio Processing, 10(8), 605-619.
European Telecommunications Standards Institute (2002). Speech processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms (ETSI ES 202 050, v1.1.1).
Gong, L., & Lai, J. (2003). To mix or not to mix synthetic speech and human speech? International Journal of Speech Technology, 6(2), 123-132.
Gong, Y., & Kao, Y.-H. (2000). Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP. Proceedings of ICASSP 2000 (pp. 3686-3689). IEEE.
Hickey, M., & Brittan, P. (2001). Lessons from the development of a conversational interface. Proceedings of EUROSPEECH’2001 (pp. 2.1295-2.1298). ESCA.
Hoffmann, R., Jokisch, O., Hirschfeld, D., Strecha, G., Kruschke, H., Kordon, U., & Koloska, U. (2003). A multilingual TTS system with less than 1 Mbyte footprint for embedded applications. Proceedings of ICASSP 2003 (pp. 1.532-1.535). IEEE.
Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Upper Saddle River, NJ: Prentice Hall.
Keller, E. (2002). Towards greater naturalness: Future directions of research in speech synthesis. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in speech synthesis (pp. 3-17). Chichester: John Wiley & Sons, Ltd.
Lévi, C., Linarès, G., Nocera, P., & Bonastre, J.-F. (2004). Reducing computational and memory cost for cellular phone embedded speech recognition systems. Proceedings of ICASSP 2004 (pp. 5.309-5.312). IEEE.
Li, X., Malkin, J., & Bilmes, J. (2004). Codebook design for ASR systems using custom arithmetic units. Proceedings of ICASSP 2004 (pp. 1.845-1.848). IEEE.
Malkin, J., Li, X., & Bilmes, J. (2004). Custom arithmetic for high-speed, low-resource ASR systems. Proceedings of ICASSP 2004 (pp. 5.305-5.308). IEEE
Mark, W. (1999). Turning pervasive computing into mediated spaces. IBM Systems Journal, 38(4), 677-692.
Möbius, B. (2003). Rare events and closed domains: two delicate concepts in speech synthesis. International Journal of Speech Technology, 6(1), 57-71.
Monaghan, A., Kassaei, M., Luckin, M., Amador-Hernandez, M., Lowry, A., Faulkner, D., & Sannier, F. (2001). Multilingual TTS for computer telephony: The Aculab approach. Proceedings of EUROSPEECH’2001 (pp. 1.513-1.516).ESCA.
Motorola, Inc. (2001, November). SC140 DSP Core: Reference Manual, Rev. 3, MNSC140CORE/D.
Motorola, Inc. (2002, May). MSC8101: Reference Manual, Rev. 2, MSC8101RM/D.
Novak, M., Hampl, R., Krbec, P., Bergl, V., & Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. Proceedings of ICASSP 2003 (pp. 1.200-1.203). IEEE.
Pieraccini, R., Levin, E., & Eckert, W. (1998). Spoken language dialogue: Architectures and algorithms. Proceedings of the XXIIème Journées d’Etudes sur la Parole, Martigny, Suisse, pp. 387-395.
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.
Rouillard, J., & Caelen, J. (1999). Multimodal information seeking dialogues on the World Wide Web. Proceedings of EUROSPEECH’99 (pp. 6.2151-6.2154). ESCA.
Sheikhzadeh, H., Cornu, E., Brennan, R., & Schneider, T. (2002). Real-time speech synthesis on an ultra low-resource, programmable DSP system. Proceedings of ICASSP 2002 (pp. 1.433-1.436). IEEE.
Varga, I., Aalburg, S., Andrassy, B., Astrov, S., Bauer, J. G., Beaugeant, C., GeiÖler, C., & Höge, H. (2002). ASR in mobile phones - an industrial approach. IEEE Transactions on Speech and Audio Processing, 10(8), 562-569.
Vasilache, M., Iso-Sipilä, J., & Viikki, O. (2004). On a practical design of a low complexity speech recognition engine. Proceedings of ICASSP 2004 (pp. 5.113-5.116). IEEE.
Wang, D., Zhang, L., Liu, J., & Liu, R. (2004). Embedded speech recognition system on 8-bit MCU core. Proceedings of ICASSP 2004 (pp. 5.301-5.304). IEEE.
Wouters, J., & Macon, M. W. (2001). Control of spectral dynamics in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 30-38.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science + Business Media, LLC
About this chapter
Cite this chapter
Burileanu, D. (2008). Spoken Language Interfaces for Embedded Applications. In: Human Factors and Voice Interactive Systems. Signals and Communication Technology. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-68439-0_5
Download citation
DOI: https://doi.org/10.1007/978-0-387-68439-0_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-25482-1
Online ISBN: 978-0-387-68439-0
eBook Packages: EngineeringEngineering (R0)