Spoken Language Interfaces for Embedded Applications

Burileanu, Dragos

doi:10.1007/978-0-387-68439-0_5

Dragos Burileanu³

Part of the book series: Signals and Communication Technology ((SCT))

702 Accesses
5 Citations

Abstract

Speech-enabled interfaces have been increasingly appearing in small devices, such as cellular phones, PDAs, car kits, and various other consumer electronics products, resulting is what is now being called “embedded speech.” The new generation of small-scale computing devices has severe resource constraints, notably low CPU resources and small memory footprints. This makes the design and efficient implementation of speech interfaces for these devices a challenging task. This chapter discusses first the evolution of spoken language interfaces and evaluates their potential benefits for embedded applications. The basic requirements for these kinds of interfaces and the inherent restrictions imposed by low-resource systems are investigated. Then, the chapter analyzes current theoretical and practical solutions in adapting speech recognition and synthesis technologies to portable electronic devices. As a concrete example, implementation issues in developing an optimized embedded version of a complete text-to-speech synthesis system are described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Astrov, S., Bauer, J. G., & Stan, S. (2003). High performance speaker and vocabulary independent ASR technology for mobile phones. Proceedings of ICASSP 2003 (p. 2.281-2.284). IEEE.
Google Scholar
Bi, N., Garudadri, H., Chang, C., DeJaco, A., Qi, Y., Malayath, N., & Huang, W. (2002). A robust speech recognition system embedded in CDMA cellular phone chipsets. Proceedings of ICASSP 2002 (pp. 4.3804-4.3807). IEEE.
Google Scholar
Boite, R., Bourlard, H., Dutoit, T., Hancq, J., & Leich, H. (2000). Traitement de la parole. Lausanne: Presses Polytechniques et Universitaires Romandes.
Google Scholar
Burileanu, D. (2002). Basic research and implementation decisions for a text-to-speech synthesis system in Romanian. International Journal of Speech Technology, 5(3), 211-225.
Article MATH Google Scholar
Burileanu, D., Fecioru, A., & Ion, D. (2003a). On automatic speech synthesis for spoken language interfaces. In C. Burileanu (Ed.), Speech technology and human-computer dialogue (pp. 127-138). Bucharest: Publishing House of the Romanian Academy.
Google Scholar
Burileanu, D., Sima, M., Negrescu, C., & Croitoru, V. (2003b). Robust recognition of small-vocabulary telephone-quality speech. In C. Burileanu (Ed.), Speech technology and human-computer dialogue (pp. 145-154). Bucharest: Publishing House of the Romanian Academy.
Google Scholar
Burileanu, D., Fecioru, A., Ion, D., Stoica, M., & Ilas, C. (2004). An optimized TTS system implementation using a Motorola StarCore SC140-based processor”. Proceedings of ICASSP 2004 (pp. 5.317-5.320). IEEE.
Google Scholar
Comerford, L., Frank, D., Gopalakrishnan, P., Gopinath, R., & Sedivy, J. (2001). The IBM personal speech assistant. Proceedings of ICASSP 2001 (pp. 1.1-1.4). IEEE.
Google Scholar
Cornu, E., Destrez, N., Dufaux, A., Sheikhzadeh, H., and Brennan, R. (2002). An ultra low power, ultra miniature voice command system based on hidden Markov models. Proceedings of ICASSP 2002 (pp. 4.3800-4.3803). IEEE.
Google Scholar
Deligne, S., Dharanipragada, S., Gopinath, R., Maison, B., Olsen, P., & Printz, H. (2002). A robust high accuracy speech recognition system for mobile applications. IEEE Transactions on Speech and Audio Processing, 10(8), 551-561.
Article Google Scholar
Deng, L., Wang. K., Acero, A., Hon, H.-W., Droppo, J., Boulis, C., Wang, Y.-Y., Jacoby, D., Mahajan, M., Chelba, C., & Huang, X. (2002). Distributed speech processing in MiPad’s multimodal user interface. IEEE Transactions on Speech and Audio Processing, 10(8), 605-619.
Article Google Scholar
European Telecommunications Standards Institute (2002). Speech processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms (ETSI ES 202 050, v1.1.1).
Google Scholar
Gong, L., & Lai, J. (2003). To mix or not to mix synthetic speech and human speech? International Journal of Speech Technology, 6(2), 123-132.
Article Google Scholar
Gong, Y., & Kao, Y.-H. (2000). Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP. Proceedings of ICASSP 2000 (pp. 3686-3689). IEEE.
Google Scholar
Hickey, M., & Brittan, P. (2001). Lessons from the development of a conversational interface. Proceedings of EUROSPEECH’2001 (pp. 2.1295-2.1298). ESCA.
Google Scholar
Hoffmann, R., Jokisch, O., Hirschfeld, D., Strecha, G., Kruschke, H., Kordon, U., & Koloska, U. (2003). A multilingual TTS system with less than 1 Mbyte footprint for embedded applications. Proceedings of ICASSP 2003 (pp. 1.532-1.535). IEEE.
Google Scholar
Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Keller, E. (2002). Towards greater naturalness: Future directions of research in speech synthesis. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in speech synthesis (pp. 3-17). Chichester: John Wiley & Sons, Ltd.
Google Scholar
Lévi, C., Linarès, G., Nocera, P., & Bonastre, J.-F. (2004). Reducing computational and memory cost for cellular phone embedded speech recognition systems. Proceedings of ICASSP 2004 (pp. 5.309-5.312). IEEE.
Google Scholar
Li, X., Malkin, J., & Bilmes, J. (2004). Codebook design for ASR systems using custom arithmetic units. Proceedings of ICASSP 2004 (pp. 1.845-1.848). IEEE.
Google Scholar
Malkin, J., Li, X., & Bilmes, J. (2004). Custom arithmetic for high-speed, low-resource ASR systems. Proceedings of ICASSP 2004 (pp. 5.305-5.308). IEEE
Google Scholar
Mark, W. (1999). Turning pervasive computing into mediated spaces. IBM Systems Journal, 38(4), 677-692.
Article MathSciNet Google Scholar
Möbius, B. (2003). Rare events and closed domains: two delicate concepts in speech synthesis. International Journal of Speech Technology, 6(1), 57-71.
Article MATH Google Scholar
Monaghan, A., Kassaei, M., Luckin, M., Amador-Hernandez, M., Lowry, A., Faulkner, D., & Sannier, F. (2001). Multilingual TTS for computer telephony: The Aculab approach. Proceedings of EUROSPEECH’2001 (pp. 1.513-1.516).ESCA.
Google Scholar
Motorola, Inc. (2001, November). SC140 DSP Core: Reference Manual, Rev. 3, MNSC140CORE/D.
Google Scholar
Motorola, Inc. (2002, May). MSC8101: Reference Manual, Rev. 2, MSC8101RM/D.
Google Scholar
Novak, M., Hampl, R., Krbec, P., Bergl, V., & Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. Proceedings of ICASSP 2003 (pp. 1.200-1.203). IEEE.
Google Scholar
Pieraccini, R., Levin, E., & Eckert, W. (1998). Spoken language dialogue: Architectures and algorithms. Proceedings of the XXIIème Journées d’Etudes sur la Parole, Martigny, Suisse, pp. 387-395.
Google Scholar
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Rouillard, J., & Caelen, J. (1999). Multimodal information seeking dialogues on the World Wide Web. Proceedings of EUROSPEECH’99 (pp. 6.2151-6.2154). ESCA.
Google Scholar
Sheikhzadeh, H., Cornu, E., Brennan, R., & Schneider, T. (2002). Real-time speech synthesis on an ultra low-resource, programmable DSP system. Proceedings of ICASSP 2002 (pp. 1.433-1.436). IEEE.
Google Scholar
Varga, I., Aalburg, S., Andrassy, B., Astrov, S., Bauer, J. G., Beaugeant, C., GeiÖler, C., & Höge, H. (2002). ASR in mobile phones - an industrial approach. IEEE Transactions on Speech and Audio Processing, 10(8), 562-569.
Article Google Scholar
Vasilache, M., Iso-Sipilä, J., & Viikki, O. (2004). On a practical design of a low complexity speech recognition engine. Proceedings of ICASSP 2004 (pp. 5.113-5.116). IEEE.
Google Scholar
Wang, D., Zhang, L., Liu, J., & Liu, R. (2004). Embedded speech recognition system on 8-bit MCU core. Proceedings of ICASSP 2004 (pp. 5.301-5.304). IEEE.
Google Scholar
Wouters, J., & Macon, M. W. (2001). Control of spectral dynamics in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 30-38.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology and Human-Computer Dialogue Laboratory, “Politehnica” University of Bucharest, Bucharest, Romania
Dragos Burileanu

Authors

Dragos Burileanu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Burileanu, D. (2008). Spoken Language Interfaces for Embedded Applications. In: Human Factors and Voice Interactive Systems. Signals and Communication Technology. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-68439-0_5

Download citation

DOI: https://doi.org/10.1007/978-0-387-68439-0_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-25482-1
Online ISBN: 978-0-387-68439-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics