Skip to main content

Spoken Language Interfaces for Embedded Applications

  • Chapter
Human Factors and Voice Interactive Systems

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

Speech-enabled interfaces have been increasingly appearing in small devices, such as cellular phones, PDAs, car kits, and various other consumer electronics products, resulting is what is now being called “embedded speech.” The new generation of small-scale computing devices has severe resource constraints, notably low CPU resources and small memory footprints. This makes the design and efficient implementation of speech interfaces for these devices a challenging task. This chapter discusses first the evolution of spoken language interfaces and evaluates their potential benefits for embedded applications. The basic requirements for these kinds of interfaces and the inherent restrictions imposed by low-resource systems are investigated. Then, the chapter analyzes current theoretical and practical solutions in adapting speech recognition and synthesis technologies to portable electronic devices. As a concrete example, implementation issues in developing an optimized embedded version of a complete text-to-speech synthesis system are described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Astrov, S., Bauer, J. G., & Stan, S. (2003). High performance speaker and vocabulary independent ASR technology for mobile phones. Proceedings of ICASSP 2003 (p. 2.281-2.284). IEEE.

    Google Scholar 

  • Bi, N., Garudadri, H., Chang, C., DeJaco, A., Qi, Y., Malayath, N., & Huang, W. (2002). A robust speech recognition system embedded in CDMA cellular phone chipsets. Proceedings of ICASSP 2002 (pp. 4.3804-4.3807). IEEE.

    Google Scholar 

  • Boite, R., Bourlard, H., Dutoit, T., Hancq, J., & Leich, H. (2000). Traitement de la parole. Lausanne: Presses Polytechniques et Universitaires Romandes.

    Google Scholar 

  • Burileanu, D. (2002). Basic research and implementation decisions for a text-to-speech synthesis system in Romanian. International Journal of Speech Technology, 5(3), 211-225.

    Article  MATH  Google Scholar 

  • Burileanu, D., Fecioru, A., & Ion, D. (2003a). On automatic speech synthesis for spoken language interfaces. In C. Burileanu (Ed.), Speech technology and human-computer dialogue (pp. 127-138). Bucharest: Publishing House of the Romanian Academy.

    Google Scholar 

  • Burileanu, D., Sima, M., Negrescu, C., & Croitoru, V. (2003b). Robust recognition of small-vocabulary telephone-quality speech. In C. Burileanu (Ed.), Speech technology and human-computer dialogue (pp. 145-154). Bucharest: Publishing House of the Romanian Academy.

    Google Scholar 

  • Burileanu, D., Fecioru, A., Ion, D., Stoica, M., & Ilas, C. (2004). An optimized TTS system implementation using a Motorola StarCore SC140-based processor”. Proceedings of ICASSP 2004 (pp. 5.317-5.320). IEEE.

    Google Scholar 

  • Comerford, L., Frank, D., Gopalakrishnan, P., Gopinath, R., & Sedivy, J. (2001). The IBM personal speech assistant. Proceedings of ICASSP 2001 (pp. 1.1-1.4). IEEE.

    Google Scholar 

  • Cornu, E., Destrez, N., Dufaux, A., Sheikhzadeh, H., and Brennan, R. (2002). An ultra low power, ultra miniature voice command system based on hidden Markov models. Proceedings of ICASSP 2002 (pp. 4.3800-4.3803). IEEE.

    Google Scholar 

  • Deligne, S., Dharanipragada, S., Gopinath, R., Maison, B., Olsen, P., & Printz, H. (2002). A robust high accuracy speech recognition system for mobile applications. IEEE Transactions on Speech and Audio Processing, 10(8), 551-561.

    Article  Google Scholar 

  • Deng, L., Wang. K., Acero, A., Hon, H.-W., Droppo, J., Boulis, C., Wang, Y.-Y., Jacoby, D., Mahajan, M., Chelba, C., & Huang, X. (2002). Distributed speech processing in MiPad’s multimodal user interface. IEEE Transactions on Speech and Audio Processing, 10(8), 605-619.

    Article  Google Scholar 

  • European Telecommunications Standards Institute (2002). Speech processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms (ETSI ES 202 050, v1.1.1).

    Google Scholar 

  • Gong, L., & Lai, J. (2003). To mix or not to mix synthetic speech and human speech? International Journal of Speech Technology, 6(2), 123-132.

    Article  Google Scholar 

  • Gong, Y., & Kao, Y.-H. (2000). Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP. Proceedings of ICASSP 2000 (pp. 3686-3689). IEEE.

    Google Scholar 

  • Hickey, M., & Brittan, P. (2001). Lessons from the development of a conversational interface. Proceedings of EUROSPEECH’2001 (pp. 2.1295-2.1298). ESCA.

    Google Scholar 

  • Hoffmann, R., Jokisch, O., Hirschfeld, D., Strecha, G., Kruschke, H., Kordon, U., & Koloska, U. (2003). A multilingual TTS system with less than 1 Mbyte footprint for embedded applications. Proceedings of ICASSP 2003 (pp. 1.532-1.535). IEEE.

    Google Scholar 

  • Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  • Keller, E. (2002). Towards greater naturalness: Future directions of research in speech synthesis. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in speech synthesis (pp. 3-17). Chichester: John Wiley & Sons, Ltd.

    Google Scholar 

  • Lévi, C., Linarès, G., Nocera, P., & Bonastre, J.-F. (2004). Reducing computational and memory cost for cellular phone embedded speech recognition systems. Proceedings of ICASSP 2004 (pp. 5.309-5.312). IEEE.

    Google Scholar 

  • Li, X., Malkin, J., & Bilmes, J. (2004). Codebook design for ASR systems using custom arithmetic units. Proceedings of ICASSP 2004 (pp. 1.845-1.848). IEEE.

    Google Scholar 

  • Malkin, J., Li, X., & Bilmes, J. (2004). Custom arithmetic for high-speed, low-resource ASR systems. Proceedings of ICASSP 2004 (pp. 5.305-5.308). IEEE

    Google Scholar 

  • Mark, W. (1999). Turning pervasive computing into mediated spaces. IBM Systems Journal, 38(4), 677-692.

    Article  MathSciNet  Google Scholar 

  • Möbius, B. (2003). Rare events and closed domains: two delicate concepts in speech synthesis. International Journal of Speech Technology, 6(1), 57-71.

    Article  MATH  Google Scholar 

  • Monaghan, A., Kassaei, M., Luckin, M., Amador-Hernandez, M., Lowry, A., Faulkner, D., & Sannier, F. (2001). Multilingual TTS for computer telephony: The Aculab approach. Proceedings of EUROSPEECH’2001 (pp. 1.513-1.516).ESCA.

    Google Scholar 

  • Motorola, Inc. (2001, November). SC140 DSP Core: Reference Manual, Rev. 3, MNSC140CORE/D.

    Google Scholar 

  • Motorola, Inc. (2002, May). MSC8101: Reference Manual, Rev. 2, MSC8101RM/D.

    Google Scholar 

  • Novak, M., Hampl, R., Krbec, P., Bergl, V., & Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. Proceedings of ICASSP 2003 (pp. 1.200-1.203). IEEE.

    Google Scholar 

  • Pieraccini, R., Levin, E., & Eckert, W. (1998). Spoken language dialogue: Architectures and algorithms. Proceedings of the XXIIème Journées d’Etudes sur la Parole, Martigny, Suisse, pp. 387-395.

    Google Scholar 

  • Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Rouillard, J., & Caelen, J. (1999). Multimodal information seeking dialogues on the World Wide Web. Proceedings of EUROSPEECH’99 (pp. 6.2151-6.2154). ESCA.

    Google Scholar 

  • Sheikhzadeh, H., Cornu, E., Brennan, R., & Schneider, T. (2002). Real-time speech synthesis on an ultra low-resource, programmable DSP system. Proceedings of ICASSP 2002 (pp. 1.433-1.436). IEEE.

    Google Scholar 

  • Varga, I., Aalburg, S., Andrassy, B., Astrov, S., Bauer, J. G., Beaugeant, C., GeiÖler, C., & Höge, H. (2002). ASR in mobile phones - an industrial approach. IEEE Transactions on Speech and Audio Processing, 10(8), 562-569.

    Article  Google Scholar 

  • Vasilache, M., Iso-Sipilä, J., & Viikki, O. (2004). On a practical design of a low complexity speech recognition engine. Proceedings of ICASSP 2004 (pp. 5.113-5.116). IEEE.

    Google Scholar 

  • Wang, D., Zhang, L., Liu, J., & Liu, R. (2004). Embedded speech recognition system on 8-bit MCU core. Proceedings of ICASSP 2004 (pp. 5.301-5.304). IEEE.

    Google Scholar 

  • Wouters, J., & Macon, M. W. (2001). Control of spectral dynamics in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 30-38.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science + Business Media, LLC

About this chapter

Cite this chapter

Burileanu, D. (2008). Spoken Language Interfaces for Embedded Applications. In: Human Factors and Voice Interactive Systems. Signals and Communication Technology. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-68439-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-68439-0_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-25482-1

  • Online ISBN: 978-0-387-68439-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics