Skip to main content

Efficient Use of Voice Activity Detector and Automatic Speech Recognition in Embedded Platforms for Natural Language Interaction

  • Conference paper
Highlights in Practical Applications of Agents and Multiagent Systems

Abstract

Nowadays people are faced daily with the management of all types of technological devices that have embedded processors inside. The desire of users and the industry is to achieve a natural interaction with these devices. Researchers have spent many years working in spoken dialog systems which are now used in many applications. In these systems it is crucial to achieve correct speech recognition. Usually the largest research effort focuses on the robustness to noise in all kinds of adverse conditions but often the response time is ignored. This paper focuses on a new approach for the efficient use of voice activity detection and speech recognition in embedded devices for natural language interaction. We propose a new approach to adjust the response time of the recognition system to the requirements of the overall implementation without sacrificing too much accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BeagleBoard website, http://beagleboard.org/

  2. CMU Sphinx website, http://cmusphinx.sourceforge.net/

  3. CMU Sphinxbase website, http://sourceforge.net/projects/cmusphinx/

  4. Huggins-daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: PocketSphinx: a free, real-time continuous speech recognition system for hand-held devices. In: Proc. of ICASSP, Touluse, France, May 2006, pp. 185–188 (2006)

    Google Scholar 

  5. Embedded ViaVoice website, http://www-01.ibm.com/software/pervasive/embedded_viavoice/

  6. Jiang, H.: Confidence measures for speech recognition: A survey. Speech Communication 45(4), 455–470 (2005)

    Article  Google Scholar 

  7. Jokinen, K.: Natural language and dialogue interfaces. In: The Universal Access Handbook, 1st edn., pp. 495–506. CRC Press Taylor & Francis Group (2009)

    Google Scholar 

  8. Loquendo ASR website, http://www.loquendo.com/es/technology/asr.htm

  9. Nuance VoCon website, http://www.nuance.es/vocon/

  10. Pocketsphinx optimizations for embedded devices, http://cmusphinx.sourceforge.net/wiki/pocketsphinxhandhelds

  11. Sasou, A., Kojima, H.: Noise robust speech recognition applied to Voice-Driven wheelchair. EURASIP Journal on Advances in Signal Processing 2009, 1–10 (2009)

    Google Scholar 

  12. Schmitt, A., Zaykovskiy, D., Minker, W.: Speech recognition for mobile devices. International Journal of Speech Technology 11(2), 63–72 (2008), doi:10.1007/s10772-009-9036-6

    Article  Google Scholar 

  13. Tan, Z.-H., Lindberg, B.: Speech recognition on mobile devices. In: Jiang, X., Ma, M.Y., Chen, C.W. (eds.) Mobile Multimedia Processing. LNCS, vol. 5960, pp. 221–237. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Vertanen, K.: Baseline WSJ acoustic models for HTK and sphinx: Training recipes and recognition experiments. Technical report, University of Cambridge, Cavendish Laboratory (2006)

    Google Scholar 

  15. Vertanen, K., Kristensson, P.O.: Parakeet: A continuous speech recognition system for mobile Touch-Screen devices. In: IUI 2009: Proceedings of the 14th International Conference on Intelligent User Interfaces, pp. 237–246. ACM Press, Sanibel Island (2009)

    Google Scholar 

  16. Voxforge Spanish Model website, http://cmusphinx.sourceforge.net/2010/08/voxforge-spanish-model-released/

  17. Zhang, J., Ward, W., Pellom, B., Yu, X., Hacioglu, K.: Improvements in audio processing and language modeling in the CU communicator. In: Eurospeech 2001, Aalborg, Denmark (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Santos-Pérez, M., González-Parada, E., Cano-García, J.M. (2011). Efficient Use of Voice Activity Detector and Automatic Speech Recognition in Embedded Platforms for Natural Language Interaction. In: Pérez, J.B., et al. Highlights in Practical Applications of Agents and Multiagent Systems. Advances in Intelligent and Soft Computing, vol 89. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19917-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19917-2_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19916-5

  • Online ISBN: 978-3-642-19917-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics