Efficient Use of Voice Activity Detector and Automatic Speech Recognition in Embedded Platforms for Natural Language Interaction

Santos-Pérez, Marcos; González-Parada, Eva; Cano-García, José Manuel

doi:10.1007/978-3-642-19917-2_28

Marcos Santos-Pérez⁹,
Eva González-Parada⁹ &
José Manuel Cano-García⁹

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 89))

642 Accesses
2 Citations

Abstract

Nowadays people are faced daily with the management of all types of technological devices that have embedded processors inside. The desire of users and the industry is to achieve a natural interaction with these devices. Researchers have spent many years working in spoken dialog systems which are now used in many applications. In these systems it is crucial to achieve correct speech recognition. Usually the largest research effort focuses on the robustness to noise in all kinds of adverse conditions but often the response time is ignored. This paper focuses on a new approach for the efficient use of voice activity detection and speech recognition in embedded devices for natural language interaction. We propose a new approach to adjust the response time of the recognition system to the requirements of the overall implementation without sacrificing too much accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BeagleBoard website, http://beagleboard.org/
CMU Sphinx website, http://cmusphinx.sourceforge.net/
CMU Sphinxbase website, http://sourceforge.net/projects/cmusphinx/
Huggins-daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: PocketSphinx: a free, real-time continuous speech recognition system for hand-held devices. In: Proc. of ICASSP, Touluse, France, May 2006, pp. 185–188 (2006)
Google Scholar
Embedded ViaVoice website, http://www-01.ibm.com/software/pervasive/embedded_viavoice/
Jiang, H.: Confidence measures for speech recognition: A survey. Speech Communication 45(4), 455–470 (2005)
Article Google Scholar
Jokinen, K.: Natural language and dialogue interfaces. In: The Universal Access Handbook, 1st edn., pp. 495–506. CRC Press Taylor & Francis Group (2009)
Google Scholar
Loquendo ASR website, http://www.loquendo.com/es/technology/asr.htm
Nuance VoCon website, http://www.nuance.es/vocon/
Pocketsphinx optimizations for embedded devices, http://cmusphinx.sourceforge.net/wiki/pocketsphinxhandhelds
Sasou, A., Kojima, H.: Noise robust speech recognition applied to Voice-Driven wheelchair. EURASIP Journal on Advances in Signal Processing 2009, 1–10 (2009)
Google Scholar
Schmitt, A., Zaykovskiy, D., Minker, W.: Speech recognition for mobile devices. International Journal of Speech Technology 11(2), 63–72 (2008), doi:10.1007/s10772-009-9036-6
Article Google Scholar
Tan, Z.-H., Lindberg, B.: Speech recognition on mobile devices. In: Jiang, X., Ma, M.Y., Chen, C.W. (eds.) Mobile Multimedia Processing. LNCS, vol. 5960, pp. 221–237. Springer, Heidelberg (2010)
Chapter Google Scholar
Vertanen, K.: Baseline WSJ acoustic models for HTK and sphinx: Training recipes and recognition experiments. Technical report, University of Cambridge, Cavendish Laboratory (2006)
Google Scholar
Vertanen, K., Kristensson, P.O.: Parakeet: A continuous speech recognition system for mobile Touch-Screen devices. In: IUI 2009: Proceedings of the 14th International Conference on Intelligent User Interfaces, pp. 237–246. ACM Press, Sanibel Island (2009)
Google Scholar
Voxforge Spanish Model website, http://cmusphinx.sourceforge.net/2010/08/voxforge-spanish-model-released/
Zhang, J., Ward, W., Pellom, B., Yu, X., Hacioglu, K.: Improvements in audio processing and language modeling in the CU communicator. In: Eurospeech 2001, Aalborg, Denmark (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Electronic Technology Department, School of Telecommunications Engineering, University of Malaga, Teatinos Campus, 29071, Malaga, Spain
Marcos Santos-Pérez, Eva González-Parada & José Manuel Cano-García

Authors

Marcos Santos-Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Eva González-Parada
View author publications
You can also search for this author in PubMed Google Scholar
José Manuel Cano-García
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad Pontificia de Salamanca Escuela Universitaria de Informática, Compañía 5, 37002, Salamanca, Spain
Javier Bajo Pérez
Departamento de Informática y Automática Facultad de Ciencias, Universidad de Salamanca, Plaza de la Merced, 37008, Salamanca, Spain
Juan M. Corchado & María N. Moreno &
Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia , Camino de Vera s/n, 46022, Valencia, Spain
Vicente Julián
Fondamentale de Lille) UMR CNRS 8022, Université des Sciences et Technologies de Lille, LIFL (Laboratoire d’Informatique, Bâtiment M3 (extension), 59655, Villeneuve d’Ascq Cedex, France
Philippe Mathieu
Departamento de Ingeniería de Telecomunicación E.P.S. de Linares, Universidad de Jaén, Área de Ingeniería Telemática, C/ Alfonso X el Sabio, 28, 23700, Linares (Jaén), Spain
Joaquin Canada-Bago
Departamento de Ingeniería Informática, Ciudad Universitaria de Cantoblanco, Calle Francisco Tomás y Valiente, 11, 28049, Madrid, Spain
Alfonso Ortega
Departamento de Sistemas Informáticos, Universidad de Castilla-La Mancha Escuela de Ingenieros Industriales de Albacete, Campus Universitario s/n, 02071, Spain
Antonio Fernández Caballero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santos-Pérez, M., González-Parada, E., Cano-García, J.M. (2011). Efficient Use of Voice Activity Detector and Automatic Speech Recognition in Embedded Platforms for Natural Language Interaction. In: Pérez, J.B., et al. Highlights in Practical Applications of Agents and Multiagent Systems. Advances in Intelligent and Soft Computing, vol 89. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19917-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-19917-2_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19916-5
Online ISBN: 978-3-642-19917-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics