Abstract
As the first very step to activate speech interfaces, wake-up word detection aims to achieve a fully hand-free experience by detecting a specific word or phrase to activate the speech recognition and understanding modules. The task usually requires low-latency, highly accurate, small-footprint and easily migratory to power limited environment. In this paper, we describe the creation of HelloNPU, a publicly-available corpus that provides a common testbed to facilitate wake-up word detection research. We also introduce some baseline experimental results on this proposed corpus using the deep KWS approach. We hope the release of this corpus can trigger more studies on small-footprint wake-up word detection.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Rohlicek, J.R., Russell, W., Roukos, S., Gish, H.: Continuous hidden Markov modeling for speaker-independent wordspotting. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 627–630. IEEE (1990)
Rose, R.C., Paul, D.B.: A hidden Markov model based keyword recognition system. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 129–132. IEEE (1990)
Wilpon, J.G., Miller, L.G., Modi, P.: Improvements and applications for key word recognition using hidden Markov modeling techniques. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 309–312. IEEE (1991)
Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091. IEEE (2014)
Sainath, T.N., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Interspeech, pp. 1478–1482 (2015)
Arik, S.O., Kliegl, M., Child, R., et al.: Convolutional recurrent neural networks for small-footprint keyword spotting (2017)
Silaghi, M.-C., Bourlard, H.: Iterative posterior-based keyword spotting without filler models. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 213–216. IEEE (1999)
Silaghi, M.-C.: Spotting subsequences matching an HMM using the average observation probability criteria with application to keyword spotting. In: Proceedings of the National Conference on Artificial Intelligence. AAAI Press, MIT Press, Menlo Park, Cambridge, London, vol. 20, p. 1118 (1999, 2005)
Li, K.P., Naylor, J.A., Rossen, M.L.: A whole word recurrent neural network for keyword spotting. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 81–84. IEEE (1992)
Fernández, S., Graves, A., Schmidhuber, J.: An application of recurrent neural networks to discriminative keyword spotting. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 220–229. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74695-9_23
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech 2011, pp. 437–440 (2011)
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Ze, H., Senior, A., Schuster, M.: Statistical parametric speech syn-thesis using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, pp. 7962–7966 (2013)
Tucker, G., Wu, M., Sun, M., Panchapagesan, S., Fu, G., Vitaladevuni, S.: Model compression applied to small-footprint keyword spotting. In: Proceedings of Interspeech, pp. 1393–1397 (2016)
Sindhwani, V., Sainath, T.N., Kumar, S.: Structured transforms for small-footprint deep learning. In: Neural Information Processing Systems, pp. 3088–3096 (2015)
Prabhavalkar, R., Alvarez, R., Parada, C., Nakkiran, P., Sainath, T.N.: Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks. In: IEEE Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 4704–4708 (2015)
Panchapagesan, S., Sun, M., Khare, A., Matsoukas, S., Mandal, A., Hoffmeister, B., Vitaladevuni, S.: Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In: Proceedings of Interspeech, pp. 760–764 (2016)
Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: A study on data augmentation of reverberant speech for robust speech In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)
Snyder, D., Chen, G., Povey, D.: Musan: a music, speech, and noise corpus. arXiv preprint arXiv:1510.08484 (2015)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, S., Hou, J., Xie, L., Hao, Y. (2018). HelloNPU: A Corpus for Small-Footprint Wake-Up Word Detection Research. In: Tao, J., Zheng, T., Bao, C., Wang, D., Li, Y. (eds) Man-Machine Speech Communication. NCMMSC 2017. Communications in Computer and Information Science, vol 807. Springer, Singapore. https://doi.org/10.1007/978-981-10-8111-8_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-8111-8_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8110-1
Online ISBN: 978-981-10-8111-8
eBook Packages: Computer ScienceComputer Science (R0)