Advertisement

HelloNPU: A Corpus for Small-Footprint Wake-Up Word Detection Research

  • Senmao Wang
  • Jingyong Hou
  • Lei Xie
  • Yufeng Hao
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 807)

Abstract

As the first very step to activate speech interfaces, wake-up word detection aims to achieve a fully hand-free experience by detecting a specific word or phrase to activate the speech recognition and understanding modules. The task usually requires low-latency, highly accurate, small-footprint and easily migratory to power limited environment. In this paper, we describe the creation of HelloNPU, a publicly-available corpus that provides a common testbed to facilitate wake-up word detection research. We also introduce some baseline experimental results on this proposed corpus using the deep KWS approach. We hope the release of this corpus can trigger more studies on small-footprint wake-up word detection.

Keywords

Keyword spotting Wake-up word detection Deep neural network 

References

  1. 1.
    Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Rohlicek, J.R., Russell, W., Roukos, S., Gish, H.: Continuous hidden Markov modeling for speaker-independent wordspotting. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 627–630. IEEE (1990)Google Scholar
  3. 3.
    Rose, R.C., Paul, D.B.: A hidden Markov model based keyword recognition system. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 129–132. IEEE (1990)Google Scholar
  4. 4.
    Wilpon, J.G., Miller, L.G., Modi, P.: Improvements and applications for key word recognition using hidden Markov modeling techniques. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 309–312. IEEE (1991)Google Scholar
  5. 5.
    Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091. IEEE (2014)Google Scholar
  6. 6.
    Sainath, T.N., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Interspeech, pp. 1478–1482 (2015)Google Scholar
  7. 7.
    Arik, S.O., Kliegl, M., Child, R., et al.: Convolutional recurrent neural networks for small-footprint keyword spotting (2017)Google Scholar
  8. 8.
    Silaghi, M.-C., Bourlard, H.: Iterative posterior-based keyword spotting without filler models. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 213–216. IEEE (1999)Google Scholar
  9. 9.
    Silaghi, M.-C.: Spotting subsequences matching an HMM using the average observation probability criteria with application to keyword spotting. In: Proceedings of the National Conference on Artificial Intelligence. AAAI Press, MIT Press, Menlo Park, Cambridge, London, vol. 20, p. 1118 (1999, 2005)Google Scholar
  10. 10.
    Li, K.P., Naylor, J.A., Rossen, M.L.: A whole word recurrent neural network for keyword spotting. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 81–84. IEEE (1992)Google Scholar
  11. 11.
    Fernández, S., Graves, A., Schmidhuber, J.: An application of recurrent neural networks to discriminative keyword spotting. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 220–229. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74695-9_23 CrossRefGoogle Scholar
  12. 12.
    Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRefGoogle Scholar
  13. 13.
    Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech 2011, pp. 437–440 (2011)Google Scholar
  14. 14.
    Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  15. 15.
    Ze, H., Senior, A., Schuster, M.: Statistical parametric speech syn-thesis using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, pp. 7962–7966 (2013)Google Scholar
  16. 16.
    Tucker, G., Wu, M., Sun, M., Panchapagesan, S., Fu, G., Vitaladevuni, S.: Model compression applied to small-footprint keyword spotting. In: Proceedings of Interspeech, pp. 1393–1397 (2016)Google Scholar
  17. 17.
    Sindhwani, V., Sainath, T.N., Kumar, S.: Structured transforms for small-footprint deep learning. In: Neural Information Processing Systems, pp. 3088–3096 (2015)Google Scholar
  18. 18.
    Prabhavalkar, R., Alvarez, R., Parada, C., Nakkiran, P., Sainath, T.N.: Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks. In: IEEE Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 4704–4708 (2015)Google Scholar
  19. 19.
    Panchapagesan, S., Sun, M., Khare, A., Matsoukas, S., Mandal, A., Hoffmeister, B., Vitaladevuni, S.: Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In: Proceedings of Interspeech, pp. 760–764 (2016)Google Scholar
  20. 20.
    Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: A study on data augmentation of reverberant speech for robust speech In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)Google Scholar
  21. 21.
    Snyder, D., Chen, G., Povey, D.: Musan: a music, speech, and noise corpus. arXiv preprint arXiv:1510.08484 (2015)

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.School of Computer ScienceNorthwestern Polytechnical UniversityXi’anChina
  2. 2.Beijing Haitian Ruisheng Science Technology Ltd. (Speechocean)BeijingChina

Personalised recommendations