Advertisement

Deep Learning and Online Speech Activity Detection for Czech Radio Broadcasting

  • Jan Zelinka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

In this paper, enhancements of online speech activity detection (SAD) is presented. Our proposed approach combines standard signal processing methods and modern deep-learning methods which allows simultaneous training of the detector’s parts that are usually trained or designed separately. In our SAD, an NN-based early score computation system, an NN-based score smoothing system and proposed online decoding system were incorporated in a training process. Besides the CNN and DNN, spectral flux and spectral variance features are also investigated. The proposed approach was tested on a Czech Radio broadcasting corpus. The corpus was used for investigation supervised and also semi-supervised machine learning.

Keywords

Speech activity detector Differentiable decoding Semi-supervised learning 

References

  1. 1.
    Chen, J., Wang, Y., Wang, D.: A feature study for classification-based speech separation at very low signal-to-noise ratio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7039–7043, May 2014Google Scholar
  2. 2.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  3. 3.
    Hughes, T., Mierle, K.: Recurrent neural networks for voice activity detection. In: ICASSP, pp. 7378–7382 (2013)Google Scholar
  4. 4.
    Lehner, B., Widmer, G., Sonnleitner, R.: On the reduction of false positives in singing voice detection. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7480–7484 (2014)Google Scholar
  5. 5.
    Mateju, L., Cerva, P., Zdansky, J., Malek, J.: Speech activity detection in online broadcast transcription using deep neural networks and weighted finite state transducers. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5460–5464, March 2017Google Scholar
  6. 6.
    Sadjadi, S.O., Hansen, J.H.L.: Unsupervised speech activity detection using voicing measures and perceptual spectral flux. IEEE Signal Process. Lett. 20, 197–200 (2013)CrossRefGoogle Scholar
  7. 7.
    Saon, G., Thomas, S., Soltau, H., Ganapathy, S., Kingsbury, B.: The IBM speech activity detection system for the DARPA RATS program, pp. 3497–3501, January 2013Google Scholar
  8. 8.
    Sehgal, A., Kehtarnavaz, N.: A convolutional neural network smartphone app for real-time voice activity detection. IEEE Access 6, 9017–9026 (2018)CrossRefGoogle Scholar
  9. 9.
    Thomas, S., Ganapathy, S., Saon, G., Soltau, H.: Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2519–2523, May 2014Google Scholar
  10. 10.
    Thomas, S., Saon, G., Segbroeck, M.V., Narayanan, S.S.: Improvements to the IBM speech activity detection system for the DARPA RATS program. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4500–4504 (2015)Google Scholar
  11. 11.
    Zhang, X.L., Wang, D.: Boosting contextual information for deep neural network based voice activity detection. IEEE/ACM Trans. Audio, Speech, Lang. Process. 24, 252–264 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Faculty of Applied Sciences, New Technologies for the Information SocietyUniversity of West BohemiaPilsenCzech Republic

Personalised recommendations