Skip to main content

A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition

  • Conference paper
Advances in Speech and Language Technologies for Iberian Languages

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8854))

Abstract

The inclusion of two or more microphones in smartphones is becoming quite common. These were originally intended to perform noise reduction and few benefit is still being taken from this feature for noise-robust automatic speech recognition (ASR). In this paper we propose a novel system to estimate missing-data masks for robust ASR on dual-microphone smartphones. This novel system is based on deep neural networks (DNNs), which have proven to be a powerful tool in the field of ASR in different ways. To assess the performance of the proposed technique, spectral reconstruction experiments are carried out on a dual-channel database derived from Aurora-2. Our results demonstrate that the DNN is better able to exploit the dual-channel information and yields an improvement on word accuracy of more than 6% over state-of-the-art single-channel mask estimation techniques.

This work has been supported by the MICINN TEC2013-46690-P project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. López-Espejo, I., et al.: Feature Enhancement for Robust Speech Recognition on Smartphones with Dual-Microphone. In: EUSIPCO, Lisbon (2014)

    Google Scholar 

  2. Zhang, J., et al.: A Fast Two-Microphone Noise Reduction Algorithm Based on Power Level Ratio for Mobile Phone. In: ISCSLP, Hong-Kong, pp. 206–209 (2012)

    Google Scholar 

  3. Hinton, G., et al.: Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Processing Magazine 29(6) (2012)

    Google Scholar 

  4. Seltzer, M.L., Yu, D., Wang, Y.: An Investigation of Deep Neural Networks for Noise Robust Speech Recognition. In: ICASSP, Vancouver, pp. 7398–7402 (2013)

    Google Scholar 

  5. Wang, Y., Wang, D.L.: Towards Scaling Up Classification-Based Speech Separation. IEEE Trans. on Audio, Speech, and Language Processing 21(7) (2013)

    Google Scholar 

  6. Narayanan, A., Wang, D.L.: Ideal Ratio Mask Estimation Using Deep Neural Networks for Robust Speech Recognition. In: ICASSP, Vancouver (2013)

    Google Scholar 

  7. Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of Missing Features for Robust Speech Recognition. Speech Comm. 48(4), 275–296 (2004)

    Article  Google Scholar 

  8. González, J.A., Peinado, A.M., Ma, N., Gomez, A.M., Barker, J.: MMSE-Based Missing-Feature Reconstruction with Temporal Modeling for Robust Speech Recognition. IEEE Trans. on Audio, Speech and Language Proc. 21(3) (2013)

    Google Scholar 

  9. Cooke, M., et al.: Robust Automatic Speech Recognition with Missing Data and Unreliable Acoustic Data. Speech Communication 34, 267–285 (2001)

    Article  MATH  Google Scholar 

  10. Pearce, D., Hirsch, H.G.: The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems Under Noisy Conditions. In: ICSLP, Beijing (2000)

    Google Scholar 

  11. Roweis, S.T.: Factorial Models and Refiltering for Speech Separation and Denoising. In: EUROSPEECH, Geneva, pp. 1009–1012 (2003)

    Google Scholar 

  12. Hinton, G., Salakhutdinov, R.: Reducing the Dimensionality of Data with Neural Networks. Science 313(5786) (2006)

    Google Scholar 

  13. Hinton, G.: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation 14, 1771–1800 (2002)

    Article  MATH  Google Scholar 

  14. ETSI ES 201 108 - Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms

    Google Scholar 

  15. Ephraim, Y., Malah, D.: Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-32(6), 1109–1121 (1984)

    Article  Google Scholar 

  16. Hinton, G.: A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010-003 (2010)

    Google Scholar 

  17. Tanaka, M.: Deep Neural Network Toolbox for MatLab (2013)

    Google Scholar 

  18. ETSI ES 202 050 - Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms

    Google Scholar 

  19. Deng, L., et al.: Large-Vocabulary Speech Recognition Under Adverse Acoustic Environments. In: ICSLP, Beijing, pp. 806–809 (2000)

    Google Scholar 

  20. González, J.A., et al.: Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition. IEEE Trans. on Audio, Speech, and Language Proc. 19(5), 1206–1220 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

López-Espejo, I., González, J.A., Gómez, Á.M., Peinado, A.M. (2014). A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13623-3_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13622-6

  • Online ISBN: 978-3-319-13623-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics