A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition

López-Espejo, Iván; González, José A.; Gómez, Ángel M.; Peinado, Antonio M.

doi:10.1007/978-3-319-13623-3_13

Iván López-Espejo²³,
José A. González²⁴,
Ángel M. Gómez²³ &
…
Antonio M. Peinado²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8854))

864 Accesses
8 Citations

Abstract

The inclusion of two or more microphones in smartphones is becoming quite common. These were originally intended to perform noise reduction and few benefit is still being taken from this feature for noise-robust automatic speech recognition (ASR). In this paper we propose a novel system to estimate missing-data masks for robust ASR on dual-microphone smartphones. This novel system is based on deep neural networks (DNNs), which have proven to be a powerful tool in the field of ASR in different ways. To assess the performance of the proposed technique, spectral reconstruction experiments are carried out on a dual-channel database derived from Aurora-2. Our results demonstrate that the DNN is better able to exploit the dual-channel information and yields an improvement on word accuracy of more than 6% over state-of-the-art single-channel mask estimation techniques.

This work has been supported by the MICINN TEC2013-46690-P project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

López-Espejo, I., et al.: Feature Enhancement for Robust Speech Recognition on Smartphones with Dual-Microphone. In: EUSIPCO, Lisbon (2014)
Google Scholar
Zhang, J., et al.: A Fast Two-Microphone Noise Reduction Algorithm Based on Power Level Ratio for Mobile Phone. In: ISCSLP, Hong-Kong, pp. 206–209 (2012)
Google Scholar
Hinton, G., et al.: Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Processing Magazine 29(6) (2012)
Google Scholar
Seltzer, M.L., Yu, D., Wang, Y.: An Investigation of Deep Neural Networks for Noise Robust Speech Recognition. In: ICASSP, Vancouver, pp. 7398–7402 (2013)
Google Scholar
Wang, Y., Wang, D.L.: Towards Scaling Up Classification-Based Speech Separation. IEEE Trans. on Audio, Speech, and Language Processing 21(7) (2013)
Google Scholar
Narayanan, A., Wang, D.L.: Ideal Ratio Mask Estimation Using Deep Neural Networks for Robust Speech Recognition. In: ICASSP, Vancouver (2013)
Google Scholar
Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of Missing Features for Robust Speech Recognition. Speech Comm. 48(4), 275–296 (2004)
Article Google Scholar
González, J.A., Peinado, A.M., Ma, N., Gomez, A.M., Barker, J.: MMSE-Based Missing-Feature Reconstruction with Temporal Modeling for Robust Speech Recognition. IEEE Trans. on Audio, Speech and Language Proc. 21(3) (2013)
Google Scholar
Cooke, M., et al.: Robust Automatic Speech Recognition with Missing Data and Unreliable Acoustic Data. Speech Communication 34, 267–285 (2001)
Article MATH Google Scholar
Pearce, D., Hirsch, H.G.: The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems Under Noisy Conditions. In: ICSLP, Beijing (2000)
Google Scholar
Roweis, S.T.: Factorial Models and Refiltering for Speech Separation and Denoising. In: EUROSPEECH, Geneva, pp. 1009–1012 (2003)
Google Scholar
Hinton, G., Salakhutdinov, R.: Reducing the Dimensionality of Data with Neural Networks. Science 313(5786) (2006)
Google Scholar
Hinton, G.: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation 14, 1771–1800 (2002)
Article MATH Google Scholar
ETSI ES 201 108 - Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms
Google Scholar
Ephraim, Y., Malah, D.: Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-32(6), 1109–1121 (1984)
Article Google Scholar
Hinton, G.: A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010-003 (2010)
Google Scholar
Tanaka, M.: Deep Neural Network Toolbox for MatLab (2013)
Google Scholar
ETSI ES 202 050 - Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms
Google Scholar
Deng, L., et al.: Large-Vocabulary Speech Recognition Under Adverse Acoustic Environments. In: ICSLP, Beijing, pp. 806–809 (2000)
Google Scholar
González, J.A., et al.: Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition. IEEE Trans. on Audio, Speech, and Language Proc. 19(5), 1206–1220 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Signal Theory, Telematics and Communications, University of Granada, Spain
Iván López-Espejo, Ángel M. Gómez & Antonio M. Peinado
Dept. of Computer Science, University of Sheffield, UK
José A. González

Authors

Iván López-Espejo
View author publications
You can also search for this author in PubMed Google Scholar
José A. González
View author publications
You can also search for this author in PubMed Google Scholar
Ángel M. Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Antonio M. Peinado
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ETSIT, Las Palmas de Gran Canaria, Spain
Juan Luis Navarro Mesa , Eduardo Hernández Pérez , Pedro Quintana Morales , Antonio Ravelo García & Iván Guerra Moreno , , , &
University of Zaragoza, Spain
Alfonso Ortega
Dep. of Electronics, Telecommunications and Informatics Engineering, University of Aveiro, Portugal
António Teixeira
ATVS Biometric Recognition Group,, Universidad Autónoma de Madrid, Spain
Doroteo T. Toledano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

López-Espejo, I., González, J.A., Gómez, Á.M., Peinado, A.M. (2014). A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-13623-3_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics