Abstract
This paper addresses the problem of distant speech recognition in reverberant noisy conditions employing a star-shaped microphone array and vector Taylor series (VTS) compensation. First, a beamformer yields an enhanced single-channel signal by applying convex (CVX) optimization over three spatial dimensions given the spatio-temporal position of the target speaker as prior knowledge. Then, VTS compensation is applied over the speech features extracted from the temporal signal obtained by the beamformer. Finally, the compensated features are used for speech recognition. Due to a lack of existing resources in German to evaluate the proposed enhancement framework, this paper also introduces a new speech database. In particular, we present a medium-vocabulary German database for microphone array made of embedded clean signals contaminated with real room impulsive responses and mixed in a ‘natural’ way with real noises. We show that the proposed enhancement framework performs better than other related systems on the presented database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Christensen, H., Barker, J., Ma, N., Green, P.: The chime corpus: A resource and a challenge for computational hearing in multisource environments. In: Interspeech (2010)
European project FP7. Distant-speech interaction for robust home applications (dirha) (March 2012-2015), http://dirha.fbk.eu
Habib, T., Romsdorfer, H.: Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing. In: Interspeech (2010)
Hirsch, H.G.: Experimental framework for the performance evaluation of speech recognition front-ends of large vocabulary task. Technical report, STQ AURORA DSR, Working Group (2002)
Mabande, E., Schad, A., Kellermann, W.: Design of robust superdirective beamformers as a convex optimization problem. In: ICASSP (2009)
Morales-Cordovilla, J.A., Hagmüller, M., Pessentheiner, H., Kubin, G.: Distant speech recognition in reverberant noisy conditions employing a microphone array. In: EUSIPCO (2014)
Morales-Cordovilla, J.A., Pessentheiner, H., Hagmüller, M.M., Kubin, G.: Room localization for distant speech recognition. In: Interspeech (2014)
Morales-Cordovilla, J.A., Ma, N., Sánchez, V., Carmona, J.L., Peinado, A.M., Barker, J.: A pitch based noise estimation technique for robust speech recognition with missing data. In: ICASSP, May 22-27, pp. 4808–4811 (2011)
Morales-Cordovilla, J.A., Pessentheiner, H., Hagmller, M., Mowlaee, P., Pernkopf, F., Kubin, G.: A german distant speech recognizer based on 3d beamforming and harmonic missing data mask. In: AIA-DAGA (2013)
Moreno, P.: Speech Recognition in Noisy Environments. PhD thesis, Carnegie Mellon University (1996)
Pessentheiner, H., Kubin, G., Romsdorfer, H.: Improving beamforming for distant speech recognition in reverberant environments using a genetic algorithm for planar array synthesis. In: 10th ITG Symposium on Speech Communication (2012)
Pessentheiner, H., Petrik, S., Romsdorfer, H.: Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double-talk scenarios. In: Interspeech (2012)
Schiel, F., Baumann, A.: Phondat 1, corpus version 3.4. Technical report, Bavarian Archive for Speech Signals (BAS) (2006), http://www.bas.uni-muenchen.de/Bas/BasFormatseng.html
Tashev, I.: Sound Capture and Processing: Practical Approaches. John Wiley and Sons (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Morales-Cordovilla, J.A., Pessentheiner, H., Hagmüller, M., González, J.A., Kubin, G. (2014). CVX-Optimized Beamforming and Vector Taylor Series Compensation with German ASR Employing Star-Shaped Microphone Array. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-13623-3_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)