CVX-Optimized Beamforming and Vector Taylor Series Compensation with German ASR Employing Star-Shaped Microphone Array

Morales-Cordovilla, Juan A.; Pessentheiner, Hannes; Hagmüller, Martin; González, José A.; Kubin, Gernot

doi:10.1007/978-3-319-13623-3_16

Juan A. Morales-Cordovilla²³,
Hannes Pessentheiner²³,
Martin Hagmüller²³,
José A. González²⁴ &
…
Gernot Kubin²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8854))

840 Accesses

Abstract

This paper addresses the problem of distant speech recognition in reverberant noisy conditions employing a star-shaped microphone array and vector Taylor series (VTS) compensation. First, a beamformer yields an enhanced single-channel signal by applying convex (CVX) optimization over three spatial dimensions given the spatio-temporal position of the target speaker as prior knowledge. Then, VTS compensation is applied over the speech features extracted from the temporal signal obtained by the beamformer. Finally, the compensated features are used for speech recognition. Due to a lack of existing resources in German to evaluate the proposed enhancement framework, this paper also introduces a new speech database. In particular, we present a medium-vocabulary German database for microphone array made of embedded clean signals contaminated with real room impulsive responses and mixed in a ‘natural’ way with real noises. We show that the proposed enhancement framework performs better than other related systems on the presented database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Christensen, H., Barker, J., Ma, N., Green, P.: The chime corpus: A resource and a challenge for computational hearing in multisource environments. In: Interspeech (2010)
Google Scholar
European project FP7. Distant-speech interaction for robust home applications (dirha) (March 2012-2015), http://dirha.fbk.eu
Habib, T., Romsdorfer, H.: Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing. In: Interspeech (2010)
Google Scholar
Hirsch, H.G.: Experimental framework for the performance evaluation of speech recognition front-ends of large vocabulary task. Technical report, STQ AURORA DSR, Working Group (2002)
Google Scholar
Mabande, E., Schad, A., Kellermann, W.: Design of robust superdirective beamformers as a convex optimization problem. In: ICASSP (2009)
Google Scholar
Morales-Cordovilla, J.A., Hagmüller, M., Pessentheiner, H., Kubin, G.: Distant speech recognition in reverberant noisy conditions employing a microphone array. In: EUSIPCO (2014)
Google Scholar
Morales-Cordovilla, J.A., Pessentheiner, H., Hagmüller, M.M., Kubin, G.: Room localization for distant speech recognition. In: Interspeech (2014)
Google Scholar
Morales-Cordovilla, J.A., Ma, N., Sánchez, V., Carmona, J.L., Peinado, A.M., Barker, J.: A pitch based noise estimation technique for robust speech recognition with missing data. In: ICASSP, May 22-27, pp. 4808–4811 (2011)
Google Scholar
Morales-Cordovilla, J.A., Pessentheiner, H., Hagmller, M., Mowlaee, P., Pernkopf, F., Kubin, G.: A german distant speech recognizer based on 3d beamforming and harmonic missing data mask. In: AIA-DAGA (2013)
Google Scholar
Moreno, P.: Speech Recognition in Noisy Environments. PhD thesis, Carnegie Mellon University (1996)
Google Scholar
Pessentheiner, H., Kubin, G., Romsdorfer, H.: Improving beamforming for distant speech recognition in reverberant environments using a genetic algorithm for planar array synthesis. In: 10th ITG Symposium on Speech Communication (2012)
Google Scholar
Pessentheiner, H., Petrik, S., Romsdorfer, H.: Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double-talk scenarios. In: Interspeech (2012)
Google Scholar
Schiel, F., Baumann, A.: Phondat 1, corpus version 3.4. Technical report, Bavarian Archive for Speech Signals (BAS) (2006), http://www.bas.uni-muenchen.de/Bas/BasFormatseng.html
Tashev, I.: Sound Capture and Processing: Practical Approaches. John Wiley and Sons (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Signal Processing and Speech Communication Laboratory, Graz University of Technology, Austria
Juan A. Morales-Cordovilla, Hannes Pessentheiner, Martin Hagmüller & Gernot Kubin
Dept. of Computer Science, University of Sheffield, UK
José A. González

Authors

Juan A. Morales-Cordovilla
View author publications
You can also search for this author in PubMed Google Scholar
Hannes Pessentheiner
View author publications
You can also search for this author in PubMed Google Scholar
Martin Hagmüller
View author publications
You can also search for this author in PubMed Google Scholar
José A. González
View author publications
You can also search for this author in PubMed Google Scholar
Gernot Kubin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ETSIT, Las Palmas de Gran Canaria, Spain
Juan Luis Navarro Mesa , Eduardo Hernández Pérez , Pedro Quintana Morales , Antonio Ravelo García & Iván Guerra Moreno , , , &
University of Zaragoza, Spain
Alfonso Ortega
Dep. of Electronics, Telecommunications and Informatics Engineering, University of Aveiro, Portugal
António Teixeira
ATVS Biometric Recognition Group,, Universidad Autónoma de Madrid, Spain
Doroteo T. Toledano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morales-Cordovilla, J.A., Pessentheiner, H., Hagmüller, M., González, J.A., Kubin, G. (2014). CVX-Optimized Beamforming and Vector Taylor Series Compensation with German ASR Employing Star-Shaped Microphone Array. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-13623-3_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics