Skip to main content

CVX-Optimized Beamforming and Vector Taylor Series Compensation with German ASR Employing Star-Shaped Microphone Array

  • Conference paper
Advances in Speech and Language Technologies for Iberian Languages

Abstract

This paper addresses the problem of distant speech recognition in reverberant noisy conditions employing a star-shaped microphone array and vector Taylor series (VTS) compensation. First, a beamformer yields an enhanced single-channel signal by applying convex (CVX) optimization over three spatial dimensions given the spatio-temporal position of the target speaker as prior knowledge. Then, VTS compensation is applied over the speech features extracted from the temporal signal obtained by the beamformer. Finally, the compensated features are used for speech recognition. Due to a lack of existing resources in German to evaluate the proposed enhancement framework, this paper also introduces a new speech database. In particular, we present a medium-vocabulary German database for microphone array made of embedded clean signals contaminated with real room impulsive responses and mixed in a ‘natural’ way with real noises. We show that the proposed enhancement framework performs better than other related systems on the presented database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Christensen, H., Barker, J., Ma, N., Green, P.: The chime corpus: A resource and a challenge for computational hearing in multisource environments. In: Interspeech (2010)

    Google Scholar 

  2. European project FP7. Distant-speech interaction for robust home applications (dirha) (March 2012-2015), http://dirha.fbk.eu

  3. Habib, T., Romsdorfer, H.: Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing. In: Interspeech (2010)

    Google Scholar 

  4. Hirsch, H.G.: Experimental framework for the performance evaluation of speech recognition front-ends of large vocabulary task. Technical report, STQ AURORA DSR, Working Group (2002)

    Google Scholar 

  5. Mabande, E., Schad, A., Kellermann, W.: Design of robust superdirective beamformers as a convex optimization problem. In: ICASSP (2009)

    Google Scholar 

  6. Morales-Cordovilla, J.A., Hagmüller, M., Pessentheiner, H., Kubin, G.: Distant speech recognition in reverberant noisy conditions employing a microphone array. In: EUSIPCO (2014)

    Google Scholar 

  7. Morales-Cordovilla, J.A., Pessentheiner, H., Hagmüller, M.M., Kubin, G.: Room localization for distant speech recognition. In: Interspeech (2014)

    Google Scholar 

  8. Morales-Cordovilla, J.A., Ma, N., Sánchez, V., Carmona, J.L., Peinado, A.M., Barker, J.: A pitch based noise estimation technique for robust speech recognition with missing data. In: ICASSP, May 22-27, pp. 4808–4811 (2011)

    Google Scholar 

  9. Morales-Cordovilla, J.A., Pessentheiner, H., Hagmller, M., Mowlaee, P., Pernkopf, F., Kubin, G.: A german distant speech recognizer based on 3d beamforming and harmonic missing data mask. In: AIA-DAGA (2013)

    Google Scholar 

  10. Moreno, P.: Speech Recognition in Noisy Environments. PhD thesis, Carnegie Mellon University (1996)

    Google Scholar 

  11. Pessentheiner, H., Kubin, G., Romsdorfer, H.: Improving beamforming for distant speech recognition in reverberant environments using a genetic algorithm for planar array synthesis. In: 10th ITG Symposium on Speech Communication (2012)

    Google Scholar 

  12. Pessentheiner, H., Petrik, S., Romsdorfer, H.: Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double-talk scenarios. In: Interspeech (2012)

    Google Scholar 

  13. Schiel, F., Baumann, A.: Phondat 1, corpus version 3.4. Technical report, Bavarian Archive for Speech Signals (BAS) (2006), http://www.bas.uni-muenchen.de/Bas/BasFormatseng.html

  14. Tashev, I.: Sound Capture and Processing: Practical Approaches. John Wiley and Sons (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Morales-Cordovilla, J.A., Pessentheiner, H., Hagmüller, M., González, J.A., Kubin, G. (2014). CVX-Optimized Beamforming and Vector Taylor Series Compensation with German ASR Employing Star-Shaped Microphone Array. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13623-3_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13622-6

  • Online ISBN: 978-3-319-13623-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics