Advertisement

CVX-Optimized Beamforming and Vector Taylor Series Compensation with German ASR Employing Star-Shaped Microphone Array

  • Juan A. Morales-Cordovilla
  • Hannes Pessentheiner
  • Martin Hagmüller
  • José A. González
  • Gernot Kubin
Conference paper
  • 700 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)

Abstract

This paper addresses the problem of distant speech recognition in reverberant noisy conditions employing a star-shaped microphone array and vector Taylor series (VTS) compensation. First, a beamformer yields an enhanced single-channel signal by applying convex (CVX) optimization over three spatial dimensions given the spatio-temporal position of the target speaker as prior knowledge. Then, VTS compensation is applied over the speech features extracted from the temporal signal obtained by the beamformer. Finally, the compensated features are used for speech recognition. Due to a lack of existing resources in German to evaluate the proposed enhancement framework, this paper also introduces a new speech database. In particular, we present a medium-vocabulary German database for microphone array made of embedded clean signals contaminated with real room impulsive responses and mixed in a ‘natural’ way with real noises. We show that the proposed enhancement framework performs better than other related systems on the presented database.

Keywords

distant speech recognition cvx-optimized beamforming vector Taylor series compensation star-shaped microphone array reverberant and noisy environment natural mixing German database 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Christensen, H., Barker, J., Ma, N., Green, P.: The chime corpus: A resource and a challenge for computational hearing in multisource environments. In: Interspeech (2010)Google Scholar
  2. 2.
    European project FP7. Distant-speech interaction for robust home applications (dirha) (March 2012-2015), http://dirha.fbk.eu
  3. 3.
    Habib, T., Romsdorfer, H.: Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing. In: Interspeech (2010)Google Scholar
  4. 4.
    Hirsch, H.G.: Experimental framework for the performance evaluation of speech recognition front-ends of large vocabulary task. Technical report, STQ AURORA DSR, Working Group (2002)Google Scholar
  5. 5.
    Mabande, E., Schad, A., Kellermann, W.: Design of robust superdirective beamformers as a convex optimization problem. In: ICASSP (2009)Google Scholar
  6. 6.
    Morales-Cordovilla, J.A., Hagmüller, M., Pessentheiner, H., Kubin, G.: Distant speech recognition in reverberant noisy conditions employing a microphone array. In: EUSIPCO (2014)Google Scholar
  7. 7.
    Morales-Cordovilla, J.A., Pessentheiner, H., Hagmüller, M.M., Kubin, G.: Room localization for distant speech recognition. In: Interspeech (2014)Google Scholar
  8. 8.
    Morales-Cordovilla, J.A., Ma, N., Sánchez, V., Carmona, J.L., Peinado, A.M., Barker, J.: A pitch based noise estimation technique for robust speech recognition with missing data. In: ICASSP, May 22-27, pp. 4808–4811 (2011)Google Scholar
  9. 9.
    Morales-Cordovilla, J.A., Pessentheiner, H., Hagmller, M., Mowlaee, P., Pernkopf, F., Kubin, G.: A german distant speech recognizer based on 3d beamforming and harmonic missing data mask. In: AIA-DAGA (2013)Google Scholar
  10. 10.
    Moreno, P.: Speech Recognition in Noisy Environments. PhD thesis, Carnegie Mellon University (1996)Google Scholar
  11. 11.
    Pessentheiner, H., Kubin, G., Romsdorfer, H.: Improving beamforming for distant speech recognition in reverberant environments using a genetic algorithm for planar array synthesis. In: 10th ITG Symposium on Speech Communication (2012)Google Scholar
  12. 12.
    Pessentheiner, H., Petrik, S., Romsdorfer, H.: Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double-talk scenarios. In: Interspeech (2012)Google Scholar
  13. 13.
    Schiel, F., Baumann, A.: Phondat 1, corpus version 3.4. Technical report, Bavarian Archive for Speech Signals (BAS) (2006), http://www.bas.uni-muenchen.de/Bas/BasFormatseng.html
  14. 14.
    Tashev, I.: Sound Capture and Processing: Practical Approaches. John Wiley and Sons (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Juan A. Morales-Cordovilla
    • 1
  • Hannes Pessentheiner
    • 1
  • Martin Hagmüller
    • 1
  • José A. González
    • 2
  • Gernot Kubin
    • 1
  1. 1.Signal Processing and Speech Communication LaboratoryGraz University of TechnologyAustria
  2. 2.Dept. of Computer ScienceUniversity of SheffieldUK

Personalised recommendations