Recognition of Distant Voice Commands for Home Applications in Portuguese

  • Miguel Matos
  • Alberto Abad
  • Ramón Astudillo
  • Isabel Trancoso
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)


This paper presents a set of exploratory experiments addressed to analyse and evaluate the performance of baseline speech processing components in European Portuguese for distant voice command recognition applications in domestic environments. The analysis, conducted in a multi-channel multi-room scenario, showed the importance of adequate room detection and channel selection strategies to obtain acceptable performances. Two different computationally inexpensive channel selection measures for room detection, channel selection and cluster selection have been investigated. Experimental results show that the strategies based on envelope-variance measure consistently outperformed the remaining methods investigated, and particularly, that channel selection strategies can be more convenient than baseline beamforming methods, such as delay-and-sum, for this type of multi-room scenarios.


distant speech recognition multi-microphone processing beamforming microphone selection home control applications 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Young, S., et al.: HTK – Hidden Markov Model Toolkit, Manual (2006),
  2. 2.
    Neto, J.P., Martins, C.A., Meinedo, H., Almeida, L.B.: The design of a large vocabulary speech corpus for Portuguese. In: Proc. Eurospeech, pp. 1707–1710 (1997)Google Scholar
  3. 3.
    Potamianos, G., et al.: Robustness of distant–speech recognition and speaker identification-development of baseline system. Deliverable D4.1, DIRHA Consortium (February 2013)Google Scholar
  4. 4.
    Hagmüller, M., et al.: Experimental task definitions. Deliverable D2.2, DIRHA Consortium (February 2013)Google Scholar
  5. 5.
    Ravanelli, M., et al.: DIRHA-simcorpora I and II. Deliverables 2.1, 2.3, 2.4, DIRHA Consortium (February 2014)Google Scholar
  6. 6.
    Johnson, D., Dudgeon, D.: Array signal processing: concepts and techniques. Prentice Hall (1993)Google Scholar
  7. 7.
    Wolf, M., Nadeu, C.: On the potential of channel selection for recognition of reverberated speech with multiple microphones. In: Proc. Interspeech, pp. 80–83 (2010)Google Scholar
  8. 8.
    Wolf, M., Nadeu, C.: Channel selection using N-Best hypothesis for multi-microphone ASR. In: Proc. Interspeech (2013)Google Scholar
  9. 9.
    Wolf, M.: Channel selection and reverberation-robust automatic speech recognition. PhD, Universitat Politècnica de Catalunya (UPC) (2013)Google Scholar
  10. 10.
    Cristoforetti, L., Ravanelli, M., Omologo, M., Sosi, A., Abad, A., Hagmüller, M., Maragos, P.: The DIRHA simulated corpus. In: Proc. LREC (2014)Google Scholar
  11. 11.
    Abad, A., et al.: First report on novel techniques for distant-speech and speaker recognition. Deliverable D4.2, DIRHA Consortium (February 2014)Google Scholar
  12. 12.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing, 19–41 (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Miguel Matos
    • 1
    • 2
  • Alberto Abad
    • 1
    • 2
  • Ramón Astudillo
    • 1
  • Isabel Trancoso
    • 1
    • 2
  1. 1.L2F - Spoken Language Systems LabINESC-IDLisboaPortugal
  2. 2.IST - Instituto Superior TécnicoUniversity of LisbonPortugal

Personalised recommendations