Hierarchical Multi-stream Posterior Based Speech Recognition System

  • Hamed Ketabdar
  • Hervé Bourlard
  • Samy Bengio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3869)


In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into account acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). These posteriors are estimated based on “state gamma posterior” definition (typically used in standard HMMs training) extended to the case of multi-stream HMMs.This approach provides a new, principled, theoretical framework for hierarchical estimation/use of posteriors, multi-stream feature combination, and integrating appropriate context and prior knowledge in posterior estimates. In the present work, we used the resulting gamma posteriors as features for a standard HMM/GMM layer. On the OGI Digits database and on a reduced vocabulary version (1000 words) of the DARPA Conversational Telephone Speech-to-text (CTS) task, this resulted in significant performance improvement, compared to the state-of-the-art Tandem systems.


Speech Recognition Automatic Speech Recognition Speech Recognition System Word Error Rate Automatic Speech Recognition System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bourlard, H., Morgan, N.: Connectionist Speech Recognition – A Hybrid Approach. Kluwer Academic Publishers, Dordrecht (1994)CrossRefGoogle Scholar
  2. 2.
    Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Computer, Speech and Language 14, 373–400 (2000)CrossRefGoogle Scholar
  3. 3.
    Abdou, S., Scordilis, M.S.: Beam search pruning in speech recognition using a posterior-based confidence measure. Speech Communication, 409–428 (2004)Google Scholar
  4. 4.
    Bernardis, G., Bourlard, H.: Improving posterior confidence measures in hybrid HMM/ANN speech recognition system. In: Proc. ICSLP, pp. 775–778 (1998)Google Scholar
  5. 5.
    Hermansky, H., Ellis, D.P.W., Sharma, S.: Connectionist Feature Extraction for Conventional HMM Systems. In: Proc. ICASSP (2000)Google Scholar
  6. 6.
    Bourlard, H., Bengio, S., Magimai Doss, M., Zhu, Q., Mesot, B., Morgan, N.: Towards using hierarchical posteriors for flexible automatic speech recognition systems. In: DARPA RT 2004 Workshop (November 2004); also IDIAP-RR 04-58Google Scholar
  7. 7.
    Ketabdar, H., Vepa, J., Bengio, S., Bourlard, H.: Developing and enhancing posterior based speech recognition systems. IDIAP RR 05-23 (2005)Google Scholar
  8. 8.
    Bourlard, H., Dupont, S.: Sub-band-based speech recognition. In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, pp. 1251–1254 (1997)Google Scholar
  9. 9.
    Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia 2(3), 141–151 (2000)CrossRefGoogle Scholar
  10. 10.
    Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRefGoogle Scholar
  11. 11.
    Hermansky, H., Sharma, S.: TRAPs: classifiers of TempoRAl Patterns. In: Proc. ICSLP 1998, Australia (November 1998)Google Scholar
  12. 12.
    Misra, H., Bourlard, H., Tyagi, V.: New entropy based combination rules in HMM/ANN multi-stream ASR. In: Proc. ICASSP (2003)Google Scholar
  13. 13.
    Cole, R., Fanty, M., Noel, M., Lander, T.: Telephone Speech Corpus Development at CSLU. In: Proc. of ISCLP, Yokohama, Japan, pp. 1815–1818 (1994)Google Scholar
  14. 14.
    Rabiner, L.R.: A tutorial on hidden Markov models and selective applications in speech recognition. Proc. IEEE 77, 257–286 (1989)CrossRefGoogle Scholar
  15. 15.
    Bengio, S.: Joint training of multi-stream HMMs. To be published as IDIAP-RR 05-22 (2005)Google Scholar
  16. 16.
    Cole, R., Noel, M., Lander, T., Durham, T.: New Telephone Speech Corpora at CSLU. In: Proc. of EUROSPEECH, Madrid, Spain, pp. 821–824 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hamed Ketabdar
    • 1
    • 2
  • Hervé Bourlard
    • 1
    • 2
  • Samy Bengio
    • 1
  1. 1.IDIAP Research InstituteMartignySwitzerland
  2. 2.Ecole Polytechnique Fédérale de Lausanne (EPFL)Switzerland

Personalised recommendations