How does a dictation machine recognize speech?

  • T. Dutoit
  • L. Couvreur
  • H. Bourlard


There is magic (or is it witchcraft?) in a speech recognizer that transcribes continuous radio speech into text with a word accuracy of even not more than 50%. The extreme difficulty of this task, though, is usually not perceived by the general public. This is because we are almost deaf to the infinite acoustic variations that accompany the production of vocal sounds, which arise not only from physiological constraints (coarticulation) but also from the acoustic environment (additive or convolutional noise, Lombard effect) or from the emotional state of the speaker (voice quality, speaking rate, hesitations, etc.)2. Our consciousness of speech is indeed not stimulated until after it has been processed by our brain to make it appear as a sequence of meaningful units: phonemes and words.


Feature Vector Hide Markov Model Gaussian Mixture Model Automatic Speech Recognition Probability Density Function 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bilmes JA (1998) A gentle tutorial of the EM algorithm and its applications t parameter estimation for Gaussian mixture and hidden Markov models. Technical Report 97–021 ,ICSI, Berkeley ,CA, USA.Google Scholar
  2. Bourlard (2007) Automatic speech and speaker recognition. In: Speech and Language Engineering, M. Rajman, ed., EPFL Press, pp 267–335Google Scholar
  3. Bourlard H, Morgan N (1994) Connectionist Speech Recognition -A Hybrid Approach. Kluwer Academic Publishers, Dordrecht.CrossRefGoogle Scholar
  4. Bourlard H, Wellekens C (1990) Links between Markov models and multilayer perceptrons. IEEE Trans on Pattern Analysis and Machine Intelligence, 12(12)CrossRefGoogle Scholar
  5. Cappé O (2001) H2M: A Set of MATLAB/OCTAVE Functions for the EM Estimation of Mixtures and Hidden Markov Models [on line] Available: [3/06/07]Google Scholar
  6. Duda RO, Hart PE, Stork DG (2000) Pattern Classification. Wiley-Interscience Gold B, Morgan N (2000) Speech and Audio Signal Processing, Processing and Perception of Speech and Music. Wiley, Chichester.Google Scholar
  7. Jelinek F (1991) Up from Trigrams! Proceedings of Eurospeech 91, Genova, vol. 3, pp 1037–1040Google Scholar
  8. Moon TK (1996) The Expectation-Maximization Algorithm. IEEE Signal Processing Magazine, 13(6), 47–60CrossRefGoogle Scholar
  9. Murphy K (2005) Hidden Markov Model (HMM) Toolbox for Matlab [on line] Available: [20/05/07]Google Scholar
  10. Picone JW (1993) Signal Modeling Techniques in Speech Recognition. Proceedings of the IEEE, 81(2), 1214–1247Google Scholar
  11. Polikar R (2006) Pattern Recognition. In: Wiley Encyclopedia of Biomedical Engineering, M. Akay, ed.. New York, WileyGoogle Scholar
  12. Rabiner LR (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 257–286CrossRefGoogle Scholar
  13. Thorvaldsen S (2005) A tutorial on Markov models based on Mendel’s classical experiments. Journal of Bioinformatics and Computational Biology, 3(6), 1441’1460. [on line] Available: [3/06/07]CrossRefGoogle Scholar
  14. Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK Book (for HTKGoogle Scholar
  15. Version 3.4) [on line] Available: [3/06/07]Google Scholar

Copyright information

© Springer Science+Business Media New York 2009

Authors and Affiliations

  • T. Dutoit
    • 1
  • L. Couvreur
    • 1
  • H. Bourlard
    • 2
  1. 1.Faculté Polytechnique de MonsBelgium
  2. 2.Ecole Polytechnique Fédérale de LausanneSwitzerland

Personalised recommendations