How does a dictation machine recognize speech?
There is magic (or is it witchcraft?) in a speech recognizer that transcribes continuous radio speech into text with a word accuracy of even not more than 50%. The extreme difficulty of this task, though, is usually not perceived by the general public. This is because we are almost deaf to the infinite acoustic variations that accompany the production of vocal sounds, which arise not only from physiological constraints (coarticulation) but also from the acoustic environment (additive or convolutional noise, Lombard effect) or from the emotional state of the speaker (voice quality, speaking rate, hesitations, etc.)2. Our consciousness of speech is indeed not stimulated until after it has been processed by our brain to make it appear as a sequence of meaningful units: phonemes and words.
KeywordsFeature Vector Hide Markov Model Gaussian Mixture Model Automatic Speech Recognition Probability Density Function
Unable to display preview. Download preview PDF.
- Bilmes JA (1998) A gentle tutorial of the EM algorithm and its applications t parameter estimation for Gaussian mixture and hidden Markov models. Technical Report 97–021 ,ICSI, Berkeley ,CA, USA.Google Scholar
- Bourlard (2007) Automatic speech and speaker recognition. In: Speech and Language Engineering, M. Rajman, ed., EPFL Press, pp 267–335Google Scholar
- Cappé O (2001) H2M: A Set of MATLAB/OCTAVE Functions for the EM Estimation of Mixtures and Hidden Markov Models [on line] Available: http://www.tsi.enst.fr/~cappe/h2m/h2m.html [3/06/07]Google Scholar
- Duda RO, Hart PE, Stork DG (2000) Pattern Classification. Wiley-Interscience Gold B, Morgan N (2000) Speech and Audio Signal Processing, Processing and Perception of Speech and Music. Wiley, Chichester.Google Scholar
- Jelinek F (1991) Up from Trigrams! Proceedings of Eurospeech 91, Genova, vol. 3, pp 1037–1040Google Scholar
- Murphy K (2005) Hidden Markov Model (HMM) Toolbox for Matlab [on line] Available: http://www.cs.ubc.ca/~murphyk/Sofr [20/05/07]Google Scholar
- Picone JW (1993) Signal Modeling Techniques in Speech Recognition. Proceedings of the IEEE, 81(2), 1214–1247Google Scholar
- Polikar R (2006) Pattern Recognition. In: Wiley Encyclopedia of Biomedical Engineering, M. Akay, ed.. New York, WileyGoogle Scholar
- Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK Book (for HTKGoogle Scholar
- Version 3.4) [on line] Available: http://htk.eng.cam.ac.uk/prot-docs/htkbook.pdf [3/06/07]Google Scholar