Introduction and Book Overview
The continuous growth of information technology is having an increasingly large impact on many aspects of our daily lives. The issue of communication via speech between human beings and information-processing machines is also becoming more important (Holmes and Holmes, 2001). A common dream is to realize a technology that allows humans to communicate or have dialogs with machines through natural and spontaneous speech, since in many cases of human-machine communication, speech is the most convenient way. It is the most natural modality for humans and thus requires no special user training (Lea, 1986). As a communication channel for human expression, speech also provides the highest capacity. This has been quantitatively demonstrated by Pierce and Kerlin (1957) and also Turn (1974), where spontaneous speech was shown to have a typical transmission rate of around 2.0 to 3.6 words per second; in contrast, handwriting conveys only about 0.4 words per second, and typing, by a skilled typist, achieves about 1.6 to 2.5 words per second. Speech communication also offers obvious benefits for individuals challenged with a variety of physical disabilities, such as loss of sight or limitations in physical motion and motor skills (Lea, 1986). A fundamental technology for achieving a speech-oriented interface is development of automatic speech recognition (ASR): A machine that can automatically recognize naturally spoken words uttered by humans. A speech waveform is produced by a sound source that propagates though the vocal tract (from larynx to lips) with different resonance properties (e.g. different formant frequencies for different vowel sound). Designing an ASR system mostly involves dealing with the acoustic properties of speech sounds and their relationship to the basic sounds of a human language including phonemes, words, phrases, and sentences (Juang and Rabiner, 2005). Figure 1.1 shows an example of a machine that recognizes the speech waveform of a human utterance as Good night."
KeywordsSpeech Recognition Speech Signal Automatic Speech Recognition Vocal Tract Knowledge Source
Unable to display preview. Download preview PDF.