Advertisement

Introduction and Book Overview

  • Sakriani Sakti
  • Satoshi Nakamura
  • Konstantin Markov
  • Wolfgang   Minker
Chapter
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 42)

The continuous growth of information technology is having an increasingly large impact on many aspects of our daily lives. The issue of communication via speech between human beings and information-processing machines is also becoming more important (Holmes and Holmes, 2001). A common dream is to realize a technology that allows humans to communicate or have dialogs with machines through natural and spontaneous speech, since in many cases of human-machine communication, speech is the most convenient way. It is the most natural modality for humans and thus requires no special user training (Lea, 1986). As a communication channel for human expression, speech also provides the highest capacity. This has been quantitatively demonstrated by Pierce and Kerlin (1957) and also Turn (1974), where spontaneous speech was shown to have a typical transmission rate of around 2.0 to 3.6 words per second; in contrast, handwriting conveys only about 0.4 words per second, and typing, by a skilled typist, achieves about 1.6 to 2.5 words per second. Speech communication also offers obvious benefits for individuals challenged with a variety of physical disabilities, such as loss of sight or limitations in physical motion and motor skills (Lea, 1986). A fundamental technology for achieving a speech-oriented interface is development of automatic speech recognition (ASR): A machine that can automatically recognize naturally spoken words uttered by humans. A speech waveform is produced by a sound source that propagates though the vocal tract (from larynx to lips) with different resonance properties (e.g. different formant frequencies for different vowel sound). Designing an ASR system mostly involves dealing with the acoustic properties of speech sounds and their relationship to the basic sounds of a human language including phonemes, words, phrases, and sentences (Juang and Rabiner, 2005). Figure 1.1 shows an example of a machine that recognizes the speech waveform of a human utterance as Good night."

Keywords

Speech Recognition Speech Signal Automatic Speech Recognition Vocal Tract Knowledge Source 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Sakriani Sakti
    • 1
  • Satoshi Nakamura
    • 2
  • Konstantin Markov
    • 3
  • Wolfgang   Minker
    • 4
  1. 1. NICT/ATR Spoken Language NICT/ATR Spoken Language Communication Research Laboratories KyotoJapan
  2. 2. NICT/ATR Spoken Language NICT/ATR Spoken Language Communication Research Laboratories KyotoJapan
  3. 3. NICT/ATR Spoken Language NICT/ATR Spoken Language Communication Research Laboratories KyotoJapan
  4. 4.University of UlmUlm Germany

Personalised recommendations