Advertisement

Speaker-Independent Word Recognition with Backpropagation Networks

  • Bernd Freisleben
  • Christian-Arved Bohn
Conference paper

Abstract

This paper presents a system that recognizes a limited vocabulary of spoken words in a speaker-independent manner. The system requires only minimal hardware support for acoustic preprocessing. In contrast to other approaches to word-level recognition, it reduces the information content of the speech signals by a compression algorithm before presenting them as inputs to a standard 3-layer backpropagation network. The network learns to recognize the utterances of the speakers in the training set, and the trained network is then used to recognize the spoken words of unknown speakers. Recognition rates of up to 91% were obtained for unknown speakers of the same sex and up to 72% for a mix of both male and female speakers. Since the training times are fast and the system is very cost effective, the approach is practically feasible for a variety of applications.

Keywords

Word Recognition Recognition Rate Speech Recognition Speech Signal Hide Unit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Behme H, ‘A Neural Net for Recognition and Storing of Spoken Words’, In: Parallel Processing in Neural Systems and Computers, pp. 379-382, Elsevier Science Publishers, 1990.Google Scholar
  2. [2]
    Bengio Y, Cardin R, and De Mori R, ‘Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge’, In: Advances in Neural Information Processing Systems, Vol. 2, pp. 218–225, Morgan Kaufman Publishers, 1990.Google Scholar
  3. [3]
    Bourlard H, and Morgan N, ‘A Continuous Speech Recognition System Embedding MLP into HMM’, In: Advances in Neural Information Processing Systems, Vol. 2, pp. 186–193, Morgan Kaufman Publishers, 1990.Google Scholar
  4. [4]
    Franzini M A, ‘Learning to Recognize Spoken Words: A study in Connectionist Speech Recognition’, In: Proceedings of the 1988 Connectionist Models Summer School, pp. 407-416, Morgan Kaufman Publishers, 1988.Google Scholar
  5. [5]
    Grajski K A, Witmer D P, and Chen C, ‘A Preliminary Note on Static and Recurrent Neural Networks for Word-Level Speech Recognition’, In: Proceedings of the 1990 International Joint Conference on Neural Networks, Vol. 2, pp. 245–248, Lawrence Erlbaum Publishers, 1990.Google Scholar
  6. [6]
    Hampshire II J B, and Waibel A, ‘Connectionist Architectures for Multi-Speaker Phoneme Recognition’, In: Advances in Neural Information Processing Systems, Vol. 2, pp. 203–210, Morgan Kaufman Publishers, 1990.Google Scholar
  7. [7]
    Hertz J A, Krogh A, and Palmer R, ‘Introduction to the Theory of Neural Computation’, Addison-Wesley, Reading, Massachusetts, 1991.Google Scholar
  8. [8]
    Kohonen T, ‘The Neural Phonetic Typewriter’, IEEE Computer, 3: 11–22, 1988.CrossRefGoogle Scholar
  9. [9]
    Kowalewski F, and Strube H, ‘Word Recognition with a Recurrent Neural Network’, In: Parallel Processing in Neural Systems and Computers, pp. 390-394, Elsevier Publishers, 1990.Google Scholar
  10. [10]
    Lee K, ‘Context-Dependent Phonetic Hidden Markov Models for Speaker-Independent Continuous Speech Recognition’, IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(4), 1990.Google Scholar
  11. [11]
    Lee Y, and Lippmann R P, ‘Practical Characteristics of Neural Network and Conventional Pattern Classifiers on Artificial and Speech Problems’, In: Advances in Neural Information Processing Systems, Vol. 2, pp. 168–177, Morgan Kaufman Publishers, 1990.Google Scholar
  12. [12]
    Peacocke R D, and Graf D H, ‘An Introduction to Speech and Speaker Recognition’, IEEE Computer, 8: 26–33, 1990.CrossRefGoogle Scholar
  13. [13]
    Rabiner L R, and Gold B, ‘Theory and Applications of Digital Signal Processing’, Prentice-Hall, 1975.Google Scholar
  14. [14]
    Rigoll G, ‘Neural Network Based Continous Speech Recognition by Combining Self Organizing Maps and Hidden Markov Modelling’, In: Lecture Notes in Computer Science, Vol. 134, pp. 58–65, Springer-Verlag, Berlin, 1990.Google Scholar
  15. [15]
    Rumelhart, D E, Hinton, G, and Williams, R E, ‘Learning Internal Representations by Error Propagation’, In: Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. 1, 318-362, MIT PressGoogle Scholar
  16. [16]
    Sung C, and Jones W C, ‘A Speech Recognition System Featuring Neural Network Processing of Global Lexical Features’, In: Proceedings of the 1990 International Joint Conference on Neural Networks, Vol. 2, pp. 437–440, Lawrence Erlbaum Publishers, 1990.Google Scholar
  17. [17]
    Waibel A, Hanazawa T, Hinton G, Shikano K, and Lang K, ‘Phoneme Recognition Using Time-Delay Neural Networks’, IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3): 328–339, 1989.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag/Wien 1993

Authors and Affiliations

  • Bernd Freisleben
    • 1
  • Christian-Arved Bohn
    • 2
  1. 1.Dept. of Computer ScienceUniversity of DarmstadtDarmstadtGermany
  2. 2.Dept. Scientific Visualization of HLRZGMD BirlinghovenSankt Augustin 1Germany

Personalised recommendations