Acoustic-Phonetic Decoding of Speech

Statistical Modeling for Phonetic Recognition
  • Richard M. Schwartz
  • Y. Chow
  • M. Dunham
  • O. Kimball
  • M. Krasner
  • F. Kubala
  • J. Makhoul
  • P. Price
  • S. Roucos
Part of the NATO ASI Series book series (volume 46)

Abstract

Several methods for acoustic-phonetic decoding are reviewed. Emphasis is placed on the need for mathematical methods for speech recognition. Several examples of statistical methods are described. The author presents several techniques for incorporating “speech knowledge” into these statistical models, and provides a simple formalism for using multiple knowledge sources in a coherent speech recognition system.

Keywords

Covariance Acoustics Dunham Diphone 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. /BAH 75/.
    L.R. Bahl and F. Jelinek. Decoding for Channels with Insertions, Deletions, and Substitutions with Applications to Speech Recognition. IEEE Trans. Inform. Theory IT-21(4):404–411, July, 1975.MATHCrossRefGoogle Scholar
  2. /BAH 83/.
    L.R. Bahl, F. Jelinek, and R.L. Mercer. A Maximum Likelihood Approach to Continuous Speech Recognition. IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-5(2): 179–190, March, 1983.CrossRefGoogle Scholar
  3. /BAH 88/.
    L.R. Bahl, P.F. Brown, P.V. deSouza, R. L. Mercer, and M.A. Picheny. A Method for the Construction of Acoustic Markov Models for Words. In IEEE Int. Conf. Acoust., Speech, Signal Processing. New York, NY, April, 1988.Google Scholar
  4. /BAK 75/.
    J.K. Baker. Stochastic Modeling for Automatic Speech Understanding. In Raj Reddy (editor), Speech Recognition, chapter Part Five:systems Organization and Analysis Systems, pages 521–542. Academic Press, New York, 1975.Google Scholar
  5. /BAU 67/.
    L.E. Baum and J.A. Eagon. An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model of Ecology. Amer. Math Soc. Bulletin 73,:360–362, 1967.MathSciNetMATHCrossRefGoogle Scholar
  6. /BRI 79/.
    J.S. Bridle and M.D. Brown. Connected Word Recognition Using Whole Word Templates. In Proc. of the Inst, of Acoustics. Autumn, 1979.Google Scholar
  7. /CHO 86/.
    /CHO 86/ Y.L. Chow, R.M. Schwartz, S. Roucos, O.A. Kimball, P.J. Price, G.F. Kubala, M.O. Dunham, M.A. Krasner, and J. Makhoul. The Role of Word-Dependent Coarticulatory Effects in a Phoneme-Based Speech Recognition System. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 1593–1596. Tokyo, Japan, April, 1986. Paper No. 30.9.Google Scholar
  8. /CHO 87/.
    Y.L. Chow, M.O. Dunham, O.A. Kimball, M.A. Krasner, G.F. Kubala, J. Makhoul, P.J. Price, S. Roucos, and R.M. Schwartz. BYBLOS: The BBN Continuous Soeech Recognition System. In IEEE Int. Conf. Acoust., Speech, Signú Processing, pages 89–92. Dallas, TX, April, 1987. Paper No. 3.7.Google Scholar
  9. /FEN 88/.
    M W Feng, F. Kubala, R.M. Schwartz, and J. Makhoul. Improved Speaker Adaptation using Text-Dependent Spectral Mappings. In IEEE Int. Conf. Acoust., Speech, Signal Processing. New York, NY, April, 1988.Google Scholar
  10. /KUB 88/.
    F. Kubala, Y. Chow, A. Derr, M. Feng, O. Kimball, J. Makhoul, P. Price, J. Rohlicek, S. Roucos, R. Schwartz, and J. Vandegrift. Continuous Speech Recognition Results of the BYBLOS System on the DARPA 1000-Word Resource Management Database. In IEEE Int. Conf. Acoust., Speech, Signal Processing. New York, NY, April, 1988.Google Scholar
  11. /LEE 87/.
    K.F. Lee. Speaker-Independent Continuous Speech Recognition Using Hidden Markov Models. In NATOAdvancedStudylnstitute. Bad Windheim, FR Germany, July, 1987. elsewhere in this volume.Google Scholar
  12. /LEV 83/.
    S.E. Levinson, L.R. Rabiner, and M.M. Sondhi. Speaker Independent Isolated Digit Recognition Using Hidden Markov Models. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 1049–1052. Boston, MA, April, 1983.Google Scholar
  13. /NEY 87/.
    H. Ney. Dynamic Programming Speech Recognition Using a Context Free Grammar. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 69–72. Dallas, TX, April, 1987. Paper No. 3.2.Google Scholar
  14. /PAE 87/.
    /PAE 87/ A. Paeseler. Modification of Earley’s Algorithm for Speech Understanding. In NATOAdvancedStudy Institute. Bad Windheim, FR Germany, July, 1987. elsewhere in this volume.Google Scholar
  15. /PRI 88/.
    P. Price, W. Fisher, J. Bernstein, and D. Pallett. The DARPA 1000-Word Resource Management Database for Continuous Speech Recognition. In IEEE Int. Conf. Acoust., Speech, Signal Processing. New York, NY, April, 1988.Google Scholar
  16. /RAB 87/.
    L.R. Rabiner. Mathematical Foundations and Applications of HMM. In NATOAdvancedStudy Institute. Bad Windheim, FR Germany, July, 1987. Invited paper. Elsewhere in this volume.Google Scholar
  17. /ROU 82/.
    S. Roucos, R. Schwartz, and J. Makhoul. Segment Quantization for Very-Low-Rate Speech Coding. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 1565–1569. Paris, France, May, 1982.Google Scholar
  18. /ROU 83/.
    S. Roucos, R. Schwartz, and J. Makhoul. A Segment Vocoder at 150 B/S. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 61–64. Boston, MA, April, 1983.Google Scholar
  19. /ROU 87/.
    S. Roucos and M.O. Dunham. A Stochastic Segment Model for Phoneme-Based Continuous Speech Recognition. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 73–89. Dallas, TX, April, 1987. Paper No. 3.3.Google Scholar
  20. /SCH 84/.
    R.M. Schwartz, Y. Chow, S. Roucos, M. Krasner, and J. Makhoul. Improved Hidden Markov Modeling of Phonemes for Continuous Speech Recognition. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 35.6.1–35.o.4. San Diego, CA, March, 1984.Google Scholar
  21. /SCH 85/.
    /SCH 85/ R.M. Schwartz, Y.L. Chow, O.A. Kimball, S. Roucos, M. Krasner, and J. Makhoul. Context-Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 1205–1208. Tampa, FL, March, 1985. Paper No. 31.3.Google Scholar
  22. /SCH 87/.
    R.M. Schwartz, Y.L. Chow, G.F. Kubala. Rapid Speaker Adaptation using a Probabilistic Spectral Mapping. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 633–636. Dallas, TX, Apnf, 1987. Paper No. 15.3.Google Scholar
  23. /VIN 76/.
    T.K. Vintsiuk. Generative Grammars and Dynamic Programming in Speech Recognition with Learning. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 446–449. Philadelphia, PA, April, 1976.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • Richard M. Schwartz
    • 1
  • Y. Chow
    • 1
  • M. Dunham
    • 1
  • O. Kimball
    • 1
  • M. Krasner
    • 1
  • F. Kubala
    • 1
  • J. Makhoul
    • 1
  • P. Price
    • 1
  • S. Roucos
    • 1
  1. 1.BBN Laboratories Inc.CambridgeUSA

Personalised recommendations