Skip to main content

Part of the book series: NATO ASI Series ((NATO ASI F,volume 169))

Summary

Major speech production models from speech science literature and a number of popular statistical “generative” models of speech used in speech technology are surveyed. Strengths and weaknesses of these two styles of speech models are analyzed, pointing to the need to integrate the respective strengths while eliminating the respective weaknesses. As an example, a statistical task-dynamic model of speech production is described, motivated by the original deterministic version of the model and targeted for integrated-multilingual speech recognition applications. Methods for model parameter learning (training) and for likelihood computation (recognition) are described based on statistical optimization principles integrated in neural network and dynamic system theories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bakis R. (1993), “An articulatory-like speech production model with controlled use of prior knowledge,” notes from Frontiers in Speech Processing, CD-ROM.

    Google Scholar 

  2. Blackburn C., and Young.S. (1995), “Towards improved speech recognition using a speech production model,” Proc. Eurospeech, vol. 2, pp. 1623-1626.

    Google Scholar 

  3. Deng L. (1992)“A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal,” Signal Processing, vol. 27, pp. 65–78.

    Article  MATH  Google Scholar 

  4. Deng L. (1993)“Design of a feature-based speech recognizer aiming at integration of auditory processing, signal modeling, and phonological structure of speech.” J ASAvol. 93(4) Pt. 2, pp. 2318

    Google Scholar 

  5. Deng L. (1992-1993)“A Computational Model of the Phonology-Phonetics Interface for Automatic Speech Recognition,” Summary Report of Research in Spoken Language Systems, Laboratory for Computer Science, MIT.

    Google Scholar 

  6. Deng L. and Aksmanovic M. (1997)“Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions,” IEEE Trans. Speech Audio Processing, vol. 5, pp. 319–324.

    Article  Google Scholar 

  7. Deng L. and Sameti H. (1996)“Transitional speech units and their representation by the regressive Markov states: Applications to speech recognition,” IEEE Trans. Speech Audio Proc., vol. 4(4), pp. 301–306.

    Article  Google Scholar 

  8. Deng L. and Sun D. (1994), “A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features,” JASA, vol. 95, pp. 2702–2719.

    Google Scholar 

  9. Deng L., Ramsay L., and Sun D. (1997) “Production models as a structural basis for automatic speech recognition,” Speech Communication, August issue.

    Google Scholar 

  10. Digalakis V., Rohlicek J., and Ostendorf M., (1993)“ML estimation of a stochastic linear system with the EMalgorithm and its application to speech recognition,” IEEE Trans. Speech Audio Processing, pp. 431–442.

    Google Scholar 

  11. Ghitza O., and Sondhi M. (1993) “Hidden Markov models with templates as nonsta-tionary states: an application to speech recognition,” Computer Speech and Language, vol. 7, pp. 101–119

    Article  Google Scholar 

  12. Gales M. and Young S. (1993) “Segmental HMMs for speech recognition,” Proc. Eurospeech, pp. 1579–1582.

    Google Scholar 

  13. Gersch W. (1992) “Smoothness priors,” in New Directions in Time Series AnalysisD. Brillinger et al. (eds.), Springer, New York, pp. 111–146.

    Google Scholar 

  14. Gish H. and Ng K. (1993) “A segmental speech model with applications to word spotting,” Proc. ICASSP, pp. 447–450.

    Google Scholar 

  15. Haykin S. (1994) Neural Networks—A Comprehensive Foundation, Maxwell Macmil-lan, Toronto.

    MATH  Google Scholar 

  16. Holmes W. and Russell M. (1995)“Speech recognition using a linear dynamic segmental HMM,” Proc. Eurospeech, pp. 1611–1641.

    Google Scholar 

  17. Kent R., Adams S. and Turner G. (1995 “Models of speech production,” in Principles of Experimental Phonetics, Ed. N. Lass, Mosby: London, pp. 3–45.

    Google Scholar 

  18. Kitagawa G. and W. Gersch W. (1996) Smoothness Priors Analysis of Time Series, Springer, New York.

    Book  MATH  Google Scholar 

  19. Kohn R. and Ansley C. (1988)“Equivalence between Bayesian smoothness priors and optimal smoothing for function estimation,” in Bayesian Analysis of Time Series and Dynamic Models, J. Spall (ed.), Marcel Dekker, New York, pp. 393–430.

    Google Scholar 

  20. McGowan R. (1994)“Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests,” Speech Communication, 14, pp. 19–48.

    Article  Google Scholar 

  21. Mendel J. (1995) Lessons in Estimation Theory for Signal Processing, Communications, and Control, Prentice Hall, New Jersey.

    MATH  Google Scholar 

  22. Moore R. (1994) “Twenty things we still don-t know about speech,” Proc. CRIM/FORWISS Workshop on Speech Research and Technology, pp. 1–9.

    Google Scholar 

  23. Ostendorf M. (1996)“From HMMs to segment models,” in Automatic Speech and Speaker Recognition -Advanced Topics, C. Lee, F. Soong, and K. Paliwal (eds.), Kluwer Academic Publishers, pp. 185–210.

    Chapter  Google Scholar 

  24. Perrier P. et al. (eds.) Proceedings of the First ESCA Tutorial & Research Workshop on Speech Production Modeling, Autrans, France, May 24–27, 1996

    Google Scholar 

  25. Ramsay G. and Deng L. (1996)“Optimal filtering and smoothing for speech recognition using a stochastic target model,” Proc. ICSLP, pp. 1113–1116

    Google Scholar 

  26. Rathinavalu C. and Deng L. (1997) “HMM-based speech recognition using state-dependent, discriminatively derived transforms on Mel-warped DFT features,” IEEE Trans. Speech Audio Processing, pp. 243–256.

    Google Scholar 

  27. Rubin P. et al (1996) “CASY and extensions to the task-dynamic model,” Proc. 4th European Speech Production Workshop, Autrans, France, pp. 125–128.

    Google Scholar 

  28. Saltzman E. and Munhall K. (1989)“A dynamical approach to gestural patterning in speech production,” Ecological Psychology, 1, 333–382.

    Article  Google Scholar 

  29. Stevens K. (1989)“On the quantal nature of speech,” J. Phonetics, vol. 17, 1989, pp. 3–45.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Deng, L. (1999). Computational Models for Speech Production. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-60087-6_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-64250-0

  • Online ISBN: 978-3-642-60087-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics