Skip to main content

Tree-based Dependence Models for Speech Recognition

  • Chapter
Computational Models of Speech Pattern Processing

Part of the book series: NATO ASI Series ((NATO ASI F,volume 169))

  • 229 Accesses

Summary

The independence assumptions typically used to make speech recognition practical ignore the fact that different sounds in speech are highly correlated. Tree-structured dependence models make it possible to represent cross-class acoustic dependence in recognition when used in conjunction with hidden Markov or other such models. These models have Markov-like assumptions on the branches of a tree, which lead to efficient recursive algorithms for state estimation. This paper will describe general approaches to topology design and parameter estimation of tree-based models and outline more specific solutions for two examples: discrete-state hidden dependence trees and continuous-state multiscale models, drawing analogies to results for time series models. Initial results for both cases will be described, followed by a discussion of questions raised by the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Eide and H. Gish, “A parametric approach to vocal tract length normalization,” Proc. Inter. Conf. on Acoust., Speech and Signal Proc., vol. 1, pp. 346–348, May 1996.

    Google Scholar 

  2. C. J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation Using Maximum Likelihood Linear Regression,” Proc. ARPA Workshop on Spoken Language Technologypp. 110–115, January 1995.

    Google Scholar 

  3. G. Zavaliagkos, R. Schwartz, J. McDonough, and J. Makhoul, “Adaptation algorithms for large scale HMM recognizers,” Proc. European Conference on Speech Comm. and Tech. , vol. 2, pp. 1131–1134, September 1995.

    Google Scholar 

  4. Q. Huo and C.-H. Lee, “On-line adaptive learning of the correlated continuous density hidden Markov models for speech recognition,” Proc. Inter. Conf on Acoust., Speech and Signal Proc., vol. 2, pp. 705–708, May 1996.

    Google Scholar 

  5. A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood estimation from incomplete data,” Journal of the Royal Statistical Society (B), vol. 39, no. 1, pp. 1–38, 1977.

    MathSciNet  MATH  Google Scholar 

  6. C.K. Chow and C.N. Liu, “Approximating discrete probability distributions with dependence trees,” IEEE Trans. Information Theory, vol. IT-14, no. 3, pp. 462–467, May 1968.

    Article  MathSciNet  Google Scholar 

  7. O. Ronen, J.R. Rohlicek, and M. Ostendorf, “Parameter estimation of dependence tree models using the EM algorithm,” IEEE Signal Processing Letters, vol. 2, no. 8, pp. 157–159, 1995.

    Article  Google Scholar 

  8. J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, CA, 1988.

    Google Scholar 

  9. H. Lucke, “Which stochastic models allow Baum-Welch training?” IEEE Trans. Signal Proc., vol. 44, no. 11, pp. 2746–2756, 1996.

    Article  Google Scholar 

  10. O. Ronen, Dependence tree models of intra-utterance phone dependence, Boston University Ph.D. Thesis, 1997.

    Google Scholar 

  11. F. Kubala et al, “The hub and spoke paradigm for CSR evaluation,” Proc. of the ARPA Human Language Technology Workshop, pp. 37–42, March 1994.

    Google Scholar 

  12. J.J. Godfrey, E.C. Holliman, and J. McDaniel, “SWITCHBOARD: Telephone speech corpus for research and development,” Proc. Inter. Conf. Acoust., Speech, and Signal Proc., vol. 1, pp. 517–520, March 1992.

    Google Scholar 

  13. L. Nguyen et al. , “The 1994 BBN/BYBLOS speech recognition system, ” Proc. of the ARPA Spoken Language Systems Technology Workshoppp. 77–81, January 1995.

    Google Scholar 

  14. K. C. Chou, A. S. Willsky, and A. Benveniste, “Multiscale recursive estimation, data fusion, and regularization,” IEEE Trans. Automatic Control, vol. 39, no. 3, pp. 464–478, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  15. M. R. Luettgen, W. C. Karl, A. S. Willsky, and R. R. Tenney, “Multiscale representations of Markov random fields,” IEEE Trans. Signal Proc., vol. 41, no. 12, pp. 3377–3396, 1993.

    Article  MATH  Google Scholar 

  16. V. Digalakis, J.R. Rohlicek, and M. Ostendorf, “ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition,” IEEE Trans. Speech and Audio Proc., vol. 1, no. 4, pp. 431–442, 1993.

    Article  Google Scholar 

  17. M. R. Luettgen and A. S. Willsky, “Likelihood calculation for a class of multiscale stochastic models, with application to texture discrimination,” IEEE Trans. Image Proc., vol. 4, no. 2, pp. 194–207, 1995.

    Article  Google Scholar 

  18. A. Kannan and M. Ostendorf, “Modeling Dependency in Adaptation of Acoustic Models using Multiscale Tree Processes,” Proc. Eurospeech, vol. 4, pp. 1863–1866, 1997.

    Google Scholar 

  19. A. Kannan and M. Ostendorf, “Adaptation of polynomial trajectory segment models for large vocabulary speech recognition,” Proc. Inter. Conf. Acoust., Speech and Signal Proc., vol. 2, pp. 1411–1414, April 1997.

    Google Scholar 

  20. D. Paul, “Extensions to phone-state decision-tree clustering single tree and tagged clustering,” Proc. Inter. Conf. Acoust., Speech and Signal Proc., vol. 2, pp. 1487–1490, April 1997.

    Google Scholar 

  21. B. M. Shahshahani, “A Markov random field approach to Bayesian speaker adaptation,” IEEE Trans. Speech and Audio Proc., vol. 5, no. 2, pp. 183–191, 1997.

    Article  MathSciNet  Google Scholar 

  22. A. Kannan, M. Ostendorf and J. R. Rohlicek, “Maximum likelihood clustering of Gaussians for speech recognition,” IEEE Trans. Speech and Audio Proc., vol. 2, no. 3, pp. 453–455, 1994.

    Article  Google Scholar 

  23. S. J. Young, J. J. Odell and P. C. Woodland, “Tree-based state tying for high accuracy acoustic modeling,” Proc. ARPA Workshop on Human Language Technology, pp. 307–312, March 1994.

    Google Scholar 

  24. A. Kannan, M. Ostendorf, D. A. Castañon, and W. C. Karl, “ML parameter estimation of a multiscale tree process using the EM algorithm,” Technical Report ECE-96-009, Boston University, November 1996. Available from ftp://raven.bu.edu/pub/reports.

    Google Scholar 

  25. M. R. Luettgen and A. S. Willsky, “Multiscale smoothing error models,” IEEE Trans. Automatic Control, vol. 40, no. 1, pp. 173–175, 1995.

    Article  MathSciNet  MATH  Google Scholar 

  26. G. Zavaliagkos, personal communication.

    Google Scholar 

  27. T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, “A compact model for speaker-adaptive training,” Proc. of the Inter. Conf. on Spoken Language Processing, vol. 2, pp. 1137–1140, October 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ostendorf, M., Kannan, A., Ronen, O. (1999). Tree-based Dependence Models for Speech Recognition. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-60087-6_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-64250-0

  • Online ISBN: 978-3-642-60087-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics