Tree-based Dependence Models for Speech Recognition

Ostendorf, Mari; Kannan, Ashvin; Ronen, Orith

doi:10.1007/978-3-642-60087-6_4

Mari Ostendorf²,
Ashvin Kannan² &
Orith Ronen²

Part of the book series: NATO ASI Series ((NATO ASI F,volume 169))

229 Accesses

Summary

The independence assumptions typically used to make speech recognition practical ignore the fact that different sounds in speech are highly correlated. Tree-structured dependence models make it possible to represent cross-class acoustic dependence in recognition when used in conjunction with hidden Markov or other such models. These models have Markov-like assumptions on the branches of a tree, which lead to efficient recursive algorithms for state estimation. This paper will describe general approaches to topology design and parameter estimation of tree-based models and outline more specific solutions for two examples: discrete-state hidden dependence trees and continuous-state multiscale models, drawing analogies to results for time series models. Initial results for both cases will be described, followed by a discussion of questions raised by the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

E. Eide and H. Gish, “A parametric approach to vocal tract length normalization,” Proc. Inter. Conf. on Acoust., Speech and Signal Proc., vol. 1, pp. 346–348, May 1996.
Google Scholar
C. J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation Using Maximum Likelihood Linear Regression,” Proc. ARPA Workshop on Spoken Language Technologypp. 110–115, January 1995.
Google Scholar
G. Zavaliagkos, R. Schwartz, J. McDonough, and J. Makhoul, “Adaptation algorithms for large scale HMM recognizers,” Proc. European Conference on Speech Comm. and Tech. , vol. 2, pp. 1131–1134, September 1995.
Google Scholar
Q. Huo and C.-H. Lee, “On-line adaptive learning of the correlated continuous density hidden Markov models for speech recognition,” Proc. Inter. Conf on Acoust., Speech and Signal Proc., vol. 2, pp. 705–708, May 1996.
Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood estimation from incomplete data,” Journal of the Royal Statistical Society (B), vol. 39, no. 1, pp. 1–38, 1977.
MathSciNet MATH Google Scholar
C.K. Chow and C.N. Liu, “Approximating discrete probability distributions with dependence trees,” IEEE Trans. Information Theory, vol. IT-14, no. 3, pp. 462–467, May 1968.
Article MathSciNet Google Scholar
O. Ronen, J.R. Rohlicek, and M. Ostendorf, “Parameter estimation of dependence tree models using the EM algorithm,” IEEE Signal Processing Letters, vol. 2, no. 8, pp. 157–159, 1995.
Article Google Scholar
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, CA, 1988.
Google Scholar
H. Lucke, “Which stochastic models allow Baum-Welch training?” IEEE Trans. Signal Proc., vol. 44, no. 11, pp. 2746–2756, 1996.
Article Google Scholar
O. Ronen, Dependence tree models of intra-utterance phone dependence, Boston University Ph.D. Thesis, 1997.
Google Scholar
F. Kubala et al, “The hub and spoke paradigm for CSR evaluation,” Proc. of the ARPA Human Language Technology Workshop, pp. 37–42, March 1994.
Google Scholar
J.J. Godfrey, E.C. Holliman, and J. McDaniel, “SWITCHBOARD: Telephone speech corpus for research and development,” Proc. Inter. Conf. Acoust., Speech, and Signal Proc., vol. 1, pp. 517–520, March 1992.
Google Scholar
L. Nguyen et al. , “The 1994 BBN/BYBLOS speech recognition system, ” Proc. of the ARPA Spoken Language Systems Technology Workshoppp. 77–81, January 1995.
Google Scholar
K. C. Chou, A. S. Willsky, and A. Benveniste, “Multiscale recursive estimation, data fusion, and regularization,” IEEE Trans. Automatic Control, vol. 39, no. 3, pp. 464–478, 1994.
Article MathSciNet MATH Google Scholar
M. R. Luettgen, W. C. Karl, A. S. Willsky, and R. R. Tenney, “Multiscale representations of Markov random fields,” IEEE Trans. Signal Proc., vol. 41, no. 12, pp. 3377–3396, 1993.
Article MATH Google Scholar
V. Digalakis, J.R. Rohlicek, and M. Ostendorf, “ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition,” IEEE Trans. Speech and Audio Proc., vol. 1, no. 4, pp. 431–442, 1993.
Article Google Scholar
M. R. Luettgen and A. S. Willsky, “Likelihood calculation for a class of multiscale stochastic models, with application to texture discrimination,” IEEE Trans. Image Proc., vol. 4, no. 2, pp. 194–207, 1995.
Article Google Scholar
A. Kannan and M. Ostendorf, “Modeling Dependency in Adaptation of Acoustic Models using Multiscale Tree Processes,” Proc. Eurospeech, vol. 4, pp. 1863–1866, 1997.
Google Scholar
A. Kannan and M. Ostendorf, “Adaptation of polynomial trajectory segment models for large vocabulary speech recognition,” Proc. Inter. Conf. Acoust., Speech and Signal Proc., vol. 2, pp. 1411–1414, April 1997.
Google Scholar
D. Paul, “Extensions to phone-state decision-tree clustering single tree and tagged clustering,” Proc. Inter. Conf. Acoust., Speech and Signal Proc., vol. 2, pp. 1487–1490, April 1997.
Google Scholar
B. M. Shahshahani, “A Markov random field approach to Bayesian speaker adaptation,” IEEE Trans. Speech and Audio Proc., vol. 5, no. 2, pp. 183–191, 1997.
Article MathSciNet Google Scholar
A. Kannan, M. Ostendorf and J. R. Rohlicek, “Maximum likelihood clustering of Gaussians for speech recognition,” IEEE Trans. Speech and Audio Proc., vol. 2, no. 3, pp. 453–455, 1994.
Article Google Scholar
S. J. Young, J. J. Odell and P. C. Woodland, “Tree-based state tying for high accuracy acoustic modeling,” Proc. ARPA Workshop on Human Language Technology, pp. 307–312, March 1994.
Google Scholar
A. Kannan, M. Ostendorf, D. A. Castañon, and W. C. Karl, “ML parameter estimation of a multiscale tree process using the EM algorithm,” Technical Report ECE-96-009, Boston University, November 1996. Available from ftp://raven.bu.edu/pub/reports.
Google Scholar
M. R. Luettgen and A. S. Willsky, “Multiscale smoothing error models,” IEEE Trans. Automatic Control, vol. 40, no. 1, pp. 173–175, 1995.
Article MathSciNet MATH Google Scholar
G. Zavaliagkos, personal communication.
Google Scholar
T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, “A compact model for speaker-adaptive training,” Proc. of the Inter. Conf. on Spoken Language Processing, vol. 2, pp. 1137–1140, October 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, Boston University, 8 St. Mary’s St., Boston, 02215, MA, USA
Mari Ostendorf, Ashvin Kannan & Orith Ronen

Authors

Mari Ostendorf
View author publications
You can also search for this author in PubMed Google Scholar
Ashvin Kannan
View author publications
You can also search for this author in PubMed Google Scholar
Orith Ronen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech Research Unit, DERA Malvern, St. Andrew’s Road, WR14 4DT, Great Malvern, Worcs, UK
Keith Ponting

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ostendorf, M., Kannan, A., Ronen, O. (1999). Tree-based Dependence Models for Speech Recognition. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-60087-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-64250-0
Online ISBN: 978-3-642-60087-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics