Summary
The independence assumptions typically used to make speech recognition practical ignore the fact that different sounds in speech are highly correlated. Tree-structured dependence models make it possible to represent cross-class acoustic dependence in recognition when used in conjunction with hidden Markov or other such models. These models have Markov-like assumptions on the branches of a tree, which lead to efficient recursive algorithms for state estimation. This paper will describe general approaches to topology design and parameter estimation of tree-based models and outline more specific solutions for two examples: discrete-state hidden dependence trees and continuous-state multiscale models, drawing analogies to results for time series models. Initial results for both cases will be described, followed by a discussion of questions raised by the experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. Eide and H. Gish, “A parametric approach to vocal tract length normalization,” Proc. Inter. Conf. on Acoust., Speech and Signal Proc., vol. 1, pp. 346–348, May 1996.
C. J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation Using Maximum Likelihood Linear Regression,” Proc. ARPA Workshop on Spoken Language Technologypp. 110–115, January 1995.
G. Zavaliagkos, R. Schwartz, J. McDonough, and J. Makhoul, “Adaptation algorithms for large scale HMM recognizers,” Proc. European Conference on Speech Comm. and Tech. , vol. 2, pp. 1131–1134, September 1995.
Q. Huo and C.-H. Lee, “On-line adaptive learning of the correlated continuous density hidden Markov models for speech recognition,” Proc. Inter. Conf on Acoust., Speech and Signal Proc., vol. 2, pp. 705–708, May 1996.
A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood estimation from incomplete data,” Journal of the Royal Statistical Society (B), vol. 39, no. 1, pp. 1–38, 1977.
C.K. Chow and C.N. Liu, “Approximating discrete probability distributions with dependence trees,” IEEE Trans. Information Theory, vol. IT-14, no. 3, pp. 462–467, May 1968.
O. Ronen, J.R. Rohlicek, and M. Ostendorf, “Parameter estimation of dependence tree models using the EM algorithm,” IEEE Signal Processing Letters, vol. 2, no. 8, pp. 157–159, 1995.
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, CA, 1988.
H. Lucke, “Which stochastic models allow Baum-Welch training?” IEEE Trans. Signal Proc., vol. 44, no. 11, pp. 2746–2756, 1996.
O. Ronen, Dependence tree models of intra-utterance phone dependence, Boston University Ph.D. Thesis, 1997.
F. Kubala et al, “The hub and spoke paradigm for CSR evaluation,” Proc. of the ARPA Human Language Technology Workshop, pp. 37–42, March 1994.
J.J. Godfrey, E.C. Holliman, and J. McDaniel, “SWITCHBOARD: Telephone speech corpus for research and development,” Proc. Inter. Conf. Acoust., Speech, and Signal Proc., vol. 1, pp. 517–520, March 1992.
L. Nguyen et al. , “The 1994 BBN/BYBLOS speech recognition system, ” Proc. of the ARPA Spoken Language Systems Technology Workshoppp. 77–81, January 1995.
K. C. Chou, A. S. Willsky, and A. Benveniste, “Multiscale recursive estimation, data fusion, and regularization,” IEEE Trans. Automatic Control, vol. 39, no. 3, pp. 464–478, 1994.
M. R. Luettgen, W. C. Karl, A. S. Willsky, and R. R. Tenney, “Multiscale representations of Markov random fields,” IEEE Trans. Signal Proc., vol. 41, no. 12, pp. 3377–3396, 1993.
V. Digalakis, J.R. Rohlicek, and M. Ostendorf, “ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition,” IEEE Trans. Speech and Audio Proc., vol. 1, no. 4, pp. 431–442, 1993.
M. R. Luettgen and A. S. Willsky, “Likelihood calculation for a class of multiscale stochastic models, with application to texture discrimination,” IEEE Trans. Image Proc., vol. 4, no. 2, pp. 194–207, 1995.
A. Kannan and M. Ostendorf, “Modeling Dependency in Adaptation of Acoustic Models using Multiscale Tree Processes,” Proc. Eurospeech, vol. 4, pp. 1863–1866, 1997.
A. Kannan and M. Ostendorf, “Adaptation of polynomial trajectory segment models for large vocabulary speech recognition,” Proc. Inter. Conf. Acoust., Speech and Signal Proc., vol. 2, pp. 1411–1414, April 1997.
D. Paul, “Extensions to phone-state decision-tree clustering single tree and tagged clustering,” Proc. Inter. Conf. Acoust., Speech and Signal Proc., vol. 2, pp. 1487–1490, April 1997.
B. M. Shahshahani, “A Markov random field approach to Bayesian speaker adaptation,” IEEE Trans. Speech and Audio Proc., vol. 5, no. 2, pp. 183–191, 1997.
A. Kannan, M. Ostendorf and J. R. Rohlicek, “Maximum likelihood clustering of Gaussians for speech recognition,” IEEE Trans. Speech and Audio Proc., vol. 2, no. 3, pp. 453–455, 1994.
S. J. Young, J. J. Odell and P. C. Woodland, “Tree-based state tying for high accuracy acoustic modeling,” Proc. ARPA Workshop on Human Language Technology, pp. 307–312, March 1994.
A. Kannan, M. Ostendorf, D. A. Castañon, and W. C. Karl, “ML parameter estimation of a multiscale tree process using the EM algorithm,” Technical Report ECE-96-009, Boston University, November 1996. Available from ftp://raven.bu.edu/pub/reports.
M. R. Luettgen and A. S. Willsky, “Multiscale smoothing error models,” IEEE Trans. Automatic Control, vol. 40, no. 1, pp. 173–175, 1995.
G. Zavaliagkos, personal communication.
T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, “A compact model for speaker-adaptive training,” Proc. of the Inter. Conf. on Spoken Language Processing, vol. 2, pp. 1137–1140, October 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ostendorf, M., Kannan, A., Ronen, O. (1999). Tree-based Dependence Models for Speech Recognition. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-60087-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-64250-0
Online ISBN: 978-3-642-60087-6
eBook Packages: Springer Book Archive