Abstract
As previously introduced, the Structured Language Model (SLM) operated with the help of a stack from which less probable sub-parse entries were purged before further words were generated. In this article we generalize the CKY algorithm to obtain a chart which allows the direct computation of language model probabilities thus rendering the stacks unnecessary. An analysis of the behavior of the SLM leads to a generalization of the Inside–Outside algorithm and thus to rigorous EM type re-estimation of the SLM parameters. The derived algorithms are computationally expensive but their demands can be mitigated by use of appropriate thresholding.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. Chelba and F. Jelinek, “Structured Language Modeling,” Computer Speech and Language, Vol. 14, No. 4, October 2000.
C. Chelba and F. Jelinek, “Exploiting Syntactic Structure for Language Modeling,” Proceedings of COL1NG-ACL, Vol. 1, pp. 225–231, Montreal, Canada, August 10-14, 1998.
M. Marcus and B. Santorini, “Building a Large Annotated Corpus of English: the Penn Treebank,” Computational Linguistics, Vol. 19, No. 2, pp. 313–330, June 1993.
J. Cocke, unpublished notes.
T. Kasami, “An efficient recognition and syntax algorithm for context-free languages,” Scientific Report AFCRL-65-758, Air Force Cambridge Research Lab., Bedford MA, 1965.
D.H. Younger, “Recognition and Parsing of Context Free Languages in Time N3,” Information and Control, Vol. 10, pp. 198–208, 1967.
J.K. Baker, “Trainable Grammars for Speech Recognition,” Proceedings of the Spring Conference of the Acoustical Society of America, pp. 547–550, Boston MA, 1979.
A. Ratnaparkhi, “A Linear Observed Time Statistical Parser Based on Maximum Entropy Models,” Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 1–10, Providence, RI, 1997.
E. Charniak, “Treebank Grammars,” Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 1031–1036, Menlo Park, CA, 1996.
M.J. Collins, “A New Statistical Parser Based on Bigram Lexical Dependencies,” Proceedings of the 34th Annual Meeting of the Associations for Computational Linguistics, pp. 184–191, Santa Cruz, CA, 1996.
C. Chelba, “A Structured Language Model,” Proceedings of ACL/EACL’97 Student Session, pp. 498–500, Madrid, Spain, 1997.
C. Chelba and F. Jelinek, “Refinement of a Structured Language Model,” Proceedings of ICAPR-98, pp. 225–231, Plymouth, England, 1998
C. Chelba and F. Jelinek, “Structured Language Modeling for Speech Recognition,” Proceedings of NLDB99, Klagenfurt, Austria, 1999
C. Chelba and F. Jelinek, “Recognition Performance of a Structured Language Model,” Proceedings of Euro speech’99, Vol. 4, pp. 1567–1570, Budapest, Hungary, 1999.
F. Jelinek and C. Chelba, “Putting Language into Language Modeling,” Proceedings of Eurospeech’99, Vol. 1, pp. KN–1–6, Budapest, Hungary, 1999.
C. Chelba and P. Xu, “Richer Syntactic Dependencies for Structured Language Modeling,” Proceedings of the Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglio, Italy, 2001.
P. Xu, C. Chelba, and F. Jelinek, “A Study on Richer Syntactic Dependencies for Structured Language Modeling,” Proceedings of ACL’ 02, pp. 191–198, Philadelphia, 2002.
D.H. Van Uystel, D. Van Compernolle, and P. Wambacq, “NaximumLikelihood Training of the PLCG-Based Language Model,” Proceedings of the Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglio, Italy, 2001.
D.H. Van Uystel, F. Van Aelten, and D. Van Compernolle, “A Structured Language Model Based on Context-Sensitive Probabilistic Left-Corner Parsing,” Proceedings of 2nd Meeting of the North American Chapter of the ACL, pp. 223–230, Pittsburgh, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science+Business Media New York
About this paper
Cite this paper
Jelinek, F. (2004). Stochastic Analysis of Structured Language Modeling. In: Johnson, M., Khudanpur, S.P., Ostendorf, M., Rosenfeld, R. (eds) Mathematical Foundations of Speech and Language Processing. The IMA Volumes in Mathematics and its Applications, vol 138. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9017-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-4419-9017-4_3
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-6484-2
Online ISBN: 978-1-4419-9017-4
eBook Packages: Springer Book Archive