Stochastic Analysis of Structured Language Modeling

Jelinek, Frederick

doi:10.1007/978-1-4419-9017-4_3

Frederick Jelinek⁶

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 138))

700 Accesses
4 Citations

Abstract

As previously introduced, the Structured Language Model (SLM) operated with the help of a stack from which less probable sub-parse entries were purged before further words were generated. In this article we generalize the CKY algorithm to obtain a chart which allows the direct computation of language model probabilities thus rendering the stacks unnecessary. An analysis of the behavior of the SLM leads to a generalization of the Inside–Outside algorithm and thus to rigorous EM type re-estimation of the SLM parameters. The derived algorithms are computationally expensive but their demands can be mitigated by use of appropriate thresholding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. Chelba and F. Jelinek, “Structured Language Modeling,” Computer Speech and Language, Vol. 14, No. 4, October 2000.
Google Scholar
C. Chelba and F. Jelinek, “Exploiting Syntactic Structure for Language Modeling,” Proceedings of COL1NG-ACL, Vol. 1, pp. 225–231, Montreal, Canada, August 10-14, 1998.
Google Scholar
M. Marcus and B. Santorini, “Building a Large Annotated Corpus of English: the Penn Treebank,” Computational Linguistics, Vol. 19, No. 2, pp. 313–330, June 1993.
Google Scholar
J. Cocke, unpublished notes.
Google Scholar
T. Kasami, “An efficient recognition and syntax algorithm for context-free languages,” Scientific Report AFCRL-65-758, Air Force Cambridge Research Lab., Bedford MA, 1965.
Google Scholar
D.H. Younger, “Recognition and Parsing of Context Free Languages in Time N³,” Information and Control, Vol. 10, pp. 198–208, 1967.
Article Google Scholar
J.K. Baker, “Trainable Grammars for Speech Recognition,” Proceedings of the Spring Conference of the Acoustical Society of America, pp. 547–550, Boston MA, 1979.
Google Scholar
A. Ratnaparkhi, “A Linear Observed Time Statistical Parser Based on Maximum Entropy Models,” Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 1–10, Providence, RI, 1997.
Google Scholar
E. Charniak, “Treebank Grammars,” Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 1031–1036, Menlo Park, CA, 1996.
Google Scholar
M.J. Collins, “A New Statistical Parser Based on Bigram Lexical Dependencies,” Proceedings of the 34th Annual Meeting of the Associations for Computational Linguistics, pp. 184–191, Santa Cruz, CA, 1996.
Chapter Google Scholar
C. Chelba, “A Structured Language Model,” Proceedings of ACL/EACL’97 Student Session, pp. 498–500, Madrid, Spain, 1997.
Google Scholar
C. Chelba and F. Jelinek, “Refinement of a Structured Language Model,” Proceedings of ICAPR-98, pp. 225–231, Plymouth, England, 1998
Google Scholar
C. Chelba and F. Jelinek, “Structured Language Modeling for Speech Recognition,” Proceedings of NLDB99, Klagenfurt, Austria, 1999
Google Scholar
C. Chelba and F. Jelinek, “Recognition Performance of a Structured Language Model,” Proceedings of Euro speech’99, Vol. 4, pp. 1567–1570, Budapest, Hungary, 1999.
Google Scholar
F. Jelinek and C. Chelba, “Putting Language into Language Modeling,” Proceedings of Eurospeech’99, Vol. 1, pp. KN–1–6, Budapest, Hungary, 1999.
Google Scholar
C. Chelba and P. Xu, “Richer Syntactic Dependencies for Structured Language Modeling,” Proceedings of the Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglio, Italy, 2001.
Google Scholar
P. Xu, C. Chelba, and F. Jelinek, “A Study on Richer Syntactic Dependencies for Structured Language Modeling,” Proceedings of ACL’ 02, pp. 191–198, Philadelphia, 2002.
Google Scholar
D.H. Van Uystel, D. Van Compernolle, and P. Wambacq, “NaximumLikelihood Training of the PLCG-Based Language Model,” Proceedings of the Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglio, Italy, 2001.
Google Scholar
D.H. Van Uystel, F. Van Aelten, and D. Van Compernolle, “A Structured Language Model Based on Context-Sensitive Probabilistic Left-Corner Parsing,” Proceedings of 2nd Meeting of the North American Chapter of the ACL, pp. 223–230, Pittsburgh, 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Language and Speech Processing, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD, 21218, USA
Frederick Jelinek

Authors

Frederick Jelinek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Cognitive and Linguistic Studies, Brown University, Providence, RI, 02912, USA
Mark Johnson
Dept. of ECE and Dept. of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
Sanjeev P. Khudanpur
Dept. of Electrical Engineering, University of Washington, Seattle, WA, 98195, USA
Mari Ostendorf
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Roni Rosenfeld

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jelinek, F. (2004). Stochastic Analysis of Structured Language Modeling. In: Johnson, M., Khudanpur, S.P., Ostendorf, M., Rosenfeld, R. (eds) Mathematical Foundations of Speech and Language Processing. The IMA Volumes in Mathematics and its Applications, vol 138. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9017-4_3

Download citation

DOI: https://doi.org/10.1007/978-1-4419-9017-4_3
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-6484-2
Online ISBN: 978-1-4419-9017-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics