Direct Parsing of Text

Jelinek, F.

doi:10.1007/978-1-4612-4056-3_4

F. Jelinek⁴

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 80))

197 Accesses

Abstract

Traditionally, parsing of text is based on an explicit grammar and an associated parsing procedure. Examples of grammars are Context Free, Context Sensitive, Transformational, etc. The grammars are specified in a generative mode, A parsing procedure is then designed for the grammar in question (e.g. LR parsing, CYK parsing, Early parsing, etc) and is supposed to reverse the process: given text, find the particular generative sequence whose result was the text.

Parsed text is useful in text understanding or in language translation. In most cases it consists of a tree with labeled nodes and individual words at the leaves of the tree. Understanding systems attempt to derive meaning from operations on the structure of the tree. Machine translators frequently accomplish their task by transforming the tree of the source language into a tree of the target language. There are two major problems with the traditional procedure: a grammar has to be designed, usually by hand, and corresponding text analysis yields highly ambiguous parses. For some time now, attempts have been made to extract the grammar automatically from data, attach probabilities to its productions, and resolve the parsing ambiguity by selecting the most probable parse. The grammar extraction process has been based on TREEBANKS which are data bases consisting of large amounts of parsed text.

Cooperating researchers at IBM and the University of Pennsylvania have recently realized that since one is interested in parsing and not in generation, one might as well develop parsers directly, without recourse to the painful process of grammar development. Two separate and promising approaches have emerged, one statistical, one rule-based. This talk will describe both, and point out their differences and affinities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, Maryland, 21218, USA
F. Jelinek

Authors

F. Jelinek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bell Laboratories, Room 2D-446, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Stephen E. Levinson
Bell Laboratories, Room 2C-374, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Larry Shepp

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jelinek, F. (1996). Direct Parsing of Text. In: Levinson, S.E., Shepp, L. (eds) Image Models (and their Speech Model Cousins). The IMA Volumes in Mathematics and its Applications, vol 80. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-4056-3_4

Download citation

DOI: https://doi.org/10.1007/978-1-4612-4056-3_4
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-8482-6
Online ISBN: 978-1-4612-4056-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics