Skip to main content

Accurate Unlexicalized Parsing for Modern Hebrew

  • Conference paper
  • 1743 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Abstract

Many state-of-the-art statistical parsers for English can be viewed as Probabilistic Context-Free Grammars (PCFGs) acquired from treebanks consisting of phrase-structure trees enriched with a variety of contextual, derivational (e.g., markovization) and lexical information. In this paper we empirically investigate the applicability and adequacy of the unlexicalized variety of such parsing models to Modern Hebrew, a Semitic language that differs in structure and characteristics from English. We show that contrary to experience with parsing the WSJ, the markovized, head-driven unlexicalized variety does not necessarily outperform plain PCFGs for Semitic languages. We demonstrate that enriching unlexicalized PCFGs with morphologically marked agreement features percolated up the parse tree (e.g., definiteness) outperforms plain PCFGs as well as a simple head-driven variation on the MH treebank. We further show that an (unlexicalized) head-driven variety enriched with the same features achieves even better performance. We conclude that morphologically rich languages introduce an additional dimension of parametrization that is orthogonal to the horizontal/vertical dimensions proposed before [1] and its contribution is essential and complementary.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Dignum, F.P.M. (ed.) ACL 2003. LNCS (LNAI), vol. 2922, pp. 423–430. Springer, Heidelberg (2004)

    Google Scholar 

  2. Sima’an, K., Itai, A., Winter, Y., Altman, A., Nativ, N.: Building a Tree-Bank of Modern Hebrew Text. In: Traitement Automatique des Langues (2001)

    Google Scholar 

  3. Tsarfaty, R.: Integrated Morphological and Syntactic Disambiguation for Modern Hebrew. In: Proceeding of SRW COLING-ACL (2006)

    Google Scholar 

  4. Bikel, D.: Intricacies of Collins’ Parsing Model. Computational Linguistics 30(4) (2004)

    Google Scholar 

  5. Charniak, E.: Tree-Bank Grammars. In: AAAI/IAAI, vol. 2, pp. 1031–1036 (1996)

    Google Scholar 

  6. Johnson, M.: PCFG Models of Linguistic Tree Representations. Computational Linguistics 24(4), 613–632 (1998)

    Google Scholar 

  7. Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Computational Linguistics (2003)

    Google Scholar 

  8. Dubey, A., Keller, F.: Probabilistic Parsing for German using Sister-Head Dependencies. In: Dignum, F.P.M. (ed.) ACL 2003. LNCS (LNAI), vol. 2922, Springer, Heidelberg (2004)

    Google Scholar 

  9. Collins, M., Hajic, J., Ramshaw, L., Tillmann, C.: A Statistical Parser for Czech. In: Proceedings of ACL, College Park, Maryland (1999)

    Google Scholar 

  10. Bikel, D., Chiang, D.: Two Statistical Parsing Models Applied to the Chinese Treebank. In: Second Chinese Language Processing Workshop, Hong Kong (2000)

    Google Scholar 

  11. Wintner, S.: Definiteness in the Hebrew Noun Phrase. Journal of Linguistics 36, 319–363 (2000)

    Article  Google Scholar 

  12. Goldberg, Y., Adler, M., Elhadad, M.: Noun Phrase Chunking in Hebrew: Influence of Lexical and Morphological Features. In: Proceedings of COLING-ACL (2006)

    Google Scholar 

  13. Danon, G.: Syntactic Definiteness in the Grammar of Modern Hebrew. Linguistics 39(6), 1071–1116 (2001)

    Article  Google Scholar 

  14. Marcus, M., Kim, G., Marcinkiewicz, M., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating Predicate-Argument Structure (1994)

    Google Scholar 

  15. Milea, A.: Treebank Annotation Guide. MILA, Knowledge Center for Hebrew Processing (2007)

    Google Scholar 

  16. Hageloh, F.: Parsing using Transforms over Treebanks. Master’s thesis, University of Amsterdam (2007)

    Google Scholar 

  17. Schmid, H.: Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors. In: Proceedings of ACL (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tsarfaty, R., Sima’an, K. (2007). Accurate Unlexicalized Parsing for Modern Hebrew. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74628-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74627-0

  • Online ISBN: 978-3-540-74628-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics