Skip to main content

Corrective Dependency Parsing

  • Chapter
  • First Online:
Trends in Parsing Technology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 43))

Abstract

This chapter presents a discriminative modeling technique which corrects the errors made by an automatic parser. The model is similar to reranking; however, it does not require the generation of k-best lists as in MCDonald et al. (2005), McDonald and Pereira (2006), Charniak and Johnson (2005), and Hall (2007). The corrective strategy employed by our technique is to explore a set of candidate parses which are constructed by making structurally—local perturbations to an automatically generated parse tree. We train a model which makes local, corrective decisions in order to optimize for parsing performance. The technique is independent of the parser generating the first set of parses. We show in this chapter that the only requirement for this technique is the ability to define a local neighborhood in which a large number of the errors occur.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In order to correctly capture the dependency structure, co-indexed movement traces are used in a form similar to Government and Binding theory, GPSG, etc.

  2. 2.

    Exhaustive parsing assumes that the optimal parse under the model has been chosen; this is in contrast to greedy techniques, where the parse may not be optimal under the model.

  3. 3.

    The imaginary root node simplifies notation.

  4. 4.

    The dependency structures here are very similar to those described by Mel’čuk (1988); however the nodes of the dependency trees discussed in this chapter are limited to the words of the sentence and are always ordered according to the surface word-order.

  5. 5.

    Node w a is said to transitively govern node w b if w b is a descendant of w a in the dependency tree.

  6. 6.

    Bilexical dependencies are components of both the Collins and Charniak parsers and model the types of syntactic subordination that we encode in a dependency tree. (Bilexical models were also proposed by Eisner (1996)). In the absence of lexicalization, both parsers have dependency features that are encoded as head-constituent to sibling features.

  7. 7.

    This information was provided by Eugene Charniak in a personal communication.

  8. 8.

    A cousin is a descendant of an ancestor and not an ancestor itself, which subsumes the definition of sibling.

  9. 9.

    These statistics are for the complete PDT 1.0 dataset.

  10. 10.

    http://sourceforge.net/projects/mstparser

  11. 11.

    The CoNLL07 shared-task data is a subset of the PDT 2.0 data.

  12. 12.

    Jack-knife cross-validation is the process of splitting the data into m sets, training on \(m-1\) of these, and applying the trained model the remaining set. We do this m times, resulting in predictions for the entire training set while never using a model trained on the data for which we are making predictions.

  13. 13.

    Using held-out development data, we determined a Gaussian prior parameter setting of 4 worked best. The optimal number of training iterations was chosen on held-out data for each experiment. This was generally in the order of a couple hundred iterations of L-BFGS. The MaxEnt modeling implementation can be found at http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html

  14. 14.

    The MaltEval (http://w3.msi.vxu.se/jni/malteval/) tool was used for evaluation of the dependency-based parsers.

References

  • Attardi, G. and M. Ciaramita (2007). Tree revision learning for dependency parsing. In Proceedings of Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, NY.

    Google Scholar 

  • Berger, A.L., S.A.D. Pietra, and V.J.D. Pietra (1996). A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71.

    Google Scholar 

  • Böhmová, A., J. Hajič, E. Hajičová, and B.V. Hladká (2002). The Prague Dependency Treebank: three-level annotation scenario. In A. Abeille (Ed.), In Treebanks: Building and Using Syntactically Annotated Corpora. Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  • Brill, E. (1995, December). Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565.

    Google Scholar 

  • Caraballo, S. and E. Charniak (1998, June). New figures of merit for best-first probabilistic chart parsing. Computational Linguistics 24(2), 275–298.

    Google Scholar 

  • Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of the 2000 Conference of the North American Chapter of the Association for Computational Linguistics, ACL, New Brunswick, NJ.

    Google Scholar 

  • Charniak, E. (2001). Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France.

    Google Scholar 

  • Charniak, E. and M. Johnson (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan.

    Google Scholar 

  • Collins, M. (2000). Discriminative reranking for natural language parsing. In Proceedings of the 17th International Conference on Machine Learning 2000, Stanford, CA.

    Google Scholar 

  • Collins, M. (2003). Head-driven statistical models for natural language processing. Computational Linguistics 29(4), 589–637.

    Article  Google Scholar 

  • Collins, M., L. Ramshaw, J. Hajič, and C. Tillmann (1999). A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD. pp. 505–512.

    Google Scholar 

  • Dubey, A. and F. Keller (2003). Probabilistic parsing for German using sister-head dependencies. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Sapporo, Japan pp. 96–103.

    Google Scholar 

  • Eisner, J. (1996). Three new probabilistic models for dependency parsing: an exploration. In Proceedings of the 16th International Conference on Computational Linguistics (COLING), Copenhagen, Denmark pp. 340–345.

    Google Scholar 

  • Hajič, J. (1998). Building a syntactically annotated corpus: The Prague Dependency Treebank. In Issues of Valency and Meaning. Praha: Karolinum, pp. 106–132.

    Google Scholar 

  • Hajičová, E., J. Havelka, P. Sgall, K. Veselá, and D. Zeman (2004). Issues of projectivity in the Prague Dependency Treebank. Prague Bulletin of Mathematical Linguistics 81, 5–22.

    Google Scholar 

  • Hall, K. (2007). k-best spanning tree parsing. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic.

    Google Scholar 

  • Hall, K. and V. Novák (2005). Corrective modeling for non-projective dependency parsing. In Proceedings of the 9th International Workshop on Parsing Technologies, Vancouver, BC Canada.

    Google Scholar 

  • Harrison, P., S. Abney, D. Fleckenger, C. Gdaniec, R. Grishman, D. Hindle, B. Ingria, M. Marcus, B. Santorini, and T. Strzalkowski (1991). Evaluating syntax performance of parser/grammars of english. In Proceedings of the Workshop on Evaluating Natural Language Processing Systems, ACL, Berkeley, CA.

    Google Scholar 

  • Klein, D. and C.D. Manning (2003). Factored A% search for models over sequences and trees. In Proceedings of IJCAI 2003, Acapulco, Mexico.

    Google Scholar 

  • Levy, R. and C. Manning (2004). Deep dependencies from context-free statistical parsers: correcting the surface dependency approximation. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 327–334.

    Google Scholar 

  • Manning, C.D. and H. Schütze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

    Google Scholar 

  • McDonald, R., K. Crammer, and F. Pereira (2005). Online large-margin training of dependency parsers. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI.

    Google Scholar 

  • McDonald, R., K. Lerman, and F. Pereira (2006). Multilingual dependency parsing with a two-stage discriminative parser. In Conference on Natural Language Learning, New York, NY.

    Google Scholar 

  • McDonald, R. and F. Pereira (2006). Online learning of approximate dependency parsing algorithms. In Proceedings of the Annual Meeting of the European Association for Computational Linguistics, Trento, Italy.

    Google Scholar 

  • McDonald, R., F. Pereira, K. Ribarov, and J. Hajič (2005, October). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada, pp. 523–530.

    Google Scholar 

  • Mel’čuk, I. (1988). Dependency Syntax: Theory and Practice. Albany, NY: SUNY Press.

    Google Scholar 

  • Nivre, J. (2006). Inductive Dependency Parsing, Text, Speech and Language Technology vol. 34. New York, NY: Springer.

    Book  Google Scholar 

  • Nivre, J., J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret (2007). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic.

    Google Scholar 

  • Nivre, J. and J. Nilsson (2005). Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp. 99–106.

    Google Scholar 

  • Roark, B. and M. Collins (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona.

    Google Scholar 

  • Sgall, P., E. Hajičová, and J. Panevová (1986). The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Boston, MA: Kluwer Academic.

    Google Scholar 

  • Smith, N.A. and J. Eisner (2005). Contrastive estimation: Training log-linear models on unlabeled data. In Proceedings of the Association for Computational Linguistics (ACL 2005), Ann Arbor, MI.

    Google Scholar 

  • Tarjan, R. (1977). Finding optimal branchings. Networks 7, 25–35.

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by U.S. NSF grants IIS–9982329 and OISE–0530118, by the Czech Ministry of Education grant LC536 and Czech Academy of Sciences grant 1ET201120505.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keith Hall .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Hall, K., Novák, V. (2010). Corrective Dependency Parsing. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_9

Download citation

Publish with us

Policies and ethics