Corrective Dependency Parsing

Hall, Keith; Novák, Václav

doi:10.1007/978-90-481-9352-3_9

Keith Hall⁴ &
Václav Novák⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 43))

559 Accesses
2 Citations

Abstract

This chapter presents a discriminative modeling technique which corrects the errors made by an automatic parser. The model is similar to reranking; however, it does not require the generation of k-best lists as in MCDonald et al. (2005), McDonald and Pereira (2006), Charniak and Johnson (2005), and Hall (2007). The corrective strategy employed by our technique is to explore a set of candidate parses which are constructed by making structurally—local perturbations to an automatically generated parse tree. We train a model which makes local, corrective decisions in order to optimize for parsing performance. The technique is independent of the parser generating the first set of parses. We show in this chapter that the only requirement for this technique is the ability to define a local neighborhood in which a large number of the errors occur.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In order to correctly capture the dependency structure, co-indexed movement traces are used in a form similar to Government and Binding theory, GPSG, etc.
2.
Exhaustive parsing assumes that the optimal parse under the model has been chosen; this is in contrast to greedy techniques, where the parse may not be optimal under the model.
3.
The imaginary root node simplifies notation.
4.
The dependency structures here are very similar to those described by Mel’čuk (1988); however the nodes of the dependency trees discussed in this chapter are limited to the words of the sentence and are always ordered according to the surface word-order.
5.
Node w _a is said to transitively govern node w _b if w _b is a descendant of w _a in the dependency tree.
6.
Bilexical dependencies are components of both the Collins and Charniak parsers and model the types of syntactic subordination that we encode in a dependency tree. (Bilexical models were also proposed by Eisner (1996)). In the absence of lexicalization, both parsers have dependency features that are encoded as head-constituent to sibling features.
7.
This information was provided by Eugene Charniak in a personal communication.
8.
A cousin is a descendant of an ancestor and not an ancestor itself, which subsumes the definition of sibling.
9.
These statistics are for the complete PDT 1.0 dataset.
10.
http://sourceforge.net/projects/mstparser
11.
The CoNLL07 shared-task data is a subset of the PDT 2.0 data.
12.
Jack-knife cross-validation is the process of splitting the data into m sets, training on \(m-1\) of these, and applying the trained model the remaining set. We do this m times, resulting in predictions for the entire training set while never using a model trained on the data for which we are making predictions.
13.
Using held-out development data, we determined a Gaussian prior parameter setting of 4 worked best. The optimal number of training iterations was chosen on held-out data for each experiment. This was generally in the order of a couple hundred iterations of L-BFGS. The MaxEnt modeling implementation can be found at http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html
14.
The MaltEval (http://w3.msi.vxu.se/jni/malteval/) tool was used for evaluation of the dependency-based parsers.

References

Attardi, G. and M. Ciaramita (2007). Tree revision learning for dependency parsing. In Proceedings of Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, NY.
Google Scholar
Berger, A.L., S.A.D. Pietra, and V.J.D. Pietra (1996). A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71.
Google Scholar
Böhmová, A., J. Hajič, E. Hajičová, and B.V. Hladká (2002). The Prague Dependency Treebank: three-level annotation scenario. In A. Abeille (Ed.), In Treebanks: Building and Using Syntactically Annotated Corpora. Dordrecht: Kluwer Academic Publishers.
Google Scholar
Brill, E. (1995, December). Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565.
Google Scholar
Caraballo, S. and E. Charniak (1998, June). New figures of merit for best-first probabilistic chart parsing. Computational Linguistics 24(2), 275–298.
Google Scholar
Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of the 2000 Conference of the North American Chapter of the Association for Computational Linguistics, ACL, New Brunswick, NJ.
Google Scholar
Charniak, E. (2001). Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France.
Google Scholar
Charniak, E. and M. Johnson (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan.
Google Scholar
Collins, M. (2000). Discriminative reranking for natural language parsing. In Proceedings of the 17th International Conference on Machine Learning 2000, Stanford, CA.
Google Scholar
Collins, M. (2003). Head-driven statistical models for natural language processing. Computational Linguistics 29(4), 589–637.
Article Google Scholar
Collins, M., L. Ramshaw, J. Hajič, and C. Tillmann (1999). A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD. pp. 505–512.
Google Scholar
Dubey, A. and F. Keller (2003). Probabilistic parsing for German using sister-head dependencies. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Sapporo, Japan pp. 96–103.
Google Scholar
Eisner, J. (1996). Three new probabilistic models for dependency parsing: an exploration. In Proceedings of the 16th International Conference on Computational Linguistics (COLING), Copenhagen, Denmark pp. 340–345.
Google Scholar
Hajič, J. (1998). Building a syntactically annotated corpus: The Prague Dependency Treebank. In Issues of Valency and Meaning. Praha: Karolinum, pp. 106–132.
Google Scholar
Hajičová, E., J. Havelka, P. Sgall, K. Veselá, and D. Zeman (2004). Issues of projectivity in the Prague Dependency Treebank. Prague Bulletin of Mathematical Linguistics 81, 5–22.
Google Scholar
Hall, K. (2007). k-best spanning tree parsing. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic.
Google Scholar
Hall, K. and V. Novák (2005). Corrective modeling for non-projective dependency parsing. In Proceedings of the 9th International Workshop on Parsing Technologies, Vancouver, BC Canada.
Google Scholar
Harrison, P., S. Abney, D. Fleckenger, C. Gdaniec, R. Grishman, D. Hindle, B. Ingria, M. Marcus, B. Santorini, and T. Strzalkowski (1991). Evaluating syntax performance of parser/grammars of english. In Proceedings of the Workshop on Evaluating Natural Language Processing Systems, ACL, Berkeley, CA.
Google Scholar
Klein, D. and C.D. Manning (2003). Factored A% search for models over sequences and trees. In Proceedings of IJCAI 2003, Acapulco, Mexico.
Google Scholar
Levy, R. and C. Manning (2004). Deep dependencies from context-free statistical parsers: correcting the surface dependency approximation. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 327–334.
Google Scholar
Manning, C.D. and H. Schütze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
Google Scholar
McDonald, R., K. Crammer, and F. Pereira (2005). Online large-margin training of dependency parsers. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI.
Google Scholar
McDonald, R., K. Lerman, and F. Pereira (2006). Multilingual dependency parsing with a two-stage discriminative parser. In Conference on Natural Language Learning, New York, NY.
Google Scholar
McDonald, R. and F. Pereira (2006). Online learning of approximate dependency parsing algorithms. In Proceedings of the Annual Meeting of the European Association for Computational Linguistics, Trento, Italy.
Google Scholar
McDonald, R., F. Pereira, K. Ribarov, and J. Hajič (2005, October). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada, pp. 523–530.
Google Scholar
Mel’čuk, I. (1988). Dependency Syntax: Theory and Practice. Albany, NY: SUNY Press.
Google Scholar
Nivre, J. (2006). Inductive Dependency Parsing, Text, Speech and Language Technology vol. 34. New York, NY: Springer.
Book Google Scholar
Nivre, J., J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret (2007). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic.
Google Scholar
Nivre, J. and J. Nilsson (2005). Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp. 99–106.
Google Scholar
Roark, B. and M. Collins (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona.
Google Scholar
Sgall, P., E. Hajičová, and J. Panevová (1986). The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Boston, MA: Kluwer Academic.
Google Scholar
Smith, N.A. and J. Eisner (2005). Contrastive estimation: Training log-linear models on unlabeled data. In Proceedings of the Association for Computational Linguistics (ACL 2005), Ann Arbor, MI.
Google Scholar
Tarjan, R. (1977). Finding optimal branchings. Networks 7, 25–35.
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by U.S. NSF grants IIS–9982329 and OISE–0530118, by the Czech Ministry of Education grant LC536 and Czech Academy of Sciences grant 1ET201120505.

Author information

Authors and Affiliations

Google Research, Zurich, Switzerland
Keith Hall
Charles University, Prague, Czech Republic
Václav Novák

Authors

Keith Hall
View author publications
You can also search for this author in PubMed Google Scholar
Václav Novák
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keith Hall .

Editor information

Editors and Affiliations

Tilburg University, Warandelaan 2, Tilburg, 5000 LE, Netherlands
Harry Bunt
Dépt. Linguistique, Université de Genève, rue de Candolle 2, Genève, 1211, Switzerland
Paola Merlo
Pimpstensvägen 16, Uppsala, 752 67, Sweden
Joakim Nivre

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hall, K., Novák, V. (2010). Corrective Dependency Parsing. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_9

Download citation

DOI: https://doi.org/10.1007/978-90-481-9352-3_9
Published: 29 September 2010
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9351-6
Online ISBN: 978-90-481-9352-3
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics