Delexicalized and Minimally Supervised Parsing on Universal Dependencies

Mareček, David

doi:10.1007/978-3-319-45925-7_3

David Mareček¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9918))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

493 Accesses

Abstract

In this paper, we compare delexicalized transfer and minimally supervised parsing techniques on 32 different languages from Universal Dependencies treebank collection. The minimal supervision is in adding handcrafted universal grammatical rules for POS tags. The rules are incorporated into the unsupervised dependency parser in forms of external prior probabilities. We also experiment with learning this probabilities from other treebanks. The average attachment score of our parser is slightly lower then the delexicalized transfer parser, however, it performs better for languages from less resourced language families (non-Indo-European) and is therefore suitable for those, for which the treebanks often do not exist.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the fully unsupervised setting, we cannot for example simply push verbs to the roots and nouns to become their dependents. This is already a kind of supervision.
2.
universaldependencices.org.
3.
We exclude ‘Ancient Greek-PROIEL’, ‘Finnish-FTB’, ‘Japan-KTC’, ‘Latin-ITT’, and ‘Latin-PROIEL’ treebanks.
4.
Malt parser in the current version 1.8.1 (http://maltparser.org).
5.
http://ufal.mff.cuni.cz/udp.
6.
We had to change the original parser code to do this.
7.
Note that for example \(p^{ext}_{attach}(PUNC|VERB,dir) = 1\) does not mean that all the dependents of VERB must be PUNC. Since the \(\lambda _{attach}\) is less than one, the value 1 only pushes punctuation to be attached below verbs.
8.
The results of different parameter settings for both parser varied only little (at most 2 % difference for all the languages).
9.
We used the Malt parser with its default feature set. Tuning in this specific delexicalized task would probably bring a bit better results.
10.
Danish is the only exception.

References

Blunsom, P., Cohn, T.: Unsupervised induction of tree substitution grammars for dependency parsing. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1204–1213. EMNLP 2010. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. CoNLL-X 2006. Association for Computational Linguistics, Stroudsburg (2006)
Google Scholar
Cerisara, C., Lorenzo, A., Kral, P.: Weakly supervised parsing with rules. In: Interspeech 2013, Lyon, France, pp. 2192–2196 (2013)
Google Scholar
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov chain Monte Carlo in practice. Interdisciplinary Statistics. Chapman & Hall, London (1996)
MATH Google Scholar
Hajič, J., Hajičová, E., Panevová, J., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M.: Prague Dependency Treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia (2006)
Google Scholar
Headden III, W.P., Johnson, M., McClosky, D.: Improving unsupervised dependency parsing with richer contexts and smoothing. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. NAACL 2009, pp. 101–109. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)
Google Scholar
Marecek, D.: Multilingual unsupervised dependency parsing with unsupervised POS tags. In: Sidorov, G., et al. (eds.) MICAI 2015. LNCS, vol. 9413, pp. 72–82. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27060-9_6
Chapter Google Scholar
Mareček, D., Straka, M.: Stop-probability estimates computed on a large corpus improve unsupervised dependency parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 281–290. Association for Computational Linguistics, Sofia, August 2013
Google Scholar
de Marneffe, M.C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., Manning, C.D.: Universal stanford dependencies: a cross-linguistic typology. In: Proceedings of the 9th Conference on Language Resources and Evaluation (LREC) (2014)
Google Scholar
de Marneffe, M.C., Manning, C.D.: The stanford typed dependencies representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation. CrossParser 2008, pp. 1–8. Association for Computational Linguistics, Stroudsburg (2008)
Google Scholar
Mcdonald, R., Nivre, J., Quirmbach-brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Tckstrm, O., Bedini, C., Bertomeu, N., Lee, C.J.: Universal dependency annotation for multilingual parsing. In: Proceedings of ACL 2013 (2013)
Google Scholar
McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of Human Langauge Technology Conference and Conference on Empirical Methods in Natural Language Processing (HTL/EMNLP), Vancouver, BC, Canada, pp. 523–530 (2005)
Google Scholar
McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP 2011, pp. 62–72. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
Naseem, T., Chen, H., Barzilay, R., Johnson, M.: Using universal linguistic knowledge to guide grammar induction. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. EMNLP 2010, pp. 1234–1244. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Nivre, J.: Non-projective dependency parsing in expected linear time. In: Su, K.Y., Su, J., Wiebe, J. (eds.) ACL/IJCNLP, pp. 351–359. The Association for Computer Linguistics, Stroudsburg (2009)
Google Scholar
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932. Association for Computational Linguistics, Prague, June 2007
Google Scholar
Nivre, J., de Marneffe, M.C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D.: Universal dependencies v1: a multilingual treebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association, Portorož (2016)
Google Scholar
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, May 2012
Google Scholar
Rosa, R.: Multi-source cross-lingual delexicalized parser transfer: Prague or Stanford? In: Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pp. 281–290. Uppsala University, Uppsala (2015)
Google Scholar
Rosa, R., Mašek, J., Mareček, D., Popel, M., Zeman, D., Žabokrtský, Z.: HamleDT 2.0: thirty dependency treebanks stanfordized. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26–31, 2014, pp. 2334–2341 (2014)
Google Scholar
Spitkovsky, V.I., Alshawi, H., Chang, A.X., Jurafsky, D.: Unsupervised dependency parsing without gold part-of-speech tags. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011) (2011). pubs/goldtags.pdf
Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Punctuation: making a point in unsupervised dependency parsing. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL-2011) (2011)
Google Scholar
Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Three dependency-and-boundary models for grammar induction. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012) (2012). pubs/dbm.pdf
Zeman, D.: Reusable tagset conversion using tagset drivers. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). European Language Resources Association (ELRA), Marrakech (May 2008). http://www.lrec-conf.org/proceedings/lrec2008/
Zeman, D., Dušek, O., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: harmonized multi-language dependency treebank. Lang. Resour. Eval. 48(4), 601–637 (2014)
Article Google Scholar
Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: to Parse or not to parse? In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)
Google Scholar
Zeman, D., Resnik, P.: Cross-language parser adaptation between related languages. In: IJCNLP 2008 Workshop on NLP for Less Privileged Languages, pp. 35–42. Asian Federation of Natural Language Processing. International Institute of Information Technology, Hyderabad (2008)
Google Scholar

Download references

Acknowledgments

This work has been supported by the grant 14-06548P of the Czech Science Foundation.

Author information

Authors and Affiliations

Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Malostranské náměstí 25, 118 00, Praha, Czech Republic
David Mareček

Authors

David Mareček
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Mareček .

Editor information

Editors and Affiliations

University of West Bohemia , Plzen, Czech Republic
Pavel Král
Rovira i Virgili University , Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mareček, D. (2016). Delexicalized and Minimally Supervised Parsing on Universal Dependencies. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-45925-7_3
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45924-0
Online ISBN: 978-3-319-45925-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics