Skip to main content

Delexicalized and Minimally Supervised Parsing on Universal Dependencies

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9918))

Included in the following conference series:

  • 493 Accesses

Abstract

In this paper, we compare delexicalized transfer and minimally supervised parsing techniques on 32 different languages from Universal Dependencies treebank collection. The minimal supervision is in adding handcrafted universal grammatical rules for POS tags. The rules are incorporated into the unsupervised dependency parser in forms of external prior probabilities. We also experiment with learning this probabilities from other treebanks. The average attachment score of our parser is slightly lower then the delexicalized transfer parser, however, it performs better for languages from less resourced language families (non-Indo-European) and is therefore suitable for those, for which the treebanks often do not exist.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the fully unsupervised setting, we cannot for example simply push verbs to the roots and nouns to become their dependents. This is already a kind of supervision.

  2. 2.

    universaldependencices.org.

  3. 3.

    We exclude ‘Ancient Greek-PROIEL’, ‘Finnish-FTB’, ‘Japan-KTC’, ‘Latin-ITT’, and ‘Latin-PROIEL’ treebanks.

  4. 4.

    Malt parser in the current version 1.8.1 (http://maltparser.org).

  5. 5.

    http://ufal.mff.cuni.cz/udp.

  6. 6.

    We had to change the original parser code to do this.

  7. 7.

    Note that for example \(p^{ext}_{attach}(PUNC|VERB,dir) = 1\) does not mean that all the dependents of VERB must be PUNC. Since the \(\lambda _{attach}\) is less than one, the value 1 only pushes punctuation to be attached below verbs.

  8. 8.

    The results of different parameter settings for both parser varied only little (at most 2 % difference for all the languages).

  9. 9.

    We used the Malt parser with its default feature set. Tuning in this specific delexicalized task would probably bring a bit better results.

  10. 10.

    Danish is the only exception.

References

  1. Blunsom, P., Cohn, T.: Unsupervised induction of tree substitution grammars for dependency parsing. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1204–1213. EMNLP 2010. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  2. Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. CoNLL-X 2006. Association for Computational Linguistics, Stroudsburg (2006)

    Google Scholar 

  3. Cerisara, C., Lorenzo, A., Kral, P.: Weakly supervised parsing with rules. In: Interspeech 2013, Lyon, France, pp. 2192–2196 (2013)

    Google Scholar 

  4. Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov chain Monte Carlo in practice. Interdisciplinary Statistics. Chapman & Hall, London (1996)

    MATH  Google Scholar 

  5. Hajič, J., Hajičová, E., Panevová, J., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M.: Prague Dependency Treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia (2006)

    Google Scholar 

  6. Headden III, W.P., Johnson, M., McClosky, D.: Improving unsupervised dependency parsing with richer contexts and smoothing. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. NAACL 2009, pp. 101–109. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  7. Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)

    Google Scholar 

  8. Marecek, D.: Multilingual unsupervised dependency parsing with unsupervised POS tags. In: Sidorov, G., et al. (eds.) MICAI 2015. LNCS, vol. 9413, pp. 72–82. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27060-9_6

    Chapter  Google Scholar 

  9. Mareček, D., Straka, M.: Stop-probability estimates computed on a large corpus improve unsupervised dependency parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 281–290. Association for Computational Linguistics, Sofia, August 2013

    Google Scholar 

  10. de Marneffe, M.C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., Manning, C.D.: Universal stanford dependencies: a cross-linguistic typology. In: Proceedings of the 9th Conference on Language Resources and Evaluation (LREC) (2014)

    Google Scholar 

  11. de Marneffe, M.C., Manning, C.D.: The stanford typed dependencies representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation. CrossParser 2008, pp. 1–8. Association for Computational Linguistics, Stroudsburg (2008)

    Google Scholar 

  12. Mcdonald, R., Nivre, J., Quirmbach-brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Tckstrm, O., Bedini, C., Bertomeu, N., Lee, C.J.: Universal dependency annotation for multilingual parsing. In: Proceedings of ACL 2013 (2013)

    Google Scholar 

  13. McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of Human Langauge Technology Conference and Conference on Empirical Methods in Natural Language Processing (HTL/EMNLP), Vancouver, BC, Canada, pp. 523–530 (2005)

    Google Scholar 

  14. McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP 2011, pp. 62–72. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  15. Naseem, T., Chen, H., Barzilay, R., Johnson, M.: Using universal linguistic knowledge to guide grammar induction. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. EMNLP 2010, pp. 1234–1244. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  16. Nivre, J.: Non-projective dependency parsing in expected linear time. In: Su, K.Y., Su, J., Wiebe, J. (eds.) ACL/IJCNLP, pp. 351–359. The Association for Computer Linguistics, Stroudsburg (2009)

    Google Scholar 

  17. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932. Association for Computational Linguistics, Prague, June 2007

    Google Scholar 

  18. Nivre, J., de Marneffe, M.C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D.: Universal dependencies v1: a multilingual treebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association, Portorož (2016)

    Google Scholar 

  19. Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, May 2012

    Google Scholar 

  20. Rosa, R.: Multi-source cross-lingual delexicalized parser transfer: Prague or Stanford? In: Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pp. 281–290. Uppsala University, Uppsala (2015)

    Google Scholar 

  21. Rosa, R., Mašek, J., Mareček, D., Popel, M., Zeman, D., Žabokrtský, Z.: HamleDT 2.0: thirty dependency treebanks stanfordized. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26–31, 2014, pp. 2334–2341 (2014)

    Google Scholar 

  22. Spitkovsky, V.I., Alshawi, H., Chang, A.X., Jurafsky, D.: Unsupervised dependency parsing without gold part-of-speech tags. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011) (2011). pubs/goldtags.pdf

  23. Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Punctuation: making a point in unsupervised dependency parsing. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL-2011) (2011)

    Google Scholar 

  24. Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Three dependency-and-boundary models for grammar induction. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012) (2012). pubs/dbm.pdf

  25. Zeman, D.: Reusable tagset conversion using tagset drivers. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). European Language Resources Association (ELRA), Marrakech (May 2008). http://www.lrec-conf.org/proceedings/lrec2008/

  26. Zeman, D., Dušek, O., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: harmonized multi-language dependency treebank. Lang. Resour. Eval. 48(4), 601–637 (2014)

    Article  Google Scholar 

  27. Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: to Parse or not to parse? In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)

    Google Scholar 

  28. Zeman, D., Resnik, P.: Cross-language parser adaptation between related languages. In: IJCNLP 2008 Workshop on NLP for Less Privileged Languages, pp. 35–42. Asian Federation of Natural Language Processing. International Institute of Information Technology, Hyderabad (2008)

    Google Scholar 

Download references

Acknowledgments

This work has been supported by the grant 14-06548P of the Czech Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Mareček .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Mareček, D. (2016). Delexicalized and Minimally Supervised Parsing on Universal Dependencies. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45925-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45924-0

  • Online ISBN: 978-3-319-45925-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics