Advertisement

Creating Hybrid Dependency Parsers for Syntax-Based MT

  • Nathan David GreenEmail author
  • Zdeněk Žabokrtský
Chapter
  • 757 Downloads
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

Dependency parsers are almost ubiquitously evaluated on their accuracy scores, these scores say nothing of the complexity and usefulness of the resulting structures. As dependency parses are basic structures in which other systems are built upon, it would seem more reasonable to judge these parsers down the NLP pipeline. In this chapter, we will discuss how different forms and different hybrid combinations of dependency parses effect the overall output of Syntax-Based machine translation both through automatic and manual evaluation. We show results from a variety of individual parsers, including dependency and constituent parsers, and describe multiple ensemble parsing techniques with their overall effect on the Machine Translation system. We show that parsers’ UAS scores are more correlated to the NIST evaluation metric than to the BLEU Metric, however we see increases in both metrics. To truly see the effect of hybrid dependency parsers on machine translation, we will describe and evaluate a combined resource we have released, that contains gold standard dependency trees along with gold standard translations.

References

  1. Bick, E. 2007. Hybrid ways to improve domain independence in an ML dependency parser. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, 1119–1123. http://www.aclweb.org/anthology/D/D07/D07-1120.
  2. Buchholz, S., and E. Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-X ’06, 149–164. Stroudsburg, PA: Association for Computational Linguistics. http://portal.acm.org/citation.cfm?id=1596276.1596305.CrossRefGoogle Scholar
  3. Charniak, E., and M. Johnson. 2005. Coarse-to-fine n-best parsing and maxent discriminative reranking. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, 173–180. Stroudsburg, PA: Association for Computational Linguistics. http://dx.doi.org/10.3115/1219840.1219862.CrossRefGoogle Scholar
  4. Dietterich, T.G. 2000. Ensemble methods in machine learning. In Proceedings of the First International Workshop on Multiple Classifier Systems, MCS ’00, 1–15. London: Springer. http://dl.acm.org/citation.cfm?id=648054.743935.CrossRefGoogle Scholar
  5. Eisner, J. 1996. Three new probabilistic models for dependency parsing: an exploration. In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), 340–345. Copenhagen: Association for Computational Linguistics. http://www.aclweb.org/anthology/N10-1004.CrossRefGoogle Scholar
  6. Green, N., and Z. Žabokrtský. 2012a. Hybrid combination of constituency and dependency trees into an ensemble dependency parser. In Proceedings of the EACL 2012 Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, Avignon.Google Scholar
  7. Green, N., and Z. Žabokrtský. 2012b. Ensemble Parsing and its Effect on Machine Translation. Technical Report 48.Google Scholar
  8. Green, N., S.D. Larasati, and Z. Žabokrtský. 2012a. Indonesian dependency treebank: Annotation and parsing. In Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, 137–145. Bali: Faculty of Computer Science, Universitas Indonesia. http://www.aclweb.org/anthology/Y12-1014.Google Scholar
  9. Green, N., L. Ramasamy, and Z. Žabokrtský. 2012b. Using an SVM ensemble system for improved tamil dependency parsing. In Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, 72–77. Jeju, Republic of Korea: Association for Computational Linguistics. http://www.aclweb.org/anthology/W12-3410.Google Scholar
  10. Gusmita, R.H., and R. Manurung. 2008. Some initial experiments with Indonesian probabilistic parsing. In Proceedings of the 2nd International MALINDO Workshop.Google Scholar
  11. Haffari, G., M. Razavi, and A. Sarkar. 2011. An ensemble model that combines syntactic and semantic clustering for discriminative dependency parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 710–714. Portland, OR: Association for Computational Linguistics. http://www.aclweb.org/anthology/P11-2125.Google Scholar
  12. Hajič, J. 1998. Building a syntactically annotated corpus: The Prague dependency treebank. In Issues of valency and meaning. Studies in honor of Jarmila Panevová, ed. E. Hajičová, 12–19. Prague Karolinum: Charles University Press.Google Scholar
  13. Hall, J., J. Nilsson, J. Nivre, G. Eryigit, B. Megyesi, M. Nilsson, and M. Saers. 2007. Single malt or blended? A study in multilingual parser optimization. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, 933–939. http://www.aclweb.org/anthology/D/D07/D07-1097.
  14. Johansson, R., and P. Nugues. 2007. Extended constituent-to-dependency conversion for English. In Proceedings of NODALIDA 2007, Tartu, 105–112.Google Scholar
  15. Joice, J. 2002. Pengembangan lanjut pengurai struktur kalimat bahasa indonesia yang menggunakan constraint-based formalism. Undergraduate thesis. Master’s thesis, Faculty of Computer Science, University of Indonesia.Google Scholar
  16. Klein, D., and C.D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, 423–430. Stroudsburg, PA: Association for Computational Linguistics.CrossRefGoogle Scholar
  17. Kübler, S., R. McDonald, and J. Nivre. 2009. Dependency parsing. Synthesis lectures on human language technologies. San Rafael, CA: Morgan & Claypool. http://books.google.com/books?id=k3iiup7HB9UC.Google Scholar
  18. Marcus, M.P., M.A. Marcinkiewicz, and B. Santorini. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19:313–330. http://portal.acm.org/citation.cfm?id=972470.972475.Google Scholar
  19. McDonald, R., and J. Nivre. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 122–131. http://www.aclweb.org/anthology/D/D07/D07-1013.
  20. McDonald, R., F. Pereira, K. Ribarov, and J. Hajic. 2005a. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 523–530. Vancouver, BC: Association for Computational Linguistics. http://www.aclweb.org/anthology/H/H05/H05-1066.CrossRefGoogle Scholar
  21. McDonald, R., F. Pereira, K., Ribarov, and J. Hajič. 2005b. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05, 523–530. Morristown, NJ: Association for Computational Linguistics. http://dx.doi.org/10.3115/1220575.1220641.CrossRefGoogle Scholar
  22. Nivre, J. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), 149–160.Google Scholar
  23. Nivre, J., and R. McDonald. 2008. Integrating graph-based and transition-based dependency parsers. In Proceedings of ACL-08: HLT, 950–958. Columbus, OH: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P08/P08-1108.Google Scholar
  24. Nivre, J., J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov, and E. Marsi. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13(2):95–135.Google Scholar
  25. Pajas, P., and P. Fabian. 2011. TrEd 2.0 - Newly refactored tree editor. http://ufal.mff.cuni.cz/tred/. Institute of Formal and Applied Linguistics, MFF.
  26. Papineni, K., S. Roukos, T. Ward, and W.J. Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, 311–318. Morristown, NJ: Association for Computational Linguistics. http://dx.doi.org/10.3115/1073083.1073135.Google Scholar
  27. Popel, M., Z. Žabokrtský, and J. Ptáček. 2010. Tectomt: Modular nlp framework. In IceTAL, 293–304.Google Scholar
  28. Popel, M., D. Mareček, N. Green, and Z. Žabokrtský. 2011. Influence of parser choice on dependency-based mt. In Proceedings of the Sixth Workshop on Statistical Machine Translation, 433–439. Edinburgh: Association for Computational Linguistics. http://www.aclweb.org/anthology/W11-2153.Google Scholar
  29. Ramasamy, L., and Z. Žabokrtský. 2011. Tamil dependency parsing: Results using rule based and corpus based approaches. In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing - Volume Part I, CICLing’11, 82–95. Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
  30. Sagae, K., and A. Lavie. 2006. Parser combination by reparsing. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, 129–132. New York City, NY: Association for Computational Linguistics. http://www.aclweb.org/anthology/N/N06/N06-2033.CrossRefGoogle Scholar
  31. Sagae, K., and J. Tsujii. 2007. Dependency parsing and domain adaptation with LR models and parser ensembles. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, 1044–1050. Prague: Association for Computational Linguistics. http://www.aclweb.org/anthology/D/D07/D07-1111.Google Scholar
  32. Sgall, P. 1967. Generativní popis jazyka a česká deklinace. Prague: Academia.Google Scholar
  33. Surdeanu, M., and C.D. Manning. 2010. Ensemble models for dependency parsing: Cheap and good? In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, 649–652. Stroudsburg, PA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1857999.1858090.Google Scholar
  34. Žabokrtský, Z., J. Ptáček, P. Pajas. 2008. TectoMT: Highly modular MT system with tectogrammatics used as transfer layer. In Proceedings of the 3rd Workshop on Statistical Machine Translation, ACL, 167–170.Google Scholar
  35. Zeman, D., and Z. Žabokrtský. 2005. Improving parsing accuracy by combining diverse dependency parsers. In Proceedings of the 9th International Workshop on Parsing Technologies.Google Scholar
  36. Zhang, Y., and S. Clark. 2011. Syntactic processing using the generalized perceptron and beam search. Computational Linguistics 37(1):105–151CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Westfield State UniversityWestfieldUSA
  2. 2.Charles University in PraguePragueCzech Republic

Personalised recommendations