Single Malt or Blended? A Study in Multilingual Parser Optimization

Hall, Johan; Nilsson, Jens; Nivre, Joakim

doi:10.1007/978-90-481-9352-3_2

Johan Hall⁴,
Jens Nilsson⁴ &
Joakim Nivre⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 43))

621 Accesses
4 Citations

Abstract

We describe a two-stage optimization of the MaltParser system for the ten languages in the multilingual track of the CoNLL 2007 shared task on dependency parsing. The first stage consists in tuning a single-parser system for each language by optimizing parameters of the parsing algorithm, the feature model, and the learning algorithm. The second stage consists in building an ensemble system that combines six different parsing strategies, extrapolating from the optimal parameter settings for each language. When evaluated on the official test sets, the ensemble system significantly outperformed the single-parser system and achieved the highest average labeled attachment score of all systems participating in the shared task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We have used MaltParser 0.4, which can be downloaded free of charge from the following page: http://w3.msi.vxu.se/users/nivre/research/MaltParser.html
2.
Note that this was an internally defined test set, taken out of the training data and distinct from the official test set distributed by the organizers for the final evaluation.
3.
Complete specifications of all parameter settings for all languages, for both Single Malt and Blended, are available at http://w3.msi.vxu.se/users/jha/conll07/
4.
These numbers refer to the number of multi-valued categorical features, as defined in Fig. 2.2. The number of binarized features fed to the SVM learner is of course much higher and depends on the number of possible values for each categorical feature.
5.
For example, the DEPREL of Top row 1 in Fig. 2.2 was removed for the arc-standard version of the Nivre algorithm, because it will always be null.

References

Abeillé, A. (Ed.) (2003). Treebanks: Building and Using Parsed Corpora. Dordrecht, Netherlands: Kluwer.
Google Scholar
Aduriz, I., M. J. Aranzabe, J. M. Arriola, A. Atutxa, A. D. de Ilarraza, A. Garmendia, and M. Oronoz (2003). Construction of a Basque dependency treebank. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories, Växjö, Sweden, pp. 201–204.
Google Scholar
Black, E., F. Jelinek, J. D. Lafferty, D. M. Magerman, R. L. Mercer, and S. Roukos (1992). Towards history-based grammars: using richer models for probabilistic parsing. In Proceedings of the 5th DARPA Speech and Natural Language Workshop, Harriman, NY, pp. 31–37.
Google Scholar
Böhmová, A., J. Hajič, E. Hajičová, and B. Hladká (2003). The PDT: a 3-level annotation scenario. See Abeillé (2003), Chapter 7, pp. 103–127.
Buchholz, S. and E. Marsi (2006). CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the 10th Conference on Computational Natural Language Learning, New York, NY, pp. 149–164.
Google Scholar
Chang, C.-C. and C.-J. Lin (2001). LIBSVM: a Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
Chen, K., C. Luo, M. Chang, F. Chen, C. Chen, C. Huang, and Z. Gao (2003). Sinica treebank: design criteria, representational issues and implementation. See Abeillé (2003), Chapter 13, pp. 231–248.
Covington, M. A. (2001). A fundamental algorithm for dependency parsing. In Proceedings of the 39th Annual ACM Southeast Conference, Athens, GA, pp. 95–102.
Google Scholar
Csendes, D., J. Csirik, T. Gyimóthy, and A. Kocsor (2005). The Szeged Treebank. Berlin/ Heidelberg, Germany: Springer.
Google Scholar
Hajič, J., O. Smrž, P. Zemánek, J. Šnaidauf, and E. Beška (2004). Prague Arabic dependency treebank: development in data and tools. In Proceedings of the Network for Euro-Mediterranean Language Resources International Conference on Arabic Language Resources and Tools, Cairo, Egypt, pp. 110–117.
Google Scholar
Hall, J., J. Nivre, and J. Nilsson (2006). Discriminative classifiers for deterministic dependency parsing. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics Main Conference Poster Sessions, Sydney, Australia, pp. 316–323.
Google Scholar
, R. and P. Nugues (2007). Extended constituent-to-dependency conversion for English. In Proceedings of the 16th Nordic Conference on Computational Linguistics, Tartu, Estonia, pp. 105–112.
Google Scholar
Kudo, T. and Y. Matsumoto (2002). Japanese dependency analysis using cascaded chunking. In Proceedings of the 6th Workshop on Computational Language Learning, Taipei, Taiwan, pp. 63–69.
Google Scholar
Magerman, D. M. (1995). Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, pp. 276–283.
Google Scholar
Marcus, M., B. Santorini, and M. Marcinkiewicz (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330.
Google Scholar
Martí, M. A., M. Taulé, L. Màrquez, and M. Bertran (2007). CESS-ECE: a multilingual and multilevel annotated corpus. Available for download from: http://www.lsi.upc.edu/~mbertran/cess-ece/.
McDonald, R. and J. Nivre (2007). Characterizing the errors of data-driven dependency parsing models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 122–131.
Google Scholar
McDonald, R., F. Pereira, K. Ribarov, and J. Hajič (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 523–530.
Google Scholar
Montemagni, S., F. Barsotti, M. Battista, N. Calzolari, O. Corazzari, A. Lenci, A. Zampolli, F. Fanciulli, M. Massetani, R. Raffaelli, R. Basili, M. T. Pazienza, D. Saracino, F. Zanzotto, N. Nana, F. Pianesi, and R. Delmonte (2003). Building the Italian Syntactic-Semantic Treebank. See Abeillé (2003), Chapter 11, pp. 189–210.
Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies, Nancy. France, pp. 149–160.
Google Scholar
Nivre, J. (2006). Inductive Dependency Parsing. Dordrecht, Netherlands: Springer.
Book Google Scholar
Nivre, J. (2007). Incremental non-projective dependency parsing. In Human Language Technologies: the Annual Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, NY, pp. 396–403.
Google Scholar
Nivre, J., J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret (2007). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 915–932.
Google Scholar
Nivre, J., J. Hall, J. Nilsson, A. Chanev, G. Eryiğit, S. Kübler, S. Marinov, and E. Marsi (2007). MaltParser: a language-independent system for data-driven dependency parsing. Natural Language Engineering 13(2), 95–135.
Google Scholar
Nivre, J., J. Hall, J. Nilsson, G. Eryiğit, and S. Marinov (2006). Labeled pseudo-projective dependency parsing with support vector machines. In Proceedings of the 10th Conference on Computational Natural Language Learning, New York, NY, pp. 221–225.
Google Scholar
Nivre, J. and J. Nilsson (2005). Pseudo-projective dependency. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp. 99–106.
Google Scholar
Oflazer, K., B. Say, D. Z. Hakkani-Tür, and G. Tür (2003). Building a Turkish Treebank. See Abeillé (2003), Chapter 15, pp. 261–277.
Prokopidis, P., E. Desypri, M. Koutsombogera, H. Papageorgiou, and S. Piperidis (2005). Theoretical and practical issues in the construction of a Greek dependency treebank. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories, Barcelona, Spain, pp. 149–160.
Google Scholar
Ratnaparkhi, A. (1997). A linear observed time statistical parser based on maximum entropy models. In Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing, Providence, RI, pp. 1–10.
Google Scholar
Sagae, K. and A. Lavie (2006). Parser combination by reparsing. In Proceedings of the Human Language Technology Conference and the Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, New York, NY, pp. 129–132.
Google Scholar
Yamada, H. and Y. Matsumoto (2003). Statistical dependency analysis with support vector machines. In Proceedings of the 8th International Workshop on Parsing Technologies, Nancy, France, pp. 195–206.
Google Scholar
Zeman, D. and Z. Žabokrtský (2005). Improving parsing accuracy by combining diverse dependency parsers. In Proceedings of the 9th International Workshop on Parsing Technologies, Vancouver, Canada, pp. 171–178.
Google Scholar

Download references

Acknowledgements

We want to thank all treebank providers for making the data available for the CoNLL 2007 Shared Task and the (other) organizers for their efforts in organizing it. Special thanks to Ryan McDonald, for fruitful discussions and assistance with the error analysis, and to Kenji Sagae, for showing us how to produce a good blend. Thanks also to Gülşen Eryiğit, Beáta Megyesi, Mattias Nilsson and Markus Saers for helping us with the optimization of the Single Malt parser.

Author information

Authors and Affiliations

Växjö University, Växjö, Sweden
Johan Hall & Jens Nilsson
Uppsala University, Uppsala, Sweden
Joakim Nivre

Authors

Johan Hall
View author publications
You can also search for this author in PubMed Google Scholar
Jens Nilsson
View author publications
You can also search for this author in PubMed Google Scholar
Joakim Nivre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johan Hall .

Editor information

Editors and Affiliations

Tilburg University, Warandelaan 2, Tilburg, 5000 LE, Netherlands
Harry Bunt
Dépt. Linguistique, Université de Genève, rue de Candolle 2, Genève, 1211, Switzerland
Paola Merlo
Pimpstensvägen 16, Uppsala, 752 67, Sweden
Joakim Nivre

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hall, J., Nilsson, J., Nivre, J. (2010). Single Malt or Blended? A Study in Multilingual Parser Optimization. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_2

Download citation

DOI: https://doi.org/10.1007/978-90-481-9352-3_2
Published: 29 September 2010
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9351-6
Online ISBN: 978-90-481-9352-3
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics