Abstract
Parsers are essential tools for several NLP applications. Here we introduce PassPort, a model for the dependency parsing of Portuguese trained with the Stanford Parser. For developing PassPort, we observed which approach performed best in several setups using different existing parsing algorithms and combinations of linguistic information. PassPort achieved an UAS of 87.55 and a LAS of 85.21 in the Universal Dependencies corpus. We also evaluated the model’s performance in relation to another model and different corpora containing three genres. For that, we annotated random sentences from these corpora using PassPort and the PALAVRAS parsing system. We then carried out a manual evaluation and comparison of both models. They achieved very similar results for dependency parsing, with a LAS of 85.02 for PassPort against 84.36 for PALAVRAS. In addition, the results from the analysis showed us that better performance in the part-of-speech tagging could improve our LAS.
Supported by the Walloon Region (Projects BEWARE 1510637 and 1610378) and Altissia International.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The parser model, along with the material that was used in this paper can be found in https://cental.uclouvain.be/resources/smalla_smille/passport/.
- 2.
- 3.
By the time of the execution of the experiments in this paper, the available PT-UD corpus was in its version 2.1.
- 4.
For instance, the tag DET in the short POS appears as DET or ART in the long POS, while the tag DET in the long POS appears as DET or PRON in the short POS.
- 5.
This modified version of the corpus is available along with the parser model at the PassPort website https://cental.uclouvain.be/resources/smalla_smille/passport/.
- 6.
We detected some fluctuation in the scores during preliminary testing.
- 7.
Zeman et al. [23] argue that larger dimensions may yield better results for parsing.
- 8.
The best system was run five times with randomized train and test sets.
- 9.
Using the most recent PT-UD corpus (version 2.2) in similar setups, we also had a better performance using long POS information over short POS.
- 10.
- 11.
Although there are 30 sentences selected from each genre, in the results, it is possible to observe that both parsing systems (PassPort and PALAVRAS) use their own sentence splitters, so that the final sentence numbers are different (for instance, PALAVRAS splits sentences when there is a colon).
- 12.
Selected romances from www.dominiopublico.gov.br.
- 13.
This corpus was compiled in the scope of the project PorPopular (www.ufrgs.br/textecc/porlexbras/porpopular/index.php).
- 14.
We did not evaluate punctuation tokens, since PALAVRAS does not provide dependency label for them and, in both parsing models, they are simply attached to the root or the closest dependency to the root.
- 15.
This is not in line with the UD guidelines (universaldependencies.org/u/dep/iobj.html), which indicate that the indirect objects should be marked as obj (if they are the sole object of the verb) or as iobj (if there is another obj in the clause). According to the guidelines, obl should only be used for adjuncts, but that is not the case in the PT-UD corpus.
- 16.
The tags present also a < or > symbol, which indicates the attachment direction.
- 17.
The model, training datasets and evaluation files will be made available with the final version.
References
Afonso, S., Bick, E., Santos, D., Haber, R.: Floresta sintá (c) tica: um “treebank” para o português. quot. In: Gonçalves, A., Correia, C.N., (eds.) Actas do XVII Encontro Nacional da Associação Portuguesa de Linguística (APL 2001), Lisboa 2–4 de Outubro de 2001, Lisboa Portugal: APL (2001)
António, B., Castro, S., Silva, J., Costa, F.: Cintil depbank handbook: design options for the representation of grammatical dependencies. Department of Informatics, University of Lisbon, Technical reports nb. di-fcul-tr-11-03, pp. 86–89 (2011)
Bick, E.: The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus Universitetsforlag (2000)
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. Association for Computational Linguistics (2006)
Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750 (2014)
Gamallo, P.: Dependency parsing with compression rules. In: Proceedings of the 14th International Conference on Parsing Technologies, pp. 107–117 (2015)
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 91–98. Association for Computational Linguistics (2005)
McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a two-stage discriminative parser. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 216–220. Association for Computational Linguistics (2006)
McDonald, R., Nivre, J.: Analyzing and integrating dependency parsers. Comput. Linguist. 37(1), 197–230 (2011)
McDonald, R., Pereira, F.: Online learning of approximate dependency parsing algorithms. In: 11th Conference of the European Chapter of the Association for Computational Linguistics (2006)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nivre, J., Hall, J., Nilsson, J.: MaltParser: a data-driven parser-generator for dependency parsing. In: International Conference on Language Resources and Evaluation, vol. 6, pp. 2216–2219 (2006)
Nivre, J., Hall, J., Nilsson, J., Eryiǧit, G., Marinov, S.: Labeled pseudo-projective dependency parsing with support vector machines. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 221–225. Association for Computational Linguistics (2006)
Nivre, J., et al.: Universal dependencies v1: a multilingual treebank collection. In: International Conference on Language Resources and Evaluation (2016)
Otero, P.G., González, I.: DepPattern: a multilingual dependency parser. In: International Conference on Computational Processing of the Portuguese Language (PROPOR 2012), Coimbra, Portugal, pp. 659–670. Citeseer (2012)
Otero, P.G., López, I.G.: A grammatical formalism based on patterns of part of speech tags. Int. J. Corpus Linguist. 16(1), 45–71 (2011)
Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., de Paiva, V.: Universal dependencies for Portuguese. In: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling), Pisa, Italy, pp. 197–206, September 2017. http://aclweb.org/anthology/W17-6523
Silva, J., Branco, A., Castro, S., Reis, R.: Out-of-the-box robust parsing of Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 75–85. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_10
Tiedemann, J.: Finding alternative translations in a large corpus of movie subtitle. In: International Conference on Language Resources and Evaluation (2016)
Filho, J.A.W., Wilkens, R., Zilio, L., Idiart, M., Villavicencio, A.: Crawling by readability level. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 306–318. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_31
Wagner Filho, J., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC corpus: a new open resource to aid in the processing of Brazilian Portuguese. In: 11th edition of the Language Resources and Evaluation Conference (LREC) (2018)
Wagner Filho, J.A., Wilkens, R., Villavicencio, A.: Automatic construction of large readability corpora. In: Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), p. 164 (2016)
Zeman, D., et al.: CoNLL 2017 shared task: multilingual parsing from raw text to universal dependencies. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1–19 (2017)
Zhou, H., Zhang, Y., Huang, S., Chen, J.: A neural probabilistic structured-prediction model for transition-based dependency parsing. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1: Long Papers, pp. 1213–1222 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zilio, L., Wilkens, R., Fairon, C. (2018). PassPort: A Dependency Parsing Model for Portuguese. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-99722-3_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)