PassPort: A Dependency Parsing Model for Portuguese

Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick

doi:10.1007/978-3-319-99722-3_48

Leonardo Zilio²¹,
Rodrigo Wilkens²¹ &
Cédrick Fairon²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11122))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

822 Accesses

Abstract

Parsers are essential tools for several NLP applications. Here we introduce PassPort, a model for the dependency parsing of Portuguese trained with the Stanford Parser. For developing PassPort, we observed which approach performed best in several setups using different existing parsing algorithms and combinations of linguistic information. PassPort achieved an UAS of 87.55 and a LAS of 85.21 in the Universal Dependencies corpus. We also evaluated the model’s performance in relation to another model and different corpora containing three genres. For that, we annotated random sentences from these corpora using PassPort and the PALAVRAS parsing system. We then carried out a manual evaluation and comparison of both models. They achieved very similar results for dependency parsing, with a LAS of 85.02 for PassPort against 84.36 for PALAVRAS. In addition, the results from the analysis showed us that better performance in the part-of-speech tagging could improve our LAS.

Supported by the Walloon Region (Projects BEWARE 1510637 and 1610378) and Altissia International.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dependency Parsing of Turkish

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

Advantages of Dependency Parsing for Free Word Order Natural Languages

Notes

1.
The parser model, along with the material that was used in this paper can be found in https://cental.uclouvain.be/resources/smalla_smille/passport/.
2.
lxcenter.di.fc.ul.pt/services/pt/LXServicesParserDepPT.html.
3.
By the time of the execution of the experiments in this paper, the available PT-UD corpus was in its version 2.1.
4.
For instance, the tag DET in the short POS appears as DET or ART in the long POS, while the tag DET in the long POS appears as DET or PRON in the short POS.
5.
This modified version of the corpus is available along with the parser model at the PassPort website https://cental.uclouvain.be/resources/smalla_smille/passport/.
6.
We detected some fluctuation in the scores during preliminary testing.
7.
Zeman et al. [23] argue that larger dimensions may yield better results for parsing.
8.
The best system was run five times with randomized train and test sets.
9.
Using the most recent PT-UD corpus (version 2.2) in similar setups, we also had a better performance using long POS information over short POS.
10.
Available at: https://github.com/UniversalDependencies/UD_Portuguese-GSD/tree/master.
11.
Although there are 30 sentences selected from each genre, in the results, it is possible to observe that both parsing systems (PassPort and PALAVRAS) use their own sentence splitters, so that the final sentence numbers are different (for instance, PALAVRAS splits sentences when there is a colon).
12.
Selected romances from www.dominiopublico.gov.br.
13.
This corpus was compiled in the scope of the project PorPopular (www.ufrgs.br/textecc/porlexbras/porpopular/index.php).
14.
We did not evaluate punctuation tokens, since PALAVRAS does not provide dependency label for them and, in both parsing models, they are simply attached to the root or the closest dependency to the root.
15.
This is not in line with the UD guidelines (universaldependencies.org/u/dep/iobj.html), which indicate that the indirect objects should be marked as obj (if they are the sole object of the verb) or as iobj (if there is another obj in the clause). According to the guidelines, obl should only be used for adjuncts, but that is not the case in the PT-UD corpus.
16.
The tags present also a < or > symbol, which indicates the attachment direction.
17.
The model, training datasets and evaluation files will be made available with the final version.

References

Afonso, S., Bick, E., Santos, D., Haber, R.: Floresta sintá (c) tica: um “treebank” para o português. quot. In: Gonçalves, A., Correia, C.N., (eds.) Actas do XVII Encontro Nacional da Associação Portuguesa de Linguística (APL 2001), Lisboa 2–4 de Outubro de 2001, Lisboa Portugal: APL (2001)
Google Scholar
António, B., Castro, S., Silva, J., Costa, F.: Cintil depbank handbook: design options for the representation of grammatical dependencies. Department of Informatics, University of Lisbon, Technical reports nb. di-fcul-tr-11-03, pp. 86–89 (2011)
Google Scholar
Bick, E.: The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus Universitetsforlag (2000)
Google Scholar
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. Association for Computational Linguistics (2006)
Google Scholar
Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750 (2014)
Google Scholar
Gamallo, P.: Dependency parsing with compression rules. In: Proceedings of the 14th International Conference on Parsing Technologies, pp. 107–117 (2015)
Google Scholar
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 91–98. Association for Computational Linguistics (2005)
Google Scholar
McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a two-stage discriminative parser. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 216–220. Association for Computational Linguistics (2006)
Google Scholar
McDonald, R., Nivre, J.: Analyzing and integrating dependency parsers. Comput. Linguist. 37(1), 197–230 (2011)
Article Google Scholar
McDonald, R., Pereira, F.: Online learning of approximate dependency parsing algorithms. In: 11th Conference of the European Chapter of the Association for Computational Linguistics (2006)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nivre, J., Hall, J., Nilsson, J.: MaltParser: a data-driven parser-generator for dependency parsing. In: International Conference on Language Resources and Evaluation, vol. 6, pp. 2216–2219 (2006)
Google Scholar
Nivre, J., Hall, J., Nilsson, J., Eryiǧit, G., Marinov, S.: Labeled pseudo-projective dependency parsing with support vector machines. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 221–225. Association for Computational Linguistics (2006)
Google Scholar
Nivre, J., et al.: Universal dependencies v1: a multilingual treebank collection. In: International Conference on Language Resources and Evaluation (2016)
Google Scholar
Otero, P.G., González, I.: DepPattern: a multilingual dependency parser. In: International Conference on Computational Processing of the Portuguese Language (PROPOR 2012), Coimbra, Portugal, pp. 659–670. Citeseer (2012)
Google Scholar
Otero, P.G., López, I.G.: A grammatical formalism based on patterns of part of speech tags. Int. J. Corpus Linguist. 16(1), 45–71 (2011)
Article Google Scholar
Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., de Paiva, V.: Universal dependencies for Portuguese. In: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling), Pisa, Italy, pp. 197–206, September 2017. http://aclweb.org/anthology/W17-6523
Silva, J., Branco, A., Castro, S., Reis, R.: Out-of-the-box robust parsing of Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 75–85. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_10
Chapter Google Scholar
Tiedemann, J.: Finding alternative translations in a large corpus of movie subtitle. In: International Conference on Language Resources and Evaluation (2016)
Google Scholar
Filho, J.A.W., Wilkens, R., Zilio, L., Idiart, M., Villavicencio, A.: Crawling by readability level. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 306–318. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_31
Chapter Google Scholar
Wagner Filho, J., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC corpus: a new open resource to aid in the processing of Brazilian Portuguese. In: 11th edition of the Language Resources and Evaluation Conference (LREC) (2018)
Google Scholar
Wagner Filho, J.A., Wilkens, R., Villavicencio, A.: Automatic construction of large readability corpora. In: Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), p. 164 (2016)
Google Scholar
Zeman, D., et al.: CoNLL 2017 shared task: multilingual parsing from raw text to universal dependencies. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1–19 (2017)
Google Scholar
Zhou, H., Zhang, Y., Huang, S., Chen, J.: A neural probabilistic structured-prediction model for transition-based dependency parsing. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1: Long Papers, pp. 1213–1222 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre de traitement automatique du langage – CENTAL, Université catholique de Louvain (UCL), Louvain-la-Neuve, Belgium
Leonardo Zilio, Rodrigo Wilkens & Cédrick Fairon

Authors

Leonardo Zilio
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Wilkens
View author publications
You can also search for this author in PubMed Google Scholar
Cédrick Fairon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Zilio .

Editor information

Editors and Affiliations

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Aline Villavicencio
Instituto de Informática - UFRGS, Porto Alegre, Brazil
Viviane Moreira
INESC-ID, Lisbon, Portugal
Alberto Abad
UFSCAR, Sao Carlos, Brazil
Helena Caseli
Centro Singular de Investigación en Tecnoloxías, Universidade de Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Pablo Gamallo
Université de Toulon, Parc Scientifique Technologique Luminy, Marseille, France
Carlos Ramisch
Centro de Informática e Sistemas, Universidade de Coimbra, Coimbra, Portugal
Hugo Gonçalo Oliveira
Federal University of Technology, Dois Vizinhos, Paraná, Brazil
Gustavo Henrique Paetzold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zilio, L., Wilkens, R., Fairon, C. (2018). PassPort: A Dependency Parsing Model for Portuguese. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-99722-3_48
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PassPort: A Dependency Parsing Model for Portuguese

Abstract

Access this chapter

Similar content being viewed by others

Dependency Parsing of Turkish

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

Advantages of Dependency Parsing for Free Word Order Natural Languages

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

PassPort: A Dependency Parsing Model for Portuguese

Abstract

Access this chapter

Similar content being viewed by others

Dependency Parsing of Turkish

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

Advantages of Dependency Parsing for Free Word Order Natural Languages

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation