Abstract
We describe two new resources that have been prepared for European Portuguese and how they are used for discourse parsing: the Portuguese subpart of the TED-MDB corpus, a multilingual corpus of TED Talks that has been annotated in the PDTB style, and the Lexicon of Discourse Markers for Portuguese (LDM-PT). Both lexicon and corpus are used in a preliminary experiment for discourse connective identification in texts. This includes, in many cases, the difficult task of disambiguating between connective and non-connective uses. We annotated the PT-TED-MDB corpus with POS, lemma and syntactic constituency and focus on the 10 most frequent connectives in the corpus. The best approach considers word-form+POS+syntactic annotation and leads to 85% precision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Nominalizations (e.g., the destruction of the city) can be considered as equivalent to clauses and part of the discourse level, as in the PDTB (although few such cases are actually annotated), so that coordinating conjunctions connecting nominalizations would have to be identified as discourse connectives.
- 3.
Also, dependency analysis is not available in the upload interface of PALAVRAS.
References
Aleixo, P., Pardo, T.A.: CSTTool: um parser multidocumento automático para o português do brasil. In: Proceedings of the IV Workshop on M.Sc. Dissertation and Ph.D. Thesis in Artificial Intelligence - WTDIA, pp. 140–145 (2008)
Bick, E.: The Parsing System Palavras. Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. University of Arhus, Århus (2000)
Branco, A., et al.: The Portuguese Language in the Digital Age/A Língua Portuguesa na Era Digital. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29593-5
Briz, S.P.B., Portolés, J.: Diccionario de partículas discursivas del español (2003). http://www.dpde.es
Cettolo, M., Girardi, C., Federico, M.: WIT3: web inventory of transcribed and translated talks. In: Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), vol. 261, p. 268 (2012)
Crible, L.: Discourse markers and (dis)fluency across registers : a contrastive usage-based study in English and French. Ph.D. thesis, Louvain (2007)
Cuenca, M.J., Marín, M.J.: Co-occurrence of discourse markers in catalan and spanish oral narrative. J. Pragmat. 41, 899–914 (2009)
Dombek, F.: Connective-lex.info - a web app for a multilingual connective database. Bachelor thesis, Potsdam (2017)
Feltracco, A., Jezek, E., Magnini, B., Stede, M.: Lico: A lexicon of Italian connectives. In: Proceedings of the 3rd Italian Conference on Computational Linguistics, Napoli, Italy (2016)
Halliday, M., Hasan, R.: Cohesion in English. Longman, Harlow (1976)
Lin, Z., Ng, H.T., Kan, M.Y.: A PDTB-styled end-to-end discourse parser. Nat. Lang. Eng. 20(02), 151–184 (2014)
Lopes, A., et al.: Towards using machine translation techniques to induce multilingual lexica of discourse markers. http://arxiv.org/abs/1503.0914 (2015). Accessed 15 Jan 2016
Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)
Maziero, E., Pardo, T.A.: CSTPARSER - a multi-document discourse parser. In: Proceedings of the PROPOR 2012 Demonstration, pp. 1–3 (2012)
Mendes, A., del Rio, I., Stede, M., Dombek, F.: A lexicon of discourse markers for portuguese - LDM-PT. In: Proceedings of LREC 2018 (2018)
Mírovský, J., Synková, P., Rysová, M., Poláková, L.: Designing CzeDLex - a lexicon of Czech discourse connectives. In: Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation (2016)
Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Proceedings LREC 2012 (2012)
PDTB group: the penn discourse treebank 2.0 annotation manual. Technical report Institute for Research in Cognitive Science, University of Philadelphia (2008)
Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 13–16. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)
Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 13–16. Association for Computational Linguistics (2009)
Prasad, R., et al.: The penn discourse treebank 2.0. In: LREC (2008)
Prasad, R., Joshi, A., Webber, B.: Realization of discourse relations by other means: alternative lexicalizations. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1023–1031. Association for Computational Linguistics (2010)
Rohde, H., Dickinson, A., Clark, C., Louis, A., Webber, B.: Recovering discourse relations: varying influence of discourse adverbials. In: Proceedings of the EMNLP 2015 Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pp. 22–31 (2015)
Roze, C., Danlos, L., Muller, P.: LexConn: a French lexicon of discourse connectives. Revue Discours (2012)
Rysová, M., Rysová, K.: Secondary connectives in the prague dependency treebank. In: Hajičová, E., Nivre, J. (eds.) Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pp. 291–299. Uppsala, Sweden (2015)
Rysová, M., et al: Prague Discourse Treebank 2.0 (2016)
Scheffler, T., Stede, M.: Adding Semantic relations to a large-coverage connective lexicon of German. In: et al., N.C. (ed.) Proceedings of LREC 2016 (2016)
Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: Proceedings of NAACL 2003, vol. 1, pp. 149–156. Association for Computational Linguistics, Stroudsburg, PA, USA
Stede, M.: DiMLex: a lexical approach to discourse markers. In: Exploring the Lexicon - Theory and Computation, Edizioni dell’Orso, Alessandria (2002)
Stede, M.: Discourse Processing. Morgan & Claypool Publishers, San Rafael (2011)
Stede, M., Scheffler, T., Dombek, F.: Connective-lex.info. Potsdam University (2017). http://connective-lex.info
Webber, B., Prasad, R., Lee, A., Joshi, A.: A discourse-annotated corpus of conjoined VPs. In: Proceedings of the 10th Linguistics Annotation Workshop, pp. 22–31 (2016)
Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A., Çakıcı, R.: Turkish discourse bank: porting a discourse annotation style to a morphologically rich language. Dialogue Discourse 4(2), 174–184 (2013)
Zeyrek, D., Mendes, A., Kurfalı, M.: Multilingual extension of PDTB-style annotation: the case of ted multilingual discourse bank. In: LREC (2018)
Zhou, Y., Xue, N.: PDTB-style discourse annotation of Chinese text. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 69–77. Association for Computational Linguistics (2012)
Acknowledgments
This work was partially supported by national funds through FCT - Fundação para a Ciência e a Tecnologia (under the project PEst-OE/LIN/UI0214/2013), and some of its developments were implemented in the scope of the COST Action TextLink – Structuring Discourse in Multilingual Europe3.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Mendes, A., del Río, I. (2018). Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-99722-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)