Skip to main content

Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2018)

Abstract

We describe two new resources that have been prepared for European Portuguese and how they are used for discourse parsing: the Portuguese subpart of the TED-MDB corpus, a multilingual corpus of TED Talks that has been annotated in the PDTB style, and the Lexicon of Discourse Markers for Portuguese (LDM-PT). Both lexicon and corpus are used in a preliminary experiment for discourse connective identification in texts. This includes, in many cases, the difficult task of disambiguating between connective and non-connective uses. We annotated the PT-TED-MDB corpus with POS, lemma and syntactic constituency and focus on the 10 most frequent connectives in the corpus. The best approach considers word-form+POS+syntactic annotation and leads to 85% precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://ec2-18-219-79-53.us-east-2.compute.amazonaws.com:8000/ted_mdb/.

  2. 2.

    Nominalizations (e.g., the destruction of the city) can be considered as equivalent to clauses and part of the discourse level, as in the PDTB (although few such cases are actually annotated), so that coordinating conjunctions connecting nominalizations would have to be identified as discourse connectives.

  3. 3.

    Also, dependency analysis is not available in the upload interface of PALAVRAS.

References

  1. Aleixo, P., Pardo, T.A.: CSTTool: um parser multidocumento automático para o português do brasil. In: Proceedings of the IV Workshop on M.Sc. Dissertation and Ph.D. Thesis in Artificial Intelligence - WTDIA, pp. 140–145 (2008)

    Google Scholar 

  2. Bick, E.: The Parsing System Palavras. Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. University of Arhus, Århus (2000)

    Google Scholar 

  3. Branco, A., et al.: The Portuguese Language in the Digital Age/A Língua Portuguesa na Era Digital. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29593-5

    Book  Google Scholar 

  4. Briz, S.P.B., Portolés, J.: Diccionario de partículas discursivas del español (2003). http://www.dpde.es

  5. Cettolo, M., Girardi, C., Federico, M.: WIT3: web inventory of transcribed and translated talks. In: Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), vol. 261, p. 268 (2012)

    Google Scholar 

  6. Crible, L.: Discourse markers and (dis)fluency across registers : a contrastive usage-based study in English and French. Ph.D. thesis, Louvain (2007)

    Google Scholar 

  7. Cuenca, M.J., Marín, M.J.: Co-occurrence of discourse markers in catalan and spanish oral narrative. J. Pragmat. 41, 899–914 (2009)

    Article  Google Scholar 

  8. Dombek, F.: Connective-lex.info - a web app for a multilingual connective database. Bachelor thesis, Potsdam (2017)

    Google Scholar 

  9. Feltracco, A., Jezek, E., Magnini, B., Stede, M.: Lico: A lexicon of Italian connectives. In: Proceedings of the 3rd Italian Conference on Computational Linguistics, Napoli, Italy (2016)

    Google Scholar 

  10. Halliday, M., Hasan, R.: Cohesion in English. Longman, Harlow (1976)

    Google Scholar 

  11. Lin, Z., Ng, H.T., Kan, M.Y.: A PDTB-styled end-to-end discourse parser. Nat. Lang. Eng. 20(02), 151–184 (2014)

    Article  Google Scholar 

  12. Lopes, A., et al.: Towards using machine translation techniques to induce multilingual lexica of discourse markers. http://arxiv.org/abs/1503.0914 (2015). Accessed 15 Jan 2016

  13. Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  14. Maziero, E., Pardo, T.A.: CSTPARSER - a multi-document discourse parser. In: Proceedings of the PROPOR 2012 Demonstration, pp. 1–3 (2012)

    Google Scholar 

  15. Mendes, A., del Rio, I., Stede, M., Dombek, F.: A lexicon of discourse markers for portuguese - LDM-PT. In: Proceedings of LREC 2018 (2018)

    Google Scholar 

  16. Mírovský, J., Synková, P., Rysová, M., Poláková, L.: Designing CzeDLex - a lexicon of Czech discourse connectives. In: Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation (2016)

    Google Scholar 

  17. Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Proceedings LREC 2012 (2012)

    Google Scholar 

  18. PDTB group: the penn discourse treebank 2.0 annotation manual. Technical report Institute for Research in Cognitive Science, University of Philadelphia (2008)

    Google Scholar 

  19. Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 13–16. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)

    Google Scholar 

  20. Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 13–16. Association for Computational Linguistics (2009)

    Google Scholar 

  21. Prasad, R., et al.: The penn discourse treebank 2.0. In: LREC (2008)

    Google Scholar 

  22. Prasad, R., Joshi, A., Webber, B.: Realization of discourse relations by other means: alternative lexicalizations. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1023–1031. Association for Computational Linguistics (2010)

    Google Scholar 

  23. Rohde, H., Dickinson, A., Clark, C., Louis, A., Webber, B.: Recovering discourse relations: varying influence of discourse adverbials. In: Proceedings of the EMNLP 2015 Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pp. 22–31 (2015)

    Google Scholar 

  24. Roze, C., Danlos, L., Muller, P.: LexConn: a French lexicon of discourse connectives. Revue Discours (2012)

    Google Scholar 

  25. Rysová, M., Rysová, K.: Secondary connectives in the prague dependency treebank. In: Hajičová, E., Nivre, J. (eds.) Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pp. 291–299. Uppsala, Sweden (2015)

    Google Scholar 

  26. Rysová, M., et al: Prague Discourse Treebank 2.0 (2016)

    Google Scholar 

  27. Scheffler, T., Stede, M.: Adding Semantic relations to a large-coverage connective lexicon of German. In: et al., N.C. (ed.) Proceedings of LREC 2016 (2016)

    Google Scholar 

  28. Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: Proceedings of NAACL 2003, vol. 1, pp. 149–156. Association for Computational Linguistics, Stroudsburg, PA, USA

    Google Scholar 

  29. Stede, M.: DiMLex: a lexical approach to discourse markers. In: Exploring the Lexicon - Theory and Computation, Edizioni dell’Orso, Alessandria (2002)

    Google Scholar 

  30. Stede, M.: Discourse Processing. Morgan & Claypool Publishers, San Rafael (2011)

    Google Scholar 

  31. Stede, M., Scheffler, T., Dombek, F.: Connective-lex.info. Potsdam University (2017). http://connective-lex.info

  32. Webber, B., Prasad, R., Lee, A., Joshi, A.: A discourse-annotated corpus of conjoined VPs. In: Proceedings of the 10th Linguistics Annotation Workshop, pp. 22–31 (2016)

    Google Scholar 

  33. Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A., Çakıcı, R.: Turkish discourse bank: porting a discourse annotation style to a morphologically rich language. Dialogue Discourse 4(2), 174–184 (2013)

    Article  Google Scholar 

  34. Zeyrek, D., Mendes, A., Kurfalı, M.: Multilingual extension of PDTB-style annotation: the case of ted multilingual discourse bank. In: LREC (2018)

    Google Scholar 

  35. Zhou, Y., Xue, N.: PDTB-style discourse annotation of Chinese text. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 69–77. Association for Computational Linguistics (2012)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by national funds through FCT - Fundação para a Ciência e a Tecnologia (under the project PEst-OE/LIN/UI0214/2013), and some of its developments were implemented in the scope of the COST Action TextLink – Structuring Discourse in Multilingual Europe3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amália Mendes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mendes, A., del Río, I. (2018). Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99722-3_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99721-6

  • Online ISBN: 978-3-319-99722-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics