Skip to main content

An Empirical Approach to Discourse Markers by Clustering

  • Conference paper
  • First Online:
Topics in Artificial Intelligence (CCIA 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2504))

Included in the following conference series:

Abstract

The problem of capturing discourse structure for complex NLP tasks has often been addressed by exploiting surface clues that can yield a partial structure of discourse. Discourse Markers (DMs) are among the most popular of these clues because they are both highly informative of discourse structure and have a very low processing cost. However, they present two main problems: first, there is a general lack of consensus about their appropriate characterisation for NLP applications, and secondly, their potential as an unexpensive source of discourse knowledge is weakened by the fact that information associated to them is usually hand-encoded. In this paper we will show how a combination of clustering techniques provides empirical evidence for a characterisation of DMs. This data-driven methodology provides generalisations helpful for reducing the cost of encoding the information associated to DMs, while increasing consistency of their characterisation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. Alonso and I. Castellón. Towards a delimitation of discursive segment for natural language processing applications. In First International Workshop on Semantics, Pragmatics and Rhetoric, Donostia-San Sebastián, November 2001.

    Google Scholar 

  2. L. Alonso, I. Castellón, L. Padró, and K. Gibert. Discourse marker characterisation via clustering: extrapolation from supervised to unsupervised corpora. In SEPLN, Valladolid, September 2002.

    Google Scholar 

  3. M. Arévalo, L. Alonso, M. Taulé, and M.A. Martί. Documentación sobre el analizador morfológico para el castellano (amcas). Technical Report X-Tract 01/01 Working Paper, CLiC, Universitat de Barcelona, 2001.

    Google Scholar 

  4. J. Carmona, S. Cervell, L. Márquez, M. A. Martί, L. Padró, R. Placer, H. Rodrίguez, M. Taulé, and J. Turmo. An environment for morphosyntactic processing of unrestricted spanish text. In First International Conference on Language Resources and Evaluation (LREC’98), Granada, Spain, 1998.

    Google Scholar 

  5. P. Cheeseman and J. Stutz. Bayesian classification (AutoClass): Theory and results. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press, 1996.

    Google Scholar 

  6. C. De Rham. La classif. hierarch. selon la méthode des voisins réciproques. Cahiers d’Analyse des Données, V(2):135–144, 1997.

    Google Scholar 

  7. B. Di Eugenio, J.D. Moore, and M. Paolucci. Learning features that predict cue usage. In ACL-EACL97, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 80–87, Madrid, Spain, 1997.

    Google Scholar 

  8. B. Everitt. Cluster Analysis. Heinemann, London, 1981.

    Google Scholar 

  9. K. Gibert. The use of symbolic information in automation of statistical treatment of ill-structured domains. Artificial Intelligence Communications, 1997.

    Google Scholar 

  10. K. Gibert, T. Aluja, and U. Cortés. Knowledge discovery with clustering based on rules. interpreting results. In Principles of Data Mining and Knowledge Discovery. Springer-Verlag, 1998.

    Google Scholar 

  11. K. Gibert, U. Cortés, and I. Rodrίguez-Roda. Identifying characteristic situations in wastewater treatment plants. In Workshop in Binding Environmental Sciences and Artificial Intelligence, 2000.

    Google Scholar 

  12. J.H. Kim, M. Glass, and M.W. Evens. Learning use of discourse markers in tutorial dialogue for an intelligent tutoring system. In COGSCI 2000, Proceedings of the 22nd Annual Meeting of the Cognitive Science Society, Philadelphia, PA, 2000.

    Google Scholar 

  13. A. Knott. A Data-Driven Methodology for Motivating a Set of Coherence Relations. PhD thesis, University of Edinburgh, Edinburgh, 1996.

    Google Scholar 

  14. D.J. Litman. Cue phrase classification using machine learning. Journal of Artificial Intelligence Research, 5:53–94, 1996.

    Google Scholar 

  15. D. Marcu. The Rhetorical Parsing, Summarization and Generation of Natural Language Texts. PhD thesis, Department of Computer Science, University of Toronto, Toronto, Canada, 1997.

    Google Scholar 

  16. M.A. Martίn Zorraquino and J. Portolés. Los marcadores del discurso. In Ignacio Bosque and Violeta Demonte, editors, Gramática Descriptiva de la Lengua Española, volume III, pages 4051–4213. Espasa Calpe, Madrid, 1999.

    Google Scholar 

  17. E.V. Siegel and K.R. McKeown. Emergent linguistic rules from inducing decision trees: Disambiguating discourse clue words. In AAAI94, Proceedings of the 12th Conference of the American Association for Artificial Intelligence, pages 820–826, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alonso, L., Castellón, I., Gibert, K., Padró, L. (2002). An Empirical Approach to Discourse Markers by Clustering. In: Escrig, M.T., Toledo, F., Golobardes, E. (eds) Topics in Artificial Intelligence. CCIA 2002. Lecture Notes in Computer Science(), vol 2504. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36079-4_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-36079-4_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00011-2

  • Online ISBN: 978-3-540-36079-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics