An Empirical Approach to Discourse Markers by Clustering

Alonso, Laura; Castellón, Irene; Gibert, Karina; Padró, Lluís

doi:10.1007/3-540-36079-4_15

Laura Alonso³,
Irene Castellón⁴,
Karina Gibert⁵ &
…
Lluís Padró⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2504))

Included in the following conference series:

Catalonian Conference on Artificial Intelligence

590 Accesses
4 Citations

Abstract

The problem of capturing discourse structure for complex NLP tasks has often been addressed by exploiting surface clues that can yield a partial structure of discourse. Discourse Markers (DMs) are among the most popular of these clues because they are both highly informative of discourse structure and have a very low processing cost. However, they present two main problems: first, there is a general lack of consensus about their appropriate characterisation for NLP applications, and secondly, their potential as an unexpensive source of discourse knowledge is weakened by the fact that information associated to them is usually hand-encoded. In this paper we will show how a combination of clustering techniques provides empirical evidence for a characterisation of DMs. This data-driven methodology provides generalisations helpful for reducing the cost of encoding the information associated to DMs, while increasing consistency of their characterisation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. Alonso and I. Castellón. Towards a delimitation of discursive segment for natural language processing applications. In First International Workshop on Semantics, Pragmatics and Rhetoric, Donostia-San Sebastián, November 2001.
Google Scholar
L. Alonso, I. Castellón, L. Padró, and K. Gibert. Discourse marker characterisation via clustering: extrapolation from supervised to unsupervised corpora. In SEPLN, Valladolid, September 2002.
Google Scholar
M. Arévalo, L. Alonso, M. Taulé, and M.A. Martί. Documentación sobre el analizador morfológico para el castellano (amcas). Technical Report X-Tract 01/01 Working Paper, CLiC, Universitat de Barcelona, 2001.
Google Scholar
J. Carmona, S. Cervell, L. Márquez, M. A. Martί, L. Padró, R. Placer, H. Rodrίguez, M. Taulé, and J. Turmo. An environment for morphosyntactic processing of unrestricted spanish text. In First International Conference on Language Resources and Evaluation (LREC’98), Granada, Spain, 1998.
Google Scholar
P. Cheeseman and J. Stutz. Bayesian classification (AutoClass): Theory and results. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press, 1996.
Google Scholar
C. De Rham. La classif. hierarch. selon la méthode des voisins réciproques. Cahiers d’Analyse des Données, V(2):135–144, 1997.
Google Scholar
B. Di Eugenio, J.D. Moore, and M. Paolucci. Learning features that predict cue usage. In ACL-EACL97, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 80–87, Madrid, Spain, 1997.
Google Scholar
B. Everitt. Cluster Analysis. Heinemann, London, 1981.
Google Scholar
K. Gibert. The use of symbolic information in automation of statistical treatment of ill-structured domains. Artificial Intelligence Communications, 1997.
Google Scholar
K. Gibert, T. Aluja, and U. Cortés. Knowledge discovery with clustering based on rules. interpreting results. In Principles of Data Mining and Knowledge Discovery. Springer-Verlag, 1998.
Google Scholar
K. Gibert, U. Cortés, and I. Rodrίguez-Roda. Identifying characteristic situations in wastewater treatment plants. In Workshop in Binding Environmental Sciences and Artificial Intelligence, 2000.
Google Scholar
J.H. Kim, M. Glass, and M.W. Evens. Learning use of discourse markers in tutorial dialogue for an intelligent tutoring system. In COGSCI 2000, Proceedings of the 22nd Annual Meeting of the Cognitive Science Society, Philadelphia, PA, 2000.
Google Scholar
A. Knott. A Data-Driven Methodology for Motivating a Set of Coherence Relations. PhD thesis, University of Edinburgh, Edinburgh, 1996.
Google Scholar
D.J. Litman. Cue phrase classification using machine learning. Journal of Artificial Intelligence Research, 5:53–94, 1996.
Google Scholar
D. Marcu. The Rhetorical Parsing, Summarization and Generation of Natural Language Texts. PhD thesis, Department of Computer Science, University of Toronto, Toronto, Canada, 1997.
Google Scholar
M.A. Martίn Zorraquino and J. Portolés. Los marcadores del discurso. In Ignacio Bosque and Violeta Demonte, editors, Gramática Descriptiva de la Lengua Española, volume III, pages 4051–4213. Espasa Calpe, Madrid, 1999.
Google Scholar
E.V. Siegel and K.R. McKeown. Emergent linguistic rules from inducing decision trees: Disambiguating discourse clue words. In AAAI94, Proceedings of the 12th Conference of the American Association for Artificial Intelligence, pages 820–826, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

CLiC (Centre de Llenguatge i Computació) Department of General Linguistics, Universitat de Barcelona, Spain
Laura Alonso
Department of General Linguistics, Universitat de Barcelona, Spain
Irene Castellón
Department of Statistics and Operational Research, Universitat Politécnica de Catalunya, Spain
Karina Gibert
TALP Research Center Software Department, Universitat Politècnica de Catalunya, Spain
Lluís Padró

Authors

Laura Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Irene Castellón
View author publications
You can also search for this author in PubMed Google Scholar
Karina Gibert
View author publications
You can also search for this author in PubMed Google Scholar
Lluís Padró
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departament d’Enginyeria i Ciència dels Computadors, Universitat Jaume 1, Campus de Riu Sec, 12071, Castellón, Spain
M. Teresa Escrig & Francisco Toledo &
Computer Science Department, Universitat Ramon Llull, Passeig Bonanova, 8, 08022, Barcelona, Catalunya, Spain
Elisabet Golobardes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alonso, L., Castellón, I., Gibert, K., Padró, L. (2002). An Empirical Approach to Discourse Markers by Clustering. In: Escrig, M.T., Toledo, F., Golobardes, E. (eds) Topics in Artificial Intelligence. CCIA 2002. Lecture Notes in Computer Science(), vol 2504. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36079-4_15

Download citation

DOI: https://doi.org/10.1007/3-540-36079-4_15
Published: 24 October 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00011-2
Online ISBN: 978-3-540-36079-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics