Abstract
The problem of capturing discourse structure for complex NLP tasks has often been addressed by exploiting surface clues that can yield a partial structure of discourse. Discourse Markers (DMs) are among the most popular of these clues because they are both highly informative of discourse structure and have a very low processing cost. However, they present two main problems: first, there is a general lack of consensus about their appropriate characterisation for NLP applications, and secondly, their potential as an unexpensive source of discourse knowledge is weakened by the fact that information associated to them is usually hand-encoded. In this paper we will show how a combination of clustering techniques provides empirical evidence for a characterisation of DMs. This data-driven methodology provides generalisations helpful for reducing the cost of encoding the information associated to DMs, while increasing consistency of their characterisation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L. Alonso and I. Castellón. Towards a delimitation of discursive segment for natural language processing applications. In First International Workshop on Semantics, Pragmatics and Rhetoric, Donostia-San Sebastián, November 2001.
L. Alonso, I. Castellón, L. Padró, and K. Gibert. Discourse marker characterisation via clustering: extrapolation from supervised to unsupervised corpora. In SEPLN, Valladolid, September 2002.
M. Arévalo, L. Alonso, M. Taulé, and M.A. Martί. Documentación sobre el analizador morfológico para el castellano (amcas). Technical Report X-Tract 01/01 Working Paper, CLiC, Universitat de Barcelona, 2001.
J. Carmona, S. Cervell, L. Márquez, M. A. Martί, L. Padró, R. Placer, H. Rodrίguez, M. Taulé, and J. Turmo. An environment for morphosyntactic processing of unrestricted spanish text. In First International Conference on Language Resources and Evaluation (LREC’98), Granada, Spain, 1998.
P. Cheeseman and J. Stutz. Bayesian classification (AutoClass): Theory and results. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press, 1996.
C. De Rham. La classif. hierarch. selon la méthode des voisins réciproques. Cahiers d’Analyse des Données, V(2):135–144, 1997.
B. Di Eugenio, J.D. Moore, and M. Paolucci. Learning features that predict cue usage. In ACL-EACL97, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 80–87, Madrid, Spain, 1997.
B. Everitt. Cluster Analysis. Heinemann, London, 1981.
K. Gibert. The use of symbolic information in automation of statistical treatment of ill-structured domains. Artificial Intelligence Communications, 1997.
K. Gibert, T. Aluja, and U. Cortés. Knowledge discovery with clustering based on rules. interpreting results. In Principles of Data Mining and Knowledge Discovery. Springer-Verlag, 1998.
K. Gibert, U. Cortés, and I. Rodrίguez-Roda. Identifying characteristic situations in wastewater treatment plants. In Workshop in Binding Environmental Sciences and Artificial Intelligence, 2000.
J.H. Kim, M. Glass, and M.W. Evens. Learning use of discourse markers in tutorial dialogue for an intelligent tutoring system. In COGSCI 2000, Proceedings of the 22nd Annual Meeting of the Cognitive Science Society, Philadelphia, PA, 2000.
A. Knott. A Data-Driven Methodology for Motivating a Set of Coherence Relations. PhD thesis, University of Edinburgh, Edinburgh, 1996.
D.J. Litman. Cue phrase classification using machine learning. Journal of Artificial Intelligence Research, 5:53–94, 1996.
D. Marcu. The Rhetorical Parsing, Summarization and Generation of Natural Language Texts. PhD thesis, Department of Computer Science, University of Toronto, Toronto, Canada, 1997.
M.A. Martίn Zorraquino and J. Portolés. Los marcadores del discurso. In Ignacio Bosque and Violeta Demonte, editors, Gramática Descriptiva de la Lengua Española, volume III, pages 4051–4213. Espasa Calpe, Madrid, 1999.
E.V. Siegel and K.R. McKeown. Emergent linguistic rules from inducing decision trees: Disambiguating discourse clue words. In AAAI94, Proceedings of the 12th Conference of the American Association for Artificial Intelligence, pages 820–826, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alonso, L., Castellón, I., Gibert, K., Padró, L. (2002). An Empirical Approach to Discourse Markers by Clustering. In: Escrig, M.T., Toledo, F., Golobardes, E. (eds) Topics in Artificial Intelligence. CCIA 2002. Lecture Notes in Computer Science(), vol 2504. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36079-4_15
Download citation
DOI: https://doi.org/10.1007/3-540-36079-4_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00011-2
Online ISBN: 978-3-540-36079-7
eBook Packages: Springer Book Archive