Abstract
In this paper, we present RefGen, the main module of a topic detection system used to improve a search engine by topic indexing. RefGen identifies reference chains and it uses genre specific properties of reference chains and (Ariel 1990)’s accessibility theory. RefGen checks several strong and weak constraints (lexical, morphosyntactic and semantic filters) to automatically identify coreference relations between referential expressions. We present the first results obtained by RefGen from a public reports corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ariel, M.: Accessing Noun-Phrase Antecedents. Routledge, London (1990)
Beaver: The optimization of discourse anaphora. Linguistics and Philosophy 27(1), 3–56 (2004)
Biber, D.: Representativeness in corpus design. Linguistica Computazionale, IX–X, Current Issues in Computational Linguistics: in honor of Don Walker (1994)
Bontcheva, K., Dimitrov, M., Maynard, D., Tablan, V., Cunningham, H.: Shallow methods for named entity coreference resolution. In: Proceedings of TALN 2002 (2002)
Charolles, M.: L’encadrement du discours: univers, champs, domaines et espaces, Cahier de Recherche Linguistique 6, LANDISCO, Université Nancy 2, 1–73 (1997)
Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation. In: Proceedings of NAACL 2001, pp. 109–117 (2001)
Cornish, F.: Références anaphoriques, références déictiques, et contexte prédicatif et énonciatif. Sémiotiques 8, 31–57 (1995)
Denis, P.: New Learning Models for Robust Reference Resolution. PhD thesis, University of Texas, Austin (2007)
Gegg-Harrison, W., Byron, D.: PYCOT: An Optimality Theory-based Pronoun Resolution Toolkit. In: Proceedings of LREC 2004, Lisbonne (2004)
Goutsos, D.: Modeling Discourse Topic: sequential relations and strategies in expository text. Ablex Publishing Corporation, Norwood (1997)
Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: a framework for modeling the local coherence of discourse. Computational Linguistics 21(2), 203–225 (1995)
Halliday, M., Hasan, R.: Cohesion in English. Longman English Language Series, vol. 9. Longman, London (1976)
Hartrumpf, S.: Coreference Resolution with Syntactico-Semantic Rules and Corpus Statistics. In: Proceedings of CoNLL (Computational Natural Language Learning Workshop) (2001)
Hoste, V.: Optimization Issues in Machine Learning of Coreference Resolution. PHD thesis, p. 246 (2005)
Ide, N., Veronis, J.: MULTEXT (Multilingual Tools and Corpora). In: Proceedings of the 14th International Conference on Computational Linguistics, Kyoto (1994)
Ion, R.: TTL: A portable framework for tokenization, tagging and lemmatization of large corpora. Romanian Academy, Bucharest (2007)
Kleiber, G.: Anaphores et Pronoms. Duculot, Louvain-la-Neuve (1994)
Longo, L., Todirascu, A.: Une étude de corpus pour la détection automatique de thèmes. In: Proceedings of the 6th Journées de Linguistique de Corpus (JLC 2009), Lorient, France (2010)
Manuélian, H.: Annotation des descriptions définies: le cas des reprises par les rôles thé-matiques. In: Proceedings of RECITAL 2002, Nancy, France, pp. 455–467 (2002)
Manuélian, H.: Descriptions définies et démonstratives: analyse de corpus pour la génération de textes. PhD thesis, Université de Nancy 2, France (2003)
Mitkov, R.: Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems. Applied Artificial Intelligence: An International Journal 15, 253–276 (2001)
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the ACL (Association For Computational Linguistics), Morristown, pp. 104–111 (2002)
Popescu-Belis, A.: Modélisation multi-agent des échanges langagiers: application au problème de la référence et à son évaluation. PhD thesis, Université Paris-XI (1999)
Porhiel, S.: Les introducteurs thématiques. Cahiers de Lexicologie 85 (2004)
Salmon-Alt, S.: Référence et Dialogue finalisé: de la linguistique à un modèle opération-nel. PhD thesis, Université H. Poincaré, Nancy (2001)
Schnedecker, C.: Nom propre et chaînes de référence. Recherches Linguistiques, vol. 21. Klincksieck, Paris (1997)
Schnedecker, C.: Les chaînes de référence dans les portraits journalistiques: éléments de description. Travaux de Linguistique 51, 85–133 (2005)
Vonk, W., Hustinx, L., Simons, W.: The use of referential expressions in structuring discourse. Language and Cognitive Processes 7, 301–333 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Longo, L., Todiraşcu, A. (2011). RefGen: Identifying Reference Chains to Detect Topics. In: Pallotta, V., Soro, A., Vargiu, E. (eds) Advances in Distributed Agent-Based Retrieval Tools. Studies in Computational Intelligence, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21384-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-21384-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21383-0
Online ISBN: 978-3-642-21384-7
eBook Packages: EngineeringEngineering (R0)