Skip to main content

RefGen: Identifying Reference Chains to Detect Topics

  • Chapter
Advances in Distributed Agent-Based Retrieval Tools

Part of the book series: Studies in Computational Intelligence ((SCI,volume 361))

Abstract

In this paper, we present RefGen, the main module of a topic detection system used to improve a search engine by topic indexing. RefGen identifies reference chains and it uses genre specific properties of reference chains and (Ariel 1990)’s accessibility theory. RefGen checks several strong and weak constraints (lexical, morphosyntactic and semantic filters) to automatically identify coreference relations between referential expressions. We present the first results obtained by RefGen from a public reports corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ariel, M.: Accessing Noun-Phrase Antecedents. Routledge, London (1990)

    Google Scholar 

  • Beaver: The optimization of discourse anaphora. Linguistics and Philosophy 27(1), 3–56 (2004)

    Article  MathSciNet  Google Scholar 

  • Biber, D.: Representativeness in corpus design. Linguistica Computazionale, IX–X, Current Issues in Computational Linguistics: in honor of Don Walker (1994)

    Google Scholar 

  • Bontcheva, K., Dimitrov, M., Maynard, D., Tablan, V., Cunningham, H.: Shallow methods for named entity coreference resolution. In: Proceedings of TALN 2002 (2002)

    Google Scholar 

  • Charolles, M.: L’encadrement du discours: univers, champs, domaines et espaces, Cahier de Recherche Linguistique 6, LANDISCO, Université Nancy 2, 1–73 (1997)

    Google Scholar 

  • Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation. In: Proceedings of NAACL 2001, pp. 109–117 (2001)

    Google Scholar 

  • Cornish, F.: Références anaphoriques, références déictiques, et contexte prédicatif et énonciatif. Sémiotiques 8, 31–57 (1995)

    Google Scholar 

  • Denis, P.: New Learning Models for Robust Reference Resolution. PhD thesis, University of Texas, Austin (2007)

    Google Scholar 

  • Gegg-Harrison, W., Byron, D.: PYCOT: An Optimality Theory-based Pronoun Resolution Toolkit. In: Proceedings of LREC 2004, Lisbonne (2004)

    Google Scholar 

  • Goutsos, D.: Modeling Discourse Topic: sequential relations and strategies in expository text. Ablex Publishing Corporation, Norwood (1997)

    Google Scholar 

  • Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: a framework for modeling the local coherence of discourse. Computational Linguistics 21(2), 203–225 (1995)

    Google Scholar 

  • Halliday, M., Hasan, R.: Cohesion in English. Longman English Language Series, vol. 9. Longman, London (1976)

    Google Scholar 

  • Hartrumpf, S.: Coreference Resolution with Syntactico-Semantic Rules and Corpus Statistics. In: Proceedings of CoNLL (Computational Natural Language Learning Workshop) (2001)

    Google Scholar 

  • Hoste, V.: Optimization Issues in Machine Learning of Coreference Resolution. PHD thesis, p. 246 (2005)

    Google Scholar 

  • Ide, N., Veronis, J.: MULTEXT (Multilingual Tools and Corpora). In: Proceedings of the 14th International Conference on Computational Linguistics, Kyoto (1994)

    Google Scholar 

  • Ion, R.: TTL: A portable framework for tokenization, tagging and lemmatization of large corpora. Romanian Academy, Bucharest (2007)

    Google Scholar 

  • Kleiber, G.: Anaphores et Pronoms. Duculot, Louvain-la-Neuve (1994)

    Google Scholar 

  • Longo, L., Todirascu, A.: Une étude de corpus pour la détection automatique de thèmes. In: Proceedings of the 6th Journées de Linguistique de Corpus (JLC 2009), Lorient, France (2010)

    Google Scholar 

  • Manuélian, H.: Annotation des descriptions définies: le cas des reprises par les rôles thé-matiques. In: Proceedings of RECITAL 2002, Nancy, France, pp. 455–467 (2002)

    Google Scholar 

  • Manuélian, H.: Descriptions définies et démonstratives: analyse de corpus pour la génération de textes. PhD thesis, Université de Nancy 2, France (2003)

    Google Scholar 

  • Mitkov, R.: Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems. Applied Artificial Intelligence: An International Journal 15, 253–276 (2001)

    Article  Google Scholar 

  • Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the ACL (Association For Computational Linguistics), Morristown, pp. 104–111 (2002)

    Google Scholar 

  • Popescu-Belis, A.: Modélisation multi-agent des échanges langagiers: application au problème de la référence et à son évaluation. PhD thesis, Université Paris-XI (1999)

    Google Scholar 

  • Porhiel, S.: Les introducteurs thématiques. Cahiers de Lexicologie 85 (2004)

    Google Scholar 

  • Salmon-Alt, S.: Référence et Dialogue finalisé: de la linguistique à un modèle opération-nel. PhD thesis, Université H. Poincaré, Nancy (2001)

    Google Scholar 

  • Schnedecker, C.: Nom propre et chaînes de référence. Recherches Linguistiques, vol. 21. Klincksieck, Paris (1997)

    Google Scholar 

  • Schnedecker, C.: Les chaînes de référence dans les portraits journalistiques: éléments de description. Travaux de Linguistique 51, 85–133 (2005)

    Article  Google Scholar 

  • Vonk, W., Hustinx, L., Simons, W.: The use of referential expressions in structuring discourse. Language and Cognitive Processes 7, 301–333 (1992)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Longo, L., Todiraşcu, A. (2011). RefGen: Identifying Reference Chains to Detect Topics. In: Pallotta, V., Soro, A., Vargiu, E. (eds) Advances in Distributed Agent-Based Retrieval Tools. Studies in Computational Intelligence, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21384-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21384-7_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21383-0

  • Online ISBN: 978-3-642-21384-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics