Abstract
We present CROC (Coreference Resolution for Oral Corpus), the first machine learning system for coreference resolution in French. One specific aspect of the system is that it has been trained on data that come exclusively from transcribed speech, namely ANCOR (ANaphora and Coreference in ORal corpus), the first large-scale French corpus with anaphorical relation annotations. In its current state, the CROC system requires pre-annotated mentions. We detail the features used for the learning algorithms, and we present a set of experiments with these features. The scores we obtain are close to those of state-of-the-art systems for written English.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Bengtson, E., Roth, D.: Understanding the value of features for coreference resolution. In: Proceedings of EMNLP 2008, pp. 236–243 (2008)
Broda, B., Niton, B., Gruszczynski, W., Ogrodniczuk, M.: Measuring readability of polish texts: baseline experiments. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960)
van Deemter, K., Kibble, R.: On coreferring: coreference in MUC and related annotation schemes. Comput. Linguist. 26(4), 629–637 (2000)
Denis, P.: New learning models for robust reference resolution. Ph.D. thesis, University of Texas at Austin (2007)
Denis, P., Baldridge, J.: Specialized models and ranking for coreference resolution. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, pp. 660–669 (2008)
Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In: Proceedings of Interspeech (2009)
Gardent, C., Manuélian, H.: Création d’un corpus annoté pour le traitement des descriptions définies. TAL 46(1), 115–139 (2005)
Haghighi, A., Klein, D.: Coreference resolution in a modular, entity-centered model. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 385–393 (2010)
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology. SAGE Publications Inc., Thousand Oaks (2004)
Lassalle, E.: Structured learning with latent trees: a joint approach to coreference resolution. Ph.D. thesis, Université Paris Diderot (2015)
Longo, L.: Vers des moteurs de recherche intelligents: un outil de détection automatique de thèmes. Ph.D. thesis, Université de Strasbourg (2013)
Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., Roukos, S.: A mention-synchronous coreference resolution algorithm based on the bell tree. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (2004)
Mathet, Y., Widlöcher, A.: Une approche holiste et unifiée de l’alignement et de la mesure d’accord inter-annotateurs. In: Actes de TALN, pp. 1–12. ATALA (2011)
Muzerelle, J., Lefeuvre, A., Schang, E., Antoine, J.Y., Pelletier, A., Maurel, D., Eshkol, I., Villaneau, J.: Ancor_centre, a large free spoken French coreference corpus: description of the resource and reliability measures. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014)
Ng, V., Cardie, C.: Improving machine learning approcahes to corefrence resolution. In: Proceedings of ACL 2002, pp. 104–111 (2002)
Nicolas, P., Letellier-Zarshenas, S., Schadle, I., Antoine, J.Y., Caelen, J.: Towards a large corpus of spoken dialogue in French that will be freely available: the parole publique project and its first realisations. In: Proceedings of LREC (2002)
Passonneau, R.J.: Computing reliability for coreference annotation. In: Proceedings of LREC, pp. 1503–1506 (2004)
Recasens, M.: Coreference: theory, resolution, annotation and evaluation. Ph.D. thesis, University of Barcelona (2010)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Scott, W.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955)
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)
Stoyanov, V., Cardie, C., Gilbert, N., Riloff, E., Buttler, D., Hysom, D.: Reconcile: a coreference resolution research platform. Technical report, Cornell University (2010)
Stoyanov, V., Gilbert, N., Cardie, C., Riloff, E.: Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing, pp. 656–664 (2009)
Tellier, I., Eshkol, I., Taalab, S., Prost, J.P.: POS-tagging for oral texts with CRF and category decomposition. Res. Comput. Sci. 46, 79–90 (2010)
Trouilleux, F.: Identification des reprises et interprétation automatique des expressions pronominales dans des textes en français. Ph.D. thesis, Université Blaise Pascal (2001)
Vieira, R., Salmon-Alt, S., Schang, E.: Multilingual corpora annotation for processing definite descriptions. In: Proceedings of PorTAL (2002)
Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with java implementations (1999)
Yang, X., Su, J., Lang, J., Tan, C.L., Liu, T., Li, S.: An entity-mention model for coreference resolution with inductive logic programming. In: Proceedings of ACL 2008, pp. 843–851 (2008)
Yang, X., Zhou, G., Su, J., Tan, C.L.: Coreference resolution using competition learning approach. In: Proceedings of ACL 2003, pp. 176–183 (2003)
Acknowledgments
This work was supported by grant ANR-15-CE38-0008 (“DEMOCRAT” project) from the French National Research Agency (ANR), and by APR Centre-Val-de-Loire region (“ANCOR” project).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Désoyer, A., Landragin, F., Tellier, I., Lefeuvre, A., Antoine, JY., Dinarelli, M. (2018). Coreference Resolution for French Oral Data: Machine Learning Experiments with ANCOR. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-75477-2_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)