A First Machine Learning Approach to Pronominal Anaphora Resolution in Basque

Arregi, O.; Ceberio, K.; Díaz de Illarraza, A.; Goenaga, I.; Sierra, B.; Zelaia, A.

doi:10.1007/978-3-642-16952-6_24

O. Arregi²¹,
K. Ceberio²¹,
A. Díaz de Illarraza²¹,
I. Goenaga²¹,
B. Sierra²¹ &
…
A. Zelaia²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6433))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1411 Accesses
3 Citations

Abstract

In this paper we present the first machine learning approach to resolve the pronominal anaphora in Basque language. In this work we consider different classifiers in order to find the system that fits best to the characteristics of the language under examination. We do not restrict our study to the classifiers typically used for this task, we have considered others, such as Random Forest or VFI, in order to make a general comparison. We determine the feature vector obtained with our linguistic processing system and we analyze the contribution of different subsets of features, as well as the weight of each feature used in the task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aduriz, I., Aranzabe, M.J., Arriola, J.M., Daz de Ilarraza, A., Gojenola, K., Oronoz, M., Uria, L.: A Cascaded Syntactic Analyser for Basque. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 124–134. Springer, Heidelberg (2004)
Chapter Google Scholar
Aduriz, I., Aranzabe, M.J., Arriola, J.M., Atutxa, A., Daz de Ilarraza, A., Ezeiza, N., Gojenola, K., Oronoz, M., Soroa, A., Urizar, R.: Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. In: Wilson, A., Archer, D., Rayson, P. (eds.) Language and Computers, Corpus Linguistics Around the World, Rodopi, Netherlands, pp. 1–15 (2006)
Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Demiroz, G., Guvenir, A.: Classification by voting feature intervals. In: 9th European Conference on Machine Learning, pp. 85–92 (1997)
Google Scholar
Haghighi, A., Klein, D.: Simple Coreference Resolution with Rich Syntactic and Semantic Features. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 1152–1161 (2009)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Hirst, G.: Anaphora in Natural Language Understanding. Springer, Berlin (1981)
Book Google Scholar
Kira, K., Rendell, L.A.: A Practical Approach to Feature Selection. In: Ninth International Workshop on Machine Learning, pp. 249–256 (1992)
Google Scholar
Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: European Conference on Machine Learning, pp. 171–182 (1994)
Google Scholar
Kononenko, I., Hong, S.J.: Attribute Selection for Modeling. Future Generation Computer Systems 13, 181–195 (1997)
Article Google Scholar
Laka, I.: A Brief Grammar of Euskara, the Basque Language. Euskarako errektoreordetza, EHU (2000), http://www.ehu.es/grammar
Mitkov, R.: Anaphora resolution. Longman, London (2002)
MATH Google Scholar
Moosavi, N.S., Ghassem-Sani, G.: Using Machine Learning Approaches for Persian Pronoun Resolution. In: Workshop on Corpus-Based Approaches to Coreference Resolution in Romance Languages. CBA 2008 (2008)
Google Scholar
Moosavi, N.S., Ghassem-Sani, G.: A Ranking Approach to Persian Pronoun Resolution. Advances in Computational Linguistics. Research in Computing Science 41, 169–180 (2009)
Google Scholar
MUC-6.: Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann, San Francisco, CA (1995)
Google Scholar
MUC-7.: Proceedings of the Seventh Message Understanding Conference (MUC-7). Morgan Kaufmann, San Francisco, CA (1998)
Google Scholar
Ng, V., Cardie, C.: Improving Machine Learning Approach to Coreference Resolution. In: Proceedings of the ACL, pp. 104–111 (2002)
Google Scholar
Nguy, Zabokrtský: Rule-based Approach to Pronominal Anaphora Resolution Method Using the Prague Dependency Treebank 2.0 Data. In: Proceedings of DAARC 2007 (6th Discourse Anaphora and Anaphor Resolution Colloquium) (2007)
Google Scholar
Palomar, M., Civit, M., Díaz, A., Moreno, L., Bisbal, E., Aranzabe, M.J., Ageno, A., Mart, M.A., Navarro, B.: 3LB: Construcción de una base de datos de árboles sintáctico-semánticos para el catalán, euskera y español. XX. Congreso SEPLN, Barcelona (2004)
Google Scholar
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A Machine Learning Approach to Coreference Resolution of Noun Phrases. Computational Linguistics 27(4), 521–544 (2001)
Article Google Scholar
Versley, Y.: A Constraint-based Approach to Noum Phrase Coreference Resolution in German Newspaper Text. In: Konferenz zur Verarbeitung Natrlicher Sprache KONVENS (2006)
Google Scholar
Versley, Y., Moschitti, A., Poesio, M., Yang, X.: Coreference System based on Kernels Methods. In: Proceedings of the 22nd International Coreference on Computational Linguistics (Coling 2008), Manchester, pp. 961–968 (2008)
Google Scholar
Yang, X., Su, J., Tan, C.L.: Kernel-Based Pronoun Resolution with Structured Syntactic Knowledge. In: Proc. COLING/ACL 2006, Sydney, pp. 41–48 (2006)
Google Scholar
Yldrm, S., Klaslan, Y., Yldz, T.: Pronoun Resolution in Turkish Using Decision Tree and Rule-Based Learning Algorithms. In: Human Language Technology. Challenges of the Information Society. LNCS. Springer, Heidelberg (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

University of the Basque Country, Spain
O. Arregi, K. Ceberio, A. Díaz de Illarraza, I. Goenaga, B. Sierra & A. Zelaia

Authors

O. Arregi
View author publications
You can also search for this author in PubMed Google Scholar
K. Ceberio
View author publications
You can also search for this author in PubMed Google Scholar
A. Díaz de Illarraza
View author publications
You can also search for this author in PubMed Google Scholar
I. Goenaga
View author publications
You can also search for this author in PubMed Google Scholar
B. Sierra
View author publications
You can also search for this author in PubMed Google Scholar
A. Zelaia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento Académico de Computación, Instituto Tecnológico Autónomo de México, Río Hondo No. 1, 01000, Mexico, D.F., México
Angel Kuri-Morales
Department of Computer Science and Engineering, Universidad Nacional del Sur, Alem 1253, 8000, Bahía Blanca, Argentina
Guillermo R. Simari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arregi, O., Ceberio, K., Díaz de Illarraza, A., Goenaga, I., Sierra, B., Zelaia, A. (2010). A First Machine Learning Approach to Pronominal Anaphora Resolution in Basque. In: Kuri-Morales, A., Simari, G.R. (eds) Advances in Artificial Intelligence – IBERAMIA 2010. IBERAMIA 2010. Lecture Notes in Computer Science(), vol 6433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16952-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-16952-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16951-9
Online ISBN: 978-3-642-16952-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics