Abstract
In the paper we present a preliminary work on automatic construction of rules for recognition of semantic relations between pairs of proper names in Polish texts. Our goal was to check the feasibility of automatic rule construction using existing inductive logic programming (ILP) system as an alternative or supporting method for manual rule creation. We present a set of predicates in first-order logic that is used to represent the semantic relation recognition task. The background knowledge encode the morphological, orthographic and named entity-based features. We applied an ILP on the proposed representation to generate rules for relation extraction. We have utilized an existing ILP system called Aleph [1]. The performance of automatically generated rules was compared with a set of hand-crafted rules developed on the basis of training set for 8 categories of relations (affiliation, alias, creator, composition, location, nationality, neighbourhood, origin). Finally, we proposed several ways how to improve to preliminary results in the future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Srinivasan, A.: The Aleph Manual (2006), http://www.cs.ox.ac.uk/activities/machlearn/Aleph/aleph.html
Linguistic Data Consortium (LDC). ACE (Automatic Content Extraction) English Annotation Guidelines for Relations (2008)
Pyysalo, S., Ohta, T., Tsujii\(\dag\), J.: Overview of the Entity Relations (REL) supporting task of BioNLP Shared Task 2011. In: Proceedings of BioNLP Shared Task 2011 Workshop, June 24, pp. 83–88. Association for Computational Linguistics, Portland (2011)
Marciniak, M., Mykowiecka, A.: Automatic processing of diabetic patients’ hospital documentation. In: Annual Meeting of the ACL (2007)
Patwardhan, S., Riloff, E.: Learning Domain-Specific Information Extraction Patterns from the Web. In: ACL 2006 Workshop on Information Extraction Beyond the Document (2006)
Califf, M.E.: Relational learning techniques for natural language information extraction. Doctor of philosophy, The University of Texas at Austin (1998)
Freitag, D.: Machine learning for information extraction in informal domains. Doctor of philosophy. Carnegie Mellon University (1998)
Wróblewska, A., Woliński, M.: Preliminary Experiments in Polish Dependency Parsing. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 279–292. Springer, Heidelberg (2012)
Marcińczuk, M., Janicki, M.: Optimizing CRF-Based Model for Proper Name Recognition in Polish Texts. In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 258–269. Springer, Heidelberg (2012)
Broda, B., Marcińczuk, M., Maziarz, M., Radziszewski, A., Wardyński, A.: KPWr: Towards a Free Corpus of Polish. In: Proceedings of the 8th ELRA Conference on Language Resources and Evaluation LREC 2012, Istanbul, Turkey (2012)
Marcińczuk, M., Stanek, M., Piasecki, M., Musiał, A.: Rich Set of Features for Proper Name Recognition in Polish Texts. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 332–344. Springer, Heidelberg (2012)
Quinlan, J.R., Cameron-jones, R.M.: FOIL: A Midterm Report. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 3–20. Springer, Heidelberg (1993)
Muggleton, S., Feng, C.: Efficient induction in logic programs. In: Muggleton, S. (ed.) Inductive Logic Programming, pp. 281–298. Academic Press (1992)
Muggleton, S.: Inverse Entailment and Progol. New Generation Computing Journal 13, 245–286 (1995), http://www.doc.ic.ac.uk/~shm/progol.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marcińczuk, M., Ptak, M. (2012). Preliminary Study on Automatic Induction of Rules for Recognition of Semantic Relations between Proper Names in Polish Texts. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-32790-2_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32789-6
Online ISBN: 978-3-642-32790-2
eBook Packages: Computer ScienceComputer Science (R0)