Abstract
A method of the extraction of the wordnet lexico-semantic relations from the Polish Wikipedia articles was proposed. The method is based on a set of hand-written set of lexico-morphosyntactic extraction patterns that were developed in less than one man-week of workload. Two kinds of patterns were proposed: processing encyclopaedia articles as text documents, and utilising the information about the structure of the Wikipedia article (including links). Two types of evaluation were applied: manual assessment of the extracted data and on the basis of the application of the extracted data as an additional knowledge source in automatic plWordNet expansion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahn, D., Jijkoun, V., Mishne, G., Müller, K., de Rijke, M., Schlobach, S.: Using Wikipedia at the TREC QA Track. In: Proceedings of TREC (2004)
Bunescu, R., Pasca, M.: Using Encyclopedic Knowledge for Named Entity Disambiguation. In: Proc. of the 11th Conf. of the European Chapter of ACL, pp. 9–16. ACL, Trento (2007)
Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. The MIT Press, Cambridge (1998)
Gabrilovich, E., Markovitch, S.: Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In: Proc. of the 21st National Conference on AI and the 18th Innovative Applications of AI Conference. AAAI Press, Boston (2006)
Gurevych, I., Müller, C., Zesch, T.: What to be? – Electronic Career Guidance Based on Semantic Relatedness. In: Proc. of the 45th Annual Meeting of ACL, Prague, Czech Republic, June 2007, pp. 1032–1039. ACL (2007)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the Conference of the International Committee on Computational Linguistics, pp. 539–545. ACL, Nantes (1992)
Nastase, V., Strube, M.: Decoding Wikipedia Categories for Knowledge Acquisition. In: Proc. of the 23rd AAAI Conf., Chicago, pp. 1219–1224 (2008)
Nastase, V., Strube, M., Boerschinger, B., Zirn, C., Elghafari, A.: WikiNet: A Very Large Scale Multi-Lingual Concept Network. In: Proc. of LREC 2010, pp. 1015–1022 (2010)
Piasecki, M.: Polish tagger TaKIPI: Rule based construction and optimisation. Task Quarterly 11(1-2), 151–167 (2007)
Piasecki, M., Broda, B., Głąbska, M., Marcińczuk, M., Szpakowicz, S.: Semi-automatic expansion of polish wordnet based on activation-area attachment. In: Recent Advances in Intelligent Information Systems, pp. 247–260. EXIT (2009)
Piasecki, M., Kurc, R., Broda, B.: Heterogeneous knowledge sources in graph-based expansion of the polish wordnet. In: ACIIDS 2011. LNCS (LNAI), vol. 6591, pp. 307–317. Springer, Heidelberg (2011)
Piasecki, M., Radziszewski, A.: Morphosyntactic constraints in acquisition of linguistic knowledge for polish. In: Mykowiecka, A., Marciniak, M. (eds.) Aspects of Natural Language Processing (a festschrift for Prof. Leonard Bolc). LNCS, vol. 5070, pp. 163–190. Springer, Heidelberg (2009)
Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław (2009)
Ponzetto, S.P., Strube, M.: Deriving a large scale taxonomy from Wikipedia. In: Proc. of the 22nd Conference of the Advacement of Artificial Intelligence, Vancouver B.C., Canada, July 22-26, pp. 1440–1445 (2007)
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)
Zesch, T., Gurevych, I., Mühlhäuser, M.: Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets. In: Proc. of NAACL-HLT 2007, pp. 205–208. ACL (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Piasecki, M., Indyka-Piasecka, A., Kurc, R. (2011). Linguistically Informed Mining Lexical Semantic Relations from Wikipedia Structure. In: Nguyen, N.T., Kim, CG., Janiak, A. (eds) Intelligent Information and Database Systems. ACIIDS 2011. Lecture Notes in Computer Science(), vol 6591. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20039-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-20039-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20038-0
Online ISBN: 978-3-642-20039-7
eBook Packages: Computer ScienceComputer Science (R0)