Abstract
We present experiments with a variety of corpus-based measures applied to the problem of constructing semantic similarity functions for Polish nouns. Rich inflection in Polish allows us to acquire useful syntactic features without parsing; morphosyntactic restrictions checked in a large enough window provide sufficiently useful data. A novel feature selection method gives the accuracy of 86% on the WordNet-based synonymy test, an improvement of 5% over the previous results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Berry, M.: Large scale singular value computations. International J. of Supercomputer Applications 6(1), 13–49 (1992)
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 3–46
Dagan, I., Lee, L., Pereira, F.: Similarity-based method for word sense disambiguation. In: Proc. 35th Annual Meeting of the ACL, pp. 56–63 (1997)
Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M.: Polish WordNet on a Shoestring. In: Proc. Biannual Conf. of the Society for Computational Linguistics and Language Technology, Universität Tübingen, pp. 169–178 (2007)
Fellbaum, C. (ed.): WordNet — An Electronic Lexical Database. MIT, Cambridge (1998)
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New experiments in distributional representations of synonymy. In: Proc. 9th ACL Conf. on Computational Natural Language Learning, pp. 25–32 (2005)
Girju, R., Badulescu, A., Moldovan, D.: Automatic discovery of part-whole relations. Computational Linguistics 32(1), 83–135 (2006)
Grefenstette, G.: Evaluation techniques for automatic semantic extraction: Comparing syntactic and window based approaches. In: Proc. ACL Workshop on Acquisition of Lexical Knowledge from Text, Columbus, SIGLEX 1993, pp. 143–153 (1993)
Harris, Z.S.: Mathematical Structures of Language. Interscience Publishers (1968)
Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition. Psychological Review 104(2), 211–240 (1997)
Lin, D.: Using syntactic dependency as local context to resolve word sense ambiguity. In: Proc. 35th Annual Meeting of the Association for Computational Linguistics and 8th Conf. of EACL, Madrid, pp. 64–71 (1997)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. COLING, pp. 768–774 (1998)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (2001)
Pantel, P., Pennacchiotti, M.: Esspresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proc. ACL, pp. 113–120 (2006)
Piasecki, M.: LSA based extraction of semantic similarity for Polish. In: Proc. Multimedia and Network Information Systems 2007. Wrocław University of Technology, pp. 99–107 (2006)
Piasecki, M., Broda, B.: Semantic similarity measure of Polish nouns based on linguistic features. In: Abramowict, W. (ed.) BIS 2007. LNCS, vol. 4439, Springer, Heidelberg (2007)
Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)
plWordNet: The homepage of the Polish Wordnet project (2007), http://plwordnet.pwr.wroc.pl/main/?lang=en
Przepiórkowski, A.: The IPI PAN Corpus, Preliminary Version. Institute of Computer Science PAS (2004)
Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proc. Twelfth European Conf. on Machine Learning, pp. 491–502. Springer, Berlin (2001)
Turney, P., Littman, M., Bigham, J., Shnayder, V.: Combining independent modules to solve multiple-choice synonym and analogy problems. In: Proc. International Conf. on Recent Advances in NLP (2003)
Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Piasecki, M., Szpakowicz, S., Broda, B. (2007). Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-74628-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)