Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns

Piasecki, Maciej; Szpakowicz, Stanisław; Broda, Bartosz

doi:10.1007/978-3-540-74628-7_15

Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns

Maciej Piasecki¹,
Stanisław Szpakowicz^2,3 &
Bartosz Broda¹

Conference paper

1747 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Abstract

We present experiments with a variety of corpus-based measures applied to the problem of constructing semantic similarity functions for Polish nouns. Rich inflection in Polish allows us to acquire useful syntactic features without parsing; morphosyntactic restrictions checked in a large enough window provide sufficiently useful data. A novel feature selection method gives the accuracy of 86% on the WordNet-based synonymy test, an improvement of 5% over the previous results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berry, M.: Large scale singular value computations. International J. of Supercomputer Applications 6(1), 13–49 (1992)
MathSciNet Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 3–46
Google Scholar
Dagan, I., Lee, L., Pereira, F.: Similarity-based method for word sense disambiguation. In: Proc. 35th Annual Meeting of the ACL, pp. 56–63 (1997)
Google Scholar
Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M.: Polish WordNet on a Shoestring. In: Proc. Biannual Conf. of the Society for Computational Linguistics and Language Technology, Universität Tübingen, pp. 169–178 (2007)
Google Scholar
Fellbaum, C. (ed.): WordNet — An Electronic Lexical Database. MIT, Cambridge (1998)
MATH Google Scholar
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New experiments in distributional representations of synonymy. In: Proc. 9th ACL Conf. on Computational Natural Language Learning, pp. 25–32 (2005)
Google Scholar
Girju, R., Badulescu, A., Moldovan, D.: Automatic discovery of part-whole relations. Computational Linguistics 32(1), 83–135 (2006)
Google Scholar
Grefenstette, G.: Evaluation techniques for automatic semantic extraction: Comparing syntactic and window based approaches. In: Proc. ACL Workshop on Acquisition of Lexical Knowledge from Text, Columbus, SIGLEX 1993, pp. 143–153 (1993)
Google Scholar
Harris, Z.S.: Mathematical Structures of Language. Interscience Publishers (1968)
Google Scholar
Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
Lin, D.: Using syntactic dependency as local context to resolve word sense ambiguity. In: Proc. 35th Annual Meeting of the Association for Computational Linguistics and 8th Conf. of EACL, Madrid, pp. 64–71 (1997)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. COLING, pp. 768–774 (1998)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (2001)
Google Scholar
Pantel, P., Pennacchiotti, M.: Esspresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proc. ACL, pp. 113–120 (2006)
Google Scholar
Piasecki, M.: LSA based extraction of semantic similarity for Polish. In: Proc. Multimedia and Network Information Systems 2007. Wrocław University of Technology, pp. 99–107 (2006)
Google Scholar
Piasecki, M., Broda, B.: Semantic similarity measure of Polish nouns based on linguistic features. In: Abramowict, W. (ed.) BIS 2007. LNCS, vol. 4439, Springer, Heidelberg (2007)
Google Scholar
Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)
Google Scholar
plWordNet: The homepage of the Polish Wordnet project (2007), http://plwordnet.pwr.wroc.pl/main/?lang=en
Przepiórkowski, A.: The IPI PAN Corpus, Preliminary Version. Institute of Computer Science PAS (2004)
Google Scholar
Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proc. Twelfth European Conf. on Machine Learning, pp. 491–502. Springer, Berlin (2001)
Google Scholar
Turney, P., Littman, M., Bigham, J., Shnayder, V.: Combining independent modules to solve multiple-choice synonym and analogy problems. In: Proc. International Conf. on Recent Advances in NLP (2003)
Google Scholar
Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Applied Informatics, Wrocław University of Technology, Poland
Maciej Piasecki & Bartosz Broda
School of Information Technology and Engineering, University of Ottawa,
Stanisław Szpakowicz
Institute of Computer Science, Polish Academy of Sciences,
Stanisław Szpakowicz

Authors

Maciej Piasecki
View author publications
You can also search for this author in PubMed Google Scholar
Stanisław Szpakowicz
View author publications
You can also search for this author in PubMed Google Scholar
Bartosz Broda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Piasecki, M., Szpakowicz, S., Broda, B. (2007). Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-74628-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics