Skip to main content

Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Abstract

We present experiments with a variety of corpus-based measures applied to the problem of constructing semantic similarity functions for Polish nouns. Rich inflection in Polish allows us to acquire useful syntactic features without parsing; morphosyntactic restrictions checked in a large enough window provide sufficiently useful data. A novel feature selection method gives the accuracy of 86% on the WordNet-based synonymy test, an improvement of 5% over the previous results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berry, M.: Large scale singular value computations. International J. of Supercomputer Applications 6(1), 13–49 (1992)

    MathSciNet  Google Scholar 

  2. Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 3–46

    Google Scholar 

  3. Dagan, I., Lee, L., Pereira, F.: Similarity-based method for word sense disambiguation. In: Proc. 35th Annual Meeting of the ACL, pp. 56–63 (1997)

    Google Scholar 

  4. Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M.: Polish WordNet on a Shoestring. In: Proc. Biannual Conf. of the Society for Computational Linguistics and Language Technology, Universität Tübingen, pp. 169–178 (2007)

    Google Scholar 

  5. Fellbaum, C. (ed.): WordNet — An Electronic Lexical Database. MIT, Cambridge (1998)

    MATH  Google Scholar 

  6. Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New experiments in distributional representations of synonymy. In: Proc. 9th ACL Conf. on Computational Natural Language Learning, pp. 25–32 (2005)

    Google Scholar 

  7. Girju, R., Badulescu, A., Moldovan, D.: Automatic discovery of part-whole relations. Computational Linguistics 32(1), 83–135 (2006)

    Google Scholar 

  8. Grefenstette, G.: Evaluation techniques for automatic semantic extraction: Comparing syntactic and window based approaches. In: Proc. ACL Workshop on Acquisition of Lexical Knowledge from Text, Columbus, SIGLEX 1993, pp. 143–153 (1993)

    Google Scholar 

  9. Harris, Z.S.: Mathematical Structures of Language. Interscience Publishers (1968)

    Google Scholar 

  10. Landauer, T., Dumais, S.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  11. Lin, D.: Using syntactic dependency as local context to resolve word sense ambiguity. In: Proc. 35th Annual Meeting of the Association for Computational Linguistics and 8th Conf. of EACL, Madrid, pp. 64–71 (1997)

    Google Scholar 

  12. Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. COLING, pp. 768–774 (1998)

    Google Scholar 

  13. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (2001)

    Google Scholar 

  14. Pantel, P., Pennacchiotti, M.: Esspresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proc. ACL, pp. 113–120 (2006)

    Google Scholar 

  15. Piasecki, M.: LSA based extraction of semantic similarity for Polish. In: Proc. Multimedia and Network Information Systems 2007. Wrocław University of Technology, pp. 99–107 (2006)

    Google Scholar 

  16. Piasecki, M., Broda, B.: Semantic similarity measure of Polish nouns based on linguistic features. In: Abramowict, W. (ed.) BIS 2007. LNCS, vol. 4439, Springer, Heidelberg (2007)

    Google Scholar 

  17. Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)

    Google Scholar 

  18. plWordNet: The homepage of the Polish Wordnet project (2007), http://plwordnet.pwr.wroc.pl/main/?lang=en

  19. Przepiórkowski, A.: The IPI PAN Corpus, Preliminary Version. Institute of Computer Science PAS (2004)

    Google Scholar 

  20. Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proc. Twelfth European Conf. on Machine Learning, pp. 491–502. Springer, Berlin (2001)

    Google Scholar 

  21. Turney, P., Littman, M., Bigham, J., Shnayder, V.: Combining independent modules to solve multiple-choice synonym and analogy problems. In: Proc. International Conf. on Recent Advances in NLP (2003)

    Google Scholar 

  22. Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Piasecki, M., Szpakowicz, S., Broda, B. (2007). Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74628-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74627-0

  • Online ISBN: 978-3-540-74628-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics