Skip to main content

Classification-Based Filtering of Semantic Relatedness in Hypernymy Extraction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Abstract

Manual construction of a wordnet can be facilitated by a system that suggests semantic relations acquired from corpora. Such systems tend to produce many wrong suggestions. We propose a method of filtering a raw list of noun pairs potentially linked by hypernymy, and test it on Polish. The method aims for good recall and sufficient precision. The classifiers work with complex features that give clues on the relation between the nouns. We apply a corpus-based measure of semantic relatedness enhanced with a Rank Weight Function. The evaluation is based on the data in Polish WordNet. The results compare favourably with similar methods applied to English, despite the small size of Polish WordNet.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: [19], pp. 113–120

    Google Scholar 

  2. Hearst, M.A.: Automated Discovery of WordNet Relations. In: Fellbaum, C. (ed.) WordNet – An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  3. Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M., Broda, B.: Words, Concepts and Relations in the Construction of Polish WordNet. In: Tanács, A., Csendes, D., Vincze, V., Fellbaum, C., Vossen, P. (eds.) Proc. Global WordNet Conference, Seged, Hungary, January 22-25 2008, pp. 162–177. University of Szeged (2008)

    Google Scholar 

  4. Broda, B., Derwojedowa, M., Piasecki, M., Szpakowicz, S.: Corpus-based Semantic Relatedness for the Construction of Polish WordNet. In: Proc. 6th Language Resources and Evaluation Conference (LREC 2008) (to appear,2008)

    Google Scholar 

  5. Piasecki, M., Szpakowicz, S., Broda, B.: Extended Similarity Test for the Evaluation of Semantic Similarity Functions. In: Vetulani, Z. (ed.) Proc. 3rd Language and Technology Conference, Poznań, Poland, Pozna, October 5-7, 2007, pp. 104–108. Wydawnictwo Poznańskie Sp. z o.o. (2007)

    Google Scholar 

  6. Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, Cambridge, MA, pp. 1297–1304. MIT Press, Cambridge (2005)

    Google Scholar 

  7. Snow, R., Jurafsky, D., Ng., A.Y.: Semantic taxonomy induction from heterogenous evidence. In: [19]

    Google Scholar 

  8. Kennedy, A.: Analysis and Construction of Noun Hypernym Hierarchies to Enhance Roget’s Thesaurus. Master’s thesis, School of Information Technology and Engineering, University of Ottawa (2006)

    Google Scholar 

  9. Zhang, M., Zhang, J., Su, J.: Exploring syntactic features for relation extraction using a convolution tree kernel. In: Proc. Human Language Technology Conference of the NAACL, Main Conference, ACL, pp. 288–295 (2006)

    Google Scholar 

  10. Caraballo, S., Charniak, E.: Determining the specificity of nouns from text. In: Proc. Joint SIGDAT conference on empirical methods in natural language processing (EMNLP) and very large corpora (VLC), pp. 63–70 (1999)

    Google Scholar 

  11. Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science PAS (2004)

    Google Scholar 

  12. Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2005)

    Article  MATH  Google Scholar 

  13. Ryu, P.M., Choi, K.S.: Taxonomy learning using term specificity and similarity. In: Proc. 2nd Workshop on Ontology Learning and Population ACL, Sydney, pp. 41–48 (2006)

    Google Scholar 

  14. Weiss, D.: Korpus Rzeczpospolitej. Corpus of text from the online edtion of Rzeczypospolita (2008), http://www.cs.put.poznan.pl/dweiss/rzeczpospolita

  15. Weka: Weka 3: Data Mining Software in Java (2008), http://www.cs.waikato.ac.nz/ml/weka/ .

  16. Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  17. Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation: Algorithms and Applications. Springer, Heidelberg (2006)

    Google Scholar 

  18. Sojka, P., Kopeček, I., Pala, K. (eds.): Proc. Text, Speech and Dialog 2006 Conference. LNCS (LNAI). Springer, Heidelberg (2006)

    Google Scholar 

  19. ACL 2006, ed.: Proc. 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, The Association for Computer Linguistics (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Piasecki, M., Szpakowicz, S., Marcińczuk, M., Broda, B. (2008). Classification-Based Filtering of Semantic Relatedness in Hypernymy Extraction. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85287-2_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85286-5

  • Online ISBN: 978-3-540-85287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics