Skip to main content

Combining Pattern-Based and Distributional Similarity for Graph-Based Noun Categorization

  • Conference paper
  • First Online:
Book cover Natural Language Processing and Information Systems (NLDB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9103))

  • 1794 Accesses

Abstract

We examine the combination of pattern-based and distributional similarity for the induction of semantic categories. Pattern-based methods are precise and sparse while distributional methods have a higher recall. Given these particular properties we use the prediction of distributional methods as a back-off to pattern-based similarity. Since our pattern-based approach is embedded into a semi-supervised graph clustering algorithm, we also examine how distributional information is best added to that classifier. Our experiments are carried out on \(5\) different food categorization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We remove all food items that contain as a suffix another food item that is also contained in our food vocabulary.

  2. 2.

    That is, in order to establish the label of the sparse compound chocolate-almond cake, one just considers the label of the suffix/head cake. The latter is a more general expression for which a label can be more reliably determined.

References

  1. Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)

    Google Scholar 

  2. Chahuneau, V., Gimpel, K., Routledge, B.R., Scherlis, L., Smith, N.A.: Word salad: relating food prices and descriptions. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL), Jeju Island, Korea, pp. 1357–1367 (2012)

    Google Scholar 

  3. Druck, G., Pang, B.: Spice it up? mining refinements to online instructions from user generated content. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Jeju, Republic of Korea, pp. 545–553 (2012)

    Google Scholar 

  4. van Hage, W.R., Katrenko, S., Schreiber, G.: A method to combine linguistic ontology-mapping techniques. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 732–744. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. van Hage, W.R., Kolb, H., Schreiber, G.: A method for learning part-whole relations. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 723–735. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the International Conference on Computational Linguistics (COLING), Nantes, France, pp. 539–545 (1992)

    Google Scholar 

  7. Huang, R., Riloff, E.: Inducing domain-specific semantic class taggers from (almost) nothing. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 275–285 (2010)

    Google Scholar 

  8. Kozareva, Z., Hovy, E.: Semi-supervised method to learn and construct taxonomies using the web. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 1110–1118 (2010)

    Google Scholar 

  9. Kozareva, Z., Riloff, E., Hovy, E.: Semantic class learning from the web with hyponym pattern linkage graphs. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Columbus, OH, USA, pp. 1048–1056 (2008)

    Google Scholar 

  10. Lenci, A., Benotto, G.: Identifying hypernyms in distributional semantic spaces. In: Proceedings of the Joint Conference on Lexical and Computational Semantics (*SEM), Montréal, Quebec, Canada, pp. 75–79 (2012)

    Google Scholar 

  11. Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics and International Conference on Computational Linguistics (ACL/COLING), Montreal, Quebec, Canada, pp. 768–774 (1998)

    Google Scholar 

  12. Miao, Q., Zhang, S., Zhang, B., Meng, Y., Yu, H.: Extracting and visualizing semantic relationships from chinese biomedical text. In: Proceedings of the Pacific Asia Conference on Language, Information and Compuation (PACLIC), Bali, Indonesia, pp. 99–107 (2012)

    Google Scholar 

  13. Mirkin, S., Dagan, I., Geffet, M.: Integrating pattern-based and distributional similarity methods for lexical entailment acquisition. In: Proceedings of the International Conference on Computational Linguistics and Annual Meeting of the Association for Computational Linguistics (COLING/ACL), Sydney, Australia, pp. 579–586 (2006)

    Google Scholar 

  14. Pantel, P., Ravichandran, D., Hovy, E.: Towards terascale knowledge acquisition. In: Proceedings of the International Conference on Computational Linguistics (COLING), Geneva, Switzerland, pp. 771–777 (2004)

    Google Scholar 

  15. Plank, B., Moschitti, A.: Embedding semantic similarity in tree kernels for domain adapation of relation extraction. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Sofia, Bulgaria, pp. 1498–1507 (2013)

    Google Scholar 

  16. Riloff, E., Shepherd, J.: A corpus-based approach for building semantic lexicons. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Providence, RI, USA, pp. 117–124 (1997)

    Google Scholar 

  17. Shi, S., Zhang, H., Yuan, X., Wen, J.R.: Corpus-based semantic class mining: distributional vs. pattern-based approaches. In: Proceedings of the International Conference on Computational Linguistics (COLING), Beijing, China, pp. 993–1001 (2010)

    Google Scholar 

  18. Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada (2004)

    Google Scholar 

  19. Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA, pp. 901–904 (2002)

    Google Scholar 

  20. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 384–394 (2010)

    Google Scholar 

  21. Weeds, J., Weir, D., McCarthy, D.: Characterising measures of lexical distributional similarity. In: Proceedings of the International Conference on Computational Linguistics (COLING), Geneva, Switzerland, pp. 1015–1021 (2004)

    Google Scholar 

  22. Wiegand, M., Roth, B., Klakow, D.: Web-based relation extraction for the food domain. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 222–227. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  23. Wiegand, M., Roth, B., Klakow, D.: Automatic food categorization from large unlabeled corpora and its impact on relation extraction. In: Proceedings of the Conference on European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, Sweden, pp. 673–682 (2014)

    Google Scholar 

  24. Yamada, I., Torisawa, K., Kazama, J., Kuroda, K., Murata, M., Saeger, S.D., Bond, F., Sumida, A.: Hypernym discovery based on distributional similarity and hierarchical structures. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, pp. 929–927 (2009)

    Google Scholar 

  25. Ziering, P., van der Plas, L., Schuetze, H.: Bootstrapping semantic lexicons for technical domains. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), Nagoya, Japan, pp. 1321–1329 (2013)

    Google Scholar 

Download references

Acknowledgements

This work was supported, in part, by the German Federal Ministry of Education and Research (BMBF) under grant no. 01IC12SO1X and the Information Extraction and Synthesis Lab at the University of Massachusetts. The authors would like to thank Stephanie Köser for annotating the dataset presented in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Wiegand .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wiegand, M., Roth, B., Klakow, D. (2015). Combining Pattern-Based and Distributional Similarity for Graph-Based Noun Categorization. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science(), vol 9103. Springer, Cham. https://doi.org/10.1007/978-3-319-19581-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19581-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19580-3

  • Online ISBN: 978-3-319-19581-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics