Combining Pattern-Based and Distributional Similarity for Graph-Based Noun Categorization

Wiegand, Michael; Roth, Benjamin; Klakow, Dietrich

doi:10.1007/978-3-319-19581-0_5

Michael Wiegand¹⁸,
Benjamin Roth¹⁹ &
Dietrich Klakow¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9103))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1794 Accesses

Abstract

We examine the combination of pattern-based and distributional similarity for the induction of semantic categories. Pattern-based methods are precise and sparse while distributional methods have a higher recall. Given these particular properties we use the prediction of distributional methods as a back-off to pattern-based similarity. Since our pattern-based approach is embedded into a semi-supervised graph clustering algorithm, we also examine how distributional information is best added to that classifier. Our experiments are carried out on \(5\) different food categorization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We remove all food items that contain as a suffix another food item that is also contained in our food vocabulary.
2.
That is, in order to establish the label of the sparse compound chocolate-almond cake, one just considers the label of the suffix/head cake. The latter is a more general expression for which a label can be more reliably determined.

References

Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Google Scholar
Chahuneau, V., Gimpel, K., Routledge, B.R., Scherlis, L., Smith, N.A.: Word salad: relating food prices and descriptions. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL), Jeju Island, Korea, pp. 1357–1367 (2012)
Google Scholar
Druck, G., Pang, B.: Spice it up? mining refinements to online instructions from user generated content. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Jeju, Republic of Korea, pp. 545–553 (2012)
Google Scholar
van Hage, W.R., Katrenko, S., Schreiber, G.: A method to combine linguistic ontology-mapping techniques. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 732–744. Springer, Heidelberg (2005)
Chapter Google Scholar
van Hage, W.R., Kolb, H., Schreiber, G.: A method for learning part-whole relations. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 723–735. Springer, Heidelberg (2006)
Chapter Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the International Conference on Computational Linguistics (COLING), Nantes, France, pp. 539–545 (1992)
Google Scholar
Huang, R., Riloff, E.: Inducing domain-specific semantic class taggers from (almost) nothing. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 275–285 (2010)
Google Scholar
Kozareva, Z., Hovy, E.: Semi-supervised method to learn and construct taxonomies using the web. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 1110–1118 (2010)
Google Scholar
Kozareva, Z., Riloff, E., Hovy, E.: Semantic class learning from the web with hyponym pattern linkage graphs. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Columbus, OH, USA, pp. 1048–1056 (2008)
Google Scholar
Lenci, A., Benotto, G.: Identifying hypernyms in distributional semantic spaces. In: Proceedings of the Joint Conference on Lexical and Computational Semantics (*SEM), Montréal, Quebec, Canada, pp. 75–79 (2012)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics and International Conference on Computational Linguistics (ACL/COLING), Montreal, Quebec, Canada, pp. 768–774 (1998)
Google Scholar
Miao, Q., Zhang, S., Zhang, B., Meng, Y., Yu, H.: Extracting and visualizing semantic relationships from chinese biomedical text. In: Proceedings of the Pacific Asia Conference on Language, Information and Compuation (PACLIC), Bali, Indonesia, pp. 99–107 (2012)
Google Scholar
Mirkin, S., Dagan, I., Geffet, M.: Integrating pattern-based and distributional similarity methods for lexical entailment acquisition. In: Proceedings of the International Conference on Computational Linguistics and Annual Meeting of the Association for Computational Linguistics (COLING/ACL), Sydney, Australia, pp. 579–586 (2006)
Google Scholar
Pantel, P., Ravichandran, D., Hovy, E.: Towards terascale knowledge acquisition. In: Proceedings of the International Conference on Computational Linguistics (COLING), Geneva, Switzerland, pp. 771–777 (2004)
Google Scholar
Plank, B., Moschitti, A.: Embedding semantic similarity in tree kernels for domain adapation of relation extraction. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Sofia, Bulgaria, pp. 1498–1507 (2013)
Google Scholar
Riloff, E., Shepherd, J.: A corpus-based approach for building semantic lexicons. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Providence, RI, USA, pp. 117–124 (1997)
Google Scholar
Shi, S., Zhang, H., Yuan, X., Wen, J.R.: Corpus-based semantic class mining: distributional vs. pattern-based approaches. In: Proceedings of the International Conference on Computational Linguistics (COLING), Beijing, China, pp. 993–1001 (2010)
Google Scholar
Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada (2004)
Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA, pp. 901–904 (2002)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 384–394 (2010)
Google Scholar
Weeds, J., Weir, D., McCarthy, D.: Characterising measures of lexical distributional similarity. In: Proceedings of the International Conference on Computational Linguistics (COLING), Geneva, Switzerland, pp. 1015–1021 (2004)
Google Scholar
Wiegand, M., Roth, B., Klakow, D.: Web-based relation extraction for the food domain. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 222–227. Springer, Heidelberg (2012)
Chapter Google Scholar
Wiegand, M., Roth, B., Klakow, D.: Automatic food categorization from large unlabeled corpora and its impact on relation extraction. In: Proceedings of the Conference on European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, Sweden, pp. 673–682 (2014)
Google Scholar
Yamada, I., Torisawa, K., Kazama, J., Kuroda, K., Murata, M., Saeger, S.D., Bond, F., Sumida, A.: Hypernym discovery based on distributional similarity and hierarchical structures. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, pp. 929–927 (2009)
Google Scholar
Ziering, P., van der Plas, L., Schuetze, H.: Bootstrapping semantic lexicons for technical domains. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), Nagoya, Japan, pp. 1321–1329 (2013)
Google Scholar

Download references

Acknowledgements

This work was supported, in part, by the German Federal Ministry of Education and Research (BMBF) under grant no. 01IC12SO1X and the Information Extraction and Synthesis Lab at the University of Massachusetts. The authors would like to thank Stephanie Köser for annotating the dataset presented in this paper.

Author information

Authors and Affiliations

Spoken Language Systems, Saarland University, Saarbrücken, 66123, Germany
Michael Wiegand & Dietrich Klakow
School of Computer Science, University of Massachusetts, Amherst, MA, USA
Benjamin Roth

Authors

Michael Wiegand
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Roth
View author publications
You can also search for this author in PubMed Google Scholar
Dietrich Klakow
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Wiegand .

Editor information

Editors and Affiliations

Technische Universität Darmstadt, Darmstadt, Germany
Chris Biemann
Universität Passau, Passau, Germany
Siegfried Handschuh
Universität Passau, Passau, Germany
André Freitas
University of Salford, Salford, United Kingdom
Farid Meziane
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wiegand, M., Roth, B., Klakow, D. (2015). Combining Pattern-Based and Distributional Similarity for Graph-Based Noun Categorization. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science(), vol 9103. Springer, Cham. https://doi.org/10.1007/978-3-319-19581-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-19581-0_5
Published: 04 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19580-3
Online ISBN: 978-3-319-19581-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics