An Empirical Bayesian Method for Detecting Out of Context Words

Jabbari, Sanaz; Allison, Ben; Guthrie, Louise

doi:10.1007/978-3-540-87391-4_15

Sanaz Jabbari¹,
Ben Allison¹ &
Louise Guthrie¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

952 Accesses
1 Citations

Abstract

In this paper, we propose an empirical Bayesian method for determining whether a word is used out of context. We suggest we can treat a word’s context as a multinomially distributed random variable, and this leads us to a simple and direct Bayesian hypothesis test for the problem in question. We demonstrate this method to be superior to a method based upon common practice in the literature. We also demonstrate how an empirical Bayes method, whereby we use the behaviour of other words to specify a prior distribution on model parameters, improves performance by an appreciable amount where training data is sparse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pustejovsky, J.: The Generative Lexicon. Comput. Linguist. 17, 409–441 (1991)
Google Scholar
Hirst, G., St-Onge, D.: Lexical chains as representation of context for the detection and correction malapropisms (1997)
Google Scholar
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
MATH Google Scholar
Jarmasz, M., Szpakowicz, S.: Roget’s thesaurus and semantic similarity. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-2003), pp. 212–219 (2003)
Google Scholar
Lee, L.J.: Similarity-based approaches to natural language processing. Ph.D. thesis, Cambridge, MA, USA (1997)
Google Scholar
Lee, L., Pereira, F.: Distributional similarity models: clustering vs. nearest neighbors. In: Proceedings of the 37^th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 33–40. Association for Computational Linguistics (1999)
Google Scholar
Padó, S., Lapata, M.: Dependency-based construction of semantic space models. Comput. Linguist. 33, 161–199 (2007)
Article Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Article Google Scholar
Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24, 97–123 (1998)
Google Scholar
Gliozzo, A.M.: Semantic Domains in Computational Linguistics. Ph.D. thesis (2005)
Google Scholar
Minka, T.: Estimating a dirichlet distribution. Technical report, Microsoft Research (2000)
Google Scholar
Procter, P.: Longman’s Dictionary of Contemporary English. Longman Group Limited (1978)
Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Processing Group, Department of Computer Science, University of Sheffield, UK
Sanaz Jabbari, Ben Allison & Louise Guthrie

Authors

Sanaz Jabbari
View author publications
You can also search for this author in PubMed Google Scholar
Ben Allison
View author publications
You can also search for this author in PubMed Google Scholar
Louise Guthrie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jabbari, S., Allison, B., Guthrie, L. (2008). An Empirical Bayesian Method for Detecting Out of Context Words. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-87391-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Empirical Bayesian Method for Detecting Out of Context Words