Abstract
In this paper, we propose an empirical Bayesian method for determining whether a word is used out of context. We suggest we can treat a word’s context as a multinomially distributed random variable, and this leads us to a simple and direct Bayesian hypothesis test for the problem in question. We demonstrate this method to be superior to a method based upon common practice in the literature. We also demonstrate how an empirical Bayes method, whereby we use the behaviour of other words to specify a prior distribution on model parameters, improves performance by an appreciable amount where training data is sparse.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pustejovsky, J.: The Generative Lexicon. Comput. Linguist. 17, 409–441 (1991)
Hirst, G., St-Onge, D.: Lexical chains as representation of context for the detection and correction malapropisms (1997)
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Jarmasz, M., Szpakowicz, S.: Roget’s thesaurus and semantic similarity. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-2003), pp. 212–219 (2003)
Lee, L.J.: Similarity-based approaches to natural language processing. Ph.D. thesis, Cambridge, MA, USA (1997)
Lee, L., Pereira, F.: Distributional similarity models: clustering vs. nearest neighbors. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 33–40. Association for Computational Linguistics (1999)
Padó, S., Lapata, M.: Dependency-based construction of semantic space models. Comput. Linguist. 33, 161–199 (2007)
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24, 97–123 (1998)
Gliozzo, A.M.: Semantic Domains in Computational Linguistics. Ph.D. thesis (2005)
Minka, T.: Estimating a dirichlet distribution. Technical report, Microsoft Research (2000)
Procter, P.: Longman’s Dictionary of Contemporary English. Longman Group Limited (1978)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jabbari, S., Allison, B., Guthrie, L. (2008). An Empirical Bayesian Method for Detecting Out of Context Words. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-87391-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)