Discrimination Decisions for 100,000-Dimensional Spaces
Discrimination decisions arise in many natural language processing tasks. Three classical tasks are discriminating texts by their authors (author identification), discriminating documents by their relevance to some query (information retrieval), and discriminating multi-meaning words by their meanings (sense discrimination). Many other discrimination tasks arise regularly, such as determining whether a particular proper noun represents a person or a place, or whether a given word from some teletype text would be capitalized if both cases had been used.
We (1993) introduced a method designed for the sense discrimination problem. Here we show that this same method is useful in each of the five text discrimination problems mentioned.
We also discuss areas for research based on observed shortcomings of the method. In particular, an example in the author identification task shows the need for a robust version of the method. Also, the method makes an assumption of independence which is demonstrably false, yet there has been no careful study of the results of this assumption.
KeywordsInformation Retrieval Discrimination Problem Computational Linguistics Plural Form Proper Noun
Unable to display preview. Download preview PDF.
- Brown, P., S. Della Pietra, V. Della Pietra, and R. Mercer, “Word Sense Disambiguation Using Statistical Methods,” Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp. 264–270, 1991.Google Scholar
- Church, K.W., “A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Glasgow, 1989.Google Scholar
- Dagan, I., A. Ital and U. Schwall, “Two Languages are more Informative than One,” Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp. 130–137, 1991.Google Scholar
- Deerwester, S., S. Dumais, G. Fumas, T. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science, 41, 1990.Google Scholar
- Dempster, A., N. Laird, and D. Rubin, “Maximum Likelhood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society (B), 39, 1977, pp. 1–38.Google Scholar
- DeRose, S. “Grammatical Category Disambiguation by Statistical Optimization,” Computational Linguistics, 14, 1, 1988.Google Scholar
- Francis, W., and H. Kuìrera Frequency Analysis of English Usage, Houghton Mifflin Company, Boston, 1982.Google Scholar
- Gale, W., and K. Church, “Estimation Procedures for Language Context: Poor Estimates are Worse than None,” pp. 69–74 in Proceedings in Computational Statistics, 1990, K. Momirivic and V. Mildner, eds., Physica-Verlag, Heidelberg, 1990.Google Scholar
- Gale, W., and K. Church “A Program for Aligning Sentences in Bilingual Corpora,” Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 1991a, pp 177–184.Google Scholar
- Gale, W. and K. Church “Identifying Word Correspondences in Parallel Texts,” Proceedings of the DARPA Conference on Speech and Natural Language,199 lb.Google Scholar
- Gale, W., K. Church, and D. Yarowsky “A Method for Disambiguating Word Senses in a Large Corpus,” Computers and the Humanities, 1993.Google Scholar
- Harris, Z., Mathematical Structures of Language, Wiley, New York, 1968.Google Scholar
- Hearst, M., “Noun Homograph Disambiguation Using Local Context in Large Text Corpora,” Using Corpora, University of Waterloo, Waterloo, Ontario, 1991.Google Scholar
- Leacock, C., G. Miller, T. Towel and E. Voorhees, “Comparative Study of Statistical Methods for Sense Resolution,” Proceedings of the ARPA Workshop on Human Language Technology, 1993.Google Scholar
- Merialdo, B., `Tagging Text with a Probabilistic Model,“ Proceedings of the IBM Natural Language ITL, Paris, France, 1990, pp. 161–172.Google Scholar
- Mosteller, Frederick, and David Wallace, Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, MA, 1964.Google Scholar
- Salton, G., Automatic Text Processing, Addison-Wesley, Reading, MA, 1989.Google Scholar
- Yarowsky, D., “Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora,” Proceedings of COLING-92, Nantes, France, 1992.Google Scholar
- Yule, G. U., Statistical Studies of Literary Vocabulary, Cambridge University Press, Cambridge, England, 1944.Google Scholar
- Zernik, U., “Tagging Word Senses in a Corpus: The Needle in the Haystack Revisited,” Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval, P. Jacobs, ed., Lawrence Erlbaum, Hillsdale, NJ, 1992.Google Scholar
- Zipf, G. K., Selected Studies of the Principle of Relative Frequency in Language, Harvard University Press, Cambridge, MA, 1932.Google Scholar