Discrimination Decisions for 100,000-Dimensional Spaces

  • William A. Gale
  • Kenneth W. Church
  • David Yarowsky
Part of the Linguistica Computazionale book series (LICO, volume 9)


Discrimination decisions arise in many natural language processing tasks. Three classical tasks are discriminating texts by their authors (author identification), discriminating documents by their relevance to some query (information retrieval), and discriminating multi-meaning words by their meanings (sense discrimination). Many other discrimination tasks arise regularly, such as determining whether a particular proper noun represents a person or a place, or whether a given word from some teletype text would be capitalized if both cases had been used.

We (1993) introduced a method designed for the sense discrimination problem. Here we show that this same method is useful in each of the five text discrimination problems mentioned.

We also discuss areas for research based on observed shortcomings of the method. In particular, an example in the author identification task shows the need for a robust version of the method. Also, the method makes an assumption of independence which is demonstrably false, yet there has been no careful study of the results of this assumption.


Information Retrieval Discrimination Problem Computational Linguistics Plural Form Proper Noun 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Brown, P., S. Della Pietra, V. Della Pietra, and R. Mercer, “Word Sense Disambiguation Using Statistical Methods,” Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp. 264–270, 1991.Google Scholar
  2. [2]
    Church, K.W., “A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Glasgow, 1989.Google Scholar
  3. [3]
    Dagan, I., A. Ital and U. Schwall, “Two Languages are more Informative than One,” Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp. 130–137, 1991.Google Scholar
  4. [4]
    Deerwester, S., S. Dumais, G. Fumas, T. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science, 41, 1990.Google Scholar
  5. [5]
    Dempster, A., N. Laird, and D. Rubin, “Maximum Likelhood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society (B), 39, 1977, pp. 1–38.Google Scholar
  6. [6]
    DeRose, S. “Grammatical Category Disambiguation by Statistical Optimization,” Computational Linguistics, 14, 1, 1988.Google Scholar
  7. [7]
    Francis, W., and H. Kuìrera Frequency Analysis of English Usage, Houghton Mifflin Company, Boston, 1982.Google Scholar
  8. [8]
    Gale, W., and K. Church, “Estimation Procedures for Language Context: Poor Estimates are Worse than None,” pp. 69–74 in Proceedings in Computational Statistics, 1990, K. Momirivic and V. Mildner, eds., Physica-Verlag, Heidelberg, 1990.Google Scholar
  9. [9]
    Gale, W., and K. Church “A Program for Aligning Sentences in Bilingual Corpora,” Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 1991a, pp 177–184.Google Scholar
  10. [10]
    Gale, W. and K. Church “Identifying Word Correspondences in Parallel Texts,” Proceedings of the DARPA Conference on Speech and Natural Language,199 lb.Google Scholar
  11. [11]
    Gale, W., K. Church, and D. Yarowsky “A Method for Disambiguating Word Senses in a Large Corpus,” Computers and the Humanities, 1993.Google Scholar
  12. [12]
    Harman, D., “How Effective is Suffixing?” Journal of the American Society for Information Science, 42, 1991, pp. 7–15.CrossRefGoogle Scholar
  13. [13]
    Harris, Z., Mathematical Structures of Language, Wiley, New York, 1968.Google Scholar
  14. [14]
    Hearst, M., “Noun Homograph Disambiguation Using Local Context in Large Text Corpora,” Using Corpora, University of Waterloo, Waterloo, Ontario, 1991.Google Scholar
  15. [15]
    Leacock, C., G. Miller, T. Towel and E. Voorhees, “Comparative Study of Statistical Methods for Sense Resolution,” Proceedings of the ARPA Workshop on Human Language Technology, 1993.Google Scholar
  16. [16]
    Merialdo, B., `Tagging Text with a Probabilistic Model,“ Proceedings of the IBM Natural Language ITL, Paris, France, 1990, pp. 161–172.Google Scholar
  17. [17]
    Mosteller, Frederick, and David Wallace, Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, MA, 1964.Google Scholar
  18. [18]
    Salton, G., Automatic Text Processing, Addison-Wesley, Reading, MA, 1989.Google Scholar
  19. [19]
    Salton, G. and C. Yang, “On the Specification of Term Values in Automatic Indexing,” Journal of Documentation, 29, 1973, pp. 351–372.CrossRefGoogle Scholar
  20. [20]
    Yarowsky, D., “Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora,” Proceedings of COLING-92, Nantes, France, 1992.Google Scholar
  21. [21]
    Yule, G. U., Statistical Studies of Literary Vocabulary, Cambridge University Press, Cambridge, England, 1944.Google Scholar
  22. [22]
    Zernik, U., “Tagging Word Senses in a Corpus: The Needle in the Haystack Revisited,” Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval, P. Jacobs, ed., Lawrence Erlbaum, Hillsdale, NJ, 1992.Google Scholar
  23. [23]
    Zipf, G. K., Selected Studies of the Principle of Relative Frequency in Language, Harvard University Press, Cambridge, MA, 1932.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1994

Authors and Affiliations

  • William A. Gale
    • 1
  • Kenneth W. Church
    • 1
  • David Yarowsky
    • 1
  1. 1.AT&T Bell LaboratoriesUSA

Personalised recommendations