Linear Discriminant Text Classification in High Dimension

  • András Kornai
  • J. Michael Richards
Part of the Advances in Soft Computing book series (AINSC, volume 14)


Linear Discriminant (LD) techniques are typically used in pattern recognition tasks when there are many (n >> 104) datapoints in low-dimensional (d < 102) space. In this paper we argue on theoretical grounds that LD is in fact more appropriate when training data is sparse, and the dimension of the space is extremely high. To support this conclusion we present experimental results on a medical text classification problem of great practical importance, autocoding of adverse event reports. We trained and tested LD-based systems for a variety of classification schemes widely used in the clinical drug trial process (COSTART, WHOART, HARTS, and MedDRA) and obtained significant reduction in the rate of misclassification compared both to generic Bayesian machine-learning techniques and to the current generation of domain-specific autocoders based on string matching.


Adverse Event Report Linear Classification Pattern Recognition Task Count Vector Message Rout 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Steven Abney. 1991. Parsing by chunks. In Robert Berwick, Steven Abney, and Carol Tenny, editors, Principle-based parsing. Kluwer Academic Publishers.Google Scholar
  2. 2.
    David W. Aha. 1991. Instance-based learning algorithms. Machine Learning, 61: 37–66.Google Scholar
  3. 3.
    Peter E Brown. 1987. The acoustic-modelling problem in automatic speech recognition. Ph.D. thesis, Carnegie-Mellon University.Google Scholar
  4. 4.
    Kenneth W. Church and William A. Gale. 1991. A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language, 5: 19–54.CrossRefGoogle Scholar
  5. 5.
    Christopher G. Chute and Yiming Yang. 1995. An overview of statistical methods for the classification and retrieval of patient events. Methods of Information in Medicine, 34: 104–109.Google Scholar
  6. 6.
    Guy Divita, Allen C. Browne, and Thomas C. Rindflesch. 1998. Evaluating lexical variant generation to improve information retrieval. In Proc. American Medical Informatics Association 1998 Annual Symposium,Orlando, Florida.Google Scholar
  7. 7.
    Thérèse Dupin-Spriet and Alain Spriet. 1994. Coding errors: classification, detection, and prevention. Drug Information Journal, 28: 787–790.Google Scholar
  8. 8.
    Ronald A. Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7: 179–188.CrossRefGoogle Scholar
  9. 9.
    Ronald A. Fisher. 1937. The statistical utilization of multiple measurements. Annals of Eugenics, 8: 376–385.CrossRefGoogle Scholar
  10. 10.
    Christian Fizames. 1997. How to improve the medical quality of the coding reports based on WHOART and COSTART use. Drug Information Journal, 31: 85–92.Google Scholar
  11. 11.
    Stephen I. Gallant. 1995. Exemplar-based medical text classification. Belmont Research SBIR Proposal 1 R43 CA 65250–01.Google Scholar
  12. 12.
    Terry L. Gillum, Robert H. George, and Jack E. Leitmeyer. 1995. An autoencoder for clinical and regulatory data processing. Drug Information Journal, 29: 107–113.Google Scholar
  13. 13.
    Donna K. Harman, editor. 1994. The Second Text REtrieval Conference (TREC-2). National Institute of Standards and Technology, Gaithersburg, Maryland.Google Scholar
  14. 14.
    Wilbur H. Highleyman. 1962. Linear decision functions with application to pattern recognition. Proceedings of the IRE, 50: 1501–1514.MathSciNetCrossRefGoogle Scholar
  15. 15.
    Michael. C. Joseph, Kathy Schoeffler, Peggy A. Doi, Helen Yefko, Cindy Engle, and Erika F. Nissman. 1991. An automated COSTART coding scheme. Drug Information Journal, 25: 97–108.Google Scholar
  16. 16.
    Robert Krovetz. 1993. Viewing morphology as an inference process. In Proceedings of SIGIR93, pages 191–202.Google Scholar
  17. 17.
    David D. Lewis, Robert E. Schapire, James P. Callan, and Ron Papka. 1996. Training algorithms for linear text classifiers. In Proceedings of SIGIR96, pages 298–306.Google Scholar
  18. 18.
    Marvin Minsky and Seymour Papert. 1988. Perceptons (2nd ed.). MIT Press, Cambridge MA.Google Scholar
  19. 19.
    Jordan B. Pollack. 1989. No harm intended: A review of the perceptrons expanded edition. Journal of Mathematical Psychology, 33: 358–365.CrossRefGoogle Scholar
  20. 20.
    C. Radhakrishna Rao. 1965. Linear statistical inference and its applications. John Wiley, New York.Google Scholar
  21. 21.
    Hinrich Schütze, David A. Hull, and Jan 0. Pedersen. 1995. A comparison of classifiers and document representations for the routing problem. In Proceedings of SIGIR95, pages 229–237.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • András Kornai
    • 1
  • J. Michael Richards
    • 2
  1. 1.Northern Light TechnologyCambridgeUSA
  2. 2.PPD Informatics/Belmont ResearchCambridgeUSA

Personalised recommendations