Learning Classifier Systems Applied to Knowledge Discovery in Clinical Research Databases

  • John H. Holmes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1813)


A stimulus-response learning classifier system (LCS), EpiCS, was developed from the BOOLE and NEWBOOLE models to address the needs of knowledge discovery in databases used in clinical research. Two specific needs were investigated: the derivation of accurate estimates of disease risk, and the ability to deal with rare clinical outcomes. EpiCS was shown to have excellent classification accuracy, compared to logistic regression, when using risk estimates as the primary means for classification. This was especially true in data with low disease prevalence. EpiCS was designed to accommodate differential negative reinforcement when false positive or false negative decisions were made by the system. This feature was investigated to determine its effect on learning rate and classification accuracy. Tested across a range of disease prevalences, the learning rate improved when erroneous decisions were differentially negatively reinforced. However, classification accuracy was not affected by differential negative reinforcement


Receiver Operating Characteristic Curve Knowledge Discovery Learning Rate Classification Performance Epidemiologic Surveillance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bonelli, P., Parodi, A., Sen, S., and Wilson, S.: NEWBOOLE: A fast GBML system. In International Conference on Machine Learning, pages 153–159, San Mateo, California, 1990. Morgan Kaufmann.Google Scholar
  2. 2.
    Bonelli, P. and Parodi, A.: An efficient classifier system and its experimental comparison with two representative learning methods on three medical domains. In R.K. Belew and L.B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms (ICGA-4), pages 288–295. San Mateo, CA: Morgan Kaufmann, 1991.Google Scholar
  3. 3.
    Centor, R. and Keightley, G.E.: Receiver operating characteristic (ROC) curve area analysis using the ROC ANALYZER. System. In Kingsland, L.C., editor Proceedings of the Thirteenth Annual Symposium on Computer Applications in Medical Care, pages 222–226, Silver Spring, MD: IEEE Computer Society Press, 1989.Google Scholar
  4. 4.
    Dean, A.D.. Dean, J.A., Burton, J.H., and Dicker, R.C.: Epi Info, Version 5: a word processing, database, and statistics program for epidemiology on microcomputers. Centers for Disease Control, Atlanta, Georgia, 1990.Google Scholar
  5. 5.
    Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press; 1996.Google Scholar
  6. 6.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA, 1989.zbMATHGoogle Scholar
  7. 7.
    Good, W.F., Gur, D, Straub, W.H., and Feist, J.H.: Comparing imaging systems by ROC studies. Detection versus interpretation. Investigative Radiology. 24(11): 932–3, 1989.CrossRefGoogle Scholar
  8. 8.
    Green, D.M. and Swets, J.A.: Signal Detection Theory and Psychophysics. New York: John Wiley Sons; 1966.Google Scholar
  9. 9.
    Hanley, J.A. and McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36, 1982.Google Scholar
  10. 10.
    Hennekens, C.H., Buring, J.E., and Mayrent, S.L., editors: Epidemiology in Medicine. Boston: Little, Brown and Company; 1987.Google Scholar
  11. 11.
    Holmes, J.H.: A genetics-based machine learning approach to knowledge discovery in clinical data, Journal of the American Medical Informatics Association Supplement (1996) 883.Google Scholar
  12. 12.
    Holmes, J.H.: Discovering Risk of Disease with a Learning Classifier System. In Baeck, T., editor, Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA-7). pages 426–433. San Francisco, CA: Morgan Kaufmann, 1997.Google Scholar
  13. 13.
    Holmes, J.H., Durbin D.R., and Winston F.K.: The Learning Classifier System: An evolutionary computation approach to knowledge discovery in epidemiologic surveillance. Artificial Intelligence in Medicine (Accepted for publication).Google Scholar
  14. 14.
    Hume, D.A.: A Treatise of Human Nature (1739). Second edition. Oxford: Clarendon Press; 1978.Google Scholar
  15. 15.
    Kelsey, J.L., Thompson, W.D., and Evans, A.S.: Methods in Observational Epidemiology. New York: Oxford University Press; 1986.Google Scholar
  16. 16.
    McNeil, B.J. and Hanley, J.A.: Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making. 4:137–150, 1984.CrossRefGoogle Scholar
  17. 17.
    McNichol, D.A.: Primer of Signal Detection Theory. London: George Allen and Unwin, Ltd.; 1972.Google Scholar
  18. 18.
    Metz, C.E., Shen, J.-H., and Kronman, H.B.: LabROC4: A program for maximum likelihood estimation of a binormal ROC curve and its associated parameters from a set of continuously-distributed data. University of Chicago; 1993.Google Scholar
  19. 19.
    Provost, F. and Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In Heckerman, D., Mannila, H., Pregibon, D., and Uthurusamy, R., editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 43–48. Menlo Park, CA: AAAI Press, 1997.Google Scholar
  20. 20.
    Raska, K.: Epidemiologic surveillance in the control of infectious diseases. Review of Infectious Disease, 5: 1112–1117, 1983.Google Scholar
  21. 21.
    Robertson, G.G. and Riolo, R.L.: A tale of two classifier systems. Machine Learning 3:139–159, 1988.Google Scholar
  22. 22.
    Rothman, K.J.: Modern Epidemiology. Boston: Little, Brown and Company, 1986.Google Scholar
  23. 23.
    Sedbrook, T.A., Wright, H., and Wright, R.: Application of a genetic classifier for patient triage. In Belew, R.K. and Booker, L.B., editors, Proceedings of the Fourth International Conference on Genetic Algorithms (ICGA-4), pages 334–338. San Mateo, CA: Morgan Kaufmann, 1991.Google Scholar
  24. 24.
    Somoza, E., Soutullo-Esperon, L., and Mossman, D.: Evaluation and optimization of diagnostic tests using receiver operating characteristic analysis and information theory. International Journal of Biomedical Computing, 24(3): 153–89, 1989.CrossRefGoogle Scholar
  25. 25.
    Wilson, S.W.: Knowledge growth in an artificial animal. In Grefenstette, J.J., editor, Proceedings of an International Conference on Genetic Algorithms and their Applications, pages 16–23, Pittsburgh, PA, July 1985. Lawrence Erlbaum Associates.Google Scholar
  26. 26.
    Wilson, S.W.: Classifier systems and the animat problem, Machine Learning, 2:199–228, 1987.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • John H. Holmes
    • 1
  1. 1.Center for Clinical Epidemiology and BiostatisticsUniversity of Pennsylvania School of MedicineUSA

Personalised recommendations