An Experimental Comparison of Methods for Handling Incomplete Data in Learning Parameters of Bayesian Networks
Missing values of attributes in data sets, also referred to as incomplete data, pose difficulties in learning tasks, such as classification, data mining, or learning Bayesian network structure and its numerical parameters. Because of the predominance of incomplete data in practice, many methods have been proposed to deal with them while there are few studies that compare their performance. The Hepar II project presents an excellent opportunity to test experimentally how these methods perform on a real data set. We briefly review several popular methods for handling incomplete data and then compare them on the task of learning conditional probability distributions of a Bayesian network model, where the comparison criterion is the resulting diagnostic accuracy. While substitution of “normal” values of missing attributes seemed to perform best, we observed only a small difference in performance among the studied methods.
KeywordsBayesian Network Incomplete Data Conditional Prob Ability Distribution Bayesian Network Model Bayesian Network Structure
Unable to display preview. Download preview PDF.
- 2.Leon Bobrowski. HEPAR: Computer system for diagnosis support and data analysis. Prace IBIB 31, Institute of Biocybernetics and Biomedical Engineering, Polish Academy of Sciences, Warsaw, Poland, 1992.Google Scholar
- 6.B. L. Ford. An overview of hot-deck procedures. In Rubin D. B. Madow W. G., Olkin I., editor, Incomplete data in sample surveys, pages 185–207. Academic Press, New York, 1983.Google Scholar
- 7.K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, 1972.Google Scholar
- 12.R.J.A. Little and N. Schenker. Missing data. In C.C. Clogg G. Arminger and M.E. Sobel, editors, Handbook for Statistical Modeling in the Social and Behavioral Sciences, pages 39–75. New York Plenum, 1994.Google Scholar
- 13.Agnieszka Onigko, Marek J. Druzdzel, and Hanna Wasyluk. Extension of the Hepar II model to multiple-disorder diagnosis. In S.T. Wierzchon M. Klopotek, M. Michalewicz, editor, Intelligent Information Systems, Advances in Soft Computing Series, pages 303–313, Heidelberg, 2000. PhysicaVerlag (A Springer-Verlag Company).Google Scholar
- 15.Mark Peot and Ross Shachter. Learning from what you don’t observe. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 439–446, San Francisco, CA, 1998. Morgan Kaufmann Publishers.Google Scholar
- 16.Marco Ramoni and Paola Sebastiani. Learning conditional probabilities from incomplete data: An experimental comparison. In Proceedings of the The Seventh International Workshop on Artificial Intelligence and Statistics, pages 260–265, San Francisco, CA, 1999. Morgan Kaufmann Publishers, Inc.Google Scholar
- 18.Hanna Wasyluk. The four year’s experience with HEPAR-computer assisted diagnostic program. In Proceedings of the Eighth World Congress on Medical Informatics (MEDINFO-95), pages 1033–1034, Vancouver, BC, July 23–27 1995.Google Scholar