Discriminant analysis for discrete variables derived from a tree-structured graphical model
The purpose of this paper is to illustrate the potential use of discriminant analysis for discrete variables whose dependence structure is assumed to follow, or can be approximated by, a tree-structured graphical model. This is done by comparing its empirical performance, using estimated error rates for real and simulated data, with the well-known Naive Bayes classification rule and with linear logistic regression, both of which do not consider any interaction between variables, and with models that consider interactions like a decomposable and the saturated model. The results show that discriminant analysis based on tree-structured graphical models, a simple nonlinear method including only some of the pairwise interactions between variables, is competitive with, and sometimes superior to, other methods which assume no interactions, and has the advantage over more complex decomposable models of finding the graph structure in a fast way and exact form.
KeywordsDiscrete variables Discriminant analysis Error rates Minimum weight spanning tree Multinomial distribution Sparseness Structure estimation Tree-structured graphical models
Mathematics Subject Classification62H30 68T10
This work was written while GEG was at the Department of Applied Mathematics and Computer Science, Technical University of Denmark, on Sabbatical leave from the Faculty of Sciences at the National Autonomous University of Mexico (UNAM), and gratefully acknowledges a six months grant from the program PASPA, DGAPA, UNAM. GPC was a postdoctoral researcher at the Department of Applied Mathematics and Computer Science, Technical University of Denmark, and received a postdoctoral Grant (252737) by the National Council of Science and Technology (CONACYT) of Mexico. We are very grateful to Drs. H. Avila Rosas and L. D. Sánchez Velázquez for providing the ICU data, and for helpful discussions concerning the codification and selection of variables.