# Finding Optimal Boolean Classifiers

• John Franco
Chapter
Part of the Nonconvex Optimization and Its Applications book series (NOIA, volume 42)

## Abstract

We are concerned with the following well known problem. Given a partially defined Boolean function as a collection V of m Boolean vectors of dimension n, and a mapping M : V → {1, 0}, determine the “best” completely defined Boolean function that is consistent with V and M. Of course, the meaning of “best” depends on the context in which this question is asked. In oil exploration, if vectors are the results of well logs and a mapping to ‘1’ means oil is found and a mapping to ‘0’ means no oil is found, and if relatively few vectors have been assigned values to date, then “best” means the best predictor of oil in case new vectors, with unknown values, emerge from future logs: that is, the “best” completely specified Boolean function is the one that minimizes the probability of error in assigning values to new vectors.

Clearly, such probabilities depend on the distribution of data and determining distributions would be a lot easier if there exists some natural underlying law governing the generation of data. In fact, some have proposed a theory of data generation to deal with this problem. For example, according to one theory, data is “convex.” This is a generalization of the natural intuition that, in the case of reasonably behaved data sets, it is unlikely to have vectors ‘00’ and ‘11’ map to ‘1’ and ‘10’ and ‘01’ map to ‘0’. But an analysis of some data sets suggests that something more than convexity is needed.

In this paper we propose some possibly new theories of data generation, based on Hamming distances between vectors, that seem to fit several data sets consistently: that is, observed properties of the distributions of data seem to match. In particular, we propose a model for data generation that is based on the notion that convexity is a local property and that data sets “clump” into the ‘1’ category as “vines” and not as “balls” as convexity would suggest. The vine model allows the possibility that many pairs of points of opposite value have low Hamming distance between them. Given a theory we can compute probabilities; and we show the results in the case of a few data sets.

## Keywords

Logical Analysis Boolean functions.

## References

1. [1]
E. Boros, P.L. Hammer, T. Ibaraki, and A. Kogan. Logical analysis of numerical data. In Mathematical Programming, Special issue: lectures on Mathematical Programming from ismp97, T.M. Liebling and D. deWerra, editors, 79:163–190, 1997.Google Scholar
2. [2]
E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz, and I. Muchnik. An implementation of logical analysis of data. RUTCOR Research Report RRR 22–96, RUTCOR, Rutgers University, 1996.Google Scholar
3. [3]
J.C. Davis. Statistics and Data Analysis in Geology. Chapter 5. Wiley, 1986.Google Scholar
4. [4]
M.A. Oliver and R. Webster. Kriging: a method of interpolation for geographical information systems. International Journal of Geographical Information Systems, 4: 313–332, 1990.
5. [5]
O. Ekin, P.L. Hammer, and A. Kogan. Convexity and logical analysis of data. DIMACS Technical Report 98–09, Rutgers University, 1998.Google Scholar