An Index for the Data Size to Extract Decomposable Structures in LAD
Logical analysis of data (LAD)is one of the methodologies for extracting knowledge as a Boolean function f from a given pair of data sets (T,F)on attributes set S of size n in whch T (resp.,F)0 , 1n denotes a set of positive (resp.,negative)examples for the phenomenon under cons deration.In this paper,we consider the case n which extracted knowledge has a decomposable structure;i.e.,f is described as aform f (x)=g(x[S0],h x[S1]))for some S0,S1 .S and Boolean functions g and h where x[I]denotes the projection of vector x on I In order to detect meaningful decomposable structures,it is expected that the sizes ∣T∣and ∣F∣ must be sufficiently large.In this paper,we provide an index for such indispensable number of examples,based on probabilistic analysis.Using p = ∣T ∣/ ∣T ∣+ ∣F ∣)and q = ∣F ∣/ ∣T ∣+ ∣F ∣),we claim that there exist many deceptive decomposable structures of (T,F) if ∣T + ∣F ∣≤√p n - 1 /pq The computat onal results on synthetically generated data sets show that the above index gives a good lower bound on the indispensable data size.
Keywordslogical analysis of data Boolean functions decomposable functions computational learning theory random graphs probabilistic analysis
Unable to display preview. Download preview PDF.
- 1.N. Alon and J.H. Spencer,The Probablistic Method, Second Edition (John Wiley & Sons,2000).Google Scholar
- 2.M. Anthony and N. Biggs,Computational Learning Theory, (Cambridge Univer-sity Press,1992).Google Scholar
- 8.U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth,and R. Uthurusamy,Advances in Knowledge Discovery and Data Mining, (AAAI Press,1996).Google Scholar
- 9.J. Kivinen and H. Mannila,The power of sampling in knowledge discovery,Pro-ceedings of the 1994 ACM SIGACT-SIGMOD-SIGACT Symposium on Principles of Database Theory (PODS’94),(1994)77–85.Google Scholar
- 11.S. Mii,Feature Determination Algorithms in the Analysis of Data,Master Thesis, Department of Applied Mathematics and Phys cs,Graduate School of Informat-ics,Kyoto University,March 2001.Google Scholar
- 12.H. Ono, K. Makino and T. Ibarak,Logical Analysis of Data with Decomposable Structures,COCOON2000 Lecture Notes in Computer Science 1858,(2000)396–406.Google Scholar
- 13.H. Toivonen,Sampling Large Databases for Association Rules,Proceedings of 22th International Conference on Very Large Data Bases (VLDB’96),(1996)134–145.Google Scholar
- 14.M. Yagiura, T. Ibarak and F. Glover,An Ejection Chain Approach for the Gen-eralized Assignment Problem,Technical Report #99013,Department of Applied Mathematics and Physics,Graduate School of Informatics,Kyoto University, 1999.Google Scholar