Approximate Location of Relevant Variables under the Crossover Distribution
Searching for genes involved in traits (e.g. diseases), based on genetic data, is considered from a computational learning perspective. This leads to the problem of learning relevant variables of functions from data sampled from a certain class of distributions generalizing the uniform distribution. The Fourier transform of Boolean functions is applied to translate the problem into searching for local extrema of certain functions of observables. We work out the combinatorial structure of this approach and illustrate its potential use.
KeywordsLearning from samples relevance Boolean functions Fourier transform crossover distribution genetics local extrema
Unable to display preview. Download preview PDF.
- 2.A. Bernasconi: Mathematical techniques for the analysis of Boolean functions, PhD thesis, Univ. Pisa 1998Google Scholar
- 3.N. Bshouty, J.C. Jackson, C. Tamon: More efficient PAC-learning of DNF with membership queries under the uniform distribution, ACM Symp. on Computational Learning Theory COLT’99, 286–293Google Scholar
- 5.P. Damaschke: Parallel attribute-efficient learning of monotone Boolean functions, 7th Scand. Workshop on Algorithm Theory SWAT’2000, LNCS 1851, 504–512, journal version accepted for J. of Computer and System Sciences Google Scholar
- 8.G.H. John, R. Kohavi, K. Pfleger: Irrelevant features and the subset selection problem, 11th Int. Conf. on Machine Learning 1994, Morgan Kaufmann, 121–129Google Scholar
- 9.D.S. Johnson (ed.): Challenges for Theoretical Computer Science (draft), available at http://www.research.att.com/~dsj/nflist.html#Biology
- 10.S. Karlin, U. Liberman: Classifications and comparisons of multilocus recombination distribution, Proc. Nat. Acad. Sci. USA 75 (1979), 6332–6336Google Scholar
- 11.M.J. Kearns, R.E. Schapire: Efficient distribution-free learning of probabilistic concepts, in: Computational Learning Theory and Natural Learning Systems, MIT Press 1994, 289–329 (preliminary version in FOCS’90)Google Scholar
- 12.R. Kohavi: Feature subset selection as search with probabilistic estimates, in: R. Greiner, D. Subramanian (eds.): Relevance, Proc. 1994 AAAI Fall Symposium, 122–126Google Scholar
- 13.W. Li, J. Reich: A complete enumeration and classification of two-locus disease models, Human Hereditary (1999)Google Scholar
- 15.Y. Mansour: Learning Boolean functions via the Fourier transform, in: Theoretical Advances in Neural Computing and Learning, Kluwer 1994Google Scholar
- 17.J.C. Schlimmer: Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning, 10th Int. Conf. on Machine Learning 1993, Morgan Kaufmann, 284–290Google Scholar
- 18.J.D. Terwilliger, H.H.H. Göring: Gene mapping in the 20th and 21st centuries: statistical methods, data analysis, and experimental design, Human Biology 72 (2000), 63–132Google Scholar