Approximate Location of Relevant Variables under the Crossover Distribution

  • Peter Damaschke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2264)


Searching for genes involved in traits (e.g. diseases), based on genetic data, is considered from a computational learning perspective. This leads to the problem of learning relevant variables of functions from data sampled from a certain class of distributions generalizing the uniform distribution. The Fourier transform of Boolean functions is applied to translate the problem into searching for local extrema of certain functions of observables. We work out the combinatorial structure of this approach and illustrate its potential use.


Learning from samples relevance Boolean functions Fourier transform crossover distribution genetics local extrema 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    D.A. Bell, H. Wang: A formalism for relevance and its application in feature subset selection, Machine Learning 41 (2000), 175–195zbMATHCrossRefGoogle Scholar
  2. 2.
    A. Bernasconi: Mathematical techniques for the analysis of Boolean functions, PhD thesis, Univ. Pisa 1998Google Scholar
  3. 3.
    N. Bshouty, J.C. Jackson, C. Tamon: More efficient PAC-learning of DNF with membership queries under the uniform distribution, ACM Symp. on Computational Learning Theory COLT’99, 286–293Google Scholar
  4. 4.
    P. Damaschke: Adaptive versus nonadaptive attribute-efficient learning, Machine Learning 41 (2000), 197–215zbMATHCrossRefGoogle Scholar
  5. 5.
    P. Damaschke: Parallel attribute-efficient learning of monotone Boolean functions, 7th Scand. Workshop on Algorithm Theory SWAT’2000, LNCS 1851, 504–512, journal version accepted for J. of Computer and System Sciences Google Scholar
  6. 6.
    A.S. Goldstein, E.M. Reingold: A Fibonacci version of Kraft’s inequality with an application to discrete unimodal search, SIAM J. Computing 22 (1993), 751–777zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    J.C. Jackson: An efficient membership-query algorithm for learning DNF with respect to the uniform distribution, J. of Comp. and Sys. Sci. 55 (1997), 414–440zbMATHCrossRefGoogle Scholar
  8. 8.
    G.H. John, R. Kohavi, K. Pfleger: Irrelevant features and the subset selection problem, 11th Int. Conf. on Machine Learning 1994, Morgan Kaufmann, 121–129Google Scholar
  9. 9.
    D.S. Johnson (ed.): Challenges for Theoretical Computer Science (draft), available at
  10. 10.
    S. Karlin, U. Liberman: Classifications and comparisons of multilocus recombination distribution, Proc. Nat. Acad. Sci. USA 75 (1979), 6332–6336Google Scholar
  11. 11.
    M.J. Kearns, R.E. Schapire: Efficient distribution-free learning of probabilistic concepts, in: Computational Learning Theory and Natural Learning Systems, MIT Press 1994, 289–329 (preliminary version in FOCS’90)Google Scholar
  12. 12.
    R. Kohavi: Feature subset selection as search with probabilistic estimates, in: R. Greiner, D. Subramanian (eds.): Relevance, Proc. 1994 AAAI Fall Symposium, 122–126Google Scholar
  13. 13.
    W. Li, J. Reich: A complete enumeration and classification of two-locus disease models, Human Hereditary (1999)Google Scholar
  14. 14.
    N. Linial, Y. Mansour, N. Nisan: Constant depth circuits, Fourier transform, and learnability, J. of ACM 40 (1993), 607–620zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Y. Mansour: Learning Boolean functions via the Fourier transform, in: Theoretical Advances in Neural Computing and Learning, Kluwer 1994Google Scholar
  16. 16.
    A. Mathur, E.M. Reingold: Generalized Kraft’s inequality and discrete k-modal search, SIAM J. Computing 25 (1996), 420–447zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    J.C. Schlimmer: Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning, 10th Int. Conf. on Machine Learning 1993, Morgan Kaufmann, 284–290Google Scholar
  18. 18.
    J.D. Terwilliger, H.H.H. Göring: Gene mapping in the 20th and 21st centuries: statistical methods, data analysis, and experimental design, Human Biology 72 (2000), 63–132Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Peter Damaschke
    • 1
  1. 1.Mathematical and Computing SciencesChalmers UniversityGöteborgSweden

Personalised recommendations