Likelihood-Based Sampling from Databases for Rule Induction Methods

  • Shusaku Tsumoto
  • Shoji Hirano
  • Hidenao Abe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6401)


This paper introduces the idea of log-likelihood ratio to measure the similarity between generated training samples and original tracing samples. The ratio is used as a test statistic to determine whether the statistical information of generated training samples(S k ) is almost equivalent to that of original training samples(S 0), denoted by S 0 ≃ S k . If the test statistic obtained rejects the hypothesis S 0 ≃ S k , then these samples are abandoned. Otherwise, the generated samples are accepted and rule induction methods or statistical methods are applied. This method was evaluated to three medical domains. The results show that the proposed method selects training samples which reflect the statistical characteristics of the original training samples although the performance with small samples is not so good.


Training Sample Acceptance Rate Medical Domain Acceptance Ratio Probabilistic Situation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breiman, L., Freidman, J., Olshen, R., Stone, C.: Classification And Regression Trees. Wadsworth International Group, Belmont (1984)zbMATHGoogle Scholar
  2. 2.
    Clark, P., Niblett, T.: The CN2 Induction Algorithm. Machine Learning 3, 261–283 (1989)Google Scholar
  3. 3.
    Edwards, A.W.F.: Likelihood, expanded edition. Johns Hopkins University Press, Baltimore (1992)Google Scholar
  4. 4.
    Efron, B.: The Jackknife, the Bootstrap and Other Resampling Plans. Society for Industrial and Applied Mathematics, Philadelphia (1982)Google Scholar
  5. 5.
    Quinlan, J.R.: C4.5 - Programs for Machine Learning. Morgan Kaufmann, CA (1993)Google Scholar
  6. 6.
    Walker, M.G., Olshen, R.A.: Probability Estimation for Biomedical Classification Problems. In: Proceedings of the 16th SCAMC. McGrawHill, New York (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Shusaku Tsumoto
    • 1
  • Shoji Hirano
    • 1
  • Hidenao Abe
    • 1
  1. 1.Department of Medical Informatics, Faculty of MedicineShimane UniversityJapan

Personalised recommendations