Model Selection with Small Samples

  • Masashi Sugiyama
  • Hidemitsu Ogawa
Conference paper


Recently, a new model selection criterion called the subspace information criterion (SIC) was proposed. SIC gives an unbiased estimate of the generalization error with finite samples. In this paper, we theoretically and experimentally evaluate the effectiveness of SIC in comparison with existing model selection techniques. Theoretical evaluation includes the comparison of the generalization measure, approximation method, and restriction on model candidates and learning methods. The simulations show that SIC outperforms existing techniques especially when the number of training examples is small and the noise variance is large.


Model Selection Bayesian Information Criterion Unbiased Estimate Model Candidate Generalization Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control, vol. AC-19, no. 6, pp. 716–723, (1974).MathSciNetCrossRefGoogle Scholar
  2. [2]
    Sugiura, N.: Further analysis of the data by Akaikes information criterion and the finite corrections. Communications in Statistics. Theory and Methods, vol. 7, no. 1, pp. 1326, (1978).MathSciNetCrossRefGoogle Scholar
  3. [3]
    Murata, N., Yoshizawa, S. and Amari, S.: Net-work information criterion determining the num-ber of hidden units for an artificial neural network model. IEEE Transactions on Neural Networks, vol. 5, no. 6, pp. 865–872, (1994).CrossRefGoogle Scholar
  4. [4]
    Konishi, S. and Kitagawa, G.: Generalized information criterion in model selection. Biometrika, vol. 83, pp. 875–890, (1996).MathSciNetCrossRefMATHGoogle Scholar
  5. [5]
    Schwarz, G.: Estimating the dimension of a model. Annals of Statistics, vol. 6, pp. 461–464, (1978).MathSciNetCrossRefMATHGoogle Scholar
  6. [6]
    Rissanen, J.: Modeling by shortest data description. Automatica, vol. 14, pp. 465–471, (1978).CrossRefMATHGoogle Scholar
  7. [7]
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Berlin: Springer-Verlag, (1995).CrossRefMATHGoogle Scholar
  8. [8]
    Cherkassky, V., Shao, X., Mulier, F.M. and Vapnik, V.N.: Model complexity control for regression using VC generalization bounds. IEEE Transac-tions on Neural Networks, vol. 10, no. 5, pp. 1075–1089, (1999).CrossRefGoogle Scholar
  9. [9]
    Sugiyama, M. and Ogawa, H.: Subspace information criterion for model selection. Neural Computation, (2001), (to appear).Google Scholar
  10. [10]
    Mallows, C.L.: Some comments on CP. Technometrics, vol. 15, no. 4, pp. 661–675, (1973).MATHGoogle Scholar
  11. [11]
    Takeuchi, K.: On the selection of statistical models by AIC. Journal of the Society of Instrument and Contral Engineering, vol. 22, no. 5, pp. 445–453, (1983), (in Japanese).Google Scholar

Copyright information

© Springer-Verlag Wien 2001

Authors and Affiliations

  • Masashi Sugiyama
    • 1
  • Hidemitsu Ogawa
    • 1
  1. 1.Department of Computer ScienceTokyo Institute of TechnologyTokyoJapan

Personalised recommendations