The Minimum Transfer Cost Principle for Model-Order Selection

  • Mario Frank
  • Morteza Haghir Chehreghani
  • Joachim M. Buhmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)


The goal of model-order selection is to select a model variant that generalizes best from training data to unseen test data. In unsupervised learning without any labels, the computation of the generalization error of a solution poses a conceptual problem which we address in this paper. We formulate the principle of “minimum transfer costs” for model-order selection. This principle renders the concept of cross-validation applicable to unsupervised learning problems. As a substitute for labels, we introduce a mapping between objects of the training set to objects of the test set enabling the transfer of training solutions. Our method is explained and investigated by applying it to well-known problems such as singular-value decomposition, correlation clustering, Gaussian mixture-models, and k-means clustering. Our principle finds the optimal model complexity in controlled experiments and in real-world problems such as image denoising, role mining and detection of misconfigurations in access-control data.


clustering generalization error transfer costs cross-validation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: Ranking and clustering. Journal of the ACM 55, 23:1–23:27 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6), 716–723 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56(1-3), 89–113 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Buhmann, J.M.: Information theoretic model validation for clustering. In: ISIT 2010 (2010)Google Scholar
  5. 5.
    Buhmann, J.M., Chehreghani, M.H., Frank, M., Streich, A.P.: Information theoretic model selection for pattern analysis. In: JMLR: Workshop and Conference Proceedings, vol. 7, pp. 1–8 (2011)Google Scholar
  6. 6.
    Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome biology 3(7) (2002)Google Scholar
  7. 7.
    Eastment, H.T., Krzanowski, W.J.: Cross-validatory choice of the number of components from a principal component analysis. Technometrics 24(1), 73–77 (1982)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing 15(12), 3736–3745 (2006)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Frank, M., Buhmann, J.M., Basin, D.: On the definition of role mining. In: SACMAT, pp. 35–44 (2010)Google Scholar
  10. 10.
    Frank, M., Buhmann, J.M.: Selecting the rank of truncated SVD by Maximum Approximation Capacity. In: IEEE International Symposium on Information Theory, ISIT (2011)Google Scholar
  11. 11.
    Gabriel, K.: Le biplotoutil dexploration de données multidimensionelles. Journal de la Societe Francaise de Statistique 143, 5–55 (2002)Google Scholar
  12. 12.
    Hansen, L.K., Larsen, J.: Unsupervised learning and generalization. In: IEEE Intl. Conf. on Neural Networks, pp. 25–30 (1996)Google Scholar
  13. 13.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)CrossRefzbMATHGoogle Scholar
  14. 14.
    Kuhlmann, M., Shohat, D., Schimpf, G.: Role mining – revealing business roles for security administration using data mining technology. In: SACMAT 2003, p. 179 (2003)Google Scholar
  15. 15.
    Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Computation 16(6), 1299–1323 (2004)CrossRefzbMATHGoogle Scholar
  16. 16.
    Molloy, I., et al.: Mining roles with noisy data. In: SACMAT 2010, pp. 45–54 (2010)Google Scholar
  17. 17.
    Miettinen, P., Vreeken, J.: Model Order Selection for Boolean Matrix Factorization. In: SIGKDD International Conference on Knowledge Discovery and Data Mining (2011)Google Scholar
  18. 18.
    Minka, T.P.: Automatic choice of dimensionality for PCA. In: NIPS, p. 514 (2000)Google Scholar
  19. 19.
    Owen, A.B., Perry, P.O.: Bi-cross-validation of the SVD and the nonnegative matrix factorization. Annals of Applied Statistics 3(2), 564–594 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)CrossRefzbMATHGoogle Scholar
  21. 21.
    Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Streich, A.P., Frank, M., Basin, D., Buhmann, J.M.: Multi-assignment clustering for Boolean data. In: ICML 2009, pp. 969–976 (2009)Google Scholar
  23. 23.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the Number of Clusters in a Dataset via the Gap Statistic. Journal of the Royal Statistical Society, Series B 63, 411–423 (2000)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mario Frank
    • 1
  • Morteza Haghir Chehreghani
    • 1
  • Joachim M. Buhmann
    • 1
  1. 1.Department of Computer ScienceETH ZurichSwitzerland

Personalised recommendations