Abstract
Classication of partially labeled data requires linking the unlabeled input distribution P(x) with the conditional distribution P(y|x) obtained from the labeled data. The latter should, for example, vary little in high density regions. The key problem is to articulate a general principle behind this and other such reasonable assumptions. In this paper we provide a new approach to semi-supervised learning based on the stability of estimated labels for the unlabeled dataset, e.g a large test set, and the maximization of the mutual label relation. No clustering assumptions are required and the approach remains tractable even for continuous marginal class densities. We demonstrate the approach on synthetic examples and UCI repository datasets.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998.
A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In ICML, pages 19–26. Morgan Kaufmann, San Francisco, CA, 2001.
A. Blum, J. La-erty, M. R. Rwebangira, and R. Reddy. Semi-supervised learning using randomized mincuts. In ICML, 2004.
O. Chapelle, J. Weston, and B. Schöelkopf. Cluster kernels for semi-supervised learning. In NIPS, volume 15, pages 585–592, 2002.
I. Cohen, F.G. Cozman, N. Sebe, M.C. Cirelo, and T.S. Huang. Semi-supervised learning of classi-ers: theory, algorithms, and their application to human-computer interaction. PAMI, 26(12):1553–1566, 2004.
R. P. W. Duin. On the choice of the smoothing parameters for parzen estimators of probability density functions. IEEE Transactions on Computers, 25(11):1175–1179, 1976.
T. Jaakkola, M. Meila, and T. Jebara. Maximum entropy discrimination. In NIPS, volume 12, pages 470–477, 1999.
Tsvi Lissack and King-Sun Fu. Error estimation in pattern recognition via L α — distance between posterior density functions. IEEE Transitions on Information Theory, 22(1):34–45, 1976.
E. Parzen. On the estimation of a probability density function and the mode. Annals of Mathematical Statistics, 33:1065–1076, 1962.
S. Roberts, C.C. Holmes, and D. Denison. Minimum entropy data partitioning using reversible jump markov chain monte carlo. PAMI, 23(8):909–914, 2001.
Martin Szummer and Tommi Jaakkola. Partially labeled classi-cation with markov random walks. In NIPS, volume 14, pages 945–952, 2001.
N. Tishby and N. Slonim. Data clustering by markovian relaxation and the information bottleneck method. In NIPS, pages 640–646, 2000.
V. N. Vapnik. Statistical learning theory. Wiley, NY, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Juszczak, P., Duin, R.P.W. (2005). Learning from a Test Set. In: Kurzyński, M., Puchała, E., Woźniak, M., żołnierek, A. (eds) Computer Recognition Systems. Advances in Soft Computing, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32390-2_22
Download citation
DOI: https://doi.org/10.1007/3-540-32390-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25054-8
Online ISBN: 978-3-540-32390-7
eBook Packages: EngineeringEngineering (R0)