Generalizing from Example Clusters

Hu, Pan; Vens, Celine; Verstrynge, Bart; Blockeel, Hendrik

doi:10.1007/978-3-642-40897-7_5

Pan Hu^22,23,
Celine Vens²²,
Bart Verstrynge²² &
…
Hendrik Blockeel^22,24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8140))

Included in the following conference series:

International Conference on Discovery Science

1253 Accesses
1 Citations

Abstract

We consider the following problem: Given a set of data and one or more examples of clusters, find a clustering of the whole data set that is consistent with the given clusters. This is essentially a semi-supervised clustering problem, but it differs from previously studied semi-supervised clustering settings in significant ways. Earlier work has shown that none of the existing methods for semi-supervised clustering handle this problem well. We identify two reasons for this, which are related to the default metric learning methods not working well in this situation, and to overfitting behavior. We investigate the latter in more detail and propose a new method that explicitly guards against overfitting. Experimental results confirm that the new method generalizes much better. Several other problems identified here remain open.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery 11(1), 5–33 (2005)
Article MathSciNet Google Scholar
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research 6, 937–965 (2005)
MathSciNet MATH Google Scholar
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proceedings of 19th International Conference on Machine Learning (ICML 2002) (2002)
Google Scholar
Bilenko, M., Basu, S., Mooney, R.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML, pp. 81–88 (2004)
Google Scholar
Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2(2), 139–172 (1987)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Google Scholar
Grira, N., Crucianu, M., Boujemaa, N.: Unsupervised and Semi-supervised Clustering: a Brief Survey. A Review of Machine Learning Techniques for Processing Multimedia Content, Report of the MUSCLE European Network of Excellence, FP6 (2004)
Google Scholar
Mahalanobis, P.C.: On the generalised distance in statistics. In: Proceedings National Institute of Science, India, pp. 49–55 (1936)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: Proceedings 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 591(1), pp. 586–591 (1991)
Google Scholar
Vens, C., Verstrynge, B., Blockeel, H.: Semi-supervised clustering with example clusters. In: Proceedings of the 5th International Conference on Knowledge Discovery and Information Retrieval (accepted, 2013)
Google Scholar
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1103–1110 (2000)
Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means clustering with background knowledge. In: ICML, pp. 577–584. Morgan Kaufmann (2001)
Google Scholar
Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2011)
Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems, vol. 15, pp. 505–512. MIT Press (2002)
Google Scholar
Yeung, D., Chang, H.: Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints. Pattern Recognition 39(5), 1007–1010 (2006)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Departement of Computer Science, KU Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
Pan Hu, Celine Vens, Bart Verstrynge & Hendrik Blockeel
Ecole des Mines, Saint-Etienne, France
Pan Hu
Leiden Institute of Advanced Computer Science, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Hendrik Blockeel

Authors

Pan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Celine Vens
View author publications
You can also search for this author in PubMed Google Scholar
Bart Verstrynge
View author publications
You can also search for this author in PubMed Google Scholar
Hendrik Blockeel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

TU Darmstadt, Germany
Johannes Fürnkranz
Phillips-Universität Marburg, Germany
Eyke Hüllermeier
The Institute of Statistical Mathematics, Tokyo, Japan
Tomoyuki Higuchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, P., Vens, C., Verstrynge, B., Blockeel, H. (2013). Generalizing from Example Clusters. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds) Discovery Science. DS 2013. Lecture Notes in Computer Science(), vol 8140. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40897-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-40897-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40896-0
Online ISBN: 978-3-642-40897-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics