Semi-supervised Clustering by Selecting Informative Constraints

  • Vidyadhar Rao
  • C. V. Jawahar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8251)


Traditional clustering algorithms use a predefined metric and no supervision in identifying the partition. Existing semi-supervised clustering approaches either learn a metric from randomly chosen constraints or actively select informative constraints using a generic distance measure like Euclidean norm. We tackle the problem of identifying constraints that are informative to learn appropriate metric for semi-supervised clustering. We propose an approach to simultaneously find out appropriate constraints and learn a metric to boost the clustering performance. We evaluate clustering quality of our approach using the learned metric on the MNIST handwritten digits, Caltech-256 and MSRC2 object image datasets. Our results on these datasets have significant improvements over the baseline methods like MPCK-MEANS.


Semi-supervised Clustering Constraint Selection Metric Learning 


  1. 1.
    Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Machine Learning-International Workshop then Conference, pp. 19–26 (2002)Google Scholar
  2. 2.
    Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: SDM, pp. 333–344 (2004)Google Scholar
  3. 3.
    Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML, p. 11. ACM (2004)Google Scholar
  4. 4.
    Davidson, I., Ravi, T.: Clustering with constraints: feasibility issues and the fk-means algorithm. In: SDM, pp. 138–149 (2005)Google Scholar
  5. 5.
    Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 115–126. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: ICML, pp. 209–216. ACM (2007)Google Scholar
  7. 7.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)Google Scholar
  8. 8.
    Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)CrossRefGoogle Scholar
  9. 9.
    Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc. (1988)Google Scholar
  10. 10.
    LeCun, Y.: Mnist dataset (2000),
  11. 11.
    Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: ICPR, pp. 1–4. IEEE (2008)Google Scholar
  12. 12.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66(336), 846–850 (1971)CrossRefGoogle Scholar
  13. 13.
    Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Vedaldi, A., Fulkerson, B.: Vlfeat: An open and portable library of computer vision algorithms. ACM Multimedia, 1469–1472 (2010)Google Scholar
  15. 15.
    Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Machine Learning-International Workshop then Conference, pp. 577–584 (2001)Google Scholar
  16. 16.
    Wagstaff, K.L., Basu, S., Davidson, I.: When is constrained clustering beneficial, and why? Ionosphere 58(60.1), 62–63 (2006)Google Scholar
  17. 17.
    Xu, R., Wunsch, D.: Clustering, vol. 10. Wiley-IEEE Press (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Vidyadhar Rao
    • 1
  • C. V. Jawahar
    • 1
  1. 1.IIIT-HyderabadIndia

Personalised recommendations