A Constraint Acquisition Method for Data Clustering

  • João M. M. Duarte
  • Ana L. N. Fred
  • Fernando Jorge F. Duarte
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8258)


A new constraint acquisition method for parwise-constrained data clustering based on user-feedback is proposed. The method searches for non-redundant intra-cluster and inter-cluster query-candidates, ranks the candidates by decreasing order of interest and, finally, prompts the user the most relevant query-candidates. A comparison between using the original data representation and using a learned representation (obtained from the combination of the pairwise constraints and the original data representation) is also performed. Experimental results shown that the proposed constraint acquisition method and the data representation learning methodology lead to clustering performance improvements.


Constraint Acquisition Constrained Data Clustering 


  1. 1.
    Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  2. 2.
    Wagstaff, K.L.: Intelligent clustering with instance-level constraints. PhD thesis, Ithaca, NY, USA, Chair-Claire Cardie (2002)Google Scholar
  3. 3.
    Basu, S.: Semi-supervised clustering: probabilistic models, algorithms and experiments. PhD thesis, Austin, TX, USA, Supervisor-Mooney, Raymond J (2005)Google Scholar
  4. 4.
    Davidson, I., Ravi, S.: Clustering with constraints feasibility issues and the k-means algorithm. In: SIAM International Conference on Data Mining (SDM 2005), Newport Beach,CA, pp. 138–149 (2005)Google Scholar
  5. 5.
    Tung, A.K.H., Hou, J., Han, J.: Coe: Clustering with obstacles entities. a preliminary study. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 165–168. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  6. 6.
    Béjar, J., Cortés, U.: Experiments with domain knowledge in unsupervised learning: Using and revising theories. Revista Iberoamericana de Computación. Computación y Sistemas 1(3), 136–144 (1998)Google Scholar
  7. 7.
    Ge, R., Ester, M., Jin, W., Davidson, I.: Constraint-driven clustering. In: KDD 2007: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 320–329. ACM, New York (2007)Google Scholar
  8. 8.
    Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 115–126. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. Advances in Neural Information Processing Systems 23, 892–900 (2010)Google Scholar
  10. 10.
    Jain, P., Kapoor, A.: Active learning for large multi-class problems. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 762–769 (2009)Google Scholar
  11. 11.
    Basu, S., Banjeree, A., Mooney, E., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of the 2004 SIAM International Conference on Data Mining, SDM 2004, pp. 333–344 (2004)Google Scholar
  12. 12.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.J.: Distance metric learning with application to clustering with side-information. In: Becker, S., Thrun, S., Obermayer, K. (eds.) NIPS, pp. 505–512. MIT Press (2002)Google Scholar
  13. 13.
    Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. J. Machine Learning Research 6(1), 937 (2006)MathSciNetGoogle Scholar
  14. 14.
    Hoi, S., Liu, W., Lyu, M., Ma, W.Y.: Learning distance metrics with contextual constraints for image retrieval. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2072–2078 (2006)Google Scholar
  15. 15.
    Duarte, J.M.M., Fred, A.L.N., Duarte, F.J.F.: Evidence accumulation clustering using pairwise constraints. In: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, pp. 293–299 (2012)Google Scholar
  16. 16.
    Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin 28, 1409–1438 (1958)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • João M. M. Duarte
    • 1
    • 2
  • Ana L. N. Fred
    • 1
  • Fernando Jorge F. Duarte
    • 2
  1. 1.Instituto de Telecomunicações, Instituto Superior TécnicoLisboaPortugal
  2. 2.GECAD - Knowledge Engineering and Decision-Support Research Center, Institute of EngineeringPolytechnic of PortoPortugal

Personalised recommendations