Scalable Active Constrained Clustering for Temporal Data

  • Son T. Mai
  • Sihem Amer-Yahia
  • Ahlame Douzal Chouakria
  • Ky T. Nguyen
  • Anh-Duong Nguyen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10827)

Abstract

In this paper, we introduce a novel interactive framework to handle both instance-level and temporal smoothness constraints for clustering large temporal data. It consists of a constrained clustering algorithm, called CVQE+, which optimizes the clustering quality, constraint violation and the historical cost between consecutive data snapshots. At the center of our framework is a simple yet effective active learning technique, named Border, for iteratively selecting the most informative pairs of objects to query users about, and updating the clustering with new constraints. Those constraints are then propagated inside each data snapshot and between snapshots via two schemes, called constraint inheritance and constraint propagation, to further enhance the results. Experiments show better or comparable clustering results than state-of-the-art techniques as well as high scalability for large datasets.

Keywords

Semi-supervised clustering Active learning Interactive clustering Incremental clustering Temporal clustering 

Notes

Acknowledgment

This work is supported by the CDP Life Project.

References

  1. 1.
    Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: SDM, pp. 333–344 (2004)CrossRefGoogle Scholar
  2. 2.
    Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML (2004)Google Scholar
  3. 3.
    Birgé, L., Rozenholc, Y.: How many bins should be put in a regular histogram. ESAIM: Probab. Stat. 10, 24–45 (2006)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Chakrabarti, D., Kumar, R., Tomkins, A.: Evolutionary clustering. In: SIGKDD, pp. 554–560 (2006)Google Scholar
  5. 5.
    Cohn, D., Caruana, R., Mccallum, A.: Semi-supervised clustering with user feedback. Technical report (2003)Google Scholar
  6. 6.
    Davidson, I.: Two approaches to understanding when constraints help clustering. In: KDD, pp. 1312–1320 (2012)Google Scholar
  7. 7.
    Davidson, I., Basu, S.: A survey of clustering with instance level constraints. TKDD (2007)Google Scholar
  8. 8.
    Davidson, I., Ravi, S.S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SDM, pp. 138–149 (2005)CrossRefGoogle Scholar
  9. 9.
    Davidson, I., Ravi, S.S., Ester, M.: Efficient incremental constrained clustering. In: KDD, pp. 240–249 (2007)Google Scholar
  10. 10.
    Eaton, E., desJardins, M., Jacob, S.: Multi-view clustering with constraint propagation for learning with an incomplete mapping between views. In: CIKM, pp. 389–398 (2010)Google Scholar
  11. 11.
    Eaton, E., desJardins, M., Jacob, S.: Multi-view constrained clustering with an incomplete mapping between views. Knowl. Inf. Syst. 38(1), 231–257 (2014)CrossRefGoogle Scholar
  12. 12.
    Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2005)Google Scholar
  13. 13.
    Huang, R., Lam, W.: Semi-supervised document clustering via active learning with pairwise constraints. In: ICDM, pp. 517–522 (2007)Google Scholar
  14. 14.
    Huang, Y., Mitchell, T.M.: Text clustering with extended user feedback. In: SIGIR, pp. 413–420 (2006)Google Scholar
  15. 15.
    Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: ICPR, pp. 1–4 (2008)Google Scholar
  16. 16.
    Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML, pp. 1073–1080 (2009)Google Scholar
  17. 17.
    Pelleg, D., Baras, D.: K-means with large and noisy constraint sets. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 674–682. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74958-5_67CrossRefGoogle Scholar
  18. 18.
    Chouakria, A.D., Mai, S.T., Amer-Yahia, S.: Scalable active temporal constrained clustering. In: EDBT (2018)Google Scholar
  19. 19.
    Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. IEEE Trans. Knowl. Data Eng. 26(1), 43–54 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Son T. Mai
    • 1
  • Sihem Amer-Yahia
    • 1
  • Ahlame Douzal Chouakria
    • 1
  • Ky T. Nguyen
    • 1
  • Anh-Duong Nguyen
    • 2
  1. 1.CNRSUniv. Grenoble AlpesGrenobleFrance
  2. 2.University of Rennes 1RennesFrance

Personalised recommendations