Advertisement

COBRASTS: A New Approach to Semi-supervised Clustering of Time Series

  • Toon Van CraenendonckEmail author
  • Wannes Meert
  • Sebastijan Dumančić
  • Hendrik Blockeel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11198)

Abstract

Clustering is ubiquitous in data analysis, including analysis of time series. It is inherently subjective: different users may prefer different clusterings for a particular dataset. Semi-supervised clustering addresses this by allowing the user to provide examples of instances that should (not) be in the same cluster. This paper studies semi-supervised clustering in the context of time series. We show that COBRAS, a state-of-the-art active semi-supervised clustering method, can be adapted to this setting. We refer to this approach as COBRASTS. An extensive experimental evaluation supports the following claims: (1) COBRASTS far outperforms the current state of the art in semi-supervised clustering for time series, and thus presents a new baseline for the field; (2) COBRASTS can identify clusters with separated components; (3) COBRASTS can identify clusters that are characterized by small local patterns; (4) actively querying a small amount of semi-supervision can greatly improve clustering quality for time series; (5) the choice of the clustering algorithm matters (contrary to earlier claims in the literature).

Notes

Acknowledgements

We thank Hoang Anh Dau for help with setting up the cDTWSS experiments. Toon Van Craenendonck is supported by the Agency for Innovation by Science and Technology in Flanders (IWT). This research is supported by Research Fund KU Leuven (GOA/13/010), FWO (G079416N) and FWO-SBO (HYMOP-150033).

References

  1. 1.
    Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Discov. 31(3), 606–660 (2017)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of SDM (2004)CrossRefGoogle Scholar
  3. 3.
    Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 59–68. ACM (2004)Google Scholar
  4. 4.
    Begum, N., Ulanova, L., Wang, J., Keogh, E.: Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of SIGKDD (2015)Google Scholar
  5. 5.
    Cao, H., Tan, V.Y.F., Pang, J.Z.F.: A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification. IEEE Trans. Neural Netw. Learn. Syst. 25(12), 2226–2239 (2014)CrossRefGoogle Scholar
  6. 6.
    Chen, Y., et al.: The UCR time series classification archive (2015), http://www.cs.ucr.edu/~eamonn/time_series_data/
  7. 7.
    Dau, H.A., Begum, N., Keogh, E.: Semi-supervision dramatically improves time series clustering under dynamic time warping. In: Proceedings of CIKM (2016)Google Scholar
  8. 8.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. (1985)Google Scholar
  9. 9.
    Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: Proceedings of ICPR (2008)Google Scholar
  10. 10.
    Meert, W.: DTAIDistance (2018),  https://doi.org/10.5281/zenodo.1202379
  11. 11.
    Paparrizos, J., Gravano, L.: Fast and accurate time-series clustering. ACM Trans. Database Syst. 42(2), 8:1–8:49 (2017)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Śmieja, M., Wiercioch, M.: Constrained clustering with a complex cluster structure. Adv. Data Anal. Classif. 11(3), 493–518 (2017)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Van Craenendonck, T., Blockeel, H.: Constraint-based clustering selection. In: Machine Learning. Springer (2017)Google Scholar
  14. 14.
    Van Craenendonck, T., Dumančić, S., Blockeel, H.: COBRA: a fast and simple method for active clustering with pairwise constraints. In: Proceedings of IJCAI (2017)Google Scholar
  15. 15.
    Van Craenendonck, T., Dumančić, S., Van Wolputte, E., Blockeel, H.: COBRAS: fast, iterative, active clustering with pairwise constraints (2018), https://arxiv.org/abs/1803.11060, under submission
  16. 16.
    von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  17. 17.
    von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? In: Workshop on Unsupervised Learning and Transfer Learning (2014)Google Scholar
  18. 18.
    Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means clustering with background knowledge. In: Proceedings of ICML (2001)Google Scholar
  19. 19.
    Wei, L., Keogh, E.: Semi-supervised time series classification. In: Proceedings of ACM SIGKDD (2006)Google Scholar
  20. 20.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS 2003 (2002)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Toon Van Craenendonck
    • 1
    Email author
  • Wannes Meert
    • 1
  • Sebastijan Dumančić
    • 1
  • Hendrik Blockeel
    • 1
  1. 1.Department of Computer ScienceKU LeuvenLeuvenBelgium

Personalised recommendations