Advertisement

Datenbank-Spektrum

, Volume 18, Issue 2, pp 79–87 | Cite as

Data Change Exploration Using Time Series Clustering

  • Leon Bornemann
  • Tobias Bleifuß
  • Dmitri Kalashnikov
  • Felix Naumann
  • Divesh Srivastava
Schwerpunktbeitrag
  • 72 Downloads

Abstract

Analysis of static data is one of the best studied research areas. However, data changes over time. These changes may reveal patterns or groups of similar values, properties, and entities. We study changes in large, publicly available data repositories by modelling them as time series and clustering these series by their similarity. In order to perform change exploration on real-world data we use the publicly available revision data of Wikipedia Infoboxes and weekly snapshots of IMDB.

The changes to the data are captured as events, which we call change records. In order to extract temporal behavior we count changes in time periods and propose a general transformation framework that aggregates groups of changes to numerical time series of different resolutions. We use these time series to study different application scenarios of unsupervised clustering. Our explorative results show that changes made to collaboratively edited data sources can help find characteristic behavior, distinguish entities or properties and provide insight into the respective domains.

References

  1. 1.
    Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering–a decade review. Inf Syst 53:16–38CrossRefGoogle Scholar
  2. 2.
    Alfonseca E, Garrido G, Delort J, Peñas A (2013) WHAD: Wikipedia historical attributes data – historical structured data extraction and vandalism detection from the Wikipedia edit history. Lang Resour Eval 47(4):1163–1190CrossRefGoogle Scholar
  3. 3.
    Bleifuss T, Johnson T, Kalashnikov DV, Naumann F, Shkapenyuk V, Srivastava D (2017) Enabling change exploration (vision). Fourth International Workshop on Exploratory Search in Databases and the Web (ExploreDB), pp 1–3Google Scholar
  4. 4.
    Cetintemel U, Cherniack M, DeBrabant J, Diao Y, Dimitriadou K, Kalinin A, Papaemmanouil O, Zdonik SB (2013) Query steering for interactive data exploration. Conference on Innovative Data Systems Research (CIDR).Google Scholar
  5. 5.
    Dasu T, Johnson T, Marathe A (2006) Database exploration using database dynamics. IEEE Data Eng Bull 29(2):43–59Google Scholar
  6. 6.
    Deligiannidis L, Kochut KJ, Sheth AP (2007) Rdf data exploration and visualization. ACM first workshop on CyberInfrastructure: information management in eScience, pp 39–46Google Scholar
  7. 7.
    Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci (Ny) 239:142–153MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Dividino RQ, Gottron T, Scherp A, Gröner G (2014) From changes to dynamics: dynamics analysis of linked open data sources. Proceedings of the Extended Semantic Web Conference (ESWC).Google Scholar
  9. 9.
    Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77Google Scholar
  10. 10.
    Fu T-C, Chung F-L, Luk R, Ng V (2001) Pattern discovery from stock time series using self-organizing maps. Workshop Notes of KDD 2001 Workshop on Temporal Data Mining, pp 26–29Google Scholar
  11. 11.
    Idreos S, Papaemmanouil O, Chaudhuri S (2015) Overview of data exploration techniques. International Conference on Management of Data (SIGMOD), pp 277–281Google Scholar
  12. 12.
    Iglesias F, Kastner W (2013) Analysis of similarity measures in times series clustering for the discovery of building energy patterns. Energies 6(2):579–597CrossRefGoogle Scholar
  13. 13.
    Keim DA, Kriegel HP (1994) VisDB: database exploration using multidimensional visualization. IEEE Comput Graph Appl 14(5):40–49CrossRefGoogle Scholar
  14. 14.
    Li X, Li Z, Han J, Lee JG (2009) Temporal outlier detection in vehicle traffic data. International Conference on Data Engineering (ICDE), pp 1319–1322Google Scholar
  15. 15.
    Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144MathSciNetCrossRefGoogle Scholar
  16. 16.
    Maule A, Emmerich W, Rosenblum DS (2008) Impact analysis of database schema changes. International Conference on Software Engineering (ICSE). ACM, New York, pp 451–460Google Scholar
  17. 17.
    Mörchen F, Ultsch A, Hoos O (2005) Extracting interpretable muscle activation patterns with time series knowledge mining. Int J Knowledgebased Intell Eng Syst 9(3):197–208Google Scholar
  18. 18.
    Olszewski RT (2001) Generalized feature extraction for structural pattern recognition in time-series data. Tech. rep. Carnegie-Mellon University, School of Computer Science, PittsburghGoogle Scholar
  19. 19.
    Özsoyoglu G, Snodgrass RT (1995) Temporal and real-time databases: a survey. IEEE Trans Knowl Data Eng 7(4):513–532CrossRefGoogle Scholar
  20. 20.
    Papavassiliou V, Flouris G, Fundulaki I, Kotzinos D, Christophides V (2009) On detecting high-level changes in RDF/S KBs. International Semantic Web Conference (ISWC), pp 473–488Google Scholar
  21. 21.
    Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693CrossRefMATHGoogle Scholar
  22. 22.
    Ramoni M, Sebastiani P, Cohen P (2000) Multivariate clustering by dynamics. National Conference on Artificial Intelligence (AAAI), pp 633–638Google Scholar
  23. 23.
    Rebbapragada U, Protopapas P, Brodley CE, Alcock C (2009) Finding anomalous periodic time series. Mach Learn 74(3):281–313CrossRefGoogle Scholar
  24. 24.
    Umbrich J, Decker S, Hausenblas M, Polleres A, Hogan A (2010) Towards dataset dynamics: change frequency of linked open data sources. International Workshop on Linked Data on the Web.Google Scholar
  25. 25.
    Van Der Aalst W (2012) Process mining: overview and opportunities. ACM Trans Manag Inf Syst 3(2):7Google Scholar
  26. 26.
    Velegrakis Y, Miller J, Popa L (2004) Preserving mapping consistency under schema changes. VLDB J 13(3):274–293CrossRefGoogle Scholar
  27. 27.
    Xing Z, Pei J, Yu PS, Wang K (2011) Extracting interpretable features for early classification on time series. SIAM International Conference on Data Mining, pp 247–258Google Scholar

Copyright information

© Springer-Verlag GmbH Deutschland, ein Teil von Springer Nature 2018

Authors and Affiliations

  1. 1.Hasso-Plattner-InstitutUniversität PotsdamPotsdamGermany
  2. 2.AT&T Labs – ResearchBedminsterUSA

Personalised recommendations