Advertisement

Similarity-Based Outlier Detection in Multiple Time Series

  • Grzegorz GołaszewskiEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 945)

Abstract

Outlier analysis is very often the first step in data pre-processing. Since it is performed on mostly raw data, it is crucial that algorithms used are fast and reliable. These factors are hard to achieve when the data analysed is highly dimensional, such is the case with multiple time series data sets. In this article, various outlier detection methods (distance distribution-based methods, angle-based methods, k-nearest neighbour, local density analysis) for numerical data are presented and adapted to multiple time series data. The study also addresses the problem of choosing an appropriate similarity measure (L-p norms, Dynamic Time Warping, Edit Distance, Threshold Queries based Similarity) and its impact on efficiency in further analysis. Work has also been put into determining the impact of an approach to apply these measures to multivariate time series data. To compare the different approaches, a set of tests were performed on synthetic and real data.

Notes

Acknowledgment

This work was partially supported by the Faculty of Physics and Applied Computer Science of the AGH University of Science and Technology.

The primary version of this paper was presented at the 3rd Conference on Information Technology, Systems Research and Computational Physics, 2–5 July 2018, Cracow, Poland [14].

References

  1. 1.
    Achtert, E., Kriegel, H.P., Reichert, L., Schubert, E., Wojdanowski, R., Zimek, A.: Visual evaluation of outlier detection models. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) Database Systems for Advanced Applications, pp. 396–399. Heidelberg, Springer, Berlin (2010)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C.: Data Mining: The Textbook. Springer, Heidelberg (2015)zbMATHGoogle Scholar
  3. 3.
    Aggarwal, C.C.: Outlier Analysis, 2nd edn. Springer, Heidelberg (2016)zbMATHGoogle Scholar
  4. 4.
    Aßfalg, J., Kriegel, H.P., Kröger, P., Kunath, P., Pryakhin, A., Renz, M.: Similarity search on time series based on threshold queries. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Boehm, K., Kemper, A., Grust, T., Boehm, C. (eds.) Advances in Database Technology - EDBT 2006, pp. 276–294. Heidelberg, Springer, Berlin (2006)CrossRefGoogle Scholar
  5. 5.
    Ben-Gal, I.: Outlier Detection, pp. 131–146. Springer, Boston (2005)Google Scholar
  6. 6.
    Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. AAAIWS 1994, pp. 359–370. AAAI Press (1994)Google Scholar
  7. 7.
    Bouguessa, M.: Modeling outlier score distributions. In: Zhou, S., Zhang, S., Karypis, G. (eds.) Advanced Data Mining and Applications, pp. 713–725. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Breunig, M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104. ACM (2000)Google Scholar
  9. 9.
    Chen, L., Ng, R.: On the marriage of lp-norms and edit distance. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30. VLDB 2004, pp. 792–803. VLDB Endowment (2004)Google Scholar
  10. 10.
    Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. SIGMOD 2005, pp. 491–502. ACM, New York (2005)Google Scholar
  11. 11.
    Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and mining of time series data: experimental comparison of representations and distance measures. Proc. VLDB Endow. 1(2), 1542–1552 (2008)CrossRefGoogle Scholar
  12. 12.
    Geurts, P.: Contributions to decision tree induction: bias/variance tradeoff and time series classification, January 2002Google Scholar
  13. 13.
    Geurts, P.: Pattern extraction for time series classification. In: De Raedt, L., Siebes, A. (eds.) Principles of Data Mining and Knowledge Discovery, pp. 115–127. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  14. 14.
    Gołaszewski, G.: Similarity-based outlier detection in multiple time series. In: Kulczycki, P., Kowalski, P.A., Łukasik, S. (eds.) Contemporary Computational Science, p. 68. AGH-UST Press, Cracow (2018)Google Scholar
  15. 15.
    Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefGoogle Scholar
  16. 16.
    Itakura, F.: Readings in speech recognition, pp. 154–158. Morgan Kaufmann Publishers Inc., San Francisco (1990)CrossRefGoogle Scholar
  17. 17.
    Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)CrossRefGoogle Scholar
  18. 18.
    Kim, S.W., Park, S., Chu, W.W.: An index-based approach for similarity search supporting time warping in large sequence databases. In: Proceedings of the 17th International Conference on Data Engineering, pp. 607–614. IEEE Computer Society, Washington, DC (2001)Google Scholar
  19. 19.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases. VLDB 1998, pp. 392–403. Morgan Kaufmann Publishers Inc., San Francisco, CA (1998)Google Scholar
  20. 20.
    Kudo, M., Toyama, J., Shimbo, M.: Multidimensional curve classification using passing-through regions. Pattern Recogn. Lett. 20(11), 1103–1111 (1999)CrossRefGoogle Scholar
  21. 21.
    Kuhnt, S., Pawlitschko, J.: Outlier identification rules for generalized linear models. In: Baier, D., Wernecke, K.D. (eds.) Innovations in Classification, Data Science, and Information Systems, pp. 165–172. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  22. 22.
    Kulczycki, P., Charytanowicz, M., Kowalski, P.A., Łukasik, S.: Identification of atypical (rare) elements-a conditional, distribution-free approach. IMA J. Math. Control Inf. (2017, in press)Google Scholar
  23. 23.
    Kulczycki, P., Kruszewski, D.: Identification of atypical elements by transforming task to supervised form with fuzzy and intuitionistic fuzzy evaluations. Appl. Soft Comput. 60(C), 623–633 (2017)CrossRefGoogle Scholar
  24. 24.
    Petrovskiy, M.I.: Outlier detection algorithms in data mining systems. Program. Comput. Software 29(4), 228–237 (2003)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Sakoe, H., Chiba, S.: Readings in Speech Recognition, pp. 159–165. Morgan Kaufmann Publishers Inc., San Francisco (1990)CrossRefGoogle Scholar
  26. 26.
    Schubert, E., Zimek, A., Kriegel, H.P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Disc. 28(1), 190–237 (2014)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Seo, Y.S., Bae, D.H.: On the value of outlier elimination on software effort estimation research. Empirical Software Eng. 18(4), 659–698 (2013)CrossRefGoogle Scholar
  28. 28.
    Shaikh, S.A., Kitagawa, H.: Top-k outlier detection from uncertain data. Int. J. Autom. Comput. 11(2), 128–142 (2014)CrossRefGoogle Scholar
  29. 29.
    Tang, J., Chen, Z., Fu, A.W.C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.S., Yu, P.S., Liu, B. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 535–548. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  30. 30.
    Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., Keogh, E.: Indexing multidimensional time-series. VLDB J. 15(1), 1–20 (2006)CrossRefGoogle Scholar
  31. 31.
    Yang, H., Yang, T.: Outlier mining based on principal component estimation. Acta Math. Applicatae Sin. 21(2), 303–310 (2005)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Yi, B.K., Jagadish, H.V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Proceedings of the Fourteenth International Conference on Data Engineering. ICDE 1998, pp. 201–208. IEEE Computer Society, Washington, DC (1998)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Division for Information Technology and Systems Research, Department of Applied Informatics and Computational Physics, Faculty of Physics and Applied Computer ScienceAGH University of Science and TechnologyKrakówPoland

Personalised recommendations