Advertisement

An Improvement of PAA on Trend-Based Approximation for Time Series

  • Chunkai Zhang
  • Yingyang Chen
  • Ao Yin
  • Zhen Qin
  • Xing Zhang
  • Keli Zhang
  • Zoe L. Jiang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11335)

Abstract

Piecewise Aggregate Approximation (PAA) is a competitive basic dimension reduction method for high-dimensional time series mining. When deployed, however, the limitations are obvious that some important information will be missed, especially the trend. In this paper, we propose two new approaches for time series that utilize approximate trend feature information. Our first method is based on relative mean value of each segment to record the trend, which divide each segment into two parts and use the numerical average respectively to represent the trend. We proved that this method satisfies lower bound which guarantee no false dismissals. Our second method uses a binary string to record the trend which is also relative to mean in each segment. Our methods are applied on similarity measurement in classification and anomaly detection, the experimental results show the improvement of accuracy and effectiveness by extracting the trend feature suitably.

Keywords

Time series Similarity measurement Trend distance 

Notes

Acknowledgment

This study is supported by the Shenzhen Research Council (Grant No. JSGG2017-0822160842949, JCYJ20170307151518535).

References

  1. 1.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)Google Scholar
  2. 2.
    Cantrell, C.D.: Modern mathematical methods for physicists and engineers. Measur. Sci. Technol. 12(12), 2211 (2001)CrossRefGoogle Scholar
  3. 3.
    Chan, K.P., Fu, W.C.: Efficient time series matching by wavelets. In: 1999 Proceedings of International Conference on Data Engineering, pp. 126–133 (1999)Google Scholar
  4. 4.
    Chen, Y., et al.: The UCR time series classification archive, July 2015. www.cs.ucr.edu/eamonn/time_series_data/
  5. 5.
    Chomboon, K., Chujai, P., Teerarassammee, P., Kerdprasop, K., Kerdprasop, N.: An empirical study of distance metrics for k-nearest neighbor algorithm. In: International Conference on Industrial Application Engineering, pp. 280–285 (2015)Google Scholar
  6. 6.
    Dersch, D.R., Dersch, D.R., Leinsinger, G.L., Hahn, K., Auer, D.: Cluster analysis of biomedical image time-series. Int. J. Comput. Vis. 46(2), 103–128 (2002)CrossRefGoogle Scholar
  7. 7.
    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: International Conference on Management of Data, vol. 23, no. 2, pp. 419–429 (1994)CrossRefGoogle Scholar
  8. 8.
    Guo, C., Li, H., Pan, D.: An improved piecewise aggregate approximation based on statistical features for time series mining. In: Bi, Y., Williams, M.-A. (eds.) KSEM 2010. LNCS (LNAI), vol. 6291, pp. 234–244. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15280-1_23CrossRefGoogle Scholar
  9. 9.
    Himberg, J., HyvÃrinen, A., Esposito, F.: Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 22(3), 1214–1222 (2004)CrossRefGoogle Scholar
  10. 10.
    Hu, L.Y., Huang, M.W., Ke, S.W., Tsai, C.F.: The distance function effect on k-nearest neighbor classification for medical datasets. Springerplus 5(1), 1304 (2016)CrossRefGoogle Scholar
  11. 11.
    Kahveci, T., Singh, A.: Variable length queries for time series data. In: 2001 Proceedings of International Conference on Data Engineering, p. 273 (2002)Google Scholar
  12. 12.
    Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefGoogle Scholar
  13. 13.
    Landesberger, T.V., Brodkorb, F., Roskosch, P.: Mobilitygraphs: visual analysis of mass mobility dynamics via spatia-temporal graphs and clustering. IEEE Trans. Vis. Comput. Graph. 22(1), 11–20 (2016)CrossRefGoogle Scholar
  14. 14.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11 (2003)Google Scholar
  15. 15.
    Paparrizos, J., Gravano, L.: k-Shape: efficient and accurate clustering of time series. ACM SIGMOD Rec. 45, 69–76 (2016)CrossRefGoogle Scholar
  16. 16.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition, vol. 1, pp. 353–356. Prentice-Hall, Inc., Upper Saddle River (1993)Google Scholar
  17. 17.
    Rodriguez, A.C., Mozos, M.R.D.L.: Improving network security through traffic log anomaly detection using time series analysis. In: Herrero, Á., Corchado, E., Redondo, C., Alonso, Á. (eds.) Computational Intelligence in Security for Information Systems 2010. Advances in Intelligent and Soft Computing, vol. 85, pp. 125–133. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-16626-6_14CrossRefGoogle Scholar
  18. 18.
    Rui, N., Horta, N.: A new SAX-GA methodology applied to investment strategies optimization. In: Conference on Genetic and Evolutionary Computation, pp. 1055–1062 (2012)Google Scholar
  19. 19.
    Shokoohi-Yekta, M., Chen, Y., Campana, B., Hu, B., Zakaria, J., Keogh, E.: Discovery of meaningful rules in time series. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1085–1094 (2015)Google Scholar
  20. 20.
    Rhea, S., Wang, E., Wong, E., Atkins E., Storer, N.: Littletable: a time-series database and its uses. In: ACM International Conference on Management of Data, pp. 125–138 (2017)Google Scholar
  21. 21.
    Sun, Y., Li, J., Liu, J., Sun, B., Chow, C.: An improvement of symbolic aggregate approximation distance measure for time series. Neurocomputing 138(11), 189–198 (2014)CrossRefGoogle Scholar
  22. 22.
    Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: International Conference, pp. 1033–1040 (2006)Google Scholar
  23. 23.
    Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary LP norms. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 385–394 (2000)Google Scholar
  24. 24.
    Yong, Z., Tan, X., Xi, H.: A novel approach to network security situation awareness based on multi-perspective analysis. In: International Conference on Computational Intelligence and Security, pp. 768–772 (2007)Google Scholar
  25. 25.
    Yu, Q., Jibin, L., Jiang, L.: An improved arima-based traffic anomaly detection algorithm for wireless sensor networks. Int. J. Distrib. Sensor Netw. 2016, 1–9 (2016)Google Scholar
  26. 26.
    Zhang, C., Yin, A., Liu, H., Zhang, J.: Design and application of electrocardiograph diagnosis system based on multifractal theory. In: Sun, G., Liu, S. (eds.) ADHIP 2017. LNICST, vol. 219, pp. 433–447. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-73317-3_50CrossRefGoogle Scholar
  27. 27.
    Zhang, C., Yin, A., Deng, Y., Tian, P., Wang, X., Dong, L.: A novel anomaly detection algorithm based on trident tree. In: Luo, M., Zhang, L.-J. (eds.) CLOUD 2018. LNCS, vol. 10967, pp. 295–306. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-94295-7_20CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Chunkai Zhang
    • 1
  • Yingyang Chen
    • 1
  • Ao Yin
    • 1
  • Zhen Qin
    • 1
  • Xing Zhang
    • 2
  • Keli Zhang
    • 2
  • Zoe L. Jiang
    • 1
  1. 1.Department of Computer Science and TechnologyHarbin Institute of TechnologyShenzhenChina
  2. 2.Engineering Laboratory for Big Data Collaborative Security TechnologyBeijingChina

Personalised recommendations