A novel multi-resolution representation for time series sensor data analysis

Abstract

The evolution of IoT has increased the popularity of all types of sensing devices in a variety of industrial fields and has resulted in enormous growth in the volume of sensor data. Considering the high volume and dimensionality of sensor data, the ability to perform in-depth data analysis and data mining tasks directly on the raw time series sensor data is limited. To solve this problem, we propose a novel dimensional reduction and multi-resolution representation approach for time series sensor data. This approach utilizes an appropriate number of important data points (IDPs) within a certain time series sensor data to produce a corresponding multi-resolution piecewise linear representation (MPLR), called MPLR-IDP. The results of the theoretical analyses and experiments show that MPLR-IDP can reduce the dimensionality while maintaining the important characteristics of time series data. MPLR-IDP can represent the data in a more flexible way to meet diverse needs of different users.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Abbreviations

\(TS_n\) :

A time series with length n

PLR:

Piecewise linear representation

MPLR:

Multi-resolution PLR

BMPLR:

The basic multi-resolution PLR

EMPLR:

The extended multi-resolution PLR

PIPs:

Perceptually important points

TPs:

Turning points

IDPs:

Important data points

TSRSs:

Time series representation standards

\(Num_{seg}\) :

The user-specified number of segments

TFE :

The user-specified fitting error of entire time series

\(MFE_{seg}\) :

The user-specified maximum fitting error of segment

ARI:

Adaptive representation index

SB-Tree:

Specialized binary tree index

OBST:

The optimal binary search tree

LI:

Linear interpolation

LR:

Linear regression

\(seg{{<}} {x},{y}{>}\) :

Segment object from \(v_x\) to \(v_{y}\)

\(es_{{<} x,y{>}}\) :

The fitting error of \(seg{<} {x},{y}{>}\)

BMPLR:

Basic multi-resolution PLR

EMPLR:

Extended multi-resolution PLR

\(DS_m\) :

The time series dataset with m time series

\(MN_\mathrm{TP}\) :

The maximum number of TPs

DCR:

Data compression ratio

TSC:

Time series classification

ST:

Shapelet transformation

TDS:

Time series training dataset

SubTS :

All the time series subsequences set

FSS :

The final shapelets set

References

  1. Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: International conference on foundations of data organization and algorithms. Springer, pp 69–84

  2. Bagnall A, Bostrom A, Large J, Lines J (2016) The great time series classification bake off: an experimental evaluation of recently proposed algorithms. Extended version CoRR. arXiv:1602.01711

  3. Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660

    MathSciNet  Article  Google Scholar 

  4. Chan KP, Fu AWC (1999) Efficient time series matching by wavelets. In: 15th international conference on data engineering, 1999. Proceedings. IEEE, pp 126–133

  5. Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The ucr time series classification archive

  6. Doerr B, Fischer P, Hilbert A, Witt C (2016) Detecting structural breaks in time series via genetic algorithms. Soft Comput 21(16):4707–4720

  7. Fu TC (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181

    Article  Google Scholar 

  8. Fu TC, Chung Fl, Luk R, Ng CM (2008) Representing financial time series based on data point importance. Eng Appl Artif Intell 21(2):277–300

    Article  Google Scholar 

  9. He Q, Dong Z, Zhuang F, Shang T, Shi Z (2012) Fast time series classification based on infrequent shapelets. In: 2012 11th international conference on machine learning and applications (ICMLA), vol 1. IEEE, pp 215–219

  10. Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881

    MathSciNet  Article  Google Scholar 

  11. Keogh EJ, Smyth P (1997) A probabilistic approach to fast pattern matching in time series databases. KDD 1997:24–30

    Google Scholar 

  12. Keogh EJ, Pazzani MJ (1998) An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. KDD 98:239–243

    Google Scholar 

  13. Keogh E, Chu S, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. In: Proceedings IEEE international conference on data mining, 2001. ICDM 2001. IEEE, pp 289–296

  14. Korn F, Jagadish HV, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. ACM Sigmod Record 26:289–300

    Article  Google Scholar 

  15. Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 289–297

  16. Lomet D, Hong M, Nehme R, Zhang R (2008) Transaction time indexing with version compression. Proc VLDB Endow 1(1):870–881

    Article  Google Scholar 

  17. Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1154–1162

  18. Park S, Lee D, Chu WW (1999) Fast retrieval of similar subsequences in long sequence databases. In: 1999 workshop on knowledge and data engineering exchange 1999. (KDEX’99) proceedings. IEEE, pp 60–67

  19. Perng CS, Wang H, Zhang SR, Parker DS (2000) Landmarks: a new model for similarity-based pattern querying in time series databases. In: 16th international conference on data engineering 2000. Proceedings. IEEE, pp 33–42

  20. Pratt KB, Fink E (2002) Search for patterns in compressed time series. Int J Image Graph 2(01):89–106

    Article  Google Scholar 

  21. Qu Y, Wang C, Wang XS (1998) Supporting fast search in time series for movement patterns in multiple scales. In: Proceedings of the seventh international conference on information and knowledge management. ACM, pp 251–258

  22. Shatkay H, Zdonik SB (1996a) Approximate queries and representations for large data sequences. In: Twelfth international conference on data engineering, pp 536–545

  23. Shatkay H, Zdonik SB (1996b) Approximate queries and representations for large data sequences. In: Proceedings of the twelfth international conference on data engineering, 1996. IEEE, pp 536–545

  24. Si YW, Yin J (2013) Obst-based segmentation approach to financial time series. Eng Appl Artif Intell 26:2581–2596

    Article  Google Scholar 

  25. Wan Y, Si YW (2017) A hidden semi-Markov model for chart pattern matching in financial time series. Soft Comput 22(19):6525–6544

    Article  Google Scholar 

  26. Xing Z, Pei J, Yu PS, Wang K (2011) Extracting interpretable features for early classification on time series. In: Proceedings of the 2011 SIAM international conference on data mining. SIAM, pp 247–258

  27. Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1):149–182

    MathSciNet  Article  Google Scholar 

  28. Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 947–956

  29. Yin S, Kaynak O (2015) Big data for modern industry: challenges and trends [point of view]. Proc IEEE 103(2):143–146

    Article  Google Scholar 

  30. Zhang Z, Zhang H, Wen Y, Yuan X (2016) Accelerating time series shapelets discovery with key points. In: Asia-Pacific web conference. Springer, pp 330–342

  31. Zhou DZ, Li MQ (2008) Time series segmentation based on series importance point. Comput Eng 23:14–16

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers and the editors for their insightful comments and suggestions, which are greatly helpful for improving the quality of this paper. This work is supported by the National Natural Science Foundation of China, No.: 61772310, No.: 61702300, No.: 61702302, No.: 61802231; the Science and Technology Development Funds of Shandong Province, No.: 2014GGX101028; the Project of Qingdao Postdoctoral Applied Research.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Xueqing Li.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hu, Y., Ji, C., Zhang, Q. et al. A novel multi-resolution representation for time series sensor data analysis. Soft Comput 24, 10535–10560 (2020). https://doi.org/10.1007/s00500-019-04562-7

Download citation

Keywords

  • Internet of things
  • Time series
  • Piecewise linear representation
  • Multi-resolution representation