Advertisement

Error-Bounded Approximation of Data Stream: Methods and Theories

  • Qing XieEmail author
  • Chaoyi Pang
  • Xiaofang Zhou
  • Xiangliang ZhangEmail author
  • Ke Deng
Chapter
Part of the Studies in Big Data book series (SBD, volume 41)

Abstract

Since the development of sensor network and Internet of Things, the volume of data is rapidly increasing and the streaming data has attracted much attention recently. To efficiently process and explore data streams, the compact data representation is playing an important role, since the data approximations other than the original data items are usually applied in many stream mining tasks, such as clustering, classification, and correlation analysis. In this chapter, we focus on the maximum error-bounded approximation of data stream, which represents the streaming data with constrained approximation error on each data point. There are two criteria for the approximation solution: self-adaption over time for varied error bound and real-time processing. We reviewed the existing data approximation techniques and summarized some essential theories such as optimization guarantee. Two optimal linear-time algorithms are introduced to construct error-bounded piecewise linear representation for data stream. One generates the line segments by data convex analysis, and the other one is based on the transformed space, which can be extended to a general model. We theoretically analyzed and compared these two different spaces, and proved the theoretical equivalence between them, as well as the two algorithms.

Notes

Acknowledgements

This work is partially supported by Natural Science Foundation of China (Grant No. 61602353), Natural Science Foundation of Hubei Province (Grant No. 2017CFB505), and the Fundamental Research Funds for the Central Universities (Grant No. WUT:2017IVA053 and WUT:2017IVB028).

References

  1. 1.
    Atzori, L., Lera, A., Morabito, G.: The internet of things: a survey. Comput. Netw. 54, 2787–2805 (2010)CrossRefGoogle Scholar
  2. 2.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)Google Scholar
  3. 3.
    Berg, M.D., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry Algorithms and Applications. Springer, Berlin (2008)zbMATHGoogle Scholar
  4. 4.
    Buragohain, C., Shrivastava, N., Suri, S.: Space efficient streaming algorithms for the maximum error histogram. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 1026–1035 (2007)Google Scholar
  5. 5.
    Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable pla for efficient similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 435–446 (2007)Google Scholar
  6. 6.
    Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. Proc. VLDB Endow. 2, 145–156 (2009)CrossRefGoogle Scholar
  7. 7.
    Gandhi, S., Nath, S., Suri, S., Liu, J.: Gamps: compressing multi sensor data by grouping and amplitude scaling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 771–784 (2009)Google Scholar
  8. 8.
    Gandhi, S., Foschini, L., Suri, S.: Space-efficient online approximation of time series data: streams, amnesia, and out-of-order. In: Proceedings of IEEE 26th International Conference on Data Engineering, pp. 924–935 (2010)Google Scholar
  9. 9.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proceedings of the International Conference on Very Large Data Bases, pp. 79–88 (2001)Google Scholar
  10. 10.
    Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. Inf. Theory 54, 811–830 (2008)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Guha, S., Shim, K.: A note on linear time algorithms for maximum error histograms. IEEE Trans. Knowl. Data Eng. 19, 993–997 (2007)CrossRefGoogle Scholar
  12. 12.
    Jagadish, H.V., Jin, H., Ooi, B.C., Tan, K.L.: Global optimization of histograms. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 223–234 (2001)Google Scholar
  13. 13.
    Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings of the 1st IEEE International Conference on Data Mining, pp. 289–296 (2001)Google Scholar
  14. 14.
    Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 151–162 (2001)Google Scholar
  15. 15.
    Lazaridis, I., Mehrota, S.: Capturing sensor-generated time series with quality guarantees. In: Proceedings of the 19th International Conference on Data Engineering, pp. 429–440 (2003)Google Scholar
  16. 16.
    Li, G., Li, J., Gao, H.: ε-Approximation to data streams in sensor networks. In: Proceedings of IEEE INFOCOM, pp. 1663–1671 (2013)Google Scholar
  17. 17.
    Li, S., Xu, L.D., Zhao, S.: The internet of things: a survey. Inf. Syst. Front. 17, 243–259 (2015)CrossRefGoogle Scholar
  18. 18.
    Nguyen, B., Abiteboul, S., Cobena, G., Preda, M.: Monitoring xml data on the web. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 437–448 (2001)Google Scholar
  19. 19.
    Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 563–574 (2003)Google Scholar
  20. 20.
    O’Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24(9), 574–578 (1981)CrossRefGoogle Scholar
  21. 21.
    Paix, A.D., Williamson, J.A., Runciman, W.B.: Crisis management during anaesthesia: difficult intubation. Qual. Saf. Health Care 14(3), e5 (2005)CrossRefGoogle Scholar
  22. 22.
    Palpanas, T., Vlachos, M., Keogh, E.: Online amnesic approximation of streaming time series. In: Proceedings of the 20th International Conference on Data Engineering, pp. 339–349 (2004)Google Scholar
  23. 23.
    Pang, C., Zhang, Q., Hansen, D., Maeder, A.: Unrestricted wavelet synopses under maximum error bound. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 732–743 (2009)Google Scholar
  24. 24.
    Pang, C., Zhang, Q., Zhou, X., Hansen, D., Wang, S., Maeder, A.: Computing unrestricted synopses under maximum error bound. Algorithmica 65, 1–42 (2013)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Sathe, S., Papaioannou, T.G., Jeung, H., Aberer, K.: A survey of model-based sensor data acquisition and management. In: Managing and Mining Sensor Data, pp. 9–50. Springer, Berlin (2013)Google Scholar
  26. 26.
    Shatkay, H., Zdonik, S.B.: Approximate queries and representations for large data sequences. In: Proceedings of the 12th International Conference on Data Engineering, pp. 536–545 (1996)Google Scholar
  27. 27.
    Soroush, E., Wu, K., Pei, J.: Fast and quality-guaranteed data streaming in resource-constrained sensor networks. In: Proceedings of the 9th ACM International Symposium on Mobile Ad Hoc Networking and Computing, pp. 391–400 (2008)Google Scholar
  28. 28.
    Vullings, H.J.L.M., Verhaegen, M.H.G., Verbruggen, H.B.: Ecg segmentation using time-warping. In: Advances in Intelligent Data Analysis Reasoning About Data, vol. 2, pp. 275–285 (1997)Google Scholar
  29. 29.
    Wu, H., Salzberg, B., Zhang, D.: Online event-driven subsequence matching over financial data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 23–34 (2004)Google Scholar
  30. 30.
    Xie, Q., Huang, Z., Shen, H., Zhou, X., Pang, C.: Efficient and continuous near-duplicate video detection. In: Proceedings of the 12th International Asia-Pacific Web Conference, pp. 260–266 (2010)Google Scholar
  31. 31.
    Xie, Q., Huang, Z., Shen, H.T., Zhou, X., Pang, C.: Quick identification of near-duplicate video sequences with cut signature. World Wide Web J. 15, 355–382 (2012)CrossRefGoogle Scholar
  32. 32.
    Xie, Q., Shang, S., Yuan, B., Pang, C., Zhang, X.: Local correlation detection with linearity enhancement in streaming data. In: Proceedings of the ACM International Conference on Information and Knowledge Management, pp. 309–318 (2013)Google Scholar
  33. 33.
    Xie, Q., Pang, C., Zhou, X., Zhang, X., Deng, K.: Maximum error-bounded piecewise linear representation for online stream approximation. VLDB J. 23, 915–937 (2014)CrossRefGoogle Scholar
  34. 34.
    Xu, Z., Zhang, R., Kotagiri, R., Parampalli, U.: An adaptive algorithm for online time series segmentation with error bound guarantee. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 192–203 (2012)Google Scholar
  35. 35.
    Yu, L., Li, J., Gao, H., Fang, X.: Enabling 𝜖-approximate querying in sensor networks. Proc. VLDB Endow. 2(1), 169–180 (2009)CrossRefGoogle Scholar
  36. 36.
    Zhang, Q., Pang, C., Hansen, D.: On multidimensional wavelet synopses for maximum error bounds. In: Proceedings of 14th International Conference on Database Systems for Advanced Applications, pp. 646–661 (2009)Google Scholar
  37. 37.
    Zhou, M., Wong, M.H.: A segment-wise time warping method for time scaling searching. Inf. Sci. 173, 227–254 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Wuhan University of TechnologyWuhanChina
  2. 2.Ningbo Institute of TechnologyZhejiang UniversityNingboChina
  3. 3.University of QueenslandBrisbaneAustralia
  4. 4.Soochow UniversitySuzhouChina
  5. 5.CEMSEKing Abdullah University of Science and Technology (KAUST)ThuwalKingdom of Saudi Arabia
  6. 6.RMIT UniversityMelbourneAustralia

Personalised recommendations