Skip to main content

Error-Bounded Approximation of Data Stream: Methods and Theories

  • Chapter
  • First Online:
Learning from Data Streams in Evolving Environments

Part of the book series: Studies in Big Data ((SBD,volume 41))

  • 1111 Accesses

Abstract

Since the development of sensor network and Internet of Things, the volume of data is rapidly increasing and the streaming data has attracted much attention recently. To efficiently process and explore data streams, the compact data representation is playing an important role, since the data approximations other than the original data items are usually applied in many stream mining tasks, such as clustering, classification, and correlation analysis. In this chapter, we focus on the maximum error-bounded approximation of data stream, which represents the streaming data with constrained approximation error on each data point. There are two criteria for the approximation solution: self-adaption over time for varied error bound and real-time processing. We reviewed the existing data approximation techniques and summarized some essential theories such as optimization guarantee. Two optimal linear-time algorithms are introduced to construct error-bounded piecewise linear representation for data stream. One generates the line segments by data convex analysis, and the other one is based on the transformed space, which can be extended to a general model. We theoretically analyzed and compared these two different spaces, and proved the theoretical equivalence between them, as well as the two algorithms.

Reprinted by permission from Springer Nature: Springer, The VLDB Journal, Maximum error-bounded piecewise linear representation for online stream approximation, Q. Xie et al., ⒸSpringer-Verlag Berlin Heidelberg 2014 (https://doi.org/10.1007/s00778-014-0355-0)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It should be noted that S[i − 1, j] can be δ-representable even if S[i, j] is maximally δ-representable.

  2. 2.

    Without the loss of generality, we assume that xi < xj.

References

  1. Atzori, L., Lera, A., Morabito, G.: The internet of things: a survey. Comput. Netw. 54, 2787–2805 (2010)

    Article  Google Scholar 

  2. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)

    Google Scholar 

  3. Berg, M.D., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry Algorithms and Applications. Springer, Berlin (2008)

    MATH  Google Scholar 

  4. Buragohain, C., Shrivastava, N., Suri, S.: Space efficient streaming algorithms for the maximum error histogram. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 1026–1035 (2007)

    Google Scholar 

  5. Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable pla for efficient similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 435–446 (2007)

    Google Scholar 

  6. Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. Proc. VLDB Endow. 2, 145–156 (2009)

    Article  Google Scholar 

  7. Gandhi, S., Nath, S., Suri, S., Liu, J.: Gamps: compressing multi sensor data by grouping and amplitude scaling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 771–784 (2009)

    Google Scholar 

  8. Gandhi, S., Foschini, L., Suri, S.: Space-efficient online approximation of time series data: streams, amnesia, and out-of-order. In: Proceedings of IEEE 26th International Conference on Data Engineering, pp. 924–935 (2010)

    Google Scholar 

  9. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proceedings of the International Conference on Very Large Data Bases, pp. 79–88 (2001)

    Google Scholar 

  10. Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. Inf. Theory 54, 811–830 (2008)

    Article  MathSciNet  Google Scholar 

  11. Guha, S., Shim, K.: A note on linear time algorithms for maximum error histograms. IEEE Trans. Knowl. Data Eng. 19, 993–997 (2007)

    Article  Google Scholar 

  12. Jagadish, H.V., Jin, H., Ooi, B.C., Tan, K.L.: Global optimization of histograms. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 223–234 (2001)

    Google Scholar 

  13. Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings of the 1st IEEE International Conference on Data Mining, pp. 289–296 (2001)

    Google Scholar 

  14. Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 151–162 (2001)

    Google Scholar 

  15. Lazaridis, I., Mehrota, S.: Capturing sensor-generated time series with quality guarantees. In: Proceedings of the 19th International Conference on Data Engineering, pp. 429–440 (2003)

    Google Scholar 

  16. Li, G., Li, J., Gao, H.: ε-Approximation to data streams in sensor networks. In: Proceedings of IEEE INFOCOM, pp. 1663–1671 (2013)

    Google Scholar 

  17. Li, S., Xu, L.D., Zhao, S.: The internet of things: a survey. Inf. Syst. Front. 17, 243–259 (2015)

    Article  Google Scholar 

  18. Nguyen, B., Abiteboul, S., Cobena, G., Preda, M.: Monitoring xml data on the web. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 437–448 (2001)

    Google Scholar 

  19. Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 563–574 (2003)

    Google Scholar 

  20. O’Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24(9), 574–578 (1981)

    Article  Google Scholar 

  21. Paix, A.D., Williamson, J.A., Runciman, W.B.: Crisis management during anaesthesia: difficult intubation. Qual. Saf. Health Care 14(3), e5 (2005)

    Article  Google Scholar 

  22. Palpanas, T., Vlachos, M., Keogh, E.: Online amnesic approximation of streaming time series. In: Proceedings of the 20th International Conference on Data Engineering, pp. 339–349 (2004)

    Google Scholar 

  23. Pang, C., Zhang, Q., Hansen, D., Maeder, A.: Unrestricted wavelet synopses under maximum error bound. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 732–743 (2009)

    Google Scholar 

  24. Pang, C., Zhang, Q., Zhou, X., Hansen, D., Wang, S., Maeder, A.: Computing unrestricted synopses under maximum error bound. Algorithmica 65, 1–42 (2013)

    Article  MathSciNet  Google Scholar 

  25. Sathe, S., Papaioannou, T.G., Jeung, H., Aberer, K.: A survey of model-based sensor data acquisition and management. In: Managing and Mining Sensor Data, pp. 9–50. Springer, Berlin (2013)

    Google Scholar 

  26. Shatkay, H., Zdonik, S.B.: Approximate queries and representations for large data sequences. In: Proceedings of the 12th International Conference on Data Engineering, pp. 536–545 (1996)

    Google Scholar 

  27. Soroush, E., Wu, K., Pei, J.: Fast and quality-guaranteed data streaming in resource-constrained sensor networks. In: Proceedings of the 9th ACM International Symposium on Mobile Ad Hoc Networking and Computing, pp. 391–400 (2008)

    Google Scholar 

  28. Vullings, H.J.L.M., Verhaegen, M.H.G., Verbruggen, H.B.: Ecg segmentation using time-warping. In: Advances in Intelligent Data Analysis Reasoning About Data, vol. 2, pp. 275–285 (1997)

    Google Scholar 

  29. Wu, H., Salzberg, B., Zhang, D.: Online event-driven subsequence matching over financial data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 23–34 (2004)

    Google Scholar 

  30. Xie, Q., Huang, Z., Shen, H., Zhou, X., Pang, C.: Efficient and continuous near-duplicate video detection. In: Proceedings of the 12th International Asia-Pacific Web Conference, pp. 260–266 (2010)

    Google Scholar 

  31. Xie, Q., Huang, Z., Shen, H.T., Zhou, X., Pang, C.: Quick identification of near-duplicate video sequences with cut signature. World Wide Web J. 15, 355–382 (2012)

    Article  Google Scholar 

  32. Xie, Q., Shang, S., Yuan, B., Pang, C., Zhang, X.: Local correlation detection with linearity enhancement in streaming data. In: Proceedings of the ACM International Conference on Information and Knowledge Management, pp. 309–318 (2013)

    Google Scholar 

  33. Xie, Q., Pang, C., Zhou, X., Zhang, X., Deng, K.: Maximum error-bounded piecewise linear representation for online stream approximation. VLDB J. 23, 915–937 (2014)

    Article  Google Scholar 

  34. Xu, Z., Zhang, R., Kotagiri, R., Parampalli, U.: An adaptive algorithm for online time series segmentation with error bound guarantee. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 192–203 (2012)

    Google Scholar 

  35. Yu, L., Li, J., Gao, H., Fang, X.: Enabling 𝜖-approximate querying in sensor networks. Proc. VLDB Endow. 2(1), 169–180 (2009)

    Article  Google Scholar 

  36. Zhang, Q., Pang, C., Hansen, D.: On multidimensional wavelet synopses for maximum error bounds. In: Proceedings of 14th International Conference on Database Systems for Advanced Applications, pp. 646–661 (2009)

    Google Scholar 

  37. Zhou, M., Wong, M.H.: A segment-wise time warping method for time scaling searching. Inf. Sci. 173, 227–254 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is partially supported by Natural Science Foundation of China (Grant No. 61602353), Natural Science Foundation of Hubei Province (Grant No. 2017CFB505), and the Fundamental Research Funds for the Central Universities (Grant No. WUT:2017IVA053 and WUT:2017IVB028).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Qing Xie or Xiangliang Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xie, Q., Pang, C., Zhou, X., Zhang, X., Deng, K. (2019). Error-Bounded Approximation of Data Stream: Methods and Theories. In: Sayed-Mouchaweh, M. (eds) Learning from Data Streams in Evolving Environments. Studies in Big Data, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-319-89803-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-89803-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-89802-5

  • Online ISBN: 978-3-319-89803-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics