Abstract
Since the development of sensor network and Internet of Things, the volume of data is rapidly increasing and the streaming data has attracted much attention recently. To efficiently process and explore data streams, the compact data representation is playing an important role, since the data approximations other than the original data items are usually applied in many stream mining tasks, such as clustering, classification, and correlation analysis. In this chapter, we focus on the maximum error-bounded approximation of data stream, which represents the streaming data with constrained approximation error on each data point. There are two criteria for the approximation solution: self-adaption over time for varied error bound and real-time processing. We reviewed the existing data approximation techniques and summarized some essential theories such as optimization guarantee. Two optimal linear-time algorithms are introduced to construct error-bounded piecewise linear representation for data stream. One generates the line segments by data convex analysis, and the other one is based on the transformed space, which can be extended to a general model. We theoretically analyzed and compared these two different spaces, and proved the theoretical equivalence between them, as well as the two algorithms.
Reprinted by permission from Springer Nature: Springer, The VLDB Journal, Maximum error-bounded piecewise linear representation for online stream approximation, Q. Xie et al., ⒸSpringer-Verlag Berlin Heidelberg 2014 (https://doi.org/10.1007/s00778-014-0355-0)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It should be noted that S[i − 1, j] can be δ-representable even if S[i, j] is maximally δ-representable.
- 2.
Without the loss of generality, we assume that xi < xj.
References
Atzori, L., Lera, A., Morabito, G.: The internet of things: a survey. Comput. Netw. 54, 2787–2805 (2010)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)
Berg, M.D., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry Algorithms and Applications. Springer, Berlin (2008)
Buragohain, C., Shrivastava, N., Suri, S.: Space efficient streaming algorithms for the maximum error histogram. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 1026–1035 (2007)
Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable pla for efficient similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 435–446 (2007)
Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. Proc. VLDB Endow. 2, 145–156 (2009)
Gandhi, S., Nath, S., Suri, S., Liu, J.: Gamps: compressing multi sensor data by grouping and amplitude scaling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 771–784 (2009)
Gandhi, S., Foschini, L., Suri, S.: Space-efficient online approximation of time series data: streams, amnesia, and out-of-order. In: Proceedings of IEEE 26th International Conference on Data Engineering, pp. 924–935 (2010)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proceedings of the International Conference on Very Large Data Bases, pp. 79–88 (2001)
Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. Inf. Theory 54, 811–830 (2008)
Guha, S., Shim, K.: A note on linear time algorithms for maximum error histograms. IEEE Trans. Knowl. Data Eng. 19, 993–997 (2007)
Jagadish, H.V., Jin, H., Ooi, B.C., Tan, K.L.: Global optimization of histograms. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 223–234 (2001)
Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings of the 1st IEEE International Conference on Data Mining, pp. 289–296 (2001)
Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 151–162 (2001)
Lazaridis, I., Mehrota, S.: Capturing sensor-generated time series with quality guarantees. In: Proceedings of the 19th International Conference on Data Engineering, pp. 429–440 (2003)
Li, G., Li, J., Gao, H.: ε-Approximation to data streams in sensor networks. In: Proceedings of IEEE INFOCOM, pp. 1663–1671 (2013)
Li, S., Xu, L.D., Zhao, S.: The internet of things: a survey. Inf. Syst. Front. 17, 243–259 (2015)
Nguyen, B., Abiteboul, S., Cobena, G., Preda, M.: Monitoring xml data on the web. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 437–448 (2001)
Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 563–574 (2003)
O’Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24(9), 574–578 (1981)
Paix, A.D., Williamson, J.A., Runciman, W.B.: Crisis management during anaesthesia: difficult intubation. Qual. Saf. Health Care 14(3), e5 (2005)
Palpanas, T., Vlachos, M., Keogh, E.: Online amnesic approximation of streaming time series. In: Proceedings of the 20th International Conference on Data Engineering, pp. 339–349 (2004)
Pang, C., Zhang, Q., Hansen, D., Maeder, A.: Unrestricted wavelet synopses under maximum error bound. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 732–743 (2009)
Pang, C., Zhang, Q., Zhou, X., Hansen, D., Wang, S., Maeder, A.: Computing unrestricted synopses under maximum error bound. Algorithmica 65, 1–42 (2013)
Sathe, S., Papaioannou, T.G., Jeung, H., Aberer, K.: A survey of model-based sensor data acquisition and management. In: Managing and Mining Sensor Data, pp. 9–50. Springer, Berlin (2013)
Shatkay, H., Zdonik, S.B.: Approximate queries and representations for large data sequences. In: Proceedings of the 12th International Conference on Data Engineering, pp. 536–545 (1996)
Soroush, E., Wu, K., Pei, J.: Fast and quality-guaranteed data streaming in resource-constrained sensor networks. In: Proceedings of the 9th ACM International Symposium on Mobile Ad Hoc Networking and Computing, pp. 391–400 (2008)
Vullings, H.J.L.M., Verhaegen, M.H.G., Verbruggen, H.B.: Ecg segmentation using time-warping. In: Advances in Intelligent Data Analysis Reasoning About Data, vol. 2, pp. 275–285 (1997)
Wu, H., Salzberg, B., Zhang, D.: Online event-driven subsequence matching over financial data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 23–34 (2004)
Xie, Q., Huang, Z., Shen, H., Zhou, X., Pang, C.: Efficient and continuous near-duplicate video detection. In: Proceedings of the 12th International Asia-Pacific Web Conference, pp. 260–266 (2010)
Xie, Q., Huang, Z., Shen, H.T., Zhou, X., Pang, C.: Quick identification of near-duplicate video sequences with cut signature. World Wide Web J. 15, 355–382 (2012)
Xie, Q., Shang, S., Yuan, B., Pang, C., Zhang, X.: Local correlation detection with linearity enhancement in streaming data. In: Proceedings of the ACM International Conference on Information and Knowledge Management, pp. 309–318 (2013)
Xie, Q., Pang, C., Zhou, X., Zhang, X., Deng, K.: Maximum error-bounded piecewise linear representation for online stream approximation. VLDB J. 23, 915–937 (2014)
Xu, Z., Zhang, R., Kotagiri, R., Parampalli, U.: An adaptive algorithm for online time series segmentation with error bound guarantee. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 192–203 (2012)
Yu, L., Li, J., Gao, H., Fang, X.: Enabling 𝜖-approximate querying in sensor networks. Proc. VLDB Endow. 2(1), 169–180 (2009)
Zhang, Q., Pang, C., Hansen, D.: On multidimensional wavelet synopses for maximum error bounds. In: Proceedings of 14th International Conference on Database Systems for Advanced Applications, pp. 646–661 (2009)
Zhou, M., Wong, M.H.: A segment-wise time warping method for time scaling searching. Inf. Sci. 173, 227–254 (2005)
Acknowledgements
This work is partially supported by Natural Science Foundation of China (Grant No. 61602353), Natural Science Foundation of Hubei Province (Grant No. 2017CFB505), and the Fundamental Research Funds for the Central Universities (Grant No. WUT:2017IVA053 and WUT:2017IVB028).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Xie, Q., Pang, C., Zhou, X., Zhang, X., Deng, K. (2019). Error-Bounded Approximation of Data Stream: Methods and Theories. In: Sayed-Mouchaweh, M. (eds) Learning from Data Streams in Evolving Environments. Studies in Big Data, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-319-89803-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-89803-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89802-5
Online ISBN: 978-3-319-89803-2
eBook Packages: EngineeringEngineering (R0)