Abstract
We address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We (i) highlight the limitations of approaches previously described in the literature which make them unsuitable for non-stationary streams, (ii) describe a novel principle for the utilization of the available storage space, and (iii) introduce two novel algorithms which exploit the proposed principle. Experiments on three large real-world data sets demonstrate that the proposed methods vastly outperform the existing alternatives.
Keywords
- Data Stream
- Streaming Data
- Quantile Estimation
- Large Absolute Error
- Uniform Probability Density Function
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jain, R., Chlamtac, I.: The P 2 algorithm for dynamic calculation of quantiles and histograms without storing observations. Communications of the ACM 28(10), 1076–1085 (1985)
Sgouropoulos, N., Yao, Q., Yastremiz, C.: Matching quantiles estimation. Technical report, London School of Economics (2013)
Buragohain, C., Suri, S.: Quantiles on Streams. In: Encyclopedia of Database Systems, pp. 2235–2240 (2009)
Guha, S., McGregor, A.: Stream order and order statistics: Quantile estimation in random-order streams. SIAM Journal on Computing 38(5), 2044–2059 (2009)
Munro, J.I., Paterson, M.: Selection and sorting with limited storage. Theoretical Computer Science 12, 315–323 (1980)
Gurajada, A.P., Srivastava, J.: Equidepth partitioning of a data set based on finding its medians. TR 90-24, University of Minnesota (1990)
Schmeiser, B.W., Deutsch, S.J.: Quantile estimation from grouped data: The cell midpoint. Communications in Statistics: Simulation and Computation 6(3), 221–234 (1977)
McDermott, J.P., Babu, G.J., Liechty, J.C., Lin, D.K.J.: Data skeletons: simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation. Bayesian Analysis 17, 311–321 (2007)
Vitter, J.S.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1), 37–57 (1985)
Cormode, G., Muthukrishnany, S.: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55(1), 58–75 (2005)
Philips Electronics N.V.: A surveillance system with suspicious behaviour detection. Patent EP1459272A1 (2004)
Lavee, G., Khan, L., Thuraisingham, B.: A framework for a video analysis tool for suspicious event detection. Multimedia Tools and Applications 35(1), 109–123 (2007)
Arandjelović, O.: Contextually learnt detection of unusual motion-based behaviour in crowded public spaces. In: Proc. International Symposium on Computer and Information Sciences, pp. 403–410 (2011)
intellvisions. iQ-Prisons, http://www.intellvisions.com/
iCetana. iMotionFocus, http://www.icetana.com/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Arandjelović, O., Pham, D., Venkatesh, S. (2014). Stream Quantiles via Maximal Entropy Histograms. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8835. Springer, Cham. https://doi.org/10.1007/978-3-319-12640-1_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-12640-1_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12639-5
Online ISBN: 978-3-319-12640-1
eBook Packages: Computer ScienceComputer Science (R0)