Abstract
The estimation of the quantiles is pertinent when one is mining data streams. However, the complexity of quantile estimation is much higher than the corresponding estimation of the mean and variance, and this increased complexity is more relevant as the size of the data increases. Clearly, in the context of “infinite” data streams, a computational and space complexity that is linear in the size of the data is definitely not affordable. In order to alleviate the problem complexity, recently, a very limited number of studies have devised incremental quantile estimators [7, 12]. Estimators within this class resort to updating the quantile estimates based on the most recent observation(s), and this yields updating schemes with a very small computational footprint – a constant-time (i.e., O(1)) complexity. In this article, we pursue this research direction and present an estimator that we refer to as a Higher-Fidelity Frugal [7] quantile estimator. Firstly, it guarantees a substantial advancement of the family of Frugal estimators introduced in [7]. The highlight of the present scheme is that it works in the discretized space, and it is thus a pioneering algorithm within the theory of discretized algorithms (The fact that discretized Learning Automata schemes are superior to their continuous counterparts has been clearly demonstrated in the literature. This is the first paper, to our knowledge, that proves the advantages of discretization within the domain of quantile estimation). Comprehensive simulation results show that our estimator outperforms the original Frugal algorithm in terms of accuracy.
B. John Oommen—Chancellor’s Professor; Fellow: IEEE and Fellow: IAPR. This author is also an Adjunct Professor with the University of Agder in Grimstad, Norway.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
With some insight, one sees that this elegant median estimation procedure is similar to the Boyer and Moore algorithm [2] for computing the majority item in a stream, using only a single pass.
- 2.
Clearly, though, such an approach would not be able to handle the case of non-stationary quantile estimation as the positions of the markers would be affected by stale data points.
- 3.
Throughout this paper, there is an implicit assumption that the true quantile lies in [a, b]. However, this is not a limitation of our scheme; the proof is valid for any bounded and probably non-bounded function.
References
Arandjelovic, O., Pham, D.S., Venkatesh, S.: Two maximum entropy-based algorithms for running quantile estimation in nonstationary data streams. IEEE Trans. Circuits Syst. Video Technol. 25(9), 1469–1479 (2015)
Boyer, R.S., Moore, J.S.: MJRTY-a fast majority vote algorithm. In: Boyer, R.S. (ed.) Automated Reasoning: Essays in Honor of Woody Bledsoe, pp. 105–117. Springer, Netherlands (1991). doi:10.1007/978-94-011-3488-0_5
Cao, J., Li, L.E., Chen, A., Bu, T.: Incremental tracking of multiple quantiles for network monitoring in cellular networks. In: Proceedings of the 1st ACM Workshop on Mobile Internet Through Cellular Networks, pp. 7–12. ACM (2009)
Chambers, J.M., James, D.A., Lambert, D., Wiel, S.V.: Monitoring networked applications with incremental quantile estimation. Stat. Sci. 21(4), 463–475 (2006)
Chen, F., Lambert, D., Pinheiro, J.C.: Incremental quantile estimation for massive tracking. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 516–522. ACM (2000)
Jain, R., Chlamtac, I.: The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations. Commun. ACM 28(10), 1076–1085 (1985)
Ma, Q., Muthukrishnan, S., Sandler, M.: Frugal streaming for estimating quantiles. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Space-Efficient Data Structures, Streams, and Algorithms. LNCS, vol. 8066, pp. 77–96. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40273-9_7
Oommen, B.J.: Stochastic searching on the line and its applications to parameter learning in nonlinear optimization. IEEE Trans. Syst. Man Cybern. Part B 27(4), 733–739 (1997)
Schmeiser, B.W., Deutsch, S.J.: Quantile estimation from grouped data: the cell midpoint. Commun. Stat. Simul. Comput. 6(3), 221–234 (1977)
Tierney, L.: A space-efficient recursive procedure for estimating a quantile of an unknown distribution. SIAM J. Sci. Stat. Comput. 4(4), 706–711 (1983)
Yazidi, A., Granmo, O.-C., Oommen, B.J.: A stochastic search on the line-based solution to discretized estimation. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds.) IEA/AIE 2012. LNCS, vol. 7345, pp. 764–773. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31087-4_77
Yazidi, A., Hammer, H.: Quantile estimation using the theory of stochastic learning. In: Proceedings of the 2015 Conference on Research in Adaptive and Convergent Systems, pp. 7–14. ACM (2015)
Yazidi, A., Oommen, B.J.: Novel discretized weak estimators based on the principles of the stochastic search on the line problem. IEEE Trans. Cybern. 46(12), 2732–2744 (2016)
Yazidi, A., Oommen, B.J., Horn, G., Granmo, O.C.: Stochastic discretized learning-based weak estimation: a novel estimation method for non-stationary environments. Pattern Recognit. 60(C), 430–443 (2016)
Yazidi, Anis Hammer L., H., Oommen, B.J.: Higher-fidelity frugal and accurate quantile estimation using a novel incremental (2017, to be submitted for publication). Journal version
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Yazidi, A., Hammer, H.L., John Oommen, B. (2017). A Higher-Fidelity Frugal Quantile Estimator. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-69179-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69178-7
Online ISBN: 978-3-319-69179-4
eBook Packages: Computer ScienceComputer Science (R0)