An \(\Omega(\frac{1}{\varepsilon} \log \frac{1}{\varepsilon})\) Space Lower Bound for Finding ε-Approximate Quantiles in a Data Stream

  • Regant Y. S. Hung
  • Hingfung F. Ting
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6213)


This paper studies the space complexity of the ε-approximate quantiles problem, which asks for some data structure that enables us to determine, after reading a whole data stream, a φ-quantile (for any 0 ≤ φ ≤ 1) of the stream within some error bound ε. The best known algorithm for the problem uses \(O(\frac{1}{\varepsilon}\log \varepsilon N)\) words where N is the total number of items in the stream, or uses \(O(\frac{1}{\varepsilon}\log |U|)\) words where U is the set of possible items. It is known that the space lower bound of the problem is \(\Omega(\frac{1}{\varepsilon})\) words; however, improvement of this bound is elusive.

In this paper, we prove that any comparison-based algorithm for finding ε-approximate quantiles needs \(\Omega(\frac{1}{\varepsilon} \log \frac{1}{\varepsilon})\) words.


Data Stream Space Complexity Item Memory General Memory Distinct Item 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Agrawal, R., Swami, A.: A one-pass space-efficient algorithm for finding quantiles. In: Proceedings of the 7th International Conference on Management of Data, pp. 28–30 (1995)Google Scholar
  3. 3.
    Alsabti, K., Ranka, S., Singh, V.: A one-pass algorithm for accurately estimating quantiles for disk-resident data. In: Proceedings of 23rd International Conference on Very Large Data Bases, pp. 346–355 (1997)Google Scholar
  4. 4.
    Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: Proceedings of the 23rd ACM Symposium on Principles of Database Systems, pp. 286–296 (2004)Google Scholar
  5. 5.
    Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Proceedings of the 10th Annual European Symposium on Algorithms, pp. 348–360 (2002)Google Scholar
  6. 6.
    Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and approximate stream mining of quantiles and frequencies using graphics processors. In: Proceedings of the 24th ACM SIGMOD, pp. 611–622 (2005)Google Scholar
  7. 7.
    Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: Proceedings of the 20th ACM SIGMOD, pp. 58–66 (2001)Google Scholar
  8. 8.
    Guha, S., McGregor, A.: Approximate quantiles and the order of the stream. In: Proceedings of the 25th ACM Symposium on Principles of Database Systems, pp. 273–279 (2006)Google Scholar
  9. 9.
    Jain, R., Chlamtac, I.: The p 2 algorithm for dynamic calculation for quantiles and histograms without storing observations. Communication of ACM 28, 1076–1085 (1985)CrossRefGoogle Scholar
  10. 10.
    Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems 28(1), 51–55 (2003)CrossRefGoogle Scholar
  11. 11.
    Lin, X.: Continuously maintaining order statistics over data streams: extended abstract. In: Proceedings of the 18th Conference on Australasian Database, pp. 7–10 (2007)Google Scholar
  12. 12.
    Lin, X., Lu, H., Xu, J., Yu, J.X.: Continuously maintaining quantile summaries of the most recent N elements over a data stream. In: Proceedings of the 20th International Conference on Data Engineering, pp. 362–374 (2004)Google Scholar
  13. 13.
    Lin, X., Xu, J., Zhang, Q., Lu, H., Zhou, X., Yuan, Y.: Approximate processing of massive continuous quantile queries over high-speed data streams. IEEE Transactions on Knowledge and Data Engineering (TKDE) 18(5), 683–698 (2006)CrossRefGoogle Scholar
  14. 14.
    Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate medians and other quantiles in one pass and with limited memory. In: Proceedings of the 17th ACM SIGMOD, pp. 426–435 (1998)Google Scholar
  15. 15.
    Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Random sampling techniques for space efficient online computation of order statistics of large datasets. In: Proceedings of the 18th ACM SIGMOD, pp. 251–262 (1999)Google Scholar
  16. 16.
    Misra, J., Gries, D.: Finding repeated elements. Science of Computer Programming 2, 143–152 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Munro, J.I., Paterson, M.: Selection and sorting with limited storage. Theoretical Computer Science 12, 315–323 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Pohl, I.: A minimum storage algorithm for computing median. IBM Research Report RC 2701, IBM T.J. Watson Center (1969)Google Scholar
  19. 19.
    Shrivastava, N., Buragohain, C., Agrawal, D., Suri, S.: Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, pp. 239–249 (2004)Google Scholar
  20. 20.
    Xu, J., Lin, X., Zhou, X.: Space efficient quantile summary for constrained sliding windows on a data stream. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 34–44. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  21. 21.
    Yao, F.F.: On lower bounds for selection problems. Technical Report MAC TR-121, Massachusetts Institute of Technology (1974)Google Scholar
  22. 22.
    Zhang, Q., Wang, W.: A fast algorithm for approximate quantiles in high speed data streams. In: Proceedings of the 19th International Conference on Statistical and Scientific Database Management (SSDBM), p. 29 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Regant Y. S. Hung
    • 1
  • Hingfung F. Ting
    • 1
  1. 1.The University of Hong KongHong Kong

Personalised recommendations