Skip to main content

Effective Order Preserving Estimation Method

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9877))

Abstract

Order preserving estimation is an estimation method that can retain the original order of the population parameters of interest. It is an important tool in many applications such as data visualization. In this paper, we focus on the population mean as our primary estimation function, and propose effective query processing strategy that can preserve the estimated order to be correct with probabilistic guarantees. We define the cost function as the number of samples taken for all the groups, and our goal is to make the sample size as small as possible. We compare our methods with state-of-the-art near-optimal algorithm in the literature, and achieve up to \(80\,\%\) reduction in the total sample size.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The order is induced from progressive sampling without replacement process. I.e., \(Rank(X_i) < Rank(X_j)\), if \(i < j\).

  2. 2.

    Same as before we use \(g_i\) as the group with rank i in a candidate order in the rest of Sect. 4 in order to keep the notation clean.

References

  1. Bardenet, R., Maillard, O.A.: Concentration inequalities for sampling without replacement. Bernoulli 21(3), 1361–1385 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  2. Casella, G., Berger, R.: Statistical Inference. Thomson Learning (2002)

    Google Scholar 

  3. Chaudhuri, S., Das, G., Narasayya, V.R.: Optimized stratified sampling for approximate query processing. TODS 32(2), 9 (2007)

    Article  Google Scholar 

  4. Chaudhuri, S., Motwani, R., Narasayya, V.R.: On random sampling over joins. In: SIGMOD, pp. 263–274 (1999)

    Google Scholar 

  5. Cormode, G., Garofalakis, M.N., Haas, P.J., Jermaine, C.: Synopses for massive data: Samples, histograms, wavelets, sketches. Found. Trends Databases 4(1–3), 1–294 (2012)

    MATH  Google Scholar 

  6. DataExpo,: Flight records (2009). http://stat-computing.org/dataexpo/2009/the-data.html

  7. Garofalakis, M.N., Gibbons, P.B.: Approximate query processing: Taming the terabytes. In: VLDB (2001)

    Google Scholar 

  8. Haas, P.J., Swami, A.N.: Sequential sampling procedures for query size estimation. In: SIGMOD, pp. 341–350 (1992)

    Google Scholar 

  9. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)

    Google Scholar 

  10. Kim, A., Blais, E., Parameswaran, A.G., Indyk, P., Madden, S., Rubinfeld, R.: Rapid sampling for visualizations with ordering guarantees. PVLDB 8(5), 521–532 (2015)

    Google Scholar 

  11. Neyman, J.: On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. J. Royal Stat. Soc. 97(4), 558–625 (1934)

    Article  MATH  Google Scholar 

  12. Nirkhiwale, S., Dobra, A., Jermaine, C.M.: A sampling algebra for aggregate estimation. PVLDB 6(14), 1798–1809 (2013)

    Google Scholar 

  13. Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. In: SIGMOD, pp. 256–276 (1984)

    Google Scholar 

  14. Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1), 1–12 (2014)

    Google Scholar 

  15. Vitter, J.S.: Random sampling with a reservoir. ACM TOMS 11(1), 37–57 (1985)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Chen, C., Wang, W., Wang, X., Yang, S. (2016). Effective Order Preserving Estimation Method. In: Cheema, M., Zhang, W., Chang, L. (eds) Databases Theory and Applications. ADC 2016. Lecture Notes in Computer Science(), vol 9877. Springer, Cham. https://doi.org/10.1007/978-3-319-46922-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46922-5_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46921-8

  • Online ISBN: 978-3-319-46922-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics