Effective Order Preserving Estimation Method

Chen, Chen; Wang, Wei; Wang, Xiaoyang; Yang, Shiyu

doi:10.1007/978-3-319-46922-5_29

Effective Order Preserving Estimation Method

Chen Chen¹⁶,
Wei Wang¹⁶,
Xiaoyang Wang¹⁷ &
…
Shiyu Yang¹⁶

Conference paper
First Online: 21 September 2016

2049 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9877))

Abstract

Order preserving estimation is an estimation method that can retain the original order of the population parameters of interest. It is an important tool in many applications such as data visualization. In this paper, we focus on the population mean as our primary estimation function, and propose effective query processing strategy that can preserve the estimated order to be correct with probabilistic guarantees. We define the cost function as the number of samples taken for all the groups, and our goal is to make the sample size as small as possible. We compare our methods with state-of-the-art near-optimal algorithm in the literature, and achieve up to \(80\,\%\) reduction in the total sample size.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The order is induced from progressive sampling without replacement process. I.e., \(Rank(X_i) < Rank(X_j)\), if \(i < j\).
2.
Same as before we use \(g_i\) as the group with rank i in a candidate order in the rest of Sect. 4 in order to keep the notation clean.

References

Bardenet, R., Maillard, O.A.: Concentration inequalities for sampling without replacement. Bernoulli 21(3), 1361–1385 (2015)
Article MathSciNet MATH Google Scholar
Casella, G., Berger, R.: Statistical Inference. Thomson Learning (2002)
Google Scholar
Chaudhuri, S., Das, G., Narasayya, V.R.: Optimized stratified sampling for approximate query processing. TODS 32(2), 9 (2007)
Article Google Scholar
Chaudhuri, S., Motwani, R., Narasayya, V.R.: On random sampling over joins. In: SIGMOD, pp. 263–274 (1999)
Google Scholar
Cormode, G., Garofalakis, M.N., Haas, P.J., Jermaine, C.: Synopses for massive data: Samples, histograms, wavelets, sketches. Found. Trends Databases 4(1–3), 1–294 (2012)
MATH Google Scholar
DataExpo,: Flight records (2009). http://stat-computing.org/dataexpo/2009/the-data.html
Garofalakis, M.N., Gibbons, P.B.: Approximate query processing: Taming the terabytes. In: VLDB (2001)
Google Scholar
Haas, P.J., Swami, A.N.: Sequential sampling procedures for query size estimation. In: SIGMOD, pp. 341–350 (1992)
Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)
Google Scholar
Kim, A., Blais, E., Parameswaran, A.G., Indyk, P., Madden, S., Rubinfeld, R.: Rapid sampling for visualizations with ordering guarantees. PVLDB 8(5), 521–532 (2015)
Google Scholar
Neyman, J.: On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. J. Royal Stat. Soc. 97(4), 558–625 (1934)
Article MATH Google Scholar
Nirkhiwale, S., Dobra, A., Jermaine, C.M.: A sampling algebra for aggregate estimation. PVLDB 6(14), 1798–1809 (2013)
Google Scholar
Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. In: SIGMOD, pp. 256–276 (1984)
Google Scholar
Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1), 1–12 (2014)
Google Scholar
Vitter, J.S.: Random sampling with a reservoir. ACM TOMS 11(1), 37–57 (1985)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

The University of New South Wales, Sydney, Australia
Chen Chen, Wei Wang & Shiyu Yang
University of Technology, Sydney, Australia
Xiaoyang Wang

Authors

Chen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shiyu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Chen .

Editor information

Editors and Affiliations

Monash University , Clayton, Australia
Muhammad Aamir Cheema
School of Comp. Science a. Engineer, University of New South Wales School of Comp. Science a. Engineer, Sydney, Australia
Wenjie Zhang
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, C., Wang, W., Wang, X., Yang, S. (2016). Effective Order Preserving Estimation Method. In: Cheema, M., Zhang, W., Chang, L. (eds) Databases Theory and Applications. ADC 2016. Lecture Notes in Computer Science(), vol 9877. Springer, Cham. https://doi.org/10.1007/978-3-319-46922-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-46922-5_29
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46921-8
Online ISBN: 978-3-319-46922-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics