Skip to main content

Estimating Sufficient Sample Sizes for Approximate Decision Support Queries

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2013)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 190))

Included in the following conference series:

  • 1005 Accesses

Abstract

Sampling schemes for approximate processing of highly selective decision support queries need to retrieve sufficient number of records that can provide reliable results within acceptable error limits. The k-MDI tree is an innovative index structure that supports drawing rich samples of relevant records for a given set of dimensional attribute ranges. This paper describes a method for estimating sufficient sample sizes for decision support queries based on inverse simple random sampling without replacement (SRSWOR). Combined with a k-MDI tree index, this method is shown to offer a reliable approach to approximate query processing for decision support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aouiche, K., Lemire, D.: A comparison of five probabilistic view-size estimation techniques in OLAP. In: DOLAP’07, Lisboa, Portugal (2007)

    Google Scholar 

  2. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18, 509–517 (1975)

    Article  MATH  Google Scholar 

  3. Berenson, M.L., Levine, D.M.: Basic Business Statistics - Concepts and Applications. Prentice Hall, Upper Saddle River (1992)

    Google Scholar 

  4. Chaudhuri, A., Mukerjee, R.: Domain estimation in finite populations. Aust. J. Stat. 27, 135–137 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  5. Chaudhuri, S.: What next? a half-dozen data management research goals for big data and the cloud. In: PODS 2012, Scottsdale, Arizona, USA, 21–23 May 2012

    Google Scholar 

  6. Fisher, D.: Incremental, approximate database queries and uncertainty for exploratory visualization. In: IEEE Symposium on Large Data Analysis and Visualization, Providence, RI, USA, 23–24 October 2011, pp. 73–80 (2011)

    Google Scholar 

  7. Fisher, D., Popov, I., Drucker, S.M., Schraefel, M.: Trust me, i’m partially right: incremental visualization lets analysts explore large datasets faster. In: CHI 2012, Austin, Texas, USA, 5–10 May 2012, pp. 1673–1682 (2012)

    Google Scholar 

  8. Heule, S., Numkesser, M., Hall, A.: HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: EDBT/ICDT’13 2013, Genoa, Italy, 18–22 March 2013

    Google Scholar 

  9. Hobbs, L., Hillson, S., Lawande, S.: Oracle9iR2 Data Warehousing. Elsevier Science, Boston (2003)

    Google Scholar 

  10. Jermaine, C.: Random shuffling of large database tables. IEEE Trans. Knowl. Data Eng. 18(1), 73–84 (2007)

    Article  Google Scholar 

  11. Jermaine, C.: Robust estimation with sampling and approximate pre-aggregation. In: VLDB Conference Proceedings 2003, pp. 886–897 (2003)

    Google Scholar 

  12. Jermaine, C., Pol, A., Arumugam, S.: Online maintenance of very large random samples. In: SIGMOD Conference Proceedings 2004 (2004)

    Google Scholar 

  13. Jin, R., Glimcher, L., Jermaine, C., Agrawal, G.: New sampling-based estimators for OLAP queries. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), Atlanta, GA, USA (2006)

    Google Scholar 

  14. Joshi, S., Jermaine, C.: Materialized sample views for database approximation. IEEE Trans. Knowl. Data Eng. 20(3), 337–351 (2008)

    Article  Google Scholar 

  15. Li, X., Han, J., Yin, Z., Lee, J.-G., Sun, Y.: Sampling cube: a framework for statistical OLAP over sampling data. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD’08), Vancouver, BC, Canada, June (2008)

    Google Scholar 

  16. Olken, F., Rotem, D.: Random sampling from database file: a survey. In: Michalewicz, Z. (ed.) SSDBM 1990. LNCS, vol. 420, pp. 92–111. Springer, Heidelberg (1990)

    Chapter  Google Scholar 

  17. Rudra, A., Gopalan, R.P., Achuthan, N.R.: Efficient sampling techniques in approximate decision support query processing. In: Proceedings of the International Conference on Enterprise Information Systems - ICEIS 2012, Wroclaw, Poland, June 28–July 2 2012

    Google Scholar 

  18. Sangngam, P., Suwatee, P.: Modified sampling scheme in inverse sampling without replacement. In: 2010 International Conference on Networking and Information Technology, pp. 580–584 . IEEE Press, New York (2010)

    Google Scholar 

  19. Spiegel, J., Polyzotis, N.: TuG synopses for approximate query answering. ACM Trans. Database Syst. (TODS) 34(1), 1–56 (2009)

    Article  Google Scholar 

  20. TUN: Teradata University Network. http://www.teradata.com/TUN_databases (2007). Accessed 12 Jun 2007)

  21. TPC-H: Transaction Processing Council. Decision Support Queries. http://www.teradata.com/TUN_databases (2007). Accessed 23 Apr 2007

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raj P. Gopalan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rudra, A., Gopalan, R.P., Achuthan, N.R. (2014). Estimating Sufficient Sample Sizes for Approximate Decision Support Queries. In: Hammoudi, S., Cordeiro, J., Maciaszek, L., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2013. Lecture Notes in Business Information Processing, vol 190. Springer, Cham. https://doi.org/10.1007/978-3-319-09492-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09492-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09491-5

  • Online ISBN: 978-3-319-09492-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics