Data Sampling

Zhang, Qing

doi:10.1007/978-1-4899-7993-3_535-2

Qing Zhang³

29 Accesses

Definition

Repeatedly choosing random numbers according to a given distribution is generally referred to as sampling. It is a popular technique for data reduction and approximate query processing. It allows a large set of data to be summarized as a much smaller data set, the sampling synopsis, which usually provides an estimate of the original data with provable error guarantees. One advantage of the sampling synopsis is easy and efficient. The cost of constructing such a synopsis is only proportional to the synopsis size, which makes the sampling complexity potentially sublinear to the size of the original data. The other advantage is that the sampling synopsis represents parts of the original data. Thus, many query processing and data manipulation techniques that are applicable to the original data can be directly applied on the synopsis.

Historical Background

The notion of representing large data sets through small samples dates back to the end of nineteenth century and has led to...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Author information

Authors and Affiliations

The Australian e-health Research Center, Brisbane, QLD, Australia
Qing Zhang

Authors

Qing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qing Zhang .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, Georgia, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, Ontario, Canada
M. Tamer Özsu

Section Editor information

School of Information Technology and Electrical Engineering, University of Queensland, St Lucia Campus, 4072, Brisbane, QLD, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Zhang, Q. (2017). Data Sampling. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_535-2

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7993-3_535-2
Received: 01 August 2014
Accepted: 14 July 2017
Published: 21 September 2017
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4899-7993-3
Online ISBN: 978-1-4899-7993-3
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Data Sampling

Definition

Historical Background

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Data Sampling

Definition

Historical Background

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Search

Navigation