Abstract
Contrary to the traditional problem facing statisticians that sample sizes are small, the data size in data mining is tremendously huge. It is common in data mining to deal with data sets in gigabytes or even terabytes. It is simply impossible to store a whole data set of such size in the central memory of a computer. However, certain statistical procedures, for instance, the computation of a quantile, require the whole data set to be processed at the same time in the central memory. Therefore, data reduction becomes a necessary step in dealing with huge data sets for those procedures. A desirable data reduction procedure should discard those data with low information contents and retain the data with high information contents. The RSS is essentially a data selection procedure that selects only those data which have high information contents. Therefore the notion of RSS can well be applied for data reduction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science+Business Media New York
About this chapter
Cite this chapter
Chen, Z., Bai, Z., Sinha, B.K. (2004). Ranked Set Sampling as Data Reduction Tools. In: Ranked Set Sampling. Lecture Notes in Statistics, vol 176. Springer, New York, NY. https://doi.org/10.1007/978-0-387-21664-5_7
Download citation
DOI: https://doi.org/10.1007/978-0-387-21664-5_7
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-40263-5
Online ISBN: 978-0-387-21664-5
eBook Packages: Springer Book Archive