Ranked set sampling (RSS) is a statistical technique that uses auxiliary ranking information of unmeasured sample units in an attempt to select a more representative sample that provides better estimation of population parameters than simple random sampling. However, the use of RSS can be hampered by the fact that a complete ranking of units in each set must be specified when implementing RSS. Recently, to allow ties declared as needed, Frey (Environ Ecol Stat 19(3):309–326, 2012) proposed a modification of RSS, which is to simply break ties at random so that a standard ranked set sample is obtained, and meanwhile record the tie structure for use in estimation. Under this RSS variation, several mean estimators were developed and their performance was compared via simulation, with focus on continuous outcome variables. We extend the work of Frey (2012) to binary outcomes and investigate three nonparametric and three likelihood-based proportion estimators (with/without utilizing tie information), among which four are directly extended from existing estimators and the other two are novel. Under different tie-generating mechanisms, we compare the performance of these estimators and draw conclusions based on both simulation and a data example about breast cancer prevalence. Suggestions are made about the choice of the proportion estimator in general.
Imperfect ranking Isotonic estimation Maximum likelihood Nonparametric estimation Ranking tie Relative efficiency
This is a preview of subscription content, log in to check access
We thank Professor Johan Lim for his comments on an earlier version of this paper, Professor Jesse Frey for sharing his R-Code and the UCI machine learning repository for online data use. We are also thankful to two anonymous referees and an associate editor for their valuable comments which improved an earlier version of this paper.
Compliance with ethical standards
Conflict of interest
No potential conflict of interest was reported by the authors.
Halls LK, Dell TR (1966) Trial of ranked-set sampling for forage yields. For Sci 12:22–26Google Scholar
Hatefi A, Jafari Jozani M (2017) An improved procedure for estimation of malignant breast cancer prevalence using partially rank ordered set samples with multiple concomitants. Stat Methods Med Res 26(6):2552–2566MathSciNetCrossRefGoogle Scholar
Howard RW, Jones SC, Mauldin JK, Beal RH (1982) Abundance, distribution, and colony size estimates for Reticulitermes spp. (Isopter: Rhinotermitidae) in Southern Mississippi. Environ Entomol 11:1290–1293CrossRefGoogle Scholar
Kvam PH (2003) Ranked set sampling based on binary water quality data with covariates. J Agri Biol Environ Stat 8:271–279CrossRefGoogle Scholar
Lichman M (2013) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine, CA. http://archive.ics.uci.edu/ml. Accessed 14 Feb 2018
Nussbaum BD, Sinha BK (1997) Cost effective gasoline sampling using ranked set sampling. In: Proceedings of the section on statistics and the environment, pp 83–87. American Statistical AssociationGoogle Scholar
Ozturk O, Bilgin O, Wolfe DA (2005) Estimation of population mean and variance in flock management: a ranked set sampling approach in a finite population setting. J Stat Comput Simul 75:905–919MathSciNetCrossRefMATHGoogle Scholar