Abstract
Post-stratification is used when the size of each stratum is known, but frame of each stratum is unknown. Assumed strata contain clusters of unequal size; a random sample of some clusters is drawn and post-stratified according to the existing stratum of the population. This paper considers the mean estimation problem under the post-stratified cluster sampling setup. A modified weight structure is proposed to combine different cluster means. Attempt is made to obtain the optimum variance and estimate of the variance. The efficiency comparison of estimator is numerically supported by database study. All clustering-related results were obtained from Web Accessible Genome cluster open dataset. Mean estimation under post-stratified cluster sampling scheme shows overall accuracy of 92%.
1 Introduction
In sample surveys, when stratified sampling could not apply for the estimation purpose because of unknown frames of each stratum, then the use of post-stratification is well recognized [1,2,3,4].
In stratified setup, stratum sizes are manageable but list of stratum units is often hard to get. Moreover, stratum frames may incomplete or overlapping or several population units may fall under multiple strata while classification. Under these circumstances, post-stratification is a useful sampling design. According to Shukatme et al., the post-stratification is as precise as the stratified sampling with proportional allocation subject to condition of a large sample. Jagar et al. advocated that the post-stratification with respect to relevant criteria may improve estimation strategy subsequently over the sample mean or ratio estimator [5,6,7,8].
Singh and Singh [23] proposed a class of estimators in cluster sampling. Consider stratified setup of population and suppose every stratum contains clusters of unequal size; a random sample of clusters is post-stratified according to the structure of stratum in the population [9, 10]. This constitutes post-stratified cluster design useful and closer to the real-life survey situations. This paper considers the estimation problem under this design.
2 Notation
Let a finite population of N clusters each of unequal size, divided into K strata, ith stratum contains Ni clusters and random sample of n clusters is drawn from the population (n < N) using SRSWOR [21]. The sample is post-stratified such that ni clusters are from Ni. In what follows notations are as under:
- \( Y \) :
-
variable under study.
- \( Y_{ijl} \) :
-
lth value of jth cluster of ith stratum in population.
- \( M_{ij} \) :
-
size of jth cluster of ith stratum.
- \( y_{ij} \) :
-
mean of jth cluster of ith strata included in sample.
- \( \bar{y}_{i} \) :
-
mean in sample of ith stratum.
- \( \bar{\bar{Y}}_{i} \) :
-
mean of cluster means in population.
- \( W_{i} \) :
-
\( \frac{{N_{i} }}{N} \) population proportion of clusters in ith strata.
- \( p_{i} \) :
-
\( \frac{{n_{i} }}{n} \) sample proportion of clusters from ith strata.
- \( \bar{Y}_{i} \) :
-
population means of ith strata.
- \( \bar{Y} \) :
-
grand mean of population.
- \( \bar{\bar{Y}}_{{N_{i} }} \) :
-
mean of cluster means in population of ith stratum.
3 Proposed Estimator
Assume probability of ni being zero, the usual post-stratified estimator for mean [11, 12]
Agarwal and Panda [25] suggested to utilize the proportion
In order to design a new weight structure for combining different strata means like [22, 23]
we propose an estimator
where PSC stands for “post-stratified cluster design” [24, 25] and
\( y_{i}^{*} \) is an unbiased estimator of \( \overline{{Y_{i} }} \) under the condition of given ni.
\( \overline{{Y_{\text{PSC}} }} \) is unbiased for \( \sum\limits_{i = 1}^{k} {W_{i} \bar{Y}_{i} } \) with variance.
To evaluate variance expression, following standard results are used.
Now expanding the first term or RHS of expression 16, we have
Adding Eqs. (17) and (18), we get the final expression
The minimum variance of the estimator
The optimum value of α could be obtained by differentiating the variance expression with respect to α and evaluating it to zero. The equation would provide the value of αopt = (1 + p)−1 whose substitution provides required results [13,14,15,16].
The term P is a ratio of two quantities, and therefor, in repeated surveys of relatively shorter duration over the same characteristics, it would be a stable quantity and could be guessed by an expert survey practitioner [17,18,19,20].
4 Estimate of Variance
Unbiased estimator of
4.1 Robustness of Estimator
Consider a very small quantity ε and replace P by
In the initial phase of analysis, the data set is divided into six partitions, and again, these partitions are subdivided into two parts. All these partitions are compared with four parameters. All these parameters are related to population under consideration. These population parameters are used to calculate percentage gain which plays an important role in the analysis.
In the second phase of the analysis, the data set is partitioned into six groups and these groups are compared with four parameters. Efficiency comparisons can be performed with the help of these four parameters.
Results in Table 3 explain the robustness, and the efficiency of post-stratification is utilized when the measure of every stratum is known yet casing of every stratum is obscure. Tables 4, 5, and 6 explain the second phase of the results. In the context of this paper, strata are formed with help of clusters of different sizes. Two parameters, mean estimation and post-stratified sample, form the basis for this study. Precision and recall are improved by a modified weight structure.
5 Conclusion and Future Work
Table 1 describes the results associated with percentage gain in which parameter yps is used. In initial phase, database is divided into six groups, and these groups are further divided into strata into different sizes. These strata are analyzed with different estimated parameters N, X, Y, and S. Table 2 describes the efficiency parameter evaluation with the help of v and vmin parameters in which database is divided into six clusters. Table 3 describes robustness and efficiency comparison of estimators with several parameters in which data set is divided into eight clusters. Table 4 discusses the calculation of population parameters which plays an important role in calculation of estimator phase 1. Table 5 explains the evaluation of population parameters which play an important role in calculation of estimators in phase 2. In these two phases, data set is classified into six clusters. Table 6 describes the efficiency comparison of estimated parameters in phase 3 in which dataset is classified into eight groups.
Figure 1 describes the results of population parameters with different values of intensity parameter. Figure 2 explains the efficiency comparison of contraction factor with number of individual population with different forms of accuracy. Finally, Fig. 3 describes the sample size and sampling errors which is used estimate robustness and performance of the proposed estimator. Proposed estimator shows significant classification accuracy compared to classical estimators like shrunken estimator which gives 88%. Mean estimation under post-stratified cluster sampling scheme shows overall accuracy of 92%. These results can be compared with estimators related to probability distribution as a part of future work.
References
Dr. M Raja Sekar et al., “ Tongue Image Analysis For Hepatitis Detection Using GA-SVM,” Indian Journal of computer science and Engineering, Vol 8 No 4, pp. , August 2017.
Dr. M Raja Sekar et al, “ Mammogram Images Detection Using Support Vector Machines,” International Journal of Advanced Research in Computer Science “,Volume 8, No. 7 pp. 329–334, July – August 2017.
Dr. M Raja Sekar et al., “ Areas categorization by operating support Vector machines”, ARPN Journal of Engineering and Applied Sciences”, Vol. 12, No.15, pp. 4639–4647, Aug 2017.
Dr. M Raja Sekar, “Diseases Identification by GA-SVMs”, International Journal of Innovative Research in Science, Engineering and Technology, Vol 6, Issue 8, pp. 15696–15704, August 2017.
Dr. M Raja Sekar., “Classification of Synthetic Aperture Radar Images using Fuzzy SVMs”, International Journal for Research in Applied Science & Engineering Technology (IJRASET), Volume 5 Issue 8, pp. 289–296, Vol 45, August 2017.
Dr. M Raja Sekar, “Breast Cancer Detection using Fuzzy SVMs”, International Journal for Research in Applied Science & Engineering Technology (IJRASET)”, Volume 5 Issue 8, pp. 525–533, Aug,2017.
Dr. M Raja Sekar , “Software Metrics in Fuzzy Environment” , International Journal of Computer & Mathematical Sciences(IJCMS) , Volume 6, Issue 9, September 2017.
Dr. M Raja Sekar, “Interactive Fuzzy Mathematical Modeling in Design of Multi-Objective Cellular Manufacturing Systems”, International Journal of Engineering Technology and Computer Research (IJETCR), Volume 5, Issue 5, pp 74–79, September-October: 2017.
Dr. M Raja Sekar, “Optimization of the Mixed Model Processor Scheduling”, International Journal of Engineering Technology and Computer Research (IJETCR), Volume 5, Issue 5, pp 74–79, September-October: 2017.
Dr. M Raja Sekar, “Fuzzy Approach to Productivity Improvement”, International Journal of Computer & Mathematical Sciences , Volume 6, Issue 9, pp 145–149, September 2017.
Dr. M Raja Sekar et al., “An Effective Atlas-guided Brain image identification using X-rays”, International Journal of Scientific & Engineering Research, Volume 7, Issue 12, pp 249–258, December-2016.
Dr. M Raja Sekar, “Fractional Programming with Joint Probability Constraints”, International Journal of Innovations & Advancement in Computer Science, Volume 6, Issue 9, pp 338–342, September 2017.
Dr. M Raja Sekar, “Solving Mathematical Problems by Parallel Processors”, “Current Trends in Technology and Science”, Volume 6, Issue 4, PP 734–738.
Seth, G.S., and S.K. Ghosh. Proc. Math. Soc. BHU, Vol. 11, p 111–120,1995.
Seth, G.S., N.Matho and Sigh, S.K., Presented in National seminar on Advances in mathematical, statistical and computational methods in science and Technology, Nov. 29–30, Dept of Applied Maths., I.S.M. Dhanbad.
Fonseca, “Discrete wavelet transform and support vector machine applied to pathological voice signals identification in IEEE International Symphosium”, p 367–374.
Dork, “Selection of scale invariant parts for object class recognition. In IEEE International Conference on Computer Vision, Vol 1, pp. 634–639.
P.Ganesh Kumar, “ Design of Fuzzy Expert Systems for Microarray Data Classification using Novel Genetic Swam Algorithm”, Expert Systems with Applications, pp 1811–1812, 2012.
T.S. Furey, et al., “Support Vector classification and validation of cancer tissue sampling using micro array expression data”, Bioinformatics, pp 906–914, 2000.
Chun-Fu Lin, Wang. Sheng –De, “Fuzzy Support Vector Machines”, IEEE Transaction on Neural Networks, pp. 13–22, 2002.
Shukhatme et.al., “Sampling Theory of Surveys with Applications”, Iowa State University Press, Ames, Iowa, USA, PP 34–39, 1984.
Smith, T.M.F., “Post-stratificatio, “, The Statisticians, pp 31–39, 1991.
Singh, Rajesh and Singh, H.P., “A class of unbiased estimators in cluster sampling”, Jour. Ind. Soc. Ag.Stat., pp 290–297, 1999.
Singh, D. and Choudhary, F.S., “Theory and anlysis of sample survey designs”, Wiley Eastern Limited, New Delhi, 1986.
Agarwal, M.C. and Panda, K.B., “An efficient estimator in post-stratificaton”, Journal of statistics, pp 45–48,2001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Raja Sekar, M., Sandhya, N. (2019). Mean Estimation Under Post-stratified Cluster Sampling Scheme. In: Bapi, R., Rao, K., Prasad, M. (eds) First International Conference on Artificial Intelligence and Cognitive Computing . Advances in Intelligent Systems and Computing, vol 815. Springer, Singapore. https://doi.org/10.1007/978-981-13-1580-0_27
Download citation
DOI: https://doi.org/10.1007/978-981-13-1580-0_27
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1579-4
Online ISBN: 978-981-13-1580-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)