Mean Estimation Under Post-stratified Cluster Sampling Scheme

Raja Sekar, M.; Sandhya, N.

doi:10.1007/978-981-13-1580-0_27

Mean Estimation Under Post-stratified Cluster Sampling Scheme

M. Raja Sekar¹⁷ &
N. Sandhya¹⁷

Conference paper
First Online: 05 November 2018

835 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 815))

Abstract

Post-stratification is used when the size of each stratum is known, but frame of each stratum is unknown. Assumed strata contain clusters of unequal size; a random sample of some clusters is drawn and post-stratified according to the existing stratum of the population. This paper considers the mean estimation problem under the post-stratified cluster sampling setup. A modified weight structure is proposed to combine different cluster means. Attempt is made to obtain the optimum variance and estimate of the variance. The efficiency comparison of estimator is numerically supported by database study. All clustering-related results were obtained from Web Accessible Genome cluster open dataset. Mean estimation under post-stratified cluster sampling scheme shows overall accuracy of 92%.

Download conference paper PDF

1 Introduction

In sample surveys, when stratified sampling could not apply for the estimation purpose because of unknown frames of each stratum, then the use of post-stratification is well recognized [1,2,3,4].

In stratified setup, stratum sizes are manageable but list of stratum units is often hard to get. Moreover, stratum frames may incomplete or overlapping or several population units may fall under multiple strata while classification. Under these circumstances, post-stratification is a useful sampling design. According to Shukatme et al., the post-stratification is as precise as the stratified sampling with proportional allocation subject to condition of a large sample. Jagar et al. advocated that the post-stratification with respect to relevant criteria may improve estimation strategy subsequently over the sample mean or ratio estimator [5,6,7,8].

Singh and Singh [23] proposed a class of estimators in cluster sampling. Consider stratified setup of population and suppose every stratum contains clusters of unequal size; a random sample of clusters is post-stratified according to the structure of stratum in the population [9, 10]. This constitutes post-stratified cluster design useful and closer to the real-life survey situations. This paper considers the estimation problem under this design.

2 Notation

Let a finite population of N clusters each of unequal size, divided into K strata, ith stratum contains N_i clusters and random sample of n clusters is drawn from the population (n < N) using SRSWOR [21]. The sample is post-stratified such that n_i clusters are from N_i. In what follows notations are as under:

$ Y $ :: variable under study.
$ Y_{ijl} $ :: lth value of jth cluster of ith stratum in population.
$ M_{ij} $ :: size of jth cluster of ith stratum.
$ y_{ij} $ :: mean of jth cluster of ith strata included in sample.
$ \bar{y}_{i} $ :: mean in sample of ith stratum.
$ \bar{\bar{Y}}_{i} $ :: mean of cluster means in population.
$ W_{i} $ :: $ \frac{{N_{i} }}{N} $ population proportion of clusters in ith strata.
$ p_{i} $ :: $ \frac{{n_{i} }}{n} $ sample proportion of clusters from ith strata.
$ \bar{Y}_{i} $ :: population means of ith strata.
$ \bar{Y} $ :: grand mean of population.
$ \bar{\bar{Y}}_{{N_{i} }} $ :: mean of cluster means in population of ith stratum.

3 Proposed Estimator

Assume probability of n_i being zero, the usual post-stratified estimator for mean [11, 12]

$$ \sum\limits_{i = 1}^{k} {W_{i} \bar{Y}_{i} } $$

(1)

$$ \overline{{y_{ps} }} = \sum\limits_{i = 1}^{k} {W_{i} \overline{{{{y}^*}_{i} }} } $$

(2)

Agarwal and Panda [25] suggested to utilize the proportion

$$ p_{i} = n_{i}^{{\prime }} /n\quad {\text{in}}\,{\text{addition}}\,{\text{to}}\,W_{i} $$

(3)

In order to design a new weight structure for combining different strata means like [22, 23]

$$ W_{i}^{*} = [\alpha p_{i} + (1 - \alpha )W_{i} ],\quad \alpha \,{\text{being}}\,{\text{a}}\,{\text{constant}} $$

(4)

we propose an estimator

$$ \overline{{y_{\text{PSC}} }} = \sum\limits_{i = 1}^{k} {W_{i}^{*} \overline{{y_{i}^{*} }} } $$

(5)

where PSC stands for “post-stratified cluster design” [24, 25] and

$$ \bar{y}_{i} = \sum\limits_{j = 1}^{{n_{i} }} {M_{ij} \bar{y}_{ij} /n_{i} } \,\bar{M}_{i} $$

(6)

$ y_{i}^{*} $ is an unbiased estimator of $ \overline{{Y_{i} }} $ under the condition of given n_i.

$ \overline{{Y_{\text{PSC}} }} $ is unbiased for $ \sum\limits_{i = 1}^{k} {W_{i} \bar{Y}_{i} } $ with variance.

$$ {\text{Var}}(\overline{{y_{\text{PSC}} }} ) = \left( {\frac{1}{n} - \frac{1}{N}} \right)\sum\limits_{j = 1}^{{n_{i} }} {M_{ij} \overline{{y_{ij} }} /n_{i} } \overline{{M_{i} }} $$

(7)

$$ E(W_{i}^{*} ) = E\{ E(W_{i}^{*} /n_{i} )\} $$

(8)

$$ E[E\left\{ {\alpha \left( {\frac{{n_{i} }}{n}} \right) + (1 - \alpha )\frac{{N_{i} }}{N}/n_{i} } \right\} $$

(9)

$$ E(\overline{{y_{\text{PSC}} }} ) = E[E\left\{ {\alpha \left( {\frac{{n_{i} }}{n}} \right) + (1 - \alpha )\frac{{N_{i} }}{N}/n_{i} } \right\} $$

(10)

To evaluate variance expression, following standard results are used.

$$ E\left( {\frac{1}{{n_{i} }}} \right) = \frac{1}{{nW_{i} }} + \frac{{(N - n)(1 - W_{i} )}}{{(N - 1)n^{2} W_{i}^{2} }} $$

(11)

$$ E\left( {\frac{{n_{i} }}{n}} \right)^{2} = \frac{1}{{nW_{i} }} + \frac{{(N - n)(1 - W_{i} )}}{{(N - 1)n^{2} W_{i}^{2} }} + w_{i}^{2} $$

(12)

$$ V\left( {\frac{{n_{i} }}{n}} \right) = \frac{(N - n)}{(N - 1)} \cdot \frac{{W_{i} (1 - W_{i} )}}{n} $$

(13)

$$ Cov\left( {\frac{{n_{i} }}{n},\frac{{n_{j} }}{n}} \right) = \frac{(N - n)}{(N - 1)} \cdot \frac{{W_{i} W_{j} }}{n} $$

(14)

$$ S^{2} = \sum\limits_{i = 1}^{k} {\sum\limits_{j = 1}^{S1} {\left[ {\frac{{M_{ij} }}{{\overline{M_{i}} }} - \bar{Y}_{i} } \right]} }^{2} /N - 1 $$

(15)

$$ S_{{b_{i} }}^{2} = \frac{1}{{N_{i} - 1}}\sum\limits_{j = 1}^{{N_{i} }} {\left( {\frac{{M_{ij} }}{{M_{i} }}Y_{ij} - \bar{Y}_{i} } \right)^{2} } $$

(16)

Now expanding the first term or RHS of expression 16, we have

$$ E[V(\overline{{y_{\text{PSC}} /n_{i} }} )] = E \left[ V\left( {\sum\limits_{i = 1}^{k} {W_{i}^{*} \bar{y}_{i}^{*} } } \right)/n_{i} \right] $$

(17)

$$ E\left[ {V\left( {\overline{{y_{\text{PSC}} }} /n_{i} } \right)} \right] = V\left[ {\sum\limits_{i = 1}^{k} {W_{i} *\bar{y}_{i} } } \right] $$

(18)

Adding Eqs. (17) and (18), we get the final expression

The minimum variance of the estimator

$$ V_{mn} (\overline{{y_{\text{PSC}} }} ) = \left( {\frac{1}{n} - \frac{1}{N}} \right)V\left[ {\sum\limits_{i = 1}^{k} {W_{i} *\bar{Y}_{i} } } \right] + \left\{ {\sum\limits_{i = 1}^{l} {n_{i} } } \right\}_{j} $$

(19)

$$ T1 = \left( {\frac{1}{n} - \frac{1}{N}} \right)\left( {S^{2} - \sum\limits_{i = 1}^{k} {s_{i} } } \right)^{2} $$

(20)

The optimum value of α could be obtained by differentiating the variance expression with respect to α and evaluating it to zero. The equation would provide the value of α_opt = (1 + p)⁻¹ whose substitution provides required results [13,14,15,16].

The term P is a ratio of two quantities, and therefor, in repeated surveys of relatively shorter duration over the same characteristics, it would be a stable quantity and could be guessed by an expert survey practitioner [17,18,19,20].

4 Estimate of Variance

Unbiased estimator of

$$ V(\overline{{y_{\text{PSC}} }} ) = [n(N - 1) - (1 - \alpha )^{2} (N - n)^{ - 1} ] $$

(21)

$$ V\left( {\overline{{y_{\text{PSC}} }} } \right) = \frac{{(1 - \alpha )^{2} (N - n)}}{n(N - 1)} + \left[ {n(N - 1) - (1 - \alpha )^{2} (N - n)^{ - 1} } \right] $$

(22)

$$ V(y_{\text{PSC}} ) = \frac{{(1 - \alpha )^{2} (N - n)}}{n(N - 1)}\sum\limits_{i = 1}^{k} {W_{i} (Y_{i} - \bar{Y})^{2} + [n(N - 1) - (1 - \alpha )^{2} (N - n)^{ - 1} ]} $$

(23)

$$ V(\overline{y_{\text{PSC}}} ) = \frac{{(1 - \alpha )^{2} (N - n)}}{n(N - 1)}\sum\limits_{i = 1}^{k} {W_{i} (Y_{i} - \bar{Y})^{2} + S^{2} } $$

(24)

4.1 Robustness of Estimator

Consider a very small quantity ε and replace P by

$$ \begin{aligned} P \pm \in \quad {\text{then}} \hfill \\ (\alpha_{\text{opt}} )_{\varepsilon } = (1 + (P + \varepsilon ))^{ - 1} \quad {\text{and}} \hfill \\ \end{aligned} $$

$$ [V_{\hbox{min} } (\overline{{y_{\text{PSC}} }} )]_{\varepsilon } = \left( {\frac{1}{n} - \frac{1}{N}} \right)\sum\limits_{i = 1}^{k} {W_{i} S_{bi}^{2} + \frac{(N - n)}{{(N - 1)n^{2} }}} $$

(25)

$$ {\text{Percentage}}\,{\text{gain}} = \frac{{V(\overline{{y_{PS} }} )}}{{V(y_{PS} )}} \times 100 $$

(26)

In the initial phase of analysis, the data set is divided into six partitions, and again, these partitions are subdivided into two parts. All these partitions are compared with four parameters. All these parameters are related to population under consideration. These population parameters are used to calculate percentage gain which plays an important role in the analysis.

In the second phase of the analysis, the data set is partitioned into six groups and these groups are compared with four parameters. Efficiency comparisons can be performed with the help of these four parameters.

Results in Table 3 explain the robustness, and the efficiency of post-stratification is utilized when the measure of every stratum is known yet casing of every stratum is obscure. Tables 4, 5, and 6 explain the second phase of the results. In the context of this paper, strata are formed with help of clusters of different sizes. Two parameters, mean estimation and post-stratified sample, form the basis for this study. Precision and recall are improved by a modified weight structure.

5 Conclusion and Future Work

Table 1 describes the results associated with percentage gain in which parameter yps is used. In initial phase, database is divided into six groups, and these groups are further divided into strata into different sizes. These strata are analyzed with different estimated parameters N, X, Y, and S. Table 2 describes the efficiency parameter evaluation with the help of v and v_min parameters in which database is divided into six clusters. Table 3 describes robustness and efficiency comparison of estimators with several parameters in which data set is divided into eight clusters. Table 4 discusses the calculation of population parameters which plays an important role in calculation of estimator phase 1. Table 5 explains the evaluation of population parameters which play an important role in calculation of estimators in phase 2. In these two phases, data set is classified into six clusters. Table 6 describes the efficiency comparison of estimated parameters in phase 3 in which dataset is classified into eight groups.

Table 1 Population parameters

Full size table

Table 2 Efficiency comparison

Full size table

Table 3 Robustness and efficiency comparison

Full size table

Table 4 Evaluation of population parameters in phase1

Full size table

Table 5 Efficiency comparison of phase2

Full size table

Table 6 Robustness and efficiency comparison in phase3

Full size table

Figure 1 describes the results of population parameters with different values of intensity parameter. Figure 2 explains the efficiency comparison of contraction factor with number of individual population with different forms of accuracy. Finally, Fig. 3 describes the sample size and sampling errors which is used estimate robustness and performance of the proposed estimator. Proposed estimator shows significant classification accuracy compared to classical estimators like shrunken estimator which gives 88%. Mean estimation under post-stratified cluster sampling scheme shows overall accuracy of 92%. These results can be compared with estimators related to probability distribution as a part of future work.

References

Dr. M Raja Sekar et al., “ Tongue Image Analysis For Hepatitis Detection Using GA-SVM,” Indian Journal of computer science and Engineering, Vol 8 No 4, pp. , August 2017.
Google Scholar
Dr. M Raja Sekar et al, “ Mammogram Images Detection Using Support Vector Machines,” International Journal of Advanced Research in Computer Science “,Volume 8, No. 7 pp. 329–334, July – August 2017.
Google Scholar
Dr. M Raja Sekar et al., “ Areas categorization by operating support Vector machines”, ARPN Journal of Engineering and Applied Sciences”, Vol. 12, No.15, pp. 4639–4647, Aug 2017.
Google Scholar
Dr. M Raja Sekar, “Diseases Identification by GA-SVMs”, International Journal of Innovative Research in Science, Engineering and Technology, Vol 6, Issue 8, pp. 15696–15704, August 2017.
Google Scholar
Dr. M Raja Sekar., “Classification of Synthetic Aperture Radar Images using Fuzzy SVMs”, International Journal for Research in Applied Science & Engineering Technology (IJRASET), Volume 5 Issue 8, pp. 289–296, Vol 45, August 2017.
Article Google Scholar
Dr. M Raja Sekar, “Breast Cancer Detection using Fuzzy SVMs”, International Journal for Research in Applied Science & Engineering Technology (IJRASET)”, Volume 5 Issue 8, pp. 525–533, Aug,2017.
Article Google Scholar
Dr. M Raja Sekar , “Software Metrics in Fuzzy Environment” , International Journal of Computer & Mathematical Sciences(IJCMS) , Volume 6, Issue 9, September 2017.
Google Scholar
Dr. M Raja Sekar, “Interactive Fuzzy Mathematical Modeling in Design of Multi-Objective Cellular Manufacturing Systems”, International Journal of Engineering Technology and Computer Research (IJETCR), Volume 5, Issue 5, pp 74–79, September-October: 2017.
Google Scholar
Dr. M Raja Sekar, “Optimization of the Mixed Model Processor Scheduling”, International Journal of Engineering Technology and Computer Research (IJETCR), Volume 5, Issue 5, pp 74–79, September-October: 2017.
Google Scholar
Dr. M Raja Sekar, “Fuzzy Approach to Productivity Improvement”, International Journal of Computer & Mathematical Sciences , Volume 6, Issue 9, pp 145–149, September 2017.
Google Scholar
Dr. M Raja Sekar et al., “An Effective Atlas-guided Brain image identification using X-rays”, International Journal of Scientific & Engineering Research, Volume 7, Issue 12, pp 249–258, December-2016.
Google Scholar
Dr. M Raja Sekar, “Fractional Programming with Joint Probability Constraints”, International Journal of Innovations & Advancement in Computer Science, Volume 6, Issue 9, pp 338–342, September 2017.
Google Scholar
Dr. M Raja Sekar, “Solving Mathematical Problems by Parallel Processors”, “Current Trends in Technology and Science”, Volume 6, Issue 4, PP 734–738.
Google Scholar
Seth, G.S., and S.K. Ghosh. Proc. Math. Soc. BHU, Vol. 11, p 111–120,1995.
Google Scholar
Seth, G.S., N.Matho and Sigh, S.K., Presented in National seminar on Advances in mathematical, statistical and computational methods in science and Technology, Nov. 29–30, Dept of Applied Maths., I.S.M. Dhanbad.
Google Scholar
Fonseca, “Discrete wavelet transform and support vector machine applied to pathological voice signals identification in IEEE International Symphosium”, p 367–374.
Google Scholar
Dork, “Selection of scale invariant parts for object class recognition. In IEEE International Conference on Computer Vision, Vol 1, pp. 634–639.
Google Scholar
P.Ganesh Kumar, “ Design of Fuzzy Expert Systems for Microarray Data Classification using Novel Genetic Swam Algorithm”, Expert Systems with Applications, pp 1811–1812, 2012.
Google Scholar
T.S. Furey, et al., “Support Vector classification and validation of cancer tissue sampling using micro array expression data”, Bioinformatics, pp 906–914, 2000.
Google Scholar
Chun-Fu Lin, Wang. Sheng –De, “Fuzzy Support Vector Machines”, IEEE Transaction on Neural Networks, pp. 13–22, 2002.
Google Scholar
Shukhatme et.al., “Sampling Theory of Surveys with Applications”, Iowa State University Press, Ames, Iowa, USA, PP 34–39, 1984.
Google Scholar
Smith, T.M.F., “Post-stratificatio, “, The Statisticians, pp 31–39, 1991.
Google Scholar
Singh, Rajesh and Singh, H.P., “A class of unbiased estimators in cluster sampling”, Jour. Ind. Soc. Ag.Stat., pp 290–297, 1999.
Google Scholar
Singh, D. and Choudhary, F.S., “Theory and anlysis of sample survey designs”, Wiley Eastern Limited, New Delhi, 1986.
Google Scholar
Agarwal, M.C. and Panda, K.B., “An efficient estimator in post-stratificaton”, Journal of statistics, pp 45–48,2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, VNRVJIET, Hyderabad, India
M. Raja Sekar & N. Sandhya

Authors

M. Raja Sekar
View author publications
You can also search for this author in PubMed Google Scholar
N. Sandhya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Raja Sekar .

Editor information

Editors and Affiliations

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Telangana, India
Raju Surampudi Bapi
Department Computer Science and Engineering, MLR Institute of Technology, Hyderabad, Telangana, India
Koppula Srinivas Rao
IDRBT, Hyderabad, Telangana, India
Munaga V. N. K. Prasad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raja Sekar, M., Sandhya, N. (2019). Mean Estimation Under Post-stratified Cluster Sampling Scheme. In: Bapi, R., Rao, K., Prasad, M. (eds) First International Conference on Artificial Intelligence and Cognitive Computing . Advances in Intelligent Systems and Computing, vol 815. Springer, Singapore. https://doi.org/10.1007/978-981-13-1580-0_27

Download citation

DOI: https://doi.org/10.1007/978-981-13-1580-0_27
Published: 05 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1579-4
Online ISBN: 978-981-13-1580-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Abstract

1 Introduction

2 Notation

3 Proposed Estimator

4 Estimate of Variance

4.1 Robustness of Estimator

5 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation