1 Introduction

In sample surveys, when stratified sampling could not apply for the estimation purpose because of unknown frames of each stratum, then the use of post-stratification is well recognized [1,2,3,4].

In stratified setup, stratum sizes are manageable but list of stratum units is often hard to get. Moreover, stratum frames may incomplete or overlapping or several population units may fall under multiple strata while classification. Under these circumstances, post-stratification is a useful sampling design. According to Shukatme et al., the post-stratification is as precise as the stratified sampling with proportional allocation subject to condition of a large sample. Jagar et al. advocated that the post-stratification with respect to relevant criteria may improve estimation strategy subsequently over the sample mean or ratio estimator [5,6,7,8].

Singh and Singh [23] proposed a class of estimators in cluster sampling. Consider stratified setup of population and suppose every stratum contains clusters of unequal size; a random sample of clusters is post-stratified according to the structure of stratum in the population [9, 10]. This constitutes post-stratified cluster design useful and closer to the real-life survey situations. This paper considers the estimation problem under this design.

2 Notation

Let a finite population of N clusters each of unequal size, divided into K strata, ith stratum contains Ni clusters and random sample of n clusters is drawn from the population (n < N) using SRSWOR [21]. The sample is post-stratified such that ni clusters are from Ni. In what follows notations are as under:

\( Y \) :

variable under study.

\( Y_{ijl} \) :

lth value of jth cluster of ith stratum in population.

\( M_{ij} \) :

size of jth cluster of ith stratum.

\( y_{ij} \) :

mean of jth cluster of ith strata included in sample.

\( \bar{y}_{i} \) :

mean in sample of ith stratum.

\( \bar{\bar{Y}}_{i} \) :

mean of cluster means in population.

\( W_{i} \) :

\( \frac{{N_{i} }}{N} \) population proportion of clusters in ith strata.

\( p_{i} \) :

\( \frac{{n_{i} }}{n} \) sample proportion of clusters from ith strata.

\( \bar{Y}_{i} \) :

population means of ith strata.

\( \bar{Y} \) :

grand mean of population.

\( \bar{\bar{Y}}_{{N_{i} }} \) :

mean of cluster means in population of ith stratum.

3 Proposed Estimator

Assume probability of ni being zero, the usual post-stratified estimator for mean [11, 12]

$$ \sum\limits_{i = 1}^{k} {W_{i} \bar{Y}_{i} } $$
(1)
$$ \overline{{y_{ps} }} = \sum\limits_{i = 1}^{k} {W_{i} \overline{{{{y}^*}_{i} }} } $$
(2)

Agarwal and Panda [25] suggested to utilize the proportion

$$ p_{i} = n_{i}^{{\prime }} /n\quad {\text{in}}\,{\text{addition}}\,{\text{to}}\,W_{i} $$
(3)

In order to design a new weight structure for combining different strata means like [22, 23]

$$ W_{i}^{*} = [\alpha p_{i} + (1 - \alpha )W_{i} ],\quad \alpha \,{\text{being}}\,{\text{a}}\,{\text{constant}} $$
(4)

we propose an estimator

$$ \overline{{y_{\text{PSC}} }} = \sum\limits_{i = 1}^{k} {W_{i}^{*} \overline{{y_{i}^{*} }} } $$
(5)

where PSC stands for “post-stratified cluster design” [24, 25] and

$$ \bar{y}_{i} = \sum\limits_{j = 1}^{{n_{i} }} {M_{ij} \bar{y}_{ij} /n_{i} } \,\bar{M}_{i} $$
(6)

\( y_{i}^{*} \) is an unbiased estimator of \( \overline{{Y_{i} }} \) under the condition of given ni.

\( \overline{{Y_{\text{PSC}} }} \) is unbiased for \( \sum\limits_{i = 1}^{k} {W_{i} \bar{Y}_{i} } \) with variance.

$$ {\text{Var}}(\overline{{y_{\text{PSC}} }} ) = \left( {\frac{1}{n} - \frac{1}{N}} \right)\sum\limits_{j = 1}^{{n_{i} }} {M_{ij} \overline{{y_{ij} }} /n_{i} } \overline{{M_{i} }} $$
(7)
$$ E(W_{i}^{*} ) = E\{ E(W_{i}^{*} /n_{i} )\} $$
(8)
$$ E[E\left\{ {\alpha \left( {\frac{{n_{i} }}{n}} \right) + (1 - \alpha )\frac{{N_{i} }}{N}/n_{i} } \right\} $$
(9)
$$ E(\overline{{y_{\text{PSC}} }} ) = E[E\left\{ {\alpha \left( {\frac{{n_{i} }}{n}} \right) + (1 - \alpha )\frac{{N_{i} }}{N}/n_{i} } \right\} $$
(10)

To evaluate variance expression, following standard results are used.

$$ E\left( {\frac{1}{{n_{i} }}} \right) = \frac{1}{{nW_{i} }} + \frac{{(N - n)(1 - W_{i} )}}{{(N - 1)n^{2} W_{i}^{2} }} $$
(11)
$$ E\left( {\frac{{n_{i} }}{n}} \right)^{2} = \frac{1}{{nW_{i} }} + \frac{{(N - n)(1 - W_{i} )}}{{(N - 1)n^{2} W_{i}^{2} }} + w_{i}^{2} $$
(12)
$$ V\left( {\frac{{n_{i} }}{n}} \right) = \frac{(N - n)}{(N - 1)} \cdot \frac{{W_{i} (1 - W_{i} )}}{n} $$
(13)
$$ Cov\left( {\frac{{n_{i} }}{n},\frac{{n_{j} }}{n}} \right) = \frac{(N - n)}{(N - 1)} \cdot \frac{{W_{i} W_{j} }}{n} $$
(14)
$$ S^{2} = \sum\limits_{i = 1}^{k} {\sum\limits_{j = 1}^{S1} {\left[ {\frac{{M_{ij} }}{{\overline{M_{i}} }} - \bar{Y}_{i} } \right]} }^{2} /N - 1 $$
(15)
$$ S_{{b_{i} }}^{2} = \frac{1}{{N_{i} - 1}}\sum\limits_{j = 1}^{{N_{i} }} {\left( {\frac{{M_{ij} }}{{M_{i} }}Y_{ij} - \bar{Y}_{i} } \right)^{2} } $$
(16)

Now expanding the first term or RHS of expression 16, we have

$$ E[V(\overline{{y_{\text{PSC}} /n_{i} }} )] = E \left[ V\left( {\sum\limits_{i = 1}^{k} {W_{i}^{*} \bar{y}_{i}^{*} } } \right)/n_{i} \right] $$
(17)
$$ E\left[ {V\left( {\overline{{y_{\text{PSC}} }} /n_{i} } \right)} \right] = V\left[ {\sum\limits_{i = 1}^{k} {W_{i} *\bar{y}_{i} } } \right] $$
(18)

Adding Eqs. (17) and (18), we get the final expression

The minimum variance of the estimator

$$ V_{mn} (\overline{{y_{\text{PSC}} }} ) = \left( {\frac{1}{n} - \frac{1}{N}} \right)V\left[ {\sum\limits_{i = 1}^{k} {W_{i} *\bar{Y}_{i} } } \right] + \left\{ {\sum\limits_{i = 1}^{l} {n_{i} } } \right\}_{j} $$
(19)
$$ T1 = \left( {\frac{1}{n} - \frac{1}{N}} \right)\left( {S^{2} - \sum\limits_{i = 1}^{k} {s_{i} } } \right)^{2} $$
(20)

The optimum value of α could be obtained by differentiating the variance expression with respect to α and evaluating it to zero. The equation would provide the value of αopt = (1 + p)−1 whose substitution provides required results [13,14,15,16].

The term P is a ratio of two quantities, and therefor, in repeated surveys of relatively shorter duration over the same characteristics, it would be a stable quantity and could be guessed by an expert survey practitioner [17,18,19,20].

4 Estimate of Variance

Unbiased estimator of

$$ V(\overline{{y_{\text{PSC}} }} ) = [n(N - 1) - (1 - \alpha )^{2} (N - n)^{ - 1} ] $$
(21)
$$ V\left( {\overline{{y_{\text{PSC}} }} } \right) = \frac{{(1 - \alpha )^{2} (N - n)}}{n(N - 1)} + \left[ {n(N - 1) - (1 - \alpha )^{2} (N - n)^{ - 1} } \right] $$
(22)
$$ V(y_{\text{PSC}} ) = \frac{{(1 - \alpha )^{2} (N - n)}}{n(N - 1)}\sum\limits_{i = 1}^{k} {W_{i} (Y_{i} - \bar{Y})^{2} + [n(N - 1) - (1 - \alpha )^{2} (N - n)^{ - 1} ]} $$
(23)
$$ V(\overline{y_{\text{PSC}}} ) = \frac{{(1 - \alpha )^{2} (N - n)}}{n(N - 1)}\sum\limits_{i = 1}^{k} {W_{i} (Y_{i} - \bar{Y})^{2} + S^{2} } $$
(24)

4.1 Robustness of Estimator

Consider a very small quantity ε and replace P by

$$ \begin{aligned} P \pm \in \quad {\text{then}} \hfill \\ (\alpha_{\text{opt}} )_{\varepsilon } = (1 + (P + \varepsilon ))^{ - 1} \quad {\text{and}} \hfill \\ \end{aligned} $$
$$ [V_{\hbox{min} } (\overline{{y_{\text{PSC}} }} )]_{\varepsilon } = \left( {\frac{1}{n} - \frac{1}{N}} \right)\sum\limits_{i = 1}^{k} {W_{i} S_{bi}^{2} + \frac{(N - n)}{{(N - 1)n^{2} }}} $$
(25)
$$ {\text{Percentage}}\,{\text{gain}} = \frac{{V(\overline{{y_{PS} }} )}}{{V(y_{PS} )}} \times 100 $$
(26)

In the initial phase of analysis, the data set is divided into six partitions, and again, these partitions are subdivided into two parts. All these partitions are compared with four parameters. All these parameters are related to population under consideration. These population parameters are used to calculate percentage gain which plays an important role in the analysis.

In the second phase of the analysis, the data set is partitioned into six groups and these groups are compared with four parameters. Efficiency comparisons can be performed with the help of these four parameters.

Results in Table 3 explain the robustness, and the efficiency of post-stratification is utilized when the measure of every stratum is known yet casing of every stratum is obscure. Tables 4, 5, and 6 explain the second phase of the results. In the context of this paper, strata are formed with help of clusters of different sizes. Two parameters, mean estimation and post-stratified sample, form the basis for this study. Precision and recall are improved by a modified weight structure.

5 Conclusion and Future Work

Table 1 describes the results associated with percentage gain in which parameter yps is used. In initial phase, database is divided into six groups, and these groups are further divided into strata into different sizes. These strata are analyzed with different estimated parameters N, X, Y, and S. Table 2 describes the efficiency parameter evaluation with the help of v and vmin parameters in which database is divided into six clusters. Table 3 describes robustness and efficiency comparison of estimators with several parameters in which data set is divided into eight clusters. Table 4 discusses the calculation of population parameters which plays an important role in calculation of estimator phase 1. Table 5 explains the evaluation of population parameters which play an important role in calculation of estimators in phase 2. In these two phases, data set is classified into six clusters. Table 6 describes the efficiency comparison of estimated parameters in phase 3 in which dataset is classified into eight groups.

Table 1 Population parameters
Table 2 Efficiency comparison
Table 3 Robustness and efficiency comparison
Table 4 Evaluation of population parameters in phase1
Table 5 Efficiency comparison of phase2
Table 6 Robustness and efficiency comparison in phase3

Figure 1 describes the results of population parameters with different values of intensity parameter. Figure 2 explains the efficiency comparison of contraction factor with number of individual population with different forms of accuracy. Finally, Fig. 3 describes the sample size and sampling errors which is used estimate robustness and performance of the proposed estimator. Proposed estimator shows significant classification accuracy compared to classical estimators like shrunken estimator which gives 88%. Mean estimation under post-stratified cluster sampling scheme shows overall accuracy of 92%. These results can be compared with estimators related to probability distribution as a part of future work.

Fig. 1
figure 1

Population parameters’ graph

Fig. 2
figure 2

Efficiency comparison graph

Fig. 3
figure 3

Robustness and performance graph