Introduction

The frequentist approach to statistical inference assumes that a parameter of interest is a fixed and unobservable quantity. The goal is to make inference about this fixed value, given an assumed sampling distribution of the data. For example, one might estimate the prevalence of disease in a population and calculate a confidence interval about the estimate to reflect the statistical uncertainty associated with the estimation; or test a hypothesis about the value of the prevalence and report a p-value to determine significance. The attributes of these methods are judged a priori, or before observing any data. For example, a 95% confidence interval will capture the true parameter value on average 95% of the time. Similarly, a hypothesis test is designed to a certain power function, which determines the potential errors. Yet, once the data have been observed, a posteriori the probability that the true parameter lies within that interval is zero or one and the result of the hypothesis test is correct or not- and, unfortunately, we do not know which.

It is common for novice statisticians to make such statements as "the probability that the prevalence lies within the confidence interval is 95%", which is, of course, incorrect in the frequentist framework. Indeed, such statements would be attractive and desirable, if only they were correct. There is a vehicle for making probability statements about distributions of unknown parameters: Bayesian inference [1]. In the Bayesian framework, the unknown parameter is treated not as a constant, but as a random quantity, which varies according to some probability distribution. At the core of this theory is Bayes theorem, which states that for two events, A and B, where Pr(B) > 0, the conditional probability

Pr ( A | B ) = Pr ( B | A ) Pr ( A ) Pr ( B ) .

In practice,A andB are replaced by the unobservable parameters,ϕ, and the observable data,X, respectively. Then the expression becomes

Pr ( ϕ | X ) = Pr ( X | ϕ ) Pr ( ϕ ) Pr ( X ) .

The relevant pieces of this expression are the likelihood,Pr(X|ϕ) and the prior distribution,Pr(ϕ), which, together withPr(X), yield the posterior distributionPr(ϕ|X). Both ingredients must be specified for valid conclusions in this context. The posterior distribution probabilistically describes the behavior of the unknown parameter given the prior and observed data, and serves as the basis for Bayesian analysis. Certain applications naturally lend themselves to a Bayesian approach. Consider monitoring the prevalence of acute malnutrition amongst children 6-59 months of age and within a particular area. At any given time, there may be a true value of the prevalence of acute malnutrition (i.e. the number of children acutely malnourished in the area divided by the total number of children in the area). However, if one were to consider the prevalence over a six month period, this value would fluctuate as children age, thereby entering or exiting the cohort, or their nutritional statuses change. Thus, it may be more realistic to model the prevalence of malnutrition as a random quantity over time rather than a fixed quantity.

Deitchleret al found Lot Quality Assurance Sampling (LQAS) to be a useful tool to monitor the prevalence of Global Acute Malnutrition, defined as Weight-for-Height-Z-score < -2 standard deviations, in emergency situations [2,3]. With anyLQAS application, the goal is to classify the population prevalence as above or below predefined thresholds by comparing the number of failures in a random sample to a specific decision rule. For example, one might be interested in determining with a high degree of confidence whether the prevalence of acute malnutrition is greater than 10% in a given population of children less than 5 years of age. To accomplish this goal, randomly sample 200 children from within the population. If more than thirteen children exhibit signs of acute malnourishment, then classify the prevalence of acute malnutrition as greater than 10% in that population. The choice of sample size and decision rule determine the degree to which one can rely on this classification. TheLQAS procedure is discussed in depth in the next section. Recently, Bilukha [4] and Bilukha and Blanton [5] substantially criticized these designs in this journal. The main problem with their criticism is exemplified in Bilukha and Blanton's suggestion to use as an alternative measure of risk, "the statistical probability of the true population value's exceeding the threshold," conditional on the number of malnourished children in a sample [5]. When the prevalence is treated as a constant, as it is in the model in their paper, this is a measure with little meaning. The authors fall short of specifying the necessary assumptions to make what is clearly a Bayesian statement; namely, no mention is made of the prior distribution. The reader is left to assume that the authors did not consider this aspect in their calculations.

It is attractive to have the ability to make a probability statement about the prevalence of malnutrition, even though it does require that more structure be imposed on the model. The Deitchleret alLQAS designs serve as a natural place to begin this investigation. ClassicalLQAS has generally relied on frequentist statistical principles, particularly in its application in health (see [6] for over 800 examples, all of which take a frequentist approach). Further, nowhere in the health literature have Bayesian considerations been incorporated into theLQAS procedure, to the best of our knowledge. Yet, Bayes-LQAS (B-LQAS) is well-established in the industrial literature, where it is known as Bayesian Acceptance Sampling (see [7], and references therein). As early as the 1960's, Brush examined classical and Bayesian risks for a variety of sampling plans [7]. Brush, and Sharma and Bhutani [8], emphasize the importance of examining both the classical and Bayes risks when deciding upon a sampling design. Fan [9] and Sheng and Fan [10] consider B-LQAS for binomial testing and outline an approach to choosing a prior based on historical data using an empirical Bayes approach.

More recently, Moskowitz considers B-LQAS under quadratic and step-loss functions [11]. Fitzgerald looks at B-LQAS plans under an assumed mixture prior [12], and he bases B-LQAS designs on BayesOC curves and averageOC curves. These curves were first introduced by Easterling [13]. Now, B-LQAS has made a transition into the economics and operations research literature. For example, in 1996 Lattimoreet al used B-LQAS to monitor drug use in Illinois. In that application, the sampling plan was determined by minimizing expected cost, as advocated by Moskowitzet al[14,15].

In this paper, we discuss the potential benefits of using B-LQAS in health applications. As a running example, we discuss an application to acute malnutrition, motivated by LQAS designs proposed by Deitchleret al to classify the prevalence of malnutrition [2,3]. We show how to approach the classification problem from a Bayesian perspective, show some of the advantages of this approach, and discuss the parallels between the classical and Bayesian approach.

Discussion

A Brief Review of LQAS and B-LQAS

LQAS

Classical LQAS is primarily a classification procedure [16]. In its simplest form, the goal is to classify the unknown prevalence of a binary indicator as greater than or equal to some critical threshold,p*, or less than this threshold. To do so, the number of cases,Y, in a simple random sample of sizen are compared to a predefineddecision rule, d. If fewer thand cases are observed, then the prevalence is classified aslow (p < p*). Otherwise, it is classified ashigh (p ≥ p*). The sample size and decision rule are chosen to achieve error probabilities ofα andβ, where the former is the maximum acceptable probability of a false negative (or a falselow) over some range of prevalences and the latter is the maximum acceptable probability of a false positive (or falsehigh) over some other range of prevalences.

To define these ranges, LQAS uses upper and lower thresholds,p U andp L . The sample size and decision rule are chosen so that the probability of a false negative when the true prevalence is greater than or equal top U is less than or equal toα. Likewise, the probability of a false positive when the true prevalence is less than or equal top L is less than or equal toβ. In the industrial literature, these errors are referred to as the consumer and producer risks. In health applications, generallyp U is chosen to be equal to the critical thresholdp* andp L is chosen to reflect the desired detectable deviation from that threshold [16]. However, some have suggested usingp* =p L [17], which might be appropriate depending on the application. When deciding on how to implement the procedure, it is important that the investigator keep in mind whatp represents, particularly ifp is the prevalence of an undesirable outcome.

The Operating Characteristic (OC) Curve completely summarizes any LQAS design. For a given value ofp, this is defined as

O C ( p | d , n ) = Pr ( Y < d | p ) ,
(1)

whereY is assumed to be binomially distributed with parametersn andp. Plotting this quantity for the entire range ofp yields the desired curve. A satisfactory LQAS design will have the following properties:

O C ( p U | d , n ) α 1 O C ( p L | d , n ) β .

For example, in Figure1, we see anOC curve withn = 200 andd = 14, which corresponds to the LQAS design used by Deitchleret al[2,3] to classify the prevalence of malnutrition with a critical thresholdp* = 0.10 and withp U = 0.10 andp L = 0.05. In that application,α was set to 0.10 andβ to 0.20. We see that theOC curve is less thanα = 0.10 at the upper threshold and greater than 1 -β = 0.80 at the lower threshold, and thus meets the design requirements.

Figure 1
figure 1

Classical Operating Characteristic Curve with sample sizen= 200, decision ruled= 14, upper thresholdp U = 0.10, lower thresholdp L = 0.05, and maximum tolerated errorsα= 0.10 andβ= 0.20.

B-LQAS

B-LQAS is similar to classical LQAS in that the final goal is decide whether to classify the prevalence as greater than or equal to some thresholdp* , or less than this threshold, so that appropriate action be taken. However, in the Bayesian context, we allow a prior distribution of the parameter,p, to be part of the analysis. As with classical LQAS, we choose the sample size,n, and the decision rule,d, to achieve certain criteria when performing the classificationbefore observing the data. In contrast to classical LQAS, these criteria are based on posterior properties of our decision, or what we believeafter observing the data. For example, it might be important to know what is the probability we have made the correct decision, or classification, and we have the choice of two probabilities, depending on which decision we make.

To get a better understanding of the intuition behind this approach, consider Figure2, where we have plotted a hypothetical prior distribution of the prevalence of malnutrition. This distribution has mean 8.5% with 77% of its mass between 5% and 10%. Overlaid on this plot are twoOC curves. The solid curve corresponds to a classicalOC curve withd = 14, or the decision rule that we would choose using classical considerations. When we consider the prior distribution, we might argue that it makes less sense to use this decision rule, which prioritizes error above 10% and below 5%, since we seldom expect to see a prevalence as high or as low. The dashed line corresponds to a classicalOC curve withd = 28. With the chosen prior, this appears to be a better design as it prioritizes the region of largest prior mass when choosing a decision rule. That is, the design prioritizes correct classification of prevalences which aremost likely given our prior beliefs. Hence, prior beliefs about the parameter of interest should play a vital role in determining an appropriate design, and in explaining its properties.

Figure 2
figure 2

Hypothetical prior distribution of acute malnutrition with mean 8.5% (**) and candidateOCcurves for LQAS classification withd= 14 (-) andd= 28 (--).

For the sake of illustration, in this paper we assume the conjugate Beta prior on the prevalence,p ~Beta(a,b), to demonstrate the effect on theOC curves. That is, we let the prior distribution,π(p), take the structural form

π ( p ) = p a 1 ( 1 p ) b 1 B ( a , b ) ,

wherea,b > 0 andB(a,b) is the beta function [18]. The Beta provides a rich family of distributions, allowing for a range of flexible prior shapes. Further, there is some precedence for its use [10]. The parametersa andb control the shape of the prior distribution. To aid interpretation, we might think abouta andb as the prior number of successes and failures, respectively. Therefore, a large value ofb relative toa yields a distribution skewed to the left.

For pedagogical reasons, we choose these parameters to reflect a variety of potential prior beliefs (see Figure3A). For example, whena = 1 andb = 1, the prior is completely flat, which might correspond to a lack of prior knowledge or possibly prior indifference. Whena = 2 andb = 10, this corresponds to a prior density with most of its mass below 30%, which is a realistic assumption as malnutrition prevalence is rarely as high as 30%. For example, in the CE-DAT global database of over 1400 malnutrition surveys conducted in emergency situations, only 59 reported a prevalence as high as 30% in children 6-59 months [19]. However, we also look at the case wherea = 4 andb = 2 and wherea =b = 5, reflecting a prior belief that the prevalence is in fact quite high, even though this condition is probably quite unlikely in the present context. The properties of a B-LQAS design can once again be formalized in theOC curves, although we now focus our attention on the BayesOC curves. A key difference between classical LQAS and B-LQAS is the reliance on not one, but two curves to determine appropriate designs, since we need to condition on either a high or low classification. In this paper, we define the following BayesOC curves

Figure 3
figure 3

(A) Various Beta distributions to describe a range of potential prior beliefs. We assume thatp ~Beta(a,b). (B) BayesOC Curves assumingn = 200 andd = 14. The dashed lines (---) represent 1 -BOC F and the solid lines (-) represent 1 -BOC P .

B O C P ( x | n , d ) = Pr ( p < x | P a s s ) ,
(2)
B O C F ( x | n , d ) = Pr ( p < x | F a i l ) ,
(3)

where the eventPass = {Yd} andFail = {Y <d},whereY is the number of "successes" in a sample of sizen. Plotting (2) and (3) as a function ofx yields the desired curves. We can write

1 B O C F ( p U | n , d ) = p U 1 y = 1 d 1 f ( y | p ) π ( p ) d p 0 1 y = 0 d 1 f ( y | p ) π ( p ) d p B O C P ( p P | n , d ) = 0 p L y = d n f ( y | p ) π ( p ) d p 0 1 y = d n f ( y | p ) π ( p ) d p ,

whereπ(p) is the prior distribution ofp andf (y|p) is the sampling distribution ofY given the parameter,p. As a result, the BayesOC curves are less straightforward to calculate than the classicalOC curves, as some integration over the unknown parameter is required. In some cases, these integrals can be analytically intractable, in which case one would have to appeal to numerical methods to evaluate the expressions. In any single application we ultimately take only a single action, but we need to consider both BayesOC curves. Note that in the case of malnutrition, ifYd, this indicates a high burden of malnutrition. Therefore, the use of the wordPass is not instinctual. However, we might think ofPass as "qualifying for humanitarian aid" to facilitate the interpretation. We continue with this notation to provide a unified framework.

The interpretation of each of these curves allows us to make probabilistic statements about the parameter of interest, given the results of our diagnostic procedure; such as statements like those made by Bilukha and Blanton [5], which in their context are incorrect. Intuitively, it would be desireable to control for the probability that the prevalence is low when we say it is high (or declare aPass), for example, and vice-versa. Using the BayesOC curves, we can choosen andd so that the Bayes classification errors are controlled. For an analogue to classical LQAS, we can enforce the following:

1 B O C F ( P U | n , d ) α
(4)
B O C P ( p L | n , d ) β ,
(5)

wherep U =p* andp L is some lower critical threshold. However, it is also possible to choosep L =p U =p*, which might be more appealing to some practitioners. This latter case is discussed at length in the context of Phase II clinical trials by Wanget al[20]. Ultimately, the choice depend on the application and the priorities of the investigators. We discuss this issue further in the next section.

In Figure3B, we see the BayesOC curve plotted as a function of the prevalence threshold(x in equations (2) and (3)) withn = 200 andd = 14. Whenα = 0.10 andβ = 0.20, we see that the constraints posed in (4) and (5) forp U = 0.10 andp L = 0.05 are met for all considered priors. That is,

1 B O C F ( p U | n , d ) = Pr ( p p U | F a i l ) α = 0 . 1 0 , B O C F ( p U | n , d ) = 1 Pr ( p p L | P a s s ) β = 0 . 2 0 .

Hence, when we choosed = 14, which corresponds to the classical solution, we achieve reasonable Bayesian properties as well.

Note that when we letp L =p U = 0.10, the error at the upper and lower thresholds increases slightly as compared to the case whenp U = 0.10 andp L = 0.05 for the considered priors. That is, 1 -BOC P < (p = 0.10 |n,d) decreases from essentially one, to just over 0.80, in the case whena = 2 andb = 10, which is still within the design constraints. This is important, however, as it highlights the tradeoff between classification precision and accuracy in these applications. Namely, by classifying as abovep U =p* or belowp L , we are asking for slightly imprecise results, but doing so with high accuracy. However, if we classify as above or belowp U =p*, we are asking for highly precise results, but doing so with less accuracy. We revisit this notion in more depth below.

Maximizing the Figure of Merit

In general, the above outlined approach is feasible yet time consuming, as it might require an investigator to look at a range of sample sizes and decision rules to arrive at a given design. A more automated design selection is achieved by using the followingFigure of Merit (FOM),

F O M ( n , d ) = 0 p L Pr ( Y < d | p ) π ( p ) d p + p U 1 Pr ( Y d | p ) π ( p ) ,
(6)

and for a given sample size, one might choose the decision rule to maximize this quantity. In the decision theoretic literature, this quantity is known as the Bayes Risk of a zero-one utility (negative loss) function [1]. Whenp U =p L , theFOM of a given design is the average probability of correct classification given a priorπ(p). Whenp L <p U , dividing the above quantity by a factor of 1 -Pr(p L <p <p U ) yields the average probability of correct classification of priority locales. This scaling of theFOM does not affect the maximization, but can help with interpreting the result. Interestingly, (6) can be rewritten as

F O M ( n , d ) = Pr ( Y < d ) B O C F ( p L | n , d ) + Pr ( Y d ) [ 1 B O C P ( p U | n , d ) ] ,

which shows that an optimal design is one that maximizes a weighted average of the BayesOC curves, where the weights are the marginal probabilities of passing and failing the procedure. The marginal distribution ofY is also referred to as theprior predictive distribution. That is, the predictive distribution ofY given only our prior assumptions. Hence, we weight more heavily the BayesOC curve which has the greater prior predictive probability of occurring.

Continuing with our example, Figure4A shows the plot of theFOM as a function ofd for the situations whenn = 200,p U = 0.10,p L = 0.05 and 0.10, and four prior distributions. Whenp L = 0.05, the optimal decision rule hovers aroundd = 14, or the same as the classical LQAS solution. Yet, it is important to note that both whena = 5,b = 5 anda = 4,b = 2, the optimal decision rules are less than 14. Further, even though the curves are very nearly flat in the displayed range of rules, these are true maxima due to the fact that the prior mass belowp L is non-zero. Scaling the maximumFOM appropriately reveals that the average probability of correct classification of priority locales is close to 100%, indicating the appropriateness of the design for detecting extremes (i.e. areas wherepp L orpp U ).

Figure 4
figure 4

(A) The Figure of Merit plotted as a function ofdwherep~Beta(a,b) and assumingn= 200,p U = 0.1 and bothp L = 0.05 (-) andp L = 0.1(---). Solid vertical lines indicate the maximumFOM whenp L = 0.05 and dashed vertical lines indicate maximumFOM whenp L = 0.10. (B) MaximumFOM as a function of the precision withp U = 0.1 andp L varying from 0.05 to 0.1 for various assumed Beta priors.

Whenp L =p U = 0.10, the maximumFOM decreases, albeit slightly, and the optimal decision rule increases tod = 17 ord = 20, depending on the prior. Therefore, if our prior belief is that the malnutrition prevalence is low, we require agreater number of malnourished children in our sample to be convinced otherwise. But if we believe that the malnutrition prevalence is high, we will need fewer malnourished children in our sample to be convinced that the prevalence is indeed low, and thus possibly triggering an earlier intervention. This is a consequence of incorporating prior information into our analysis. In either case, it is important to realize that the optimal design does a good job of classifying areas. That is, in both cases, the maximumFOM is greater than or equal to 90%. Therefore, with a sample of sizen = 200, the probability that we correctly classify an area is greater than or equal to at least 0.90.

It is interesting to note that whena = 1 andb = 1, the in difference prior, the optimal decision rule is equal to 20, ornp*. Both Bilukha and Blanton [5] and Rhodaet al[17] have suggested usingdnp* for the decision rule in the classical setting. The use of a flat prior withp L =p U =p* gives a Bayesian justification for such a choice, although the use of a flat prior does not ordinarily make sense for this application, as it is uncommon for the prevalence of acute malnutrition to reach as high as 30%, much less 80% or 90% [19]. We discuss this in more depth in the following sections.

Balancing Accuracy and Precision

TheFOM measures the overallaccuracy of the B-LQAS procedure, and it is attractive to constrain our procedure to achieve at least a minimumFOM. Namely, we constrain theFOM so that

O M ( n , d ) 1 δ ,
(7)

where the parameterδ controls the overall level of accuracy in the procedure. This might be considered a more appealing design metric thanα andβ. Of course, whenp L =p U , constraints (4) and (5) imply

F O M ( n , d ) ( 1 α ) Pr ( Y d ) + ( 1 β ) Pr ( Y < d ) .

Hence, in this special case, the optimal design which meets (7) might be chosen as that design for which a weighted average of the producer and consumer risks, weighted according to the prior belief of passing or failing, is greater than or equal to 1 -δ. Ifα =β, then we have thatα =β =δ, further simplifying the parameterization.

The precision demanded of the procedure impacts the accuracy. That is, the choice ofp U andp L affects the properties of the design. Formally, define theprecision as 1-|p U -p L |.Whenp U =p L , the precision is equal to one. But asp L deviates fromp U , the precision decreases. Indeed, when at their maximal difference, the precision is zero. In our example,p U = 0.10 andp L ranges from 0.05 to 0.10, so that the precision ranges from 0.95 to 1.00.

In Figure4B, we plot the maximum average probability of correct classification of priority locales (or the appropriately scaledFOM) as a function of the precision, fixingp U = 0.10 and allowingp L to vary from 0.05 to 0.10. Therefore, when the precision is equal to 0.95, this corresponds top L = 0.05 andp U = 0.10. When the precision is equal to one, thenp L =p U = 0.10. Assume that we want a design that achieves an overall accuracy of 0.95 (1 -δ = 0.95). We see that for three of the four considered priors, the maximumFOM is well above 0.95 for all considered precisions, and therefore we should on average correctly classify over 95% of locales with these procedures. However, for the situation whena = 2 andb = 10, which is likely the more realistic prior for this application, the maximumFOM drops below 0.95 asp L approachesp U , or the precision approaches one. Hence, it is not always possible to achieve the desired level of accuracy for all precisions, short of increasing the sample size; illustrating the trade of between the two.

Conclusion

In this paper, we describe the basic framework for performing Bayes-LQAS, using as an example an application to acute malnutrition. The benefits of using such a method include the ability to incorporate mild or strong prior beliefs about the underlying distribution, based either on historical data or even expert opinion, and the provision of a principled framework for accumulating data, which can be used in subsequent surveys to inform decision making.

Further, B-LQAS allows for the investigator to make probabilistic statements about the prevalence itself, given the outcome of the classification procedure, which classicalLQAS does not. Using theFOM allows for the selection of a design with optimala priori probabilities of correct classification.

We also see the inherent tradeoff between accuracy and precision. This tradeoff is not unique to the Bayesian framework, of course. Indeed, it is this very tradeoff that motivates the use of upper and lower thresholds to evaluate error in the classicalLQAS framework. This is due to the fact that it is impossible to make completely accurate classifications for all values of p, barring an infinite sample or a complete census. An important aspect of this tool which we have not discussed is its potential as a routine tool for monitoring population health. Indeed, the Figure of Merit approach can be easily adapted to incorporate historical or routine data. The above formulation is simple by construction, as we wish only to illustrate the potential of B-LQAS. More complex modeling is required to exploit the full utility of this method for monitoring health programs over time. For use with panel data, or repeated cross-sectional surveys over regular intervals, the extension of the above method needs investigating.

Clearly, the choice of prior distribution is an important element of B-LQAS. One alternative to complete specification of the prior is to let the data influence its shape via empirical Bayes procedures (see [21], pg. 122-126 for further discussion). Regardless, the prior can have minor or major influence on the chosen design, depending on the situation. In the example we present, the sample size for the survey is relatively large. However, it is not uncommon to use much smaller sample sizes when performingLQAS(n = 19, e.g.) [16]. In this case, the prior distribution will impact the choice of design more heavily. Most importantly, the prior should accurately reflect prior beliefs and should not be chosen to subvert the classification procedure.