# ICE preference maps: nonlinear generalizations of net benefit and acceptability

- 458 Downloads
- 5 Citations

## Abstract

The Net Benefit (NB) approach to Incremental Cost-Effectiveness (ICE) statistical inference uses a linear function (map) to assign a real valued, numerical preference score to every point on the 2-dimensional ICE plane. We argue that coherent ICE preferences satisfy four intuitive axioms and propose a 2-parameter family of maps that satisfy these axioms and provide highly realistic generalizations of NB. For example, nonlinear maps do not require that returns-to-scale be linear (constant) or that willingness-to-pay (WTP) and willingness-to-accept (WTA) are both equal to the shadow price of health, λ. In fact, all of our maps have the property that \( \lambda = {\sqrt {{\text{WTP}} \times {\text{WTA}}} }. \) With λ held fixed, this geometric mean relationship shows that WTA must decrease when WTP increases and vice versa. This relationship thus provides not only a polar angular measure of the size of “Bernie’s Kink,” WTP < WTA, but also the theoretical basis for Buckingham’s ALICE curve generalization of acceptability. Finally, we argue that uncertainty about economic preferences expressed by varying λ can totally swamp the statistical uncertainty in patient level data expressed by a wedge-shaped, bootstrap ICE confidence region that does not depend upon λ in the sense that it is equivariant under changes in λ.

## Keywords

ICE outcome pairs Linear standardization Preference map Returns-to-scale Willingness-to-pay/accept## 1 Introduction

Cost-effectiveness analysis has been a hot topic in health services and outcomes research for at least the last decade. The simplest case, where two treatments are being compared head-to-head, is Incremental Cost-Effectiveness (ICE) inference. The tutorial of Briggs and Fenn (1998) provides a good introduction to and review of the extremely wide variety of methodologies that have been proposed to quantify ICE uncertainty. A synopsis of ICE statistical inference is that it addresses a complex 2-sample, 2-variable problem. Specifically, ICE inference examines differences between two treatment groups using data on two types of outcomes (cost and effectiveness) that may be correlated.

As an aid to visualization in 2-dimensional, real Euclidean space, Black (1990) proposed that the incremental difference (new treatment minus standard treatment) in mean effectiveness, ΔE, be plotted horizontally while the corresponding difference in mean cost, ΔC, is plotted vertically. To depict uncertainty in a (ΔE, ΔC) point estimate (a bivariate statistic), its bootstrap distribution under resampling of patient outcome pairs with replacement within treatment groups is displayed as a scatter of points on the ICE plane; see Briggs and Fenn (1998, pp. 731–734) for a summary of ICE bootstrapping methodology.

The bivariate bootstrap distribution of uncertainty in (ΔE, ΔC) estimates yields a (univariate) confidence interval for the unknown true expected value of any scalar valued ICE summary statistic. Due to the inherently bivariate nature of the health and financial outcomes being compared, substantial challenges to the use and interpretation of the ICE ratio, ΔC/ΔE, have been noted (Chaudhary and Stearns 1996; Fan and Zhou 2007; Heitjan et al. 1999a, b; Laupacis et al. 1992). Although closely related to the ICE ratio, the Net Benefit (NB) approach (Laska et al. 1999; Stinnett 1999; Stinnett and Mullahy 1998) may have avoided similar criticism because it attempts to quantify overall incremental preference or utility. In NB, constant preference contours (indifference curves) are straight lines on the ICE plane with positive slope, λ. Within the North East (NE) quadrant, this slope can be interpreted as the willingness-to-pay (WTP) a higher cost in return for increased effectiveness; within the South West (SW) quadrant, this same slope is interpreted as the willingness-to-accept (WTA) a less effective treatment in return for lower cost. In other words, NB collects outcomes on the ICE plane into linear equivalence classes (straight lines of constant incremental preference) ordered by a scalar index, the NB.

Unfortunately, the assumption of uniform slope for iso-preference curves is unrealistic in the sense that linear utility has been consistently contradicted in empirical studies (O’Brien et al. 2002; O’Brien and Sculpher 2000; Willan et al. 2001). O’Brien et al. (2002) provide a good introduction to WTP and WTA concepts as well as a highly readable summary of relevant literature. Here we describe a logical foundation for ICE preference quantification (Obenchain 2000, 2001) including a family of two-parameter models that admit complex economic behaviors, such as non-equivalent WTP and WTA. Our axioms are quite general, and our arguments motivating them are both simple and intuitive.

The linear NB map does satisfy all four basic axioms. But our two-parameter family of simple “models” for preference variation across the entire ICE plane provides *nonlinear* generalizations of NB. For example, see Fig. 2a–c of Sect. 3.6 and the discussions of their interpretation in subsequent sections.

Section 2 introduces basic notation and demonstrates that all commonly considered transformations of the ICE plane, including both treatment re-labeling (new versus standard) and axis rescaling (changes in the shadow price of health), are simple linear transformations with a *fixed point* at the ICE origin. With λ held fixed, treatment differences in outcome are first standardized by expressing both (ΔE, ΔC) differences in identical units, either both in cost units or else both in effectiveness units. Section 3 then discusses our four basic axioms expressed in standardized units and introduces our 2-parameter family of “signed-power” maps. Section 4 discusses the returns-to-scale properties of these maps. Section 5 defines how either WTP or WTA is related to the slope of the iso-preference contour that passes through any given point on the ICE plane. Section 6 then discusses how “Bernie’s Kink” (O’Brien et al. 2002; Willan et al. 2001), WTP < WTA, is related to the “Gray Areas” in the seminal work of Laupacis et al. (1992) and quantifies the angular size of this kink. Section 7 discusses important distinctions between “ALICE” curves and traditional, linear measures of acceptability using a numerical example. Section 8 then shows that economic uncertainty about shadow price, λ, can totally swamp the statistical uncertainty about the true location of (ΔE, ΔC) within the ICE plane. Finally, Sect. 9 discusses the advantages and practical implications of nonlinear preferences as well as the need for greater consensus on ICE vocabulary and methodology.

## 2 Basic ICE notation and terminology

### 2.1 ICE outcomes

An ICE outcome is represented by a pair of expected treatment differences, usually expressed in Cartesian coordinates as (ΔE, ΔC). Here, ΔE is a difference in average treatment effectiveness of the form “new” treatment minus “standard” treatment. The underlying effectiveness measurement needs to be defined in such a way that larger (more positive) values of ΔE are unambiguously more favorable to the new treatment. The corresponding difference in average per-patient cost, ΔC, must be such that smaller (more negative) values are unambiguously more favorable to the new treatment.

### 2.2 ICE transformations

Transformations of ICE outcome coordinates occur quite naturally. For example, interchanging the labels (new and standard) on the two treatments being compared would multiply both ΔE and ΔC by minus one.

Next, let λ denote society’s fixed “shadow price” for one unit of effectiveness. In other words, λ is a strictly positive *substitution rate* expressed in units of cost per unit of effectiveness. For any specified value of λ, a cost difference of y = ΔC is re-expressed in effectiveness units by dividing it by λ. Alternatively, the corresponding effectiveness difference of x = ΔE would be expressed in cost units by multiplying it by λ. An arbitrary ICE outcome (ΔE, ΔC) thus gets transformed into either (x, y) = (λΔE, ΔC) in cost units or else into (x, y) = (ΔE, ΔC/λ) in effectiveness units. Either choice represents a *standardized*, *canonical form* for expected, overall treatment differences that clearly depends upon choice of the fixed numerical value of λ.

## 3 ICE preference maps

*standardized*ICE outcomes. Our three primary interpretation conventions (assumptions) for P(x, y) will be as follows:

P(x, y) = 0 means that the (x, y) pair of treatment differences correspond to no preference whatsoever, either for the new treatment over the standard treatment or vice-versa.

P(x, y) > 0 means that the treatment currently called new is preferred over the treatment currently called standard. Strictly positive P(x, y) values are at least ordinal measures of strength of preference for the new treatment over the standard treatment.

P(x, y) < 0 means that the treatment currently called standard is preferred over the treatment currently called new. The

absolute valuesof negative P(x, y) values are at least ordinal measures of strength of preference for the standard treatment over the new treatment.

*linear*preference map, NB(x, y) = x − y (Stinnett and Mullahy 1998), clearly satisfies all four of these axioms.

Four axioms of ICE preference

Indifference and direction of preference | P(x, y) = 0 when x = y, P(x, y) > 0 when x > y, and P(x, y) < 0 when x < y |

Monotonicity | P(x, y) ≥ P(x |

Re-labeling | P(x, y) = −P(−x, −y) |

Symmetry and anti-symmetry | P(x, y) = P(−y, −x) = −P(y, x) |

Note that the re-labeling, symmetry and anti-symmetry axioms represent additional restrictions on ICE preferences only when x ≠ y. After all, the P(x, x) = 0 property of the first axiom renders the implications of all other axioms moot for all outcomes with x = y.

The next 5 sub-sections discuss the meaning and interpretation of these four necessary axioms. The final sub-section then introduces our 2-parameter family of preference maps sufficient to satisfy the axioms and provide realistic, nonlinear generalizations of NB.

### 3.1 ICE indifference and the direction of preference

When x = y, society receives exactly the difference in effectiveness for which it pays, no more and no less. Therefore, there is no compelling reason to prefer one treatment relative to the other, i.e., P(x, y) = 0. If the new treatment is more effective, it is commensurably more costly. If the new treatment is less costly, it is commensurably less effective.

When x > y, the difference in effectiveness of the new treatment compared to the standard exceeds the cost difference between the treatments. Society thus receives a level of incremental effectiveness worth more than its incremental cost. Therefore, the new treatment is preferred over the standard, i.e., P(x, y) > 0.

When x < y, the difference in cost is larger than the difference in effectiveness. Society thus receives a level of incremental effectiveness worth less than its incremental cost. The standard treatment is then preferred over the new treatment, i.e., P(x, y) < 0.

This first axiom is by far the most restrictive of the four considered here. It dictates an infinite, linear interface of standardized slope 1 separating positive from negative ICE preferences. In original units, this is the line ΔC/ΔE = λ with slope determined by the shadow price of health, which ICE analysts may wish to deliberately vary to perform sensitivity analyses that turn out to be anything but subtle.

### 3.2 ICE monotonicity

Complete preference orderings of all outcomes of the ICE plane are subject to ongoing research and debate. However, a fundamental property of all sensible preference maps is that P(x, y) ≥ P(x_{0}, y_{0}) for all x ≥ x_{0} and all y ≤ y_{0}. If the effectiveness of a new treatment is increased at the same time its ultimate cost is decreased, preference for that new product over a fixed standard treatment certainly cannot decrease. Remember that we are assuming that x has been defined so that larger (more positive) values of standardized effectiveness are more favorable to the treatment currently called new. Similarly, y must be defined so that smaller (more negative) values of standardized cost are more favorable to the treatment currently called new.

### 3.3 ICE re-labeling

One meaning of P(x, y) = −P(−x, −y) is that, when reversing treatment labels (new and standard) on a single pair of treatments, the direction of preference is reversed while the strength of preference is preserved. The implications of this axiom are broader in the sense that this same preference equality must also hold when a fixed new treatment is either preferred or not preferred to a fixed standard by a fixed, specified amount. This axiom imposes a form of *fairness* or *even-handedness* upon head-to-head ICE treatment comparisons.

### 3.4 ICE symmetry

Axiom 4 can be expressed in two equivalent ways. Starting with the third axiom plus either form of the fourth axiom, the other form of the fourth axiom follows immediately by simple algebra. For example, the re-labeling property, P(x, y) = −P(−x, −y), can be combined with the symmetry property, P(x, y) = P(−y, −x), to yield P(x, y) = −P(y, x), which is the anti-symmetry property.

*identical*, P(x, y) = P(−y, −x). In other words, for any (x, y) outcome, the alternative outcome of (−y, −x) must yield the exact same strength of preference in the same direction (new over standard or vice versa). The “named” wedge-shaped segments of the ICE plane discussed in Laupacis et al. (1992) and illustrated here in Fig. 1 were apparently the first depictions ICE preference symmetry.

Suppose now that (x_{o}, y_{o}) is any fixed point within the NE quadrant, 0 < x_{o} and 0 < y_{o}. As a result, (−y_{o}, −x_{o}) is then a fixed point within the SW quadrant, as illustrated in Fig. 1. Denoting the standardized ICE ratio (slope) corresponding to (x_{o}, y_{o}) by s_{o} = y_{o}/x_{o} > 0, it follows that the standardized slope corresponding to the outcome pair (x, y) = (−y_{o}, −x_{o}) is s = y/x = −x_{o}/−y_{o} = +1/s_{o} > 0. In other words, one immediate implication of the ICE symmetry axiom is that all (x_{o}, y_{o}) and (−y_{o}, −x_{o}) pairings with *equivalent preferences* have *standardized* ICE ratios, s = y/x, that are *numerical reciprocals* or *inverses*.

This inverse relationship has considerable intuitive appeal. Within the NE quadrant, s = y/x is a positive “loss over gain” ratio; the positive numerator represents an undesirable additional cost (loss) while the positive denominator represents a desirable increase in effectiveness (gain). Meanwhile, within the SW quadrant, s = y/x is a positive “gain over loss” ratio; the negative numerator represents a desirable cost reduction (gain) while the negative denominator represents an undesirable reduction in effectiveness (loss).

In other words, numerically *small* and positive standardized ICE ratios are desirable within the NE quadrant where they represent loss/gain ratios, while numerically *large* and positive standardized ICE ratios are desirable within the SW quadrant (Heitjan et al. 1999a, b) where they represent gain/loss ratios. In Fig. 1, these two regions are represented by the wedges labeled “Favorable (B)” that are colored yellow-green. By assuring that outcomes within the NE and SW quadrants that yield *equivalent preferences* also yield standardized ICE ratios that are *numerical reciprocals* (y_{o}/x_{o} and −x_{o}/−y_{o }= x_{o}/y_{o}), the ICE symmetry axiom simply *formalizes basic intuition*.

Within the South East (SE) quadrant [dark green, labeled “Highly Favorable (A)” in Fig. 1], s = y/x is a negative “gain over gain” ratio; the negative numerator represents a desirable cost reduction while the positive denominator represents a desirable increase in effectiveness for the new treatment. All ICE ratios for outcome differences within the SE quadrant thus represent a distinct preference for the new treatment over the standard.

Finally, within the North West (NW) quadrant [dark red, labeled “Highly Unfavorable (E)” in Fig. 1], s = y/x is a negative “loss over loss” ratio; the positive numerator represents an undesirable added cost while the negative denominator represents an undesirable reduction in effectiveness for the new treatment. All NW quadrant outcome differences thus represent a distinct preference for the standard treatment over the new treatment.

Next, note that the *linear* preference map, NB(x, y) = x – y, possesses a purely *optional* property that is much stronger and more restrictive than P(x, y) = P(−y, −x). This linear NB preference is constant everywhere on the straight line passing through the points (x, y) and (−y, −x). Again, when one’s preference map is linear, preference is assumed constant on all straight lines (x – y = constant) that are parallel to the lower-left to upper-right diagonal (x = y) of the ICE plane.

Finally, note that ICE preference symmetry property, P(x, y) = P(−y, −x), does impose an additional restriction besides reciprocal ICE ratios, y/x and −x/−y = x/y, for the corresponding outcomes pairings. Namely, all such outcome pairs clearly also have the same ICE radius, \( {\text{r}} = {\sqrt {{\text{x}}^{2} + {\text{y}}^{2} } }. \)

### 3.5 ICE anti-symmetry

As previously noted, the ICE anti-symmetry axiom can be viewed as following directly from the re-labeling and symmetry restrictions. In its own right, the anti-symmetry requirement that P(x, y) = −P(y, x) is quite intuitive. It requires symmetry in *strength* of preferences about the x = y diagonal. However, this property is called anti-symmetry here because the *direction* of preferences is *reversed* on the two different sides of the x = y diagonal. After all, when pairs of outcomes of the form (x, y) and (y, x) are not on the x = y diagonal, they are symmetrically located relative to this x = y diagonal.

### 3.6 Two-parameter ICE preference maps

^{γ}denotes a “signed-power.” Specifically, {z}

^{γ}denotes the product of sign(z) [which is +1, 0 or −1] times the absolute value of z raised to the power γ. Special care needs to been taken in Eq. 1 because non-integer powers of negative real numbers are generally imaginary; ICE preferences need to be expressed as

*real numbers*, with possibly only ordinal measures of strength.

*indifference curves*(level curves, iso-preference contours) drawn using the contourplot() function within the “lattice” graphics package for R (The R Project for Statistical Computing 2007). Rather

*round*nonlinear maps like the one in Fig. 2a result when γ < β; the linear map, NB(x, y) = x – y, of Fig. 2b results when γ = β = 1; and

*highly directional*nonlinear maps like that of Fig. 2c–d result when γ > β. Note that the γ/β ratio of 0.25 for the map of Fig. 2a is well above the 1/Ω ≈ 0.1716 lower limit allowed under restriction (2), while γ/β = 4 of Fig. 2c is well below the Ω ≈ 5.828 upper limit for maps possessing ICE monotonicity.

## 4 Returns-to-scale

*f*. In other words, the observed effectiveness difference of x becomes

*f*times x, while the observed cost difference of y becomes

*f*times y. The resulting new value of preference in Eq. 1 is then

## 5 Willingness-to-pay or accept

WTP or WTA at any point on the ICE plane is assumed here to be determined by the iso-preference contour that passes through that given point. In fact, we define a *standardized* “willingness” rate (of WTP/λ within the NE quadrant or WTA/λ within the SW quadrant) as being equal to the dy/dx slope of the tangent to the iso-preference contour at the point of interest. This is fully consistent with NB analysis in which iso-preference contours are straight lines of slope WTP = WTA = λ. For example, the standardized value for all three of these quantities is +1 at all points in Fig. 2b.

*standardized*willingness rate at (x, y) for all signed-power ICE preference maps of form (1) is

*w*(x, y) represents either

[a] a non-negative value of WTP/λ when ΔE, x, ΔC and y are all positive

[b] a non-negative value of WTA/λ when ΔE, x, ΔC and y are all negative.

Since β and γ are unitless parameters and x and y are both measured here in the same units, it follows that η = β/γ, s = y/x and *w*(x, y) are all unitless quantities.

Note that the willingness rate evaluated at any point (x, y) is generally different from the standardized ICE ratio, s = y/x, corresponding to that point. On the other hand, when η = 1, *w*(x, y) ≡ 1 is a fixed value for all points (x, y) and all directions s = y/x.

For any fixed value of η different from 1, the standardized willingness rate (4) varies only with s. In other words, standardized willingness is then constant everywhere along every straight-line trajectory, s, passing through the origin of the ICE plane …except at the ICE origin itself. After all, neither *w*(0, 0) nor s = y/x are well defined at the ICE origin! Unlike ICE preferences, P(x, y), standardized willingness clearly does not vary with ICE radius, \( {\text{r}} = {\sqrt {{\text{x}}^{2} + {\text{y}}^{2} } }, \) within the 2-parameter maps of Eq. 1.

*w*(x, y) of (4) when η ≠ 1 are [2 + κ (

*η*

^{1/2}−

*η*

^{−1/2})]/[2 − κ (

*η*

^{1/2}−

*η*

^{−1/2})] for κ = ±1; both limits are non-negative when 1/Ω ≤ η ≤ Ω, which is also the restriction (2) that assures ICE monotonicity. The corresponding standardized directions are s = (1 + κ

*η*

^{1/2})/(1 − κ

*η*

^{1/2}), which are both positive when η < 1 as in Fig. 2a and both negative when η > 1 as in Fig. 2c–d. The maximum and minimum values for

*w*(x, y) are +∞ and 0 only in the limiting case η = Ω ≈ 5.828, and the pair of directions pointing to these limits then have (reciprocal) negative slopes of \( {\text{s}} = 1 - {\sqrt 2 } \) and \( 1 - {\sqrt 2 } \) (i.e., ICE Angles of θ = ±22.5° in Figs. 2d and 3).

## 6 Symmetry, dual ICE rays and (WTP, WTA) pairings

The ICE symmetry axiom (Obenchain 2000, 2001) dictates a number of key relationships. Again, the major implication of P(x, y) = P(−y, −x) is perhaps that these symmetric outcomes have standardized slopes, s = y/x, that are numerical reciprocals. But Eq. 4 then implies that *w*(x, y) and *w*(−y, −x) are also numerical reciprocals for all ICE maps of Eq. 1. The “dual rays” of Fig. 3 then consist of all standardized ICE outcome pairs of the form (*f*x, *f*y) and (−*f*y, −*f*x) with (x, y) fixed and ray slopes of s = y/x and 1/s, respectively, as *f* increases from 0 to +∞. Any such pair of dual rays also contains not only the *same distribution of preference strengths* (as a function of ICE radius) but also the *same direction of preference* (either always new over standard or vice versa).

In other words, the *un-standardized* ΔC/ΔE ratios corresponding to any such pair of dual rays are λs and λ/s, respectively, which are not reciprocals unless λ = 1. Similarly, the un-standardized willingness statistics of WTP = λ*w* or WTA = λ/*w* are the slopes of the iso-preference contours at all points where they cross one of these two rays. These *un-standardized* willingness slope pairings are also not reciprocals unless λ = 1. Note, furthermore, that these relationships *always hold*, for all choices of returns-to-scale, β, in (3) and all power parameter ratios, η = β/γ. Finally, choice of λ determines the orientation of the basic pattern of preference variation across the ICE plane and is clearly at least as important as choice of either β or η.

^{2}. In other words, for our signed-power ICE preference maps (or any differentiable ICE preference map satisfying the symmetry axiom), the following relationship will always hold

Equation 5 states that the shadow price of health is the geometric mean of all well-matched pairs of strictly positive WTP and WTA values. In other words, Eq. 5 shows that WTP and WTA can both vary simultaneously within a *fixed*, nonlinear ICE preference map corresponding to a single *fixed* value of λ. Specifically, relative to choice of shadow price, λ, choices for the values of β and η (or γ) parameters are *clearly less important*.

### 6.1 An additional “realism” restriction is still needed

Figure 3 portrays an accurate visualization of the numerical ordering between *w*, s and their reciprocals (i.e., 0 < s < *w* < 1 < 1/*w* < 1/s) only in cases where the η = γ/β ratio is > 1 in Eq. 4, as in the “highly directional” nonlinear map depicted in Fig. 2c. Unfortunately, η < 1 implies that 0 < s < 1/*w* < 1 < *w* < 1/s, which yields WTA < λ < WTP below the x = y diagonal in the rather “round” maps (η < 1) like Fig. 2a. This alternative ordering has apparently never been observed in empirical research on WTP and WTA (O’Brien et al. 2002; Willan et al. 2001).

In summary then, only the *nonlinear* ICE preference maps of form (1) with power parameter ratio, η = γ/β, confined to the finite interval of \( 1 < \eta \le \Upomega = 3 + 2{\sqrt 2 } \) can be fully realistic. And only the limiting maps with η = Ω allow willingness (standardized or un-standardized) to vary all of the way from 0 to +∞.

### 6.2 The Laupacis visualization of ICE preferences corresponds to the β = 0 limit

Figure 1 depicts the limit of our signed-power family of ICE preference maps of Eq. 1 as the returns-to-scale parameter, β, approaches *zero* (while the γ parameter is held fixed at any finite value). In other words, η then approaches +∞ and the standardized willingness of Eq. 4 becomes *w* = (s + s^{2})/(1 + s) = s in this limit. While failing to possess ICE monotonicity and allowing negative values for *w* in Eq. 4 within the SE and NW quadrants, these limiting maps still have iso-preference curves that correspond to pairs of dual rays with reciprocal slopes. They also *order preferences on* *ICE polar angle in the exact same way* that they are ordered on all of our β > 0 maps for outcomes at the *same ICE radius*.

On the other hand, these limiting (β = 0, γ > 0) maps are not very realistic precisely because they yield *zero returns-to-scale*. In other words, they ignore ICE radius as a potential, partial determinant of preference …especially within the SE and NW quadrants.

### 6.3 Symmetry and literature on (WTP, WTA) pairings

Preliminary standardization, in which (x, y) represents either (λΔE, ΔC) in cost units or (ΔE, ΔC/λ) in effectiveness units for a fixed numerical value of λ, is essential to be able to express not only standardized directions, s = y/x, but also standardized willingnesses, *w*, as unitless quantities. In turn, unitless quantities become absolutely essential when *reciprocals* are to be compared. After all, if the quantities being compared were not unitless, the statistic and its reciprocal would be expressed in different units, such as $/QALY and QALY/$, and clearly could not be meaningfully compared!

*simple geometric interpretations*consistent with the Laupicas et al. (1992) visualization of preferences. Specifically, the standardized slope (

*w*= s) of any WTP ray in Fig. 1 (β = 0, η = +∞) is \( {\text{s}} = {\sqrt {{\text{WTP}}/{\text{WTA}}} } < 1, \) while the standardized slope of the corresponding WTA ray is \( {\text{1/s}} = {\sqrt {{\text{WTA}}/{\text{WTP}}} } > 1. \) Similarly, the angular size of the “Favorable (B)” wedges and “Unfavorable (D)” wedges are all equal, again as depicted in Fig. 1. The size of these wedges in degrees is given by multiplying ArcTan(s), measured in radians, by 180 degrees and then dividing by π. Table 2 lists these measures for 10 of the 20 programs or products reported in Table 1 of O’Brien et al. (O’Brien et al. 2002).

WTA/WTP ratio | Standardized WTP slope | Standardized WTA slope | Angular size of the favorable (B) and unfavorable (D) wedges (degrees) | Angular size of the gray area (C) wedges (degrees) | |
---|---|---|---|---|---|

| |||||

Elk hunting | Min = 3.2 | 0.56 | 1.79 | 29 | 32 |

Deer hunting | 12.0 | 0.29 | 3.46 | 16 | 58 |

Woodland | 25.2 | 0.20 | 5.03 | 11 | 68 |

Trees in park | Max = 89.4 | 0.11 | 9.46 | 6 | 78 |

| |||||

New drug | 1.9 | 0.73 | 1.37 | 36 | 18 |

Injury (2 wk hospital) | 6.4 | 0.39 | 2.53 | 22 | 46 |

| |||||

Job (VSL) | Min = 1.1 | 0.94 | 1.07 | 43 | 4 |

Car | Max = 3.6 | 0.53 | 1.89 | 28 | 34 |

| |||||

Candy bar | Min = 1.3 | 0.89 | 1.12 | 42 | 6 |

Coffee mug | Max = 2.6 | 0.62 | 1.62 | 32 | 26 |

Note in Table 2 that the “Favorable (B)” and “Unfavorable (D)” wedges within the NE and SW quadrants are most narrow in the 7 environmental studies and most wide, approaching the maximum possible size of 45°, in some of the 4 safety and 7 experimental studies. For example, when this angle is only 6°, as in the “Trees in park” study, the corresponding Laupacis et al. (1992) “Gray Area (C)” wedges each measure 78°. At the other extreme of 43° for the “Favorable (B)” and “Unfavorable (D)” wedges in the “Job safety (VSL)” study, the “Gray Area (C)” wedges measure only 4° each.

Note that Eq. 5 does not actually establish numerical values for WTP, WTA, η, *w* or s but only a relationship between WTP, WTA and the shadow price of health, λ. In other words, WTP = WTA = λ is always one possibility. This is the *only possibility* in the purely linear NB formulation (β = γ = η = 1). In general, the cases where WTP = WTA = λ and *w* ≡ 1 because η = 1, including the linear map shown in Fig. 2B, are the only visualizations in which society is willing to pay the full shadow price of health.

Another interesting and realistic possibility is WTP < λ < WTA as in Figs. 1 and 2c–d as well as in Table 2. These are the cases where a bargain-seeking society or an individual may possibly be willing to pay only somewhat less than an established “fair” full price, possibly forcing providers to accept lower per-item profits while possibly also seeking higher volumes.

## 7 Nonlinear acceptability

The “acceptability curve” graph was originally proposed by Van Hout, Al, Gordon and Rutten (VAGR) (Van Hout et al. 1994) in 1994 to portray ICE uncertainty. Given either (i) a parametric, bivariate distribution (normal, say) with mean (ΔE, ΔC) that has been fitted to some observed patient-level data or else (ii) a bootstrap resampling distribution of ICE uncertainty, the VAGR curve depicts the estimated “confidence level” associated with the region to the right or below a rotating *straight line through the ICE origin* that starts out horizontal (representing WTP = 0) and rotates counter-clockwise by 90°, ending up being vertical (representing WTP = +∞). In 2004, Fenwick, O’Brien and Briggs (FOB) (Fenwick et al. 2004) cataloged as many as 13 “special cases” yielding VAGR curves with quite different shapes, ranging from rather flat, to increasing, to decreasing, to distinctly non-monotone.

*monotone non-decreasing*results from the unpublished alternative definition of acceptability independently proposed by me in 2001 and by Professor Ken Buckingham of Otago University, New Zeland, in 2003. My freely distributed software (Obenchain 2005, 2007) uses Buckingham’s terminology for

*Acceptability Levels In Cost Effectiveness*(ALICE) curves. For any

*given and fixed*positive value of λ, the ALICE frontier is defined using a

*pair of “kinked” dual ICE rays*(i.e., rays that remain symmetric relative to the x = −y diagonal while rotating so that their absolute ICE polar angle, \(|\theta| \), is constantly increasing). Table 3 compares the VAGR and ALICE definitions within all four quadrants of the ICE plane.

VAGR and ALICE definitions of acceptability

ICE quadrant | VAGR definition | ALICE definition |
---|---|---|

ΔC > 0, ΔE > 0 NE quadrant | Acceptable if ΔC/ΔE < λs | Acceptable if ΔC/ΔE < λs = WTP |

ΔC < 0, ΔE > 0 SE quadrant | All outcomes are acceptable | All outcomes are acceptable |

ΔC < 0, ΔE < 0 SW quadrant | Acceptable if ΔC/ΔE > λs (see Note) | Acceptable if ΔC/ΔE > λ/s = WTA |

ΔC > 0, ΔE < 0 NW quadrant | No outcomes are acceptable | No outcomes are acceptable |

In the notation of Table 3, the standardized ICE slope, s, is a unitless quantity that increases from 0 towards +∞, while λ denotes a *given, fixed* value for the shadow price of health. Within the VAGR column of definitions, the product of (λ times s) thus denotes a *variable* quantity corresponding to different *common values* for shadow price = WTP = WTA defining different *linear* VAGR thresholds for acceptability. Within the ALICE column of definitions, WTP = λs increases with s within the NE quadrant while WTA = λ/s simultaneously decreases with s within the SW quadrant, defining a range of *kinked* ALICE thresholds for acceptability, clearly satisfying Eq. 5. This ALICE definition of acceptability agrees with the sum of double integrals in (Willan et al. 2001), page 3255. Technically, interest could even be restricted to the finite range 0 ≤ s ≤ 1 for ALICE curves because s > 1 corresponds to WTA < λ < WTP, again an ordering that has apparently never been observed empirically.

Note in Table 3 that the VAGR and ALICE definitions of acceptability *differ only within the SW quadrant*. VAGR and ALICE curves thus contain the same basic information (displayed using different horizontal axes) whenever the ICE uncertainty distribution attributes zero credibility to the SW quadrant. At the other extreme, where 100% credibility is attributed to the SW quadrant, the VAGR and ALICE curves are again equivalent, but the VAGR curve would then be decreasing while the ALICE curve is increasing, as usual. In other cases, we will see that VAGR curves are non-monotone and biased.

Cases where the ICE uncertainty distribution attributes credibility not only to the SE quadrant but also to the most desirable parts of *both* the SW and NE quadrants are particularly important. We will now consider a numerical example of this “high uncertainty” type.

### 7.1 Comparison of VAGR and ALICE curves for a high uncertainty example

Patient self-reported health-care-utilization above and beyond that provided within study protocol was collected using the Resource Utilization Survey (Copley-Merriman et al. 1992) with published 1998 $/unit costs (Schoenbaum et al. 2001) rounded to the nearest multiple of $50.00. $/Week was then calculated by multiplying (total accumulated cost) for a patient by 7 and dividing by the (total days of cost accumulation) for that patient. For patients who discontinued from the study, this is Average-Value-Carried-Forward imputation.

Measures of effectiveness in this study were derived from blinded, clinical assessments on the Hamilton Depression Rating Scale (Hamilton 1967). With missing values imputed via MMRM models (Goldstein et al. 2004), the measure of overall effectiveness described here will be “integrated” decrease in HAMD-17 score from baseline to endpoint, which is a (signed) area-under-the-curve measure. (with larger being more favorable to the new treatment over standard.)

In any case, Fig. 4 shows that our example illustrates a common situation. Relative to the standard treatment (paroxetine), the new treatment (duloxetine) here could represent what is known (somewhat derisively) as a “me too” treatment for MDD. Specifically, the bootstrap distribution of uncertainty here completely covers the ICE origin and lends considerable credibility to at least 3 of the 4 ICE quadrants, at least when the $/Week difference in medication *acquisition cost* is zero, as assumed here. Our objectives in exploring this particular case-study example are two-fold. We intend to convince you not only (i) that the new treatment is at least somewhat cost-effective relative to the standard treatment in cases like that depicted in Fig. 4 but also that (ii) traditional VAGR acceptability curves are biased towards their average value in these critical “high uncertainty” cases. In particular, the all-important lower values of VAGR acceptability are biased upwards.

Figure 5 displays the non-monotone VAGR acceptability curve for our high uncertainty example that corresponds to a relatively wide range (from 0 to 5) for the unknown common value of WTP = λs = WTA. Only one numerical value within this wide range of alternative values for λs can correspond to the “true” shadow price of health.

It is not clear how VAGR or FOB themselves would interpret the information provided by Fig. 5. Because acceptability is always rather high (>0.80) over the finite range displayed here and the curve is also rather flat (max – min < 0.09), outcomes researchers might conclude that (i) choice of λ is relatively unimportant here and/or that (ii) the odds that the new treatment is more cost-effective than standard are at least 4:1 because 0.80/0.20 = 4.

Figure 6 displays the corresponding, monotone ALICE curve for this high uncertainty example. By covering the *finite range* for absolute ICE angles of 45° ≤ \(|\theta| \) ≤ 135°, the full *infinite range* of 0 ≤ s ≤ +∞ is easily visualized in Fig. 6. Furthermore, the values of the ICE Ratio = WTP displayed in Fig. 5 are now seen to be equally spaced along the horizontal axis of Fig. 6. Finally, Fig. 6 assumes that λ = 0.26 $/Week/IDBAT is the fixed, most relevant value for the true shadow price of health and also allows an overall acceptability level to be determined for all possible budget constraints of the form WTP = λs with s < 1 plus, by symmetry, WTA = λ/s.

Different choices for λ would yield different ALICE curves. However, all such alternative ALICE curves for a given set of data would have the same starting and ending points at \(|\theta| \) = 45° (s = 0) and \(|\theta| \) = 135° (s = +∞). Namely, the smallest ALICE value (0.6441 here) will always be the estimated confidence that the new treatment is both “less costly AND more effective” than standard, while the largest ALICE value (0.9608 here) will always be the estimated confidence that the new treatment is either “less costly OR more effective” than standard. These limits correspond to the two key ICE quadrant confidence levels needed to quantify *statistical dominance* (Obenchain et al. 2005).

Note that WTP = 0.26 $/Week/IDBAT yields the *maximum* VAGR acceptability (of 0.8870) in Fig. 5. Thus, it follows that no ALICE curve for any alternative value of λ (different from 0.26 $/Week/IDBAT) can yield a larger acceptability level than the value (of 0.8870) displayed in Fig. 6 at \(|\theta| \) = 90° (s = 1). We are definitely not recommending that the numerical value of λ used to define ALICE levels be routinely chosen in this way. After all, this particular choice of λ is again, in a weak sense, most favorable to the new treatment!

Rather, our point here is simply that VARG acceptability curves, by using only alternative *linear frontiers* (WTP = λ = WTA), are badly biased in all high uncertainty cases where the VARG curve ends up being flat or non-monotone. ALICE curves are then much less biased (upwards or downwards) because they use realistic *kinked frontiers*. Even the ALICE curve that is biased upwards *as much as possible*, as in Fig. 6, still suggests that administrative budget constraints (that reduce WTP and, when fair, also increase WTA) can *drastically decrease the overall acceptability level* of new over standard. This reduction is from 0.8870 at \(|\theta| \) = 90° (s = 1) to 0.6441 at \(|\theta| \) = 45° (s = 0) in Fig. 6, which is a reduction in confidence of 0.243.

After all, for two exactly *equivalent treatments*, the VAGR acceptability is expected to always be 0.50 for all values of WTP. The corresponding ALICE level would then also be expected to equal 0.50 at s = 1 (\(|\theta| \) = 90°), but it would be *expected to drop* to 0.25 at s = 0 (\(|\theta| \) = 45°) as well as *to rise* to 0.75 at s = +∞ (\(|\theta| \) = 135°), at least when cost and effectiveness differences are uncorrelated.

In all cases where the ICE bootstrap uncertainty distribution lends credibility to only one quadrant or to at most two adjacent quadrants, the information contained in VAGR and ALICE curves will really be equivalent (even when the VAGR curve is decreasing due to increases in WTA). In these relatively simple (lower uncertainty) cases, VAGR acceptability is not really biased relative to the corresponding ALICE level.

### 7.2 Advantages and disadvantages of VAGR and ALICE curves

ALICE curves always concentrate attention upon only the *uncertainty within the available data* supporting an ICE policy decision rather than upon any uncertainty about λ itself. Whenever a VAGR curve is non-montone, it is actually also depicting additional uncertainty about λ. When a VAGR curve is monotone, Table 3 shows that it can be reinterpreted as corresponding to a fixed value of λ. For example, when a VAGR curve is non-decreasing, it can be reinterpreted as displaying the uncertainty associated with values of WTP less than any value of ICE Ratio = λ within the plotting range. When a VAGR curve is non-increasing, it can be reinterpreted as displaying the uncertainty associated with values of WTA larger than any value of ICE Ratio = λ within the plotting range.

On the other hand, it is difficult to appreciate how a VAGR acceptability curve or an ALICE curve could be viewed as being a better graphical summary of ICE uncertainty than the bootstrap scatter itself! Given a scatter of bootstrap ICE uncertainty outcomes, (ΔE, ΔC), it will sometimes be fairly easy to visualize the corresponding VAGR or ALICE curve(s). The inverse problem of visualizing a bootstrap uncertainty scatter from its VAGR or ALICE curve is much more difficult. Specifically, all information about ICE radius (and thus returns-to-scale) has been discarded in the VARG and ALICE formulations.

## 8 Statistical uncertainty and economic preference variation

To this point, we have concentrated upon the more “desirable” features of ICE preference maps. It’s important to note that these positive aspects of preference quantification hold within a context where λ can be held fixed, as in Eq. 5, while both WTP and WTA are allowed to vary. In contexts where λ itself is deliberately varied, the strong implications of our first ICE preference axiom lead immediately to disturbing contradictions about economic preferences that totally dominate or “swamp” any statistical uncertainty observable from patient level outcomes data.

*ray confidence limits*computed via the “count outwards” algorithm (Obenchain 1999). This point is dramatically illustrated in Figs. 7 and 8 for the high-uncertainty Dulx-Parx example introduced in Sect. 7.1.

Unfortunately, Figs. 7 and 8 also illustrate, by coloring bootstrap ICE outcomes within the equivariant confidence wedge with alternative preferences, that deliberately varying λ leads to incoherent ICE evaluations.

Because the “count outwards” confidence wedge (Obenchain 1999) displays equivariance under changes in λ, we contend that it quantifies only the *statistical uncertainty* within the two samples of patient level cost and effectiveness data about where the unknown true (ΔE, ΔC) outcome falls on the ICE plane. We do not wish to imply that sensitivity analyses concerning choice of λ should not be performed. However, we do think that health services researchers need to be much more aware of the extent of the distinct trauma injected into evaluations of economic preference by deliberately varying the numerical shadow price. After all, *economic uncertainty* about λ is quite separate from the quantifiable *statistical uncertainty* derived from patient level measurements of effectiveness and cost.

Because all ICE preference maps depend fundamentally upon choice of λ, and possibly also upon choice of returns-to-scale, β, and preference variation (shape) parameters, γ or η, we maintain that ICE preference maps are much more useful in *interpreting* the meaning of an equivariant ICE confidence region, as in Figs. 7 and 8, than in *defining* any such region.

## 9 Conclusions

There are two senses in which our efforts to consider nonlinear ICE preferences that are more realistic than linear NB have failed to circumvent the basic shortcomings of all ICE maps. First, all ICE maps attempt to reduce the effective dimensionality of the ICE decision space from two dimensions, (effectiveness, cost), to only one dimension …a scalar (possibly ordinal) measure of overall preference. Proponents of linear NB argue that confidence intervals for *x*-*y* differences are easier to construct and interpret than confidence intervals for standardized ICE slopes, s = *y*/*x*, or (unstandardized) ICE Ratios, λ × s. The reality is simply that NB confidence intervals are based upon overly simplistic and clearly unrealistic assumptions (like WTP = λ = WTA) that also make them much easier to *misinterpret*.

Second, due to the first axiom of ICE preferences, all ICE maps (linear or nonlinear) are much too sensitive to choice of λ to allow outcomes researchers to entertain a wide range of mutually exclusive and contradictory alternative values for the shadow price of health. While Figs. 7 and 8 illustrate that this practice can always inject unwanted incoherence, the effect is even more obvious in those situations where the central polar ICE angle of the wedge is less than 90° and the wedge lies mostly within the NE or SW quadrant.

On the other hand, our two-parameter, nonlinear ICE maps do illustrate some profound new relationships between basic ICE concepts.

### 9.1 Practical implications of the “link” function

As illustrated in Table 3, the “link” function, \( \lambda = {\sqrt {{\text{WTP}} \times {\text{WTA}}} }, \) justifies and quantifies the “kink,” 0 ≤ WTP < WTA, in empirically observed consumer’s thresholds. Perhaps even more importantly, this link provides an *objective way to determine* λ. One simply elicits pairs of matched WTP and WTA values that may vary from patient to patient within a specified disease state and then looks for across-patient consistency in the geometric means of these WTP and WTA estimates. In particular, there is no need to express effectiveness in QALYs when eliciting WTP and WTA values to determine λ in this way. Being able to use natural, disease specific units simplifies the elicitation process and should improve empirical accuracy. Like traditional views of λ itself, the concept of a QALY is actually quite complex (Johnson 2005). In fact, the link function may well provide the very definition of a “fair” shadow price; all other definitions apparently suffer serious vagaries and shortcomings (Gafni and Birch 2006).

*profoundly different*from the relatively naïve linear NB perspective where WTA = λ = WTP is assumed.

Example: Assume that λ really is the often mentioned value of $50,000/QALY but that local government authorities or local payers in some particular region insist that $10,000/QALY is the maximum additional cost that they can possibly agree to pay. This is a simple budget constraint that

does nothing to change the full, fair shadow price of health, yet it does reduce the local WTP to λ/5. It would then be absolutely unfair and arbitrary to assume that that this sort of budget (maximum cost) restriction should also reduce WTA to λ/5. Instead, the \( \lambda = {\sqrt {{\text{WTP}} \times {\text{WTA}}} } \) link shows that the corresponding “fair” value of WTA would therebyincreaseto 5λ = $250,000/QALY. In other words, only treatments that reduce both cost and effectiveness by at least a net ICE ratio of $250,000/QALY have as high preferences to society as the desirable treatments that increase both cost and effectiveness by less than a net ICE ratio of $10,000/QALY.

Finally, no single nonlinear ICE map, implied by explicit numerical choices for the β and γ power parameters in Eq. 1, really needs to be singled out for preferred use. All *symmetric, differentiable maps* necessarily satisfy the same geometric-mean relationship, \( \lambda = {\sqrt {{\text{WTP}} \times {\text{WTA}}} }, \) at all standardized outcome points (*x*, *y*) and (−*y*, −*x*).

### 9.2 Practical implications for ICE angle confidence and tolerance regions/intervals

In the “intervals or surfaces?” terminology of Briggs and Fenn (1998), ICE inference methods clearly need to be based upon 2-dimensional confidence regions (surfaces) rather than upon an infinite family of at least partially self-contradictory confidence intervals for overall preference that result from deliberately varying λ (Laska et al. 1999; Stinnett 1999; Stinnett and Mullahy 1998). There is no current consensus about how geometrically simple (easy to define in written text) the boundary of an “ideal” ICE confidence region needs to be.

What is clear is that wedge-shaped confidence regions (Briggs and Fenn 1998; Chaudhary and Stearns 1996; Cook and Heyse 2000; Obenchain 1997, 1999; Willan et al. 2001) have the potential to focus attention upon meaningful sub-regions of the ICE plane and to suggest clear preference-based actions. Specifically, the counter-clockwise (upper?) and clockwise (lower?) limiting ICE rays defining a wedge-shaped confidence region also define a confidence interval for the ICE ratio (s = y/x or ΔC/ΔE).

When applying Fieller’s theorem (Chaudhary and Stearns 1996), the computed limits may be imaginary for high levels of confidence, implying that the ICE ratio could then be any positive or negative numerical value. For example, the highest confidence level for which Fieller limits exist in the high-uncertainty Dulx-Parx example is 76%, with a counter-clockwise upper limit of slope zero and a clockwise lower limit also of slope zero. In fact, the equivariant bootstrap ICE confidence region with 76% confidence also has a central polar angle of almost 180° and consists of essentially the entire SE and SW quadrants, i.e., all positive or negative ICE ratios.

When the computed Fieller limits are real, they are the slopes of a pair of straight lines through the ICE origin, and the analyst still must determine which two of the resulting four ICE rays define the confidence wedge of interest. The correct choice becomes clear once the rays are plotted on the ICE plane along with the observed ICE outcome pair, (x, y) or (ΔE, ΔC); the correct pair of ICE rays then consists of the two rays closest to the observed ICE outcome point, counter-clockwise and clockwise, respectively.

When applying bootstrap methods (Briggs and Fenn 1998; Cook and Heyse 2000; Obenchain 1997, 1999; Willan et al. 2001), wedge-shaped confidence regions for all levels of confidence always exist. On the other hand, if a 95% confidence wedge were to occupy anywhere near 95% of the full ICE plane in terms of polar angular measure (i.e., 0.95 × 360° = 342°), that region would certainly not be very meaningful or interesting. For the high uncertainty Dulx-Parx example displayed in Figs. 7 and 8, the 95% confidence wedge subtends an ICE polar angle of 237°, which is only 65.8% of 360° and thus is at least somewhat restrictive and informative.

Furthermore, the minimum and maximum values of ICE radius observed for bootstrap outcomes falling strictly within a wedge-shaped ICE confidence region are easily computed. These ICE radii are expressed in the same units as both x and y (either effectiveness units or cost units). In high-uncertainty cases, the minimum observed ICE radius will be essentially zero, but the maximum will always be finite. In any case, a (strictly bounded) “wiper-blade” shaped ICE confidence region with a lower nominal confidence percentage than the original wedge-shaped region can be defined by counting “inward” a specified number of ICE *radius order statistics*, thereby decreasing the maximum ICE radius and/or increasing the minimum ICE radius.

Alternatively, a bootstrap ICE confidence wedge can be converted into an interesting ICE tolerance wedge or an ICE ratio tolerance interval (Obenchain 1999) by simply including a few additional ICE angle order statistics within the wedge. For example, suppose that 25,000 bootstrap replications are computed (default value in Obenchain 2005, 2007) and that ICE angle order statistics are sorted around a full circle centered at the ICE origin. Any set of 0.95 × 25 K = 23,750 consecutive such order statistics then constitutes a 95% confidence wedge in the high-uncertainty cases where the ICE origin falls within the convex hull of the bootstrap ICE uncertainty scatter. The equivariant bootstrap confidence wedge for 25 K replications always results from counting outwards (counter-clockwise and clockwise, respectively) 11,875 consecutive ICE angle order statistics from the observed ICE ratio (i.e., the sample which uses the observed outcomes for each patient exactly once). The corresponding equivariant bootstrap tolerance wedge for 25 K replications that includes *at least 95% of the entire ICE uncertainty distribution* with 95% confidence then results from counting outwards 11,904 consecutive ICE angle order statistics from the observed ICE ratio (Obenchain 1999).

In view of the above non-parametric results, we contend that “count outwards” bootstrap ICE angle confidence/tolerance regions provide robust answers. Furthermore, our experience is that these bootstrap confidence limits are generally in very good agreement with Fieller limits in medium-to-large sample situations when Fieller’s theorem does yield real limits (Chaudhary and Stearns 1996). Unfortunately, simulation studies published before 1998 (Briggs and Fenn 1998) and since (Fan and Zhou 2007) tend to focus on questionable computational algorithms and report somewhat contradictory results. This is clearly an area where much greater consensus is badly needed.

### 9.3 Pragmatic choice of ICE preference map

Suppose the question is: “Are there a few distinctive forms of ICE maps that tend to characterize the full spectrum of fundamentally different preference patterns?” There might then be as few as only three saliently different types of ICE maps. The Laupacis et al. (1992) map of Fig. 1 (with β = 0) and the linear NB map (Stinnett and Mullahy 1998) of Fig. 2b (with β = γ = 1) clearly represent two simple and historic forms. Rather than consider a range of values for the η = γ/β shape parameter of realistic, nonlinear ICE preferences, it makes sense to concentrate upon the extreme maps with shape η = Ω ≈ 5.828 of (2) that satisfy the monotonicity axiom and allow willingness, (4), to be any non-negative value, as in Figs. 2d and 8. This third extreme form of ICE preference map still allows β to be <1, =1 or >1 to determine decreasing, constant or increasing returns-to-scale, (3).

The realistic, nonlinear ICE preference maps proposed here *encompass the entire ICE plane* rather than focus attention upon any particular sub-region. It can be quite confusing and counter-productive to, instead, use different basic terminology (Kent et al. 2004) within the NE and SW quadrants. Similarly, the FOB suggestion (Fenwick et al. 2004) to divide the ICE plane up into many sub-regions is tedious and counter-productive.

Better comparisons of alternative methodologies for accessing uncertainty in ICE estimates and better vocabulary for generally communicating uncertainty are clearly needed. To yield realistic and reliable results, bootstrap computing algorithms actually need to be somewhat sophisticated. While use of polar coordinates in ICE inference can be a great help in arriving at the “right” answer, many outcomes researchers and most policy makers are clearly uninterested in this high level of detail. People expect results expressed in Cartesian coordinates or as ratios. Realistic approaches to ICE treatment comparisons must ultimately address the unprecedented challenge of making a truly bivariate (2-dimensional) inference problem meaningful to non-technical audiences.

## Notes

### Acknowledgements

I wish to thank my Lilly colleagues, particularly Gerhardt Pohl and Joe Johnston, for their invaluable comments on numerous versions of these materials. I also wish to thank Ken Buckingham for personal communications about his give/get (loss/gain) motivation for ICE symmetry communicated here in Sect. 3.4.

## Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

## References

- Black, W.C.: The CE plane: a graphic representation of cost-effectiveness. Med. Decis. Making
**10**, 212–214 (1990)PubMedCrossRefGoogle Scholar - Briggs, A.H., Fenn, P.: Confidence intervals or surfaces? Uncertainty on the cost-effectiveness plane (Student Corner.). Health Econ.
**7**, 723–740 (1998)PubMedCrossRefGoogle Scholar - Chaudhary, M.A., Stearns, S.C.: Estimating confidence intervals for cost-effectiveness ratios: an example from a randomized trial. Stat. Med.
**15**, 1447–1458 (1996)PubMedCrossRefGoogle Scholar - Cook, J.R., Heyse, J.F.: Use of an angular transformation for ratio estimation in cost-effectiveness analysis. Stat. Med. 19, 2989–3003 (2000)PubMedCrossRefGoogle Scholar
- Copley-Merriman, C., Egbuonu-Davis, L., Kotsanos, J.G., Conforti, P., Franson, T., Gordon, G.: Clinical economics: a method for prospective health resource data collection. Pharmacoeconomics 1(5), 370–376 (1992)PubMedCrossRefGoogle Scholar
- Fan, M.Y., Zhou, X.H.: A simulation study to compare methods for constructing confidence intervals for the incremental cost-effectiveness ratio. Health Serv. Outcomes Res. Method 2007; doi: 10.1007/s10742-006-0017-9
- Fenwick, E., O’Brien, B.J., Briggs, A.H.: Cost-effectiveness acceptability curves—facts, fallacies and frequently asked questions. Health Econ.
**13**: 405–415 (2004)PubMedCrossRefGoogle Scholar - Gafni, A., Birch S.: Incremental cost-effectiveness ratios (ICERs): the silence of the lambda. Soc. Sci. Med.
**62**, 2091–2100 (2006)PubMedCrossRefGoogle Scholar - Goldstein, D.J., Lu, Y., Detke, M.J., Wiltse, C., Mallincrodt, C., Demitrack, M.A.: Duloxetine in the treatment of depression—a double-blind, placebo-controlled comparison with paroxetine. J. Clin. Psychopharmacol.
**24**, 389–399 (2004)PubMedCrossRefGoogle Scholar - Hamilton, M.: Development of a rating scale for primary depressive illness. Brit. J. Soc. Clin. Psychol.
**6**, 278–296 (1967)Google Scholar - Heitjan, D.F., Moskowitz, A.J., Whang, W.: Problems with interval estimates of the incremental cost-effectiveness ratio. Med. Decis. Making
**19**, 9–15 (1999a)PubMedCrossRefGoogle Scholar - Heitjan, D.F., Moskowitz, A.J., Whang, W.: Bayesian estimation of cost-effectiveness ratios from clinical trials. Health Econ.
**8**, 191–201 (1999b)PubMedCrossRefGoogle Scholar - Johnson, F.R.: Einstein on willingness to pay per QALY: is there a better way? [editorial.] Med. Decis. Making 25, 607–608 (2005)PubMedCrossRefGoogle Scholar
- Kent, D.M., Fendrick, A.M., Langa, K.M.: New and dis-improved: on the evaluation and use of less effective, less expensive medical interventions. Med. Decis. Making
**24**, 281–286 (2004)PubMedCrossRefGoogle Scholar - Laska, E.M., Meisner, M., Siegel, C., Stinnett, A.A.: Ratio-based and net benefit-based approaches to health care resource allocation: proofs of optimality and equivalence. Health Econ.
**8**, 171–174 (1999)PubMedCrossRefGoogle Scholar - Laupacis, A., Feeny, D., Detsky, A.S., Tugwell P.X.: How attractive does a new technology have to be to warrant adoption and utilization? Tentative guidelines for using clinical and economic evaluations. Can. Med. Assoc. J.
**146**(4), 473–481 (1992)Google Scholar - O’Brien, B.J., Gertsen, K., Willan, A.R., Faulkner L.A.: Is there a kink in consumers’ threshold value for cost-effectiveness in health care? Health Econ.
**11**(2), 175–180 (2002)PubMedCrossRefGoogle Scholar - O’Brien, B.J., Sculpher, M.J.: Building uncertainty into cost-effectiveness rankings: portfolio risk-return tradeoffs and implications for decision rules. Med. Care
**38**, 460–468 (2000)PubMedCrossRefGoogle Scholar - Obenchain, R.L.: Issues and algorithms in cost-effectiveness inference. Biophar. Rep.
**5**(2), 1–7, American Statistical Association, Washington (1997)Google Scholar - Obenchain, R.L.: ICEplane: a Microsoft Windows application for calculation and graphical display of bootstrap ICE confidence and tolerance regions. Copyright © Pharmaceutical Research and Manufacturers of America (PhRMA.) Version 2005.11. http://www.math.iupui.edu/~indyasa (1997–2005)
- Obenchain, R.L.: Resampling and multiplicity in cost-effectiveness inference. J. Biopharm. Stat.
**9**(4), 563–582 (1999)PubMedCrossRefGoogle Scholar - Obenchain, R.L.: The key role of “symmetry” in cost-effectiveness analyses. In: Proc. Biopharmaceutical Section, pp 15–17. Joint Statistical Meetings. American Statistical Association, Washington (2000)Google Scholar
- Obenchain, R.L.: Incremental cost-effectiveness (ICE) preference maps. In: Proc. Biopharmaceutical Section, 10 pp. Joint Statistical Meetings. American Statistical Association, Washington, CD-ROM only (2001)Google Scholar
- Obenchain, R.L., Robinson, R.L., Swindle, R.W.: Cost-effectiveness inferences from bootstrap quadrant confidence levels: three degrees of dominance. J. Biopharm. Stat.
**15**(3), 419–436 (2005)Google Scholar - Obenchain, R.L.: ICEinfer: an R-package of functions for computation and graphical display of ICE maps, bootstrap confidence regions and acceptability curves. Version 0.1-2. http://www.r-project.org, November 2007
- Schoenbaum, M., Unutzer, J., Sherbourne, C., Duan, N., Rubenstein, L.V., Miranda, J., Meredith L.S., Carney, M.F., Wells, K.: Cost-effectiveness of practice-initiated quality improvement for depression: results of a randomized controlled trial. JAMA
**286**(11), 1325–1330 (2001)PubMedCrossRefGoogle Scholar - Stinnett, A.A.: Is it really so bad to be unambiguously inefficient? The role of dominance in stochastic cost-effectiveness analysis [editorial.]. Med. Decis. Making
**19**, 102–103 (1999)PubMedCrossRefGoogle Scholar - Stinnett, A.A., Mullahy, J.: Net health benefits: a new framework for the analysis of uncertainty in cost-effectiveness analysis. Med. Decis. Making, Special Issue on Pharmacoeconomics
**18**, S68–S80 (1998)Google Scholar - The R Project for Statistical Computing. (Deepayan Sarkar’s Lattice Graphics package uses Paul Murrell’s Grid Graphics engine.) http://www.r-project.org
- Van Hout, B.A., Al, M.J., Gordon, G.S., Rutten, F.F.H.: Costs, effects and C/E ratios alongside a clinical trial. Health Econ.
**3**, 309–319 (1994)PubMedCrossRefGoogle Scholar - Willan, A.R., O’Brien, B.J., Leyva, R.A.: Cost-effectiveness analysis when the WTA is greater than the WTP. Stat. Med.
**20**(21), 3251–3259 (2001)PubMedCrossRefGoogle Scholar - Wolfram Research, Inc., Mathematica®, Version 4.1, Champaign, IL (2000)Google Scholar