Correlation neglect and casebased decisions
Abstract
In most theories of choice under uncertainty, decisionmakers are assumed to evaluate acts in terms of subjective values attributed to consequences and probabilities assigned to events. Casebased decision theory (CBDT), proposed by Gilboa and Schmeidler, is fundamentally different, and in the tradition of reinforcement learning models. It has no state space and no concept of probability. An agent evaluates each available act in terms of the consequences he has experienced through choosing that act in previous decision problems that he perceives to be similar to his current problem. Gilboa and Schmeidler present CBDT as a complement to expected utility theory (EUT), applicable only when the state space is unknown. Accordingly, most experimental tests of CBDT have used problems for which EUT makes no predictions. In contrast, we test the conjecture that casebased reasoning may also be used when relevant probabilities can be derived by Bayesian inference from observations of random processes, and that such reasoning may induce violations of EUT. Our experiment elicits participants’ valuations of a lottery after observing realisations of the lottery being valued and realisations of another lottery. Depending on the treatment, participants know that the payoffs from the two lotteries are independent, positively correlated, or negatively correlated. We find no evidence of correlation neglect indicative of casebased reasoning. However, in the negative correlation treatment, valuations cannot be explained by Bayesian reasoning, while stated qualitative judgements about chances of winning can.
Keywords
Casebased decision theory Bayesian learning Probability judgement Correlation ExperimentJEL Classifications
C91 D03 D811 Introduction
Most theories of choice under uncertainty that have been proposed by economists or decision theorists are closely related to expected utility theory, and often are generalisations of that theory. (For one survey, see Machina and Viscusi 2014, chapters 1214.) In these theories, uncertainty is represented by a set of states of the world, any one of which might obtain. Alternative acts available to an agent are represented as different assignments of consequences to states. Decisionmaking is conceptualised as a process of evaluating acts in terms of the subjective values that the agent attributes to their consequences and the probabilities or subjective weights that he or she assigns to the events in which those consequences occur.
However, casebased decision theory (CBDT), proposed by Gilboa and Schmeidler (1995, 2001), is based on a fundamentally different representation of decision problems. In broad terms, CBDT is in the tradition of psychological theories of reinforcement learning (e.g. Bush and Mosteller 1953). In CBDT, there is no state space and no concept of probability. The agent is not assumed to know anything about the outside world except what he has actually experienced as the results of previous decisionmaking. The agent uses neither forwardlooking hypothetical reasoning (“What will happen if I choose X?”) nor backwardlooking counterfactual reasoning (“What would have happened if I had chosen X?”). He simply evaluates each currently available act in terms of the consequences he has in fact experienced as a result of choosing that act (or in some variants of the theory, choosing similar acts) in previous decision problems that he perceives to be similar to the problem at hand.
Gilboa and Schmeidler (1995, pp. 606, 622) present CBDT and expected utility theory (EUT) as “complementary theories”. They argue that CBDT is normatively most defensible and descriptively most plausible when “states of the world are neither naturally given, nor can they be simply formulated”. Decisionmaking in such circumstances is decision under ignorance, as contrasted with decision under risk (i.e., with known probabilities) and decision under uncertainty (i.e., with known states of the world but unknown probabilities). For decision under ignorance, Gilboa and Schmeidler argue, “the very language of expected utility models is inappropriate”. If this complementarity claim were taken at face value, any idea of testing CBDT and EUT against one another would be out of place.
Up to now, most experimental tests of CBDT have been designed on the premise that CBDT and EUT are complementary theories. For example, Ossadnik et al. (2013) set up an experimental environment of “structural ignorance” (p. 212), and compare the explanatory power of CBDT with that of three alternative criteria for decision under ignorance – maximin, maximax, and the pessimismoptimism criterion of Arrow and Hurwicz (1972). Similarly, Grosskopf et al. (2015) start from the explicit premise that CBDT “is not proposed as an alternative to or a generalization of [EUT]” (p. 640), and use an experimental environment in which “EUT is not a reasonable alternative decisionmaking procedure” (p. 652). They test CBDT against the null hypothesis of random choice and against a very simple heuristic, related to the “Take the Best” algorithm of Gigerenzer and Goldstein (1996). Unlike Ossadnik et al. and Grosskopf et al., who test specific parameterised forms of CBDT, Bleichrodt et al. (2017) report experimental tests of predictions derived from a very general, nonparameterised version of CBDT. But they too presuppose that CBDT is intended to be applied to situations in which states cannot be specified (p. 2), and design their experiment accordingly.^{1}
In contrast, the starting point for our experimental research was the conjecture that CBDT might have predictive power in situations in which states are welldefined and objective prior probabilities are known, but expectedutility decisionmaking requires the construction of posterior probabilities by Bayesian inference from observations of random processes. In terms of a distinction introduced by Hertwig et al. (2004), these are situations in which decisions are made from experience (i.e., the properties of alternative options must be inferred from previous experience), rather than from description (i.e., those properties are known a priori). In such situations, the “correctness” of Bayesian reasoning about probabilities is uncontroversial. Nevertheless, such reasoning can be cognitively demanding and its implications can be counterintuitive. It is wellknown that human judgements about probability often contravene Bayesian principles in predictable ways, for example because of the use of availability and representativeness heuristics (Tversky and Kahneman 1973; 1983). Charness and Levin (2005) report evidence of deviations from Bayesian reasoning that are consistent with one of the simplest reinforcement learning rules, the “winstayloseshift” heuristic. Given that casebased reasoning requires much less cognitive sophistication and is welladapted to naturallyoccurring problems of decision under ignorance, the hypothesis that human beings are predisposed to use it is psychologically plausible and worthy of investigation.
Since our conjecture has not been endorsed by the proposers of CBDT, we cannot structure our enquiry as a test of that theory. Our methodological strategy is to test predictions of EUT in situations in which there are intuitive reasons, derived from the underlying principles of CBDT, for expecting those predictions to fail in specific ways. This general strategy, used in combination with disparate intuitions, has led to many important developments in decision theory. The Allais paradox, common ratio effect, Ellsberg paradox, and preference reversal phenomenon were all first discovered by researchers who recognised potential limitations of EUT but who, at the time of discovery, were not in a position to propose a comprehensive alternative theory. These robust violations of EUT achieved the status of “exhibits” which informed the subsequent development of alternative decision theories.^{2} These exhibits show nonrandom patterns in deviations of actual behaviour from the predictions of EUT. (For example, the Allais paradox involves comparisons between responses to two binary choice problems that EUT treats as equivalent. Individuals’ choices are systematically more riskaverse in one problem than in the other.) Such an exhibit provides evidence that some nonrandom mechanism, not encompassed by EUT, is at work, but is not to be interpreted as confirming any fullyspecified alternative theory. Our research was designed to have the potential to create exhibits of this kind which might inform the development and application of CBDT.
Our experiment tests two related intuitions about how casebased reasoning might lead to systematic deviations from the behaviour predicted by EUT. The first of these intuitions derives from a fundamental property of CBDT – act separability. In CBDT, experiences are encoded in memory as cases; each case consists of a problem, the act that was chosen in that problem, and the result of that choice (measured in utility units). Given a new problem, a decisionmaker assesses each available act by recalling the previous cases in which that act was chosen, and weighting the result of each of those choices by a measure of the similarity between the problem in which that choice was made and the new problem. In this algorithm, when any given act is assessed, the only items of memory that are used are those that record results that have actually been experienced as a result of the choice of that act. If memory is used in this way, information that could show positive or negative correlation between the results of different acts is never retrieved. Thus, one might expect casebased reasoners to differ from Bayesian reasoners by neglecting information about correlation.
The second intuition derives from the fact that probability judgements have no role in CBDT. Casebased reasoning does not lead to the formation of probability judgements that can be classified as “correct” or “incorrect” according to Bayesian principles. Instead, by moving directly from memory (encoded without reference to states or probabilities) to decisions, it circumvents the whole process of forming probability judgements. When an agent’s casebased reasoning leads to a violation of EUT, an outside observer may be able to conclude that the agent has behaved as if she were trying to maximise expected utility but had made erroneous probability judgements, but the agent herself may have no perception of making or endorsing the judgments that the observer attributes to her. This raises the possibility that agents might in fact endorse probability judgements that are systematically different from those that are revealed in their decisions when those decisions are analysed in the theoretical framework used by EUT. It is of course debatable whether such a difference can properly be called a violation of EUT. (Opinions differ about whether “probability” in EUT refers to an agent’s actual beliefs, or is merely part of a formal representation of her decisionmaking behaviour.) But a pattern of predictable differences between stated and revealed probabilities would be a surprising phenomenon calling for explanation.
These two lines of investigation have the potential to be mutually corroborating. Suppose that, in some experiment, participants’ decisions are found to be insensitive to variations in relevant information about correlation. CBDT would offer a possible explanation for that observation. However, another possibility might be that the participants were Bayesian reasoners who had misunderstood the information given to them, perhaps because of weaknesses in the experimental design. But suppose it were also found that participants’ stated probabilities showed Bayesian sensitivity to information about correlation. That would suggest that participants understood that information, used it when forming probability judgements, but failed to use in when making decisions. That would give additional credence to CBDT’s explanation of correlation neglect in decisions.
The remainder of the paper is organised as follows. Section 2 describes act separability under CBDT, and Section 3 discusses stated and revealed probabilities under EUT. Section 4 presents the experimental design; Section 5 discusses the hypotheses, and Section 6 presents the results. Section 7 concludes with a further discussion.
2 Act separability and correlation neglect
Now consider how this model deals with a simple case involving assets with returns that are potentially correlated. Suppose there are two lotteries L_{1} and L_{2}. The agent faces a sequence of decision problems; in each problem, he faces one of the two lotteries and chooses whether or not to play it. Each time he plays either of these lotteries, the result is either hit, with constant utility u_{H} > 0 or miss, with constant utility u_{M} < 0. Not betting leads to a utility of zero. If he bets, he immediately learns whether the outcome was hit or miss. Consider an agent who has faced at least 2n such prior problems (where n ≥ 1), and has chosen to play each lottery exactly n times.^{3} He has experienced h_{1} hits from L_{1} and h_{2} hits from L_{2}. As a final problem, he faces some lottery L_{i}, i ∈{1,2}, and has to choose whether or not to play it.
For any distinct i and j, ∂U(a_{i})/∂h_{i} = σ(u_{H} − u_{M}) > 0 and \(\sigma (u_{H}u_{M})\geq \partial U(a_{i})/\partial h_{j}=\sigma ^{\prime }(u_{H}u_{M})\geq 0\). Thus, the final choice is potentially determined by the numbers of hits that have been experienced on both lotteries. Hits on the lottery that is faced in the final problem have a strictly positive weight that is strictly greater than the weight for hits on the other lottery; the latter weight may be zero but cannot be negative.
Notice that, in deriving the conditions under which the agent chooses to bet in the final problem, we have made no assumptions about correlation between the two lotteries. In Model 1, the casebased reasoner’s final choice depends only on his experience of the lottery he is actually facing. This is compatible with Bayesian reasoning only in the case of a Bayesian agent with a prior belief that the two lotteries are independent. In Model 2, the casebased reasoner’s final choice may also be influenced by his experience of the other lottery, but only in the direction that corresponds with a Bayesian belief in positive correlation, and only to the extent that the two lotteries are perceived as subjectively similar. To assume that similarity judgements systematically incorporate prior beliefs about correlation would be inconsistent with one of the fundamental principles of CBDT – that decisionmakers use only knowledge that is derived from direct experience. This thought points to a class of situations in which the underlying intuitions of CBDT suggest that there might be systematic violations of EUT. These are situations in which there is a mismatch between salient similarity cues and information about correlation that is not embedded in experiences of decisionmaking. Our experimental design attempts to create such situations.
It is already known that people have a tendency to neglect asset correlation, but the possible connection between this tendency and CBDT has not been explored. For example, in an asset allocation experiment, Kallir and Sonsino (2009) found that participants focused their attention on individual asset returns and that the resulting portfolio decisions did not take into account return correlations. Eyster and Weizsäcker (2011) found that even when equipped with correlation information, participants regarded assets independently and resorted to the 1/n heuristic (or naïve diversification) when allocating investment funds to individual securities. Similarly, in a hypothetical investment choice experiment, Hedesström et al. (2006) observed that participants focused on individual asset volatility rather than on portfolio volatility. Resulting portfolios were inappropriately diversified and had higher volatility. Correlation neglect has been found to be sensitive to the magnitude of the stakes involved in the decisions. In portfolio experiments with low stakes, Kroll et al. (1988) found that while participants were aware of the correlation in stock returns, correlation information was not reflected in their portfolio choices. However, when the stakes were significantly increased, participants managed to effectively diversify their asset holdings and the resulting portfolio choices were closer to the predictions of meanvariance optimisation. In contrast to the experiments we have just described, our experiment uses a design that allows us to investigate attitudes to similarity while controlling and manipulating similarity cues.
3 Stated probabilities and revealed probabilities
EUT legitimates a simple certainty equivalence procedure for eliciting an agent’s subjective ranking of the probabilities of two events. Fix any two (nonnull) events E_{1} and E_{2}. Consider consequences that are measured in money units, and assume that larger consequences are always preferred to smaller ones. Choose any two consequences x and y such that x > y. Let xE_{i}y (i ∈{1,2}) denote the act that gives x if E_{i} obtains and y otherwise. Given a suitable continuity assumption, EUT implies that there exist z_{1}, z_{2} ∈ (x, y) such that the agent has the preferences \(z_{1}\sim xE_{1}y\) and \(z_{2}\sim xE_{2}y\). (We use \(\sim \) to denote indifference.) Then the subjective probability of E_{1} is greater than (equal to, less than) that of E_{2} if and only if z_{1} is greater than (equal to, less than) z_{2}. Variants of this procedure are widely used in experimental economics to elicit probability judgements.
This procedure is valid for almost all recognised forms of nonexpected utility theory.^{5} Apart from continuity, to legitimate this procedure a theory must satisfy the following monotonicity property. Fix any event E_{i} and any consequences x > z > y such that \(z\sim xE_{i}y\). Then for any event E_{j}, xE_{j}y is strictly preferred to (indifferent to, strictly less preferred than) z if and only if E_{j} is more probable than (equally probable as, less probable than) E_{i}. Many theories of choice under uncertainty have this property, even those admitting violations of either the independence or transitivity axioms of EUT.
However, CBDT does not legitimate the certainty equivalence procedure. Since CBDT explains individuals’ decisions without making any reference to events or probabilities, there is no way of using that theory to read off the probabilities of events from observations of decisions. Indeed, the logic of CBDT suggests that the idea of trying to infer attitudes to events from decisions is misguided. Notice that, if EUT holds, we can use any pair of nonindifferent consequences to elicit an agent’s probability ranking of E_{1} and E_{2}, and the resulting ranking will be the same. In this sense, the certainty equivalence procedure elicits attitudes to events that are independent of the decision problems in which those attitudes are elicited. But this need not be true for a CBDT agent. For such an agent, indifference between two acts in a particular decision problem is determined by the agent’s memory of cases that were similar to that problem; if the problem is changed, the relevantly similar cases can change too.
Once one recognises that there are theories of choice under uncertainty that make no use of probabilities, it becomes a significant research question to ask whether individuals’ stated probability judgements are the same as the revealed probabilities that are elicited by the certainty equivalence procedure. Viewed within the conceptual framework of EUT, systematic inconsistencies between stated and revealed probabilities would be surprising, and would raise doubts about the construct validity of the concept of probability used in EUT. There would be particular reason for this kind of doubt if, in a setting in which correct probabilities could be formed by Bayesian reasoning, stated probabilities had Bayesian properties but revealed probabilities did not.
Our experiment is designed to detect instances of correlation neglect that might be induced by casebased reasoning. Such behaviour, were it to occur, would be picked up as nonBayesian properties of revealed probabilities. As an additional diagnostic tool, we investigate the extent to which revealed probabilities are consistent with stated probabilities.
In experimental economics, it is standard practice to incentivise survey questions by reformulating them as decision problems with material (usually monetary) consequences. In the present case, however, the whole point of the enquiry is to discover whether stated probabilities differ from those that are revealed in decision problems. We believe that this is one of the class of “significant problems in economics that appear to be capable of experimental investigation only in nonincentivised designs” identified by Bardsley et al. (2010, pp. 336–337). The logic of our investigation requires that the elicitation of stated probabilities is not incentivised.
4 Experimental design
Our experiment used a setup similar to that analysed in Section 2. In designing the experimental interface,^{6} we tried to set a level playing field for investigating the prevalence of Bayesian and casebased reasoning. We avoided all explicit references to probability. Information about probability was always conveyed by describing physical randomising devices, but these descriptions were designed to make the translation between physical properties and objective probabilities as simple as possible. Judgements about the “chances” of a winning outcome were elicited on a qualitative scale. Thus, participants were not primed to think about probability, but no obstacles were placed in the way of participants who were predisposed to think in this way. The two lotteries seen by any participant were displayed and described in exactly the same way, except for two differentiating features – colour (blue or yellow) and position (left or right on the participant’s computer screen). Our background assumption was that, for participants who used Bayesian reasoning, it would be obvious that colour and position provided no information about probabilities or payoffs. However, by making these irrelevant features visually salient, we made it more likely that participants who used casebased reasoning would treat the two lotteries as distinct acts when encoding results in memory.
Each participant was informed about two lotteries, described as the blue game board and the yellow game board. Within an experimental session, the boards were the same for all participants. At appropriate times, these boards were displayed on participants’ screens, vibrantly coloured in blue or yellow. The blue board always appeared on the left side of the participant’s screen and the yellow board always appeared on the right. Each game board had 100 numbered boxes, corresponding with different numbered balls that might be drawn from a bingo cage (the same cage for both boards). Each box on each board had a predetermined value of either GBP 20 (a winning box) or zero (a losing box), but this value was not visible to participants until the box was “opened.” At the start of the session, all boxes were closed.
The experiment had two parts. In Part 1, each participant played ten sample rounds, five using the blue board and five using the yellow board, in random order. These were described as “samples that will give you the opportunity to learn as much as you can about the game boards.” In each sample round, the relevant game board was displayed on participants’ screens. One ball was drawn from the bingo cage, without replacement. The corresponding box on the board was opened to show its value, with a green background if it was a winning box and a red background if it was a losing box. It remained open only until the end of the round. In this way, each participant learned the values of five of the 100 boxes on each board, selected at random subject to the constraint that all ten opened boxes would have different numbers. Because no more than one box was open at any time, participants could access the information revealed in the sample rounds only by attending to each round as it occurred and by memorising its outcome. This design feature ensured that participants accumulated memory through experience over time, as is usually assumed in interpretations of CBDT. Although the sample rounds were not decision problems in the strict sense of CBDT, the framing was designed to encourage participants to think of each round as a demonstration of what they might in fact experience, were they to choose to play the relevant game board. At the end of Part 1, all balls were returned to the bingo cage.^{7}
In Part 2, each participant faced a valuation task relating to one of the two game boards, selected at random, independently for each participant. She was told that she had the opportunity to play this board – that is, to receive the value of one box on that board, determined by one draw from the bingo cage. The mechanism of Becker et al. (1964) was used to elicit the minimum amount of money that each participant was willing to accept in return for giving up this opportunity. Each participant considered thirtyfive possible offer prices, ranging from GBP 0.20 to GBP 20, and reported whether she was willing to accept each price. In effect, each participant faced thirtyfive binary choice problems, each involving a choice between playing the game board and receiving some amount of money with certainty. No feedback on the outcome of any of these choices was provided until the end of the experiment, when one of the offer prices (selected at random) was revealed as the actual offer price, and participants’ decisions conditional on that price were implemented. Irrespective of whether she had chosen to keep or sell the opportunity to play, each participant saw an independent draw (with replacement) from the bingo cage. This determined the number of one box on her board, which was then opened. If she had chosen to keep the opportunity, she was paid the value of this box; if not, she was paid the offer price. All participants received an additional participation fee of GBP 2.
Notice that the valuation task is an instance of the certainty equivalence procedure, as described in Section 3. Thus, if participants behaved according to EUT (or, indeed, according to any of a wide class of nonexpected utility theories), the reported valuation of any given participant would be an increasing function of the subjective probability she assigned to winning the final lottery.
At the start of each sample round in Part 1, and also immediately before the elicitation task in Part 2, each participant reported her judgement about “the chance that this game board [i.e., the board relevant for that round or task] will reveal a winning box in this round” on a tenpoint Likert scale with endpoints labelled “very low” and “very high”. These judgement tasks were not incentivised.
We used a betweensubjects design with three treatments, implementing different properties of correlation between the lotteries. Each session was preassigned to one of the three treatments. The differences between treatments can be described in terms of the proportions π_{B} and π_{Y} of winning boxes on the final game board on the blue and yellow boards respectively. In each treatment, the values of π_{B} and π_{Y} were determined by a random draw from a joint distribution of (π_{B}, π_{Y}). Participants were fully informed about the prior distribution, but were not informed about the actual draw. Thus, they were given sufficient information to construct objective prior probabilities of winning on each board, which could then be updated by Bayesian inference in light of the outcomes of the sample rounds. The method by which this information was communicated to participants is explained in Appendix A.
In all three treatments, and in all realisations of the random mechanism, each board was assigned either ten or thirty winning boxes. We refer to a game board with thirty winning boxes as being a high type board (type H), and a game board with ten winning boxes as a low type board (type L). Ex ante, a given board was equally likely to be of the high or low type, and therefore the ex ante chance of a given box on the board being a winning box was 0.2. Thus, colours and box numbers had no information content in themselves.

In the independent treatment, the types of the boards were drawn independently.

In the positive correlation treatment, either both boards were type H, or both boards were type L. Each possibility was equally likely.

In the negative correlation treatment, one board was always of type H and the other of type L. It was equally likely that the blue board was type H and the yellow type L, or vice versa.
For a fully rational Bayesian reasoner, all relevant information about correlation is contained in the initial joint distribution of board types. However, the assignment of winning boxes to boards had certain additional features, designed to help participants to understand the correlation properties of each treatment.
In the positive correlation treatment, the winning numbers were the same for both boards. For example, consider a participant who observes a win on box 36 of the blue board in a sample round of the positive treatment. By abstract Bayesian reasoning from the knowledge that there is perfect positive correlation between π_{B} and π_{Y}, she can deduce that the observed win is just as informative about the probability of winning on the yellow board as it is about the probability of winning on the blue board. But our design allows her to make a more direct and more concrete inference, from the knowledge that box 36 on the blue board is a winning box to the conclusion that box 36 on the yellow board is a winning box. In the negative correlation treatment, the winning numbers were different for the two boards. Thus, from the knowledge that box 36 on the blue board was a winning box, a participant could infer that box 36 on the yellow board was a losing box. In the independent treatment, the assignments of winning numbers to the two boards were independent of one another.
As a further aid to understanding, each sample round ended with a screen, described as “summaris[ing] what you learned about the game boards in that round.” On this screen, participants saw the game board they had just played, with the box just opened coloured green or red and showing “GBP 20” or “GBP 0.” At the bottom of the board there was a message reinforcing this information. On the other side of the screen there was a message about the corresponding box on the other game board. For example, consider a round involving the blue game board. Suppose the announced box number was 4 and its value was GBP 20. Then (irrespective of the treatment) the message on the blue game board would be “4 is a winning box on the blue game board.” In the positive correlation treatment, the message on the other side of the screen would be “4 is a winning box on the yellow game board.” In the negative correlation treatment, it would be “4 is a losing box on the yellow game board.” In the independent treatment, it would be “4 may be a winning box or may be a losing box on the yellow game board.” Throughout Part 1, the computer screen also displayed a header that constantly reminded participants of the correlation between the two game boards.
Full instructions for the experiment, including examples of screenshots, are given in an Online Appendix.
5 Hypotheses
Let i index the game board offered to the participant in Part 2, and j denote the other game board. Therefore, i, j ∈{blue,yellow}, with i≠j. Let h_{i} denote the number of winning boxes observed among draws from board i in Part 1, and h_{j} the number of winning boxes observed among draws from board j in Part 1. We refer to the pair (h_{i}, h_{j}) observed by a participant as their memory. The values of h_{i} and h_{j} are sufficient to compute the Bayesian posterior probability that board i is type H, which we denote ρ_{i}(h_{i}, h_{j}). Table 1 presents these posteriors for each treatment, for the memories realised in our data.^{8}
 In the independent treatment, the posterior is determined entirely by the number of winning boxes h_{i} observed on board i. We therefore define \({\succeq ^{E}_{I}}\) by$$ (h_{i},h_{j}){\succeq^{E}_{I}}(h_{i}^{\prime},h_{j}^{\prime}) \Longleftrightarrow h_{i} \geq h_{i}^{\prime}. $$
 In the positive correlation treatment, the posterior is determined entirely by the total number of winning boxes h_{i} + h_{j} observed, irrespective of board. We therefore define \({\succeq ^{E}_{P}}\) by$$ (h_{i},h_{j}){\succeq^{E}_{P}}(h_{i}^{\prime},h_{j}^{\prime}) \Longleftrightarrow h_{i}+h_{j} \geq h_{i}^{\prime}+h_{j}^{\prime}. $$
 In the negative correlation treatment, the posterior is increasing in the number of winning boxes observed on board i, and decreasing in the number of winning boxes observed on board j. The difference of the number of winning boxes between the boards, h_{i} − h_{j}, is not alone sufficient to determine the posterior. However, among memories for which h_{i} − h_{j} = m for some m, the posterior varies by only a small amount, compared to the variability of the posterior between two memories for which \(h_{i}h_{j}\not = h_{i}^{\prime }h_{j}^{\prime }\). We therefore define the ordering \({\succeq ^{E}_{N}}\) based on the difference h_{i} − h_{j},$$ (h_{i},h_{j}){\succeq^{E}_{N}}(h_{i}^{\prime},h_{j}^{\prime}) \Longleftrightarrow h_{i}h_{j} \geq h_{i}^{\prime}h_{j}^{\prime}. $$
Bayesian posterior probabilities ρ_{i}(h_{i}, h_{j}) as a function of the memory (h_{i}, h_{j}), for each treatment
The ranking of a memory is therefore determined by a treatmentspecific summary statistic, which we write M^{I}(h_{i}, h_{j}) = h_{i} for the independent treatment; M^{P}(h_{i}, h_{j}) = h_{i} + h_{j} for the positive correlation treatment; and M^{N}(h_{i}, h_{j}) = h_{i} − h_{j} for the negative correlation treatment. The summary statistic M^{t} (for t ∈{I, P, N}) therefore represents the ordering \({\succeq ^{E}_{t}}\).
The experimental design assigns each participant at random to a memory (h_{i}, h_{j}) determined by the randomlyselected sample box numbers and the game board offered. Each participant k reports a valuation, which we denote v_{k}(h_{i}, h_{j}). Under EUT, and indeed any decision theory in which valuations are monotonic in the Bayesian posterior, valuations will increase in the summary statistic. The following hypothesis is therefore an implication of EUT.
Hypothesis 1
For each treatmentt, the reported valuations are increasing inM^{t}, the summary statistic of the memory.
Our interest is in whether actual behaviour deviates from EUT in directions that would be indicative of casebased reasoning. Our design is based on the working assumption that, in all three treatments, participants perceive plays on boards of the same colour as more similar to one another than to plays on boards whose colours are different. Given this assumption, the underlying principles of CBDT suggest the following casebased weighting conjecture: when participants report valuations of the offered board i, they give greater weight to winning boxes observed on that board than to winning boxes observed on board j. (Wins on board j might be given zero weight, as in Model 1 of Section 2, or positive but lower weight than wins on board j, as in Model 2.) Following the methodological strategy of looking for exhibits (see Section 1), we focus on cases in which this conjecture implies unambiguous biases in behaviour relative to EUT predictions. Such biases are implied only in the positive and negative correlation treatments.
 In the positive correlation treatment, fix some \(m\in \mathbb {Z}_{+}\) and consider the set of memories {(h_{i}, h_{j}) : h_{i} + h_{j} = m}. This set is an indifference class of the ordering \({\succeq ^{E}_{P}}\). The casebased weighting conjecture implies a strict ranking \({\succ ^{C}_{P}}\) of the members of this set,$$ (h_{i},h_{j}){\succ^{C}_{P}}(h_{i}^{\prime},h_{j}^{\prime}) \Longleftrightarrow h_{i}+h_{j}=h_{i}^{\prime}+h_{j}^{\prime}=m \mathrm{\ and\ } h_{i}>h_{i}^{\prime}. $$(1)
 In the negative correlation treatment, fix some \(m\in \mathbb {Z}\) and consider the set of memories {(h_{i}, h_{j}) : h_{i} − h_{j} = m}. This set is an indifference class of the ordering \({\succeq ^{E}_{N}}\). The casebased weighting conjecture implies a strict ranking \({\succ ^{C}_{N}}\) of the members of this set,$$ (h_{i},h_{j}){\succ^{C}_{N}}(h_{i}^{\prime},h_{j}^{\prime}) \Longleftrightarrow h_{i}h_{j}=h_{i}^{\prime}h_{j}^{\prime}=m \mathrm{\ and\ } h_{i}>h_{i}^{\prime}. $$(2)
The following hypothesis is an implication of the casebased weighting conjecture:
Hypothesis 2
For a given treatment t ∈{P, N}, consider any set of memories {(h_{i}, h_{j}) : M^{t}(h_{i}, h_{j}) = m} for somem. Within this set, reported valuations are increasing inh_{i}, the number of winning boxes observed on the offered board.
Recall that each participant reported judgements about the “chance” of seeing a winning number prior to each box being opened, both in Part 1 and Part 2. To allow meaningful comparisons across participants, we normalise each participant’s use of the Likert scale relative to that participant’s judgement of the chance of winning on the first Part 1 game board. At the stage of the experiment at which this first judgement was reported, all participants had the same information, and that information implied that the objective probability of seeing a winning box was 0.2. For each participant, we define the variable expectation difference, which takes on the value + 1 when the participant reports a higher chance in Part 2 than at the start of Part 1; − 1 when the participant reports a lower chance in Part 2 than at the start of Part 1; and 0 when the chances reported in Part 2 and at the start of Part 1 are the same. Expectation difference can be interpreted as a selfreported judgement about whether the probability of seeing a winning box on the relevant board i, given the participant’s memory (h_{i}, h_{j}), is greater than, less than, or equal to 0.2. Thus, if stated probabilities are consistent with Bayesian reasoning, the following hypothesis will hold:
Hypothesis 3
For each treatment t, the reported expectation differences are increasing in M^{t}, the summary statistic of the memory.
Because CBDT makes no reference to probability, it does not support any particular conjecture about stated probabilities.
6 Results
We conducted a total of thirty sessions in March 2014; there were ten sessions for each treatment. Each session had six to eight participants and lasted 45 minutes. Average earnings were GBP 8.40, and ranged from GBP 2.00 to GBP 22.00. All 226 participants (119 male, 107 female) were recruited from the standing participant pool recruited by the Centre for Behavioural and Experimental Social Science at the University of East Anglia, managed via ORSEE (Greiner 2015).
The experimental design assigned participants randomly to experimental sessions, and therefore to treatments. Furthermore, because the sample results from the game boards were themselves determined at random, the design randomly assigned participants to realised memories (h_{i}, h_{j}). Therefore, the data of any pair of memories \((h_{i},h_{j})\not = (h_{i}^{\prime },h_{j}^{\prime })\) are independent samples.
Our hypotheses make predictions about how valuations and expectation differences change as a function of the observed memory. We base our statistical analysis on nonparametric tests using rank orders. These tests are wellsuited to our hypotheses in that our hypotheses only concern trends in valuations as function of h_{i} and h_{j}. In addition, because our design uses a discrete and predetermined set of questions to elicit valuations, we obtain only a bracketing interval around each participant’s valuation. These intervals are nonoverlapping and are therefore ranked from high to low, which is sufficient for the rank order approach.
Our choice architecture did not require participants to give a monotonic response to the questions implementing the Becker et al. procedure. Participants could indicate simultaneously that they accepted some price while rejecting a strictly higher price. Of the 226 participants, 210 provided monotonic responses to the valuation questions. In addition, 4 participants gave responses which were monotonic with the exception of one isolated price, such that, if the response to that price were inverted, the resulting schedule is monotonic. We neglect the isolated nonmonotonic response for those 4 participants and include them in our analysis. We drop the remaining 12 participants. For each of the 214 participants in our sample so defined, we define their valuation as the lowest accepted price. ^{9}
Summary data for valuations by treatment
Effect sizes for comparisons of valuations, pairwise by outcome of sample draws
Because memories are sorted in order of the posterior probability that board i is a hightype board, it would be expected under EUT to see effect sizes less than onehalf towards the topright of the matrix. This pattern is observed broadly in the independent and positive correlation treatments. However, no clear pattern emerges in the negative correlation treatment.
The MWW test is suitable for comparing the distributions of valuations in two groups. In some circumstances, our hypotheses require comparing across three or more groups. For this purpose we adopt the test for trend of Cuzick (1985). This test extends the rankorder calculation of the MWW test to three or more groups; for two groups Cuzick’s test coincides with MWW exactly. This test requires ordering the groups being compared. The null hypothesis is there is no trend (increasing or decreasing) in the data across the groups, against the alternative hypothesis of a trend.
Result 1
In the independent and positive correlation treatments, valuations are increasing as a function of the summary statisticM^{t}(h_{i}, h_{j}) of the memory. In the negative correlation treatment, we cannot reject the null hypothesis of no trend.
Proof Support.
Test for increasing valuations as a function of the summary statistic
Comparison  Independent  Positive  Negative 

M = − 3 vs M = − 2  .38  
M = − 2 vs M = − 1  .62*  
M = − 1 vs M = 0  .42  
M = 0 vs M = 1  .47  .40  .50 
M = 1 vs M = 2  .33**  .30**  .39 
M = 2 vs M = 3  .46  .66  
M = 3 vs M = 4  .44  
Cuzick p  .049  < .001  .58 
Result 2
There is no significant evidence of the systematic deviations from EUT implied by the casebased weighting conjecture.
Proof Support.
Test for trends predicted by the similarity conditional on the summary statistic of a memory
Pairwise effect size  

Conditional on  h_{i} = 0 vs. h_{i} = 1  h_{i} = 1 vs. h_{i} = 2  Cuzick p 
(a) Positive treatment  
h_{i} + h_{j} = 1  .46  .75  
h_{i} + h_{j} = 2  .59  .20  .20 
h_{i} + h_{j} = 3  .66  .33  
Combined  .33  
(b) Negative treatment  
h_{i} − h_{j} = − 1  .54  .85  
h_{i} − h_{j} = 0  .42  .61  
h_{i} − h_{j} = 1  .71  .24  
Combined  .41 
Expectation difference, as a function of posterior
ρ_{i}(h_{i}, h_{j})  Decrease  No change  Increase  Total 

(a) Independent  
.216  15  7  2  24 
.518  23  9  2  34 
.818  8  4  7  19 
Total  46  20  11  77 
(b) Positive correlation  
.065  6  1  1  8 
.216  17  8  4  29 
.541  9  5  1  15 
.844  7  4  4  15 
.965  2  4  2  8 
Total  41  22  12  75 
(c) Negative correlation  
.013  2  1  0  3 
.056  7  1  3  11 
.190  2  0  1  3 
.201  5  4  2  11 
.500  8  5  2  15 
.799  7  1  4  12 
.810  0  2  2  4 
.944  3  3  5  11 
.987  1  1  2  4 
Total  35  18  21  74 
Effect sizes for comparisons of judgements, pairwise by outcome of sample draws
Result 3
In all treatments, judgements are more optimistic, as measured by expectation differences, when the Bayesian posterior probability is higher.
Proof Support
Test for increasing judgements as a function of the summary statistic
Comparison  Independent  Positive  Negative 

M = − 3 vs M = − 2  .44  
M = − 2 vs M = − 1  .46  
M = − 1 vs M = 0  .53  
M = 0 vs M = 1  .53  .43  .40 
M = 1 vs M = 2  .33**  .52  .43 
M = 2 vs M = 3  .40  .48  
M = 3 vs M = 4  .43  
Cuzick p  .084  .048  .030 
Our data therefore show a contrast between the patterns in participants’ valuations and the patterns in their expectation differences in the negative treatment. Table 2 suggests stated valuations remain high in the negative treatment even when the Bayesian posterior gives a low chance the offered board is of the high type, and Result 1 formalises this statement. However, Result 3 shows expectation differences are more optimistic in situations in which the Bayesian posterior advises they should indeed be more optimistic.
7 Discussion and conclusions
Our experiment was motivated by the conjecture that human decisionmakers have a tendency to use casebased reasoning even when events are welldefined and when the objective probabilities of those events can be found by Bayesian reasoning from prior information. More specifically, we tested the “casebased weighting conjecture” that systematic violations of EUT occur when information about correlation is not embedded in experienced decision outcomes and when Bayesian reasoning involves inferences between lotteries that are saliently dissimilar. We designed the experiment in the belief that this conjecture was consistent with the psychological intuitions of CBDT. Had the evidence confirmed that conjecture, we would have interpreted our results as providing support for CBDT. In fact, we found no significant evidence of the effects implied by the conjecture. If we are right to claim that our conjecture reflects the assumed psychology of casebased reasoning, we must conclude that our results provide weak evidence against CBDT.
However, we should point out that an experiment with some similarities to ours has found evidence of a type of correlation neglect which, although not the type that we tested for, is also consistent with CBDT. Charness and Levin (2005) report an individual choice experiment in which there were two urns (“left” and “right”) from which balls were drawn with replacement. Balls were either “white” (losing balls) or “black” (winning balls). A prior random event, not revealed to subjects, determined the distribution of balls in the two urns in such a way that a winning draw from one urn increased the posterior probability of a winning draw from the other. (Compare our positive correlation treatment.) In the context of CBDT, the most interesting of Charness and Levin’s treatments were those in which subjects were required to play one lottery on a specified urn and then, after the outcome of that lottery was revealed, played a second lottery on whichever urn they chose. The parameters of the experiment were fixed so that, if the first lottery was on the left urn, the Bayesrational response was to stay with the left urn after a loss, but to shift to the right urn after a win. In fact, subjects’ responses showed a strong tendency towards the opposite pattern, as implied by the winstayloseshift heuristic (Robbins 1952). This heuristic is consistent with CBDT if the utility of a win is higher than the subject’s aspiration level. ^{10} A significant difference between our experiment and Charness and Levin’s is that in ours, subjects did not choose between “staying” and “shifting”; they saw a fixed number of trials of each lottery and then reported a valuation for one of the lotteries, which had been selected at random. Thus, our experiment tests for a possible effect of CBDT in a situation in which the winstayloseshift heuristic is not applicable.
Although our experiment finds no evidence of the specific form of correlation neglect implied by our conjecture, it is clear our participants found it difficult to perform Bayesian reasoning about decisions that involved negative correlation. More precisely, when reporting valuations of an offered lottery i, participants were generally able to recognise the irrelevance of outcomes of lotteries that were independent of i, and to recognise the relevance of outcomes of lotteries that were positively correlated with i; but they failed to recognise the relevance of negative correlation. Surprisingly, however, their stated probability judgements about lottery i (expressed in qualitative statements about the “chance” of winning) showed Bayesian responses to independence, positive correlation, and negative correlation. Our posthoc conjecture is that the problem was one of cognitive overload. Intuitively, negative correlation is a more difficult concept than independence or positive correlation, and working out one’s willingness to exchange a lottery for money is more difficult than merely judging the chance of winning if one plays it. The combination of these two sources of difficulty may have been too challenging for our participants.
This surprising finding raises doubts about the almost universal practice in experimental economics of treating incentivised decision problems as the gold standard for the elicitation of participants’ beliefs. If linking the elicitation of beliefs with problems of decision under uncertainty can lead to cognitive overload, experimentalists need to consider the possibility that direct, nonincentivised questions about beliefs might produce more accurate data.
Footnotes
 1.
Guilfoos and Pape (2016) report an experimental test of CBDT which uses a very different methodology. They compare actual behaviour in a repeated Prisoner’s Dilemma experiment with simulated data created by a “software agent” programmed to use casebased reasoning.
 2.
This methodological strategy, and its role in the development of experimental economics, is discussed by Bardsley et al. (2010, Chapter 4).
 3.
By assuming that the agent’s memory contains the same number of cases of choosing each lottery, we avoid the need to distinguish between the core version of CBDT and the “average performance” variant in which U(a) is a similarityweighted average (rather than sum) of experienced utilities (Gilboa and Schmeidler 1995, pp. 619–620).
 4.
Model 2 represents this difference in terms of similarity between problems involvingL_{1} and problems involvingL_{2}. Because these sets of problems are disjoint, this is equivalent to treating “bet on L_{1}” and “bet on L_{2}” as distinct acts and then, as in the Gilboa and Schmeidler (1995, pp. 635–638) “alternative model”, defining similarity as a relationship between (problem, chosen act) pairs.
 5.
To say that the procedure is “valid” for a specific theory is to say that the theory includes some concept of probability ranking and that, if an agent behaves according to the theory, the procedure will correctly elicit the probability rankings that the theory recognises.
 6.
The experiment was programmed in zTree (Fischbacher 2007).
 7.
When offering “motivating examples” for CBDT, Gilboa and Schmeidler (2001, pp. 2934) allow an agent’s memory to include cases that were not decision problems for her. In one such example, the agent is an employer choosing between candidates for a job; experiences by previous employers, as reported in reference letters, are treated as cases.
 8.
See Appendix A for notes on calculating these.
 9.
More precisely, the revealed valuation falls within the interval between the highest rejected price and lowest accepted price. We identify each interval by its upper endpoint for conciseness.
 10.
Gilboa and Schmeidler (1995, p. 610) discuss a similar example.
 11.
We thank Peter Moffatt for suggesting this.
Notes
Acknowledgments
The authors thank an anonymous referee for constructive comments; Maria Bigoni, Melanie Parravano, Axel Sonntag and Jiwei Zheng for zTree assistance; and Ailko van der Veen, Cameron Belton, James Rossington, Mengjie Wang, and Lian Xue for helping conduct the experiment sessions. Sugden and Turocy acknowledge the support of the Network for Integrated Behavioural Science (Economic and Social Research Council Grant ES/K002201/1). All errors are the responsibility of the authors.
Supplementary material
References
 Arrow, K., & Hurwicz, L. (1972). An optimality criterion for decision making under ignorance. In Uncertainty and expectations in economics: Essays in honour of G. L. S. Shackle (pp. 1–11). Oxford: Basil Blackwell.Google Scholar
 Bardsley, N., Cubitt, R., Loomes, G., Moffatt, P., Starmer, C., Sugden, R. (2010). Experimental Economics: Rethinking the Rules. Princeton University Press.Google Scholar
 Becker, G., DeGroot, M., Marschak, J. (1964). Measuring utility by a singleresponse sequential method. Behavioral Science, 9(3), 226–232.CrossRefGoogle Scholar
 Bleichrodt, H., Filko, M., Kothiyal, A., Wakker, P.P. (2017). Making casebased decision theory directly observable: Dataset. American Economic Journal: Microeconomics, 9, 123–151.Google Scholar
 Bush, R.R., & Mosteller, F. (1953). A stochastic model with applications to learning. The Annals of Mathematical Statistics, 24, 559–585.CrossRefGoogle Scholar
 Charness, G., & Levin, D. (2005). When optimal choices feel wrong: A laboratory study of Bayesian updating, complexity, and affect. American Economic Review, 95, 1300–1309.CrossRefGoogle Scholar
 Cuzick, J. (1985). A Wilcoxontype test for trend. Statistics in Medicine, 4, 87–90.CrossRefGoogle Scholar
 Eyster, E., & Weizsäcker, G. (2011). Correlation neglect in financial decisionmaking. DIW Berlin Discussion Paper No. 1104. Available at https://doi.org/10.2139/ssrn.1735339.
 Fischbacher, U. (2007). ztree: Zurich toolbox for readymade economic experiments. Experimental Economics, 10, 171–178.CrossRefGoogle Scholar
 Gigerenzer, G., & Goldstein, D. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 193, 650–669.CrossRefGoogle Scholar
 Gilboa, I., & Schmeidler, D. (1995). Casebased decision theory. The Quarterly Journal of Economics, 110(3), 605–639.CrossRefGoogle Scholar
 Gilboa, I., & Schmeidler, D. (2001). A theory of casebased decisions. Cambridge University Press.Google Scholar
 Greiner, B. (2015). Subject pool recruitment procedures: Organizating experiments with ORSEE. Journal of the Economic Science Association, 1, 114–125.CrossRefGoogle Scholar
 Grosskopf, B., Sarin, R., Watson, E. (2015). An experiment on casebased decision making. Theory and Decision, 79, 639–666.CrossRefGoogle Scholar
 Guilfoos, T., & Pape, A.D. (2016). Predicting human cooperation in the Prisoner’s Dilemma using casebased decision theory. Theory and Decision, 80, 1–32.CrossRefGoogle Scholar
 Hedesström, T.M., Svedsäter, H., Gärling, T. (2006). Covariation neglect among novice investors. Journal of Experimental Psychology: Applied, 12, 155–165.Google Scholar
 Hertwig, R., Barron, G., Weber, E., Erev, I. (2004). Decisions from experience and the effect of rare events in risky choice. Psychological Science, 15(8), 534–539.CrossRefGoogle Scholar
 Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291.CrossRefGoogle Scholar
 Kallir, I., & Sonsino, D. (2009). The neglect of correlation in allocation decisions. Southern Economic Journal, 75, 1045–1066.Google Scholar
 Kroll, Y., Levy, H., Rapoport, A. (1988). Experimental tests of the separation theorem and capital asset pricing model. American Economic Review, 78, 500–519.Google Scholar
 Machina, M.J., & Viscusi, W.K. (2014). Handbook of the economics of risk and uncertainty. Amsterdam: Elsevier.Google Scholar
 Ossadnik, W., Wilmsmann, D., Niemann, B. (2013). Experimental evidence on casebased decision theory. Theory and Decision, 75, 211–232.CrossRefGoogle Scholar
 Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58, 527–535.CrossRefGoogle Scholar
 Seidl, C. (2002). Preference reversal. Journal of Economic Surveys, 16(5), 621–655.CrossRefGoogle Scholar
 Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207–232.CrossRefGoogle Scholar
 Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90(4), 293–315.CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.