KeywordsColor Word Multisensory Integration Reward Rate Outcome Uncertainty Response Deadline
A diverse repertoire of behavioral tasks has been employed to examine the cognitive processes and neural basis underlying decision-making in humans and animals. Some of these have their origins in psychology, others in cognitive neuroscience, yet others in economics. There is also a continual invention of novel or hybrid paradigms, sometimes motivated by deep conceptual questions stimulated by computational modeling of decision-making and sometimes motivated by specific hypotheses related to functions of neuronal systems and brain regions that are suspected of playing an important role in decision-making.
- Decision-making under uncertainty
- Multidimensional decision-making
Decision-Making Under Uncertainty
Many decision-making tasks require the subjects to make choices among alternatives under conditions of uncertainty. Uncertainty can arise from noise related to sensory inputs, ignorance about stimulus or action outcomes, stochasticity or imprecise representation related to (relative) temporal onset of events, or some hybrid combinations of two or more of these factors. The difficulty of the tasks can often be controlled parametrically via the level of uncertainty induced experimentally, and the tasks can therefore be used to probe fundamental limitations and properties of human and animal cognition, as well as characterize individual and group differences.
Decision-making under sensory uncertainty, or perceptual decision-making, involves tasks in which the subject views a noisy stimulus and chooses a response consistent with the perceived response. One such widely used task is the random-dot coherent motion task (Newsome and Paré 1988), in which a field of flickering dots is displayed to the subject, a fraction (usually a minority) of which has simulated motion in a coherent direction, and the remainder either move in random directions (Newsome and Paré 1988) or have no discernible direction of motion (Shadlen and Newsome 2001). Based on the stimulus, the subject then chooses the appropriate response to indicate the perceived direction of coherent motion in the stimulus. This is usually done by eye movements in monkeys and either eye movements or key presses in humans. Typically, there are two possible directions of motion, out of which the subject must choose, making this a 2-alternative forced choice (2AFC) task; occasionally, it has been done with more than two alternatives or in a multiple-alternative forced choice (mAFC) format (Churchland et al. 2008). When the 2AFC version is done in conjunction with recordings of neurons that are sensitive to the direction of motion of the stimulus, the two possible directions of motion are chosen to be aligned with the preferred and anti-preferred directions of the neuron being recorded (Shadlen and Newsome 2001); in the mAFC version, one of the possible motion directions is aligned with the preferred direction of the neuron (Churchland et al. 2008).
There are two major variants of the random-dot coherent motion task, one in which the stimulus is present for a fixed duration (Newsome and Paré 1988) and one in which the stimulus is present until the subject makes a perceptual response (Roitman and Shadlen 2002). The latter, known as the reaction time (RT) variant, is the behaviorally more interesting case, as it allows the subject to decide not only which stimulus was seen but how much sensory data to accumulate about the stimulus before responding. This has made this task a simple yet effective means for probing the speed-accuracy trade-off in perception (Gold and Shadlen 2002). Another property of the task that makes it ideal for examining the speed-accuracy trade-off is that it corrupts the sensory stimulus with a significant amount of noise that is identical and independent over time, making the informational value of the sensory stimulus constant over time (Bogacz et al. 2006). This property also gives the experimenter an accessible experimental parameter for controlling the information rate (signal-to-noise ratio) of the stimulus, through the coherence of the stimulus, or the percentage of coherently moving dots.
The mathematically optimal decision policy for choosing among two hypotheses based on independent, identically distributed data, where the observer can choose how much data to observe, is known as the sequential hypothesis ratio test (SPRT). It is the optimal policy for minimizing any cost function that is monotonically related to average reaction time and probability of error (Wald 1947; Wald and Wolfowitz 1948). It states that the observer should accumulate evidence, in the form of Bayesian posterior probability using Bayes rule or, equivalently, by summing up log odds ratios, until the total evidence exceeds a confidence bound that favors one or the other stimulus; then, at that point, the observer should terminate the observation process and choose the more probable alternative. The height of the decision boundary is determined by the relative importance of speed and accuracy in the objective function, with greater importance of speed relative to accuracy leading to lower decision boundaries, and vice versa for decreasing importance of speed relative to accuracy. In the limit when many data samples are observed before the decision is made, SPRT converges to the drift-diffusion process first studied in connection with statistical mechanics; it is essentially a bounded linear dynamical system with additive Wiener noise and absorbing boundaries (Laming 1968; Ratcliff and Rouder 1998; Bogacz et al. 2006). Many mathematical properties of the drift-diffusion process are well known, such as the average sample size before termination, and the probability of exceeding the correct boundary, for different problem parameters and task conditions. It therefore provides a convenient mathematical framework for analyzing SPRT and thus experimental behavior in different task settings.
A popular alternative to the 2AFC paradigm is the Go/NoGo (GNG) task, in which one stimulus requires a response (Go) and the other requires the response to be withheld (NoGo). One apparent advantage of the GNG paradigm is that it obviates the need for response selection (Donders 1969), thus helping to minimize any confounding influences of motor planning and execution when the experimental focus is on perceptual and cognitive processing. However, within-subject comparison of reaction time (RT) and error rates (ER) in 2AFC and Go/NoGo variants of the same perceptual decision-making task has found systematic biases between the two (Gomez et al. 2007). Specifically, the go stimulus in the Go/NoGo task elicits shorter RT and more false-alarm responses than in the 2AFC task, when paired with the same opposing stimulus in both paradigms, resulting in a Go bias. This raises the question of whether the two paradigms really probe the same underlying processes.
Existing mechanistic models of these choice tasks, mostly variants of the drift-diffusion model (DDM; Ratcliff and Smith 2004; Gomez et al. 2007) and the related leaky competing accumulator models (Usher and McClelland 2001; Bogacz et al. 2006), capture various aspects of behavioral performance, but do not clarify the provenance of the Go bias in GNG. Recently, it has been shown that the Go bias may arise as a strategic adjustment in response to the implicit asymmetry in the cost structure of the 2AFC and GNG tasks and need not imply any fundamental differences in the sensory and cognitive processes engaged in the two tasks. Specifically, the NoGo response requires waiting until the response deadline, while a Go response immediately terminates the current trial (Shenoy and Yu 2012). Using a Bayes-risk minimizing decision policy that minimizes not only error rate but also average decision delay naturally exhibits the experimentally observed Go bias. The optimal decision policy is formally equivalent to a DDM with a time-varying threshold that initially rises after stimulus onset and collapses again just before the response deadline. The initial rise in the threshold is due to the diminishing temporal advantage of choosing the fast Go response compared to the fixed-delay NoGo response.
This class of decision-making tasks is designed such that subjects have no uncertainty about the stimulus identity, but rather about the consequences of choosing one alternative over another. To induce such uncertainty, the subjects are not explicitly told about the reinforcement consequences of the different alternatives, but instead they have to learn them over time. The reinforcements are typically in the form of a reward, such as money for human subjects, juice for monkeys, or seeds for birds; but occasionally they can also be in the form of a penalty, such as money taken away for human subjects, a foot shock for rats, or air puff for rabbits. This class of task originated in the study of associative learning, which showed that subjects’ asymptotic choice performance after substantial learning exhibited interesting features. For example, Herrnstein (Herrnstein 1961) found that pigeons, when faced with a choice between pecking two different buttons in a Skinner box, did not always choose the one with the higher reward rate, which is mathematically optimal, but rather alternated among the choices stochastically so as to “match” their underlying reward rates. Based on these results, Herrnstein formulated the “matching law” and the related “melioration theory” (Herrnstein 1970), which gave a procedural account of how matching-like behavior can arise from limited working memory. A more recent account, using Bayesian learning theory, shows that matching-like behavior is, at a finer timescale, the consequence of a maximizing choice strategy (always choosing the best options), coupled with a learning procedure that assumes the world to be nonstationary and thus results in internal beliefs that fluctuate with empirical experiences, even those driven by noise in a truly stationary environment (Yu and Cohen 2009; Yu and Huang 2014).
In order to induce continual outcome uncertainty, many implementations of this task incorporate unpredictable, unannounced changes in the stimulus-outcome contingencies during the experimental session. A classical example of such a manipulation is reversal learning, in which the choice with the better outcome suddenly becomes the worse choice, while the previously bad choice now becomes the better choice. This type of tasks has been used to study, among other things, how perseverative tendencies might change as a function of experimental condition or manipulations of the neural circuitry in the brain. Such tasks are rich for theoretical modeling of the neural representation, computation, and utilization of different forms of uncertainty that arise during learning and decision-making (Yu and Dayan 2005; Nassar et al. 2010). For example, there is thought to be a differentiation of at least two kinds of uncertainties, “expected uncertainty,” dealing with known uncertainties and variabilities in the environment, and “unexpected uncertainty,” arising from dramatic, unexpected changes in the statistical contingencies in the environment. Experimentally, human subjects have been shown to be more ready to alter choice strategy under conditions of more frequent choice-outcome contingencies compared to when these changes are less frequent (Behrens et al. 2007); the upregulation of the neuromodulator norepinephrine has been shown to increase the rates for learning such changes (Devauges and Sara 1990); pupil diameter in human subjects, under the control of noradrenergic and cholinergic neuromodulatory systems, has also been shown to correspond to specific model-derived uncertainty signals (Nassar et al. 2012).
There are two main types of multidimensional decision-making: (1) target-distractor differentiation, in which subjects must exclude the interfering influence of a distractor stimulus or attribute, in order to focus on the target stimulus or attribute, and (2) multisensory integration, in which subjects must integrate multiple stimuli, often differing in sensory modalities, to reach a combined choice response.
One classical task of this type is the Stroop task (Stroop 1935), in which subjects must ignore the meaning of a color word and report the physical color in which the text is written. Subjects consistently respond faster and more accurately when the meaning and physical attribute of the color word are congruent than when they are not. A related task is the Simon task, in which a stimulus present on one side (e.g., visual stimulus on the left side of the screen or auditory stimulus to the left ear) may instruct the subjects to respond with the other, less direct response (e.g., right button press), and again, performance is better when the two dimensions are congruent than when they are not (Simon 1967). Yet another similar task is the Eriksen task (Eriksen and Eriksen 1974), which requires the subject to identify a central stimulus (e.g., the letter “H” or “S”) when flanked on both sides by letters that are either congruent or incongruent with the central stimulus; again, subjects are significantly better in the congruent condition. These tasks have been used to discern how the brain identifies and filters out (or fails to do so) irrelevant distractors. Careful behavioral analysis of choice accuracy as a function of reaction time (Cho et al. 2002) has given clues as to the role of cognitive processes such as attentional control and stimulated theoretical modeling of the neural computations giving rise to specific features of the behavioral dynamics in such tasks (Yu et al. 2009).
A related but distinct class of tasks involves a prepotent “go” signal present on each trial and an occasional “stop” signal on a small fraction of trials, whereby the subject must execute the “go” response only on “go” trials and not on “stop” trials. In the stop-signal task (Logan and Cowan 1984), the “stop” signal appears, if at all, at an unpredictable time after the “go” stimulus, and the greater the onset asynchrony, the more unlikely that the subject is able to withhold the “go” response. A related task is what has been called a “compelled-response task,” in which the subject is instructed to go before being shown the cue that indicates which of the two responses is correct (Salinas et al. 2010). These tasks have given rise to competing theoretical models that postulate either the existence (Logan and Cowan 1984) or absence (Shenoy and Yu 2011; Salinas and Stanford 2013) of a specific stopping process or pathway and, in the latter class, the precise capacity to stop depending on either a normative, sequential consideration of the pros and cons of responding (Shenoy and Yu 2011) or the efficiency of the perceptual process itself (Salinas and Stanford 2013). However, recent experimental results indicate systematic and rational changes in subjects’ stopping capacity as a function of the reward structure of the task (Leotti and Wager 2009), suggesting not only that the “go” response is context sensitive and strategically malleable but also that it takes into account perceptual uncertainty and decision-theoretical factors such as reward contingencies (Shenoy and Yu 2011).
Instead of investigating how the brain selectively processes certain aspects of the environment and excludes others, simultaneously present stimuli have also been employed in experimental tasks to examine how the brain combines multiple sources of sensory information, in particular across sensory modalities. For instance, several studies in recent years have shown that human subjects combine differentially reliable sensory inputs from different modalities in a computationally optimal (Bayesian) way, such that greater weight is assigned to sensory inputs with less noise (and greater reliability) in the combined percept (Jacobs 1999; Ernst and Banks 2002; Battaglia et al. 2003; Dayan et al. 2000; Shams et al. 2005). This explains, for instance, why vision typically dominates over auditory modality in spatial localization in a phenomenon known as “visual capture” – this follows directly from the Bayesian formulation, since vision has greater spatial acuity and reliability than audition (Battaglia et al. 2003). Conversely, when the localization task is in the temporal domain instead of spatial, where audition has greater temporal acuity and reliability, auditory stimuli can induce an illusory visual percept. For example, when a single visual flash is accompanied by several auditory beeps, the visual percept is that of several flashes (Shams et al. 2000). Again this phenomenon can be explained by a statistically optimal ideal observer model (Shams et al. 2005; Kording et al. 2007). Recent neurophysiological studies have also begun to elucidate the neural basis of multisensory integration [see, e.g., Driver and Noesselt (2008) for a review].
One area of decision-making is devoted to the study of how humans make their choices based on their own internal preferences, such as in consumer decision-making, instead of based on sensory features or behavioral outcomes associated with the options. When choosing among options that differ along multiple attribute dimensions, humans consistently exhibit certain puzzling preference shifts, or even reversals, depending on the context. Three broad categories of contextual effects are studied in the psychology literature. In the attraction effect, given two similarly preferred options, A and B, the introduction of a third option Z that is similar to B, but also clearly less attractive than B, results in an increase in preference for B over A (Huber and Payne 1982; Heath and Chatterjee 1995). In the compromise effect, when B > A in one attribute and B < A in another attribute and Z has the same trade-off but is even more extreme than B, then B becomes the “compromise” option and becomes preferred relative to A (Simonson 1989). In the similarity effect, the introduction of a third option Z that is very similar to B in both attribute dimensions shifts the relative preference away from B to A (Tversky n.d.).
Traditionally, there have been two lines of explanations for such effects, the first attributing them to biases or suboptimalities in human decision-making (Kahneman and Tversky 1979; Kahneman et al. 1982) and the other suggesting that they are by-products of specific architectural or dynamical constraints on neural processing (Busemeyer and Townsend 1993; Usher and McClelland 2004; Trueblood 2012). A more recent, normative account (Shenoy and Yu 2013) uses a Bayesian model to demonstrate that these contextual effects can arise as rational consequences of three basic assumptions: (1) humans make preferential choices based on relative values anchored with respect to “fair market value,” which is inferred from both prior experience and the current set of available options; (2) different attributes are imperfect substitutes for one another, so that one unit of a scarce attribute is more valuable than one unit of an abundant one; and (3) uncertainty in beliefs about “market conditions” induces stochasticity in relative preference on repeated encounters with the same set of options. This model not only provides a principled explanation for why specific types of contextual modulation of preference choice exist, but a means to infer individual and group preferences in novel contexts given observed choices.
- Huber J, Payne J (1982) Adding asymmetrically dominated alternatives: violations of regularity and the similarity hypothesis. J Consum Res 9(1):90–98Google Scholar
- Kahneman D, Slovic P, Tversky A (eds) (1982) Judgement under uncertainty: heuristics and biases. Cambridge University Press, Cambridge, UKGoogle Scholar
- Laming DRJ (1968) Information theory of choice-reaction times. Academic, LondonGoogle Scholar
- Nassar MR, Rumsey KM, Wilson RC, Parikh K, Heasly B, Gold JI (2012) Rational regulation of learning dynamics by pupil-linked arousal systems. Nature Neurosci 15(7):1040–1046Google Scholar
- Salinas E, Stanford TR (2013) Waiting is the hardest part: comparison of two computational strategies for performing a compelled-response task. Front Comput Neurosci 33(13):5668–5685Google Scholar
- Shenoy P, Yu AJ (2012) Strategic impatience in Go/NoGo versus forced-choice decision-making. Adv Neural Inf Process Syst 25Google Scholar
- Shenoy P, Yu AJ (2013) A rational account of contextual effects in preference choice: what makes for a bargain? In: Proceedings of the thirty-fifth annual conference of the cognitive science society. Berlin, GermanyGoogle Scholar
- Tversky A (n.d.) Elimination by aspects: a theory of choice. Psychol Rev 79:288–299Google Scholar
- Usher M, McClelland J (2004) Loss aversion and inhibition in dynamical models of multialternative choice. Psychol Rev 111(3):757–769Google Scholar
- Wald A (1947) Sequential analysis. Wiley, New YorkGoogle Scholar
- Yu AJ, Cohen JD (2009) Sequential effects: superstition or rational behavior? Adv Neural Inf Process Syst 21:1873–1880Google Scholar
- Yu AJ, Huang H (2014) Maximizing masquerading as matching in human visual search choice behavior. Decision (To appear)Google Scholar