Encyclopedia of Computational Neuroscience

Living Edition
| Editors: Dieter Jaeger, Ranu Jung

Decision Making, Models

  • Paul MillerEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7320-6_312-3

Keywords

Firing Rate Drift Rate Race Model Biophysical Model Response Time Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Definition

Models of decision making attempt to describe, using stochastic differential equations which represent either neural activity or more abstract psychological variables, the dynamical process that produces a commitment to a single action/outcome as a result of incoming evidence that can be ambiguous as to the action it supports.

Detailed Description

Background

Decision making can be separated into four processes (Doya 2008):
  1. 1.

    Acquisition of sensory information to determine the state of the environment and the organism within it

     
  2. 2.

    Evaluation of potential actions (options) in terms of the cost and benefit to the organism given its belief about the current state

     
  3. 3.

    Selection of an action based on, ideally, an optimal trade-off between the costs and benefits

     
  4. 4.

    Use of the outcome of the action to update the costs and benefits associated with it

     

Models of the dynamics of decision making have focused on perceptual decisions with only two possible responses available. The term two-alternative forced choice (TAFC) applies to such tasks when two stimuli are provided, but the term is now generally used for any binary choice discrimination task.

In a perceptual decision, the response, or action, is directly determined by the current percept. Thus, the decision in these tasks is essentially one of perceptual categorization, namely, process (1) above, though the same models can be used for action selection given ambiguous information of the current state (process 3).

Evaluation of the possible responses in terms of their value or the resulting state’s utility (process 2) (Sugrue et al. 2005) given both uncertainty in the current state, and uncertainty in the outcomes of an action given the state, is the subject of expected utility theory and prospect theory.

The necessary learning and updating of the values of different actions given the actual outcomes they produce (process 4) are the subject of instrumental conditioning and reinforcement learning, for example, via temporal difference learning (Seymour et al. 2004) and actor-critic models (Joel et al. 2002).

This entry is primarily concerned with the dynamics of the production of either a single percept given unreliable sensory evidence (1) or a single action given uncertainty in the outcomes (3).

General Features of Discrimination Tasks or TAFC Tasks

In a TAFC task, a single decision variable can be defined representing the likelihood ratio – the probability that evidence to date favors one alternative over the other. While TAFC tasks (Fig. 1) have provided the dominant paradigm for analysis of choice behavior, the restriction to only two choices is lifted in many of the more recent models of decision making based on multiple variables, allowing for the fitting of a wider range of data sets.
Fig. 1

Scheme of the two-alternative forced-choice (TAFC) task. Two streams of sensory input, each containing stimulus information, or a signal (S1 and S2) combined with noise (σ1η1(t) and σ2η2(t)), are compared in a decision-making circuit. The circuit must produce one of two responses (A or B) indicating which of the two signals is stronger. The optimal method for achieving this discrimination is via the sequential probability ratio test (SPRT) which requires the decision-making circuit to integrate inputs over time

The tasks can be based on either a free-response paradigm, in which a subject responds after as much or little time as she wants, or an interrogation (forced response) paradigm, in which the stimulus duration is limited and the subject must make a response within a given time interval. The free-response paradigm is perhaps more powerful, since each trial produces two types of information: accuracy (correct or incorrect) and response time. However, by variation of the time allowed when responses are forced, both paradigms are valuable for constraining models, since they can provide a distribution of response times for both correct and incorrect trials, as well as the proportion of trials that are correct or incorrect with a given stimulus. These behavioral data can be modified by task difficulty, task instructions (such as “respond rapidly” versus “respond accurately”), or reward schedules and intertrial intervals.

Most models of the dynamics of decision making focus on tasks where the time from stimulus onset to response is no more than one to two seconds, a timescale over which neural spiking can be maintained. Choices requiring much more time than this are likely to depend upon multiple memory stores, neural circuits, and strategies, which become difficult to identify, extract, and model in a dynamical systems framework (a state-based framework is more appropriate).

In the standard setup of the models, two parallel streams of noisy sensory input are available, with each stream supplying evidence in support of one of the two allowed actions (see Fig. 1). The sensory inputs can be of either discrete or continuous quantities and can arrive discretely or continuously in time. The majority of models focus on continuous update in continuous time so they can be formulated as stochastic differential equations (Gillespie 1992; Lawler 2006). The sensory evidence, which is momentary, produces a decision variable, which indicates the likelihood of choosing one of the two alternatives given current evidence and all prior evidence. The primary difference between models is in how sensory evidence determines the decision variable. While most models incorporate a form of temporal integration of evidence (Cain and Shea-Brown 2012) and include a negative interaction between the two sources of evidence, differences arise in the stability of initial states which determines whether integration is perfect and in the nature of the interaction: feedforward between the inputs, feedforward between outputs, or feedback from outputs to decision variables (Bogacz et al. 2006). Models can also differ in their choice of decision threshold – the value of the decision variable at which a response is produced – in the free-response paradigm (Simen et al. 2009; Deneve 2012; Drugowitsch et al. 2012) and in particular whether this parameter or other model parameters, such as input gain, which also affect the response time distribution, are static or dynamical across a trial (Shea-Brown et al. 2008; Thura et al. 2012).

As the time available for acquisition of sensory information increases, so does the accuracy of responses in a perceptual discrimination task. Accuracy is measured as probability of choosing the response leading to more reward, which is equivalent to obtaining a veridical percept in these tasks. All of the models to be discussed below can produce such a speed-accuracy trade-off by parameter adjustment. If parameters are adjusted so as to increase the mean response time, then accuracy increases. Such a trade-off is observed in behavioral tasks under two regimes: one, when either instructions or the schedule of reward and punishment encourages participants to respond as quickly as possible while being less concerned about making errors or two, when subjects respond as accurately as possible while being less concerned about the time it takes to decide. The simplest way to effect such a trade-off is to adjust the intertrial interval, which if long compared to the decision time means that accuracy of responses impacts reward rate much more so than the time for the decision itself. Models can replicate such behavior when optimal performance is based on the maximal reward rate. Typical parameter adjustments to increase accuracy while slowing responses would be a multiplicative scaling down of inputs (and the concurrent input noise) or a scaling up of the range across which the decision variable can vary by raising a decision threshold (Figs. 2 and 3) (Ratcliff 2002; Simen et al. 2009; Balci et al. 2011). A similar effect can be achieved in alternative, attractor-based models through the level of a global applied current, which affects the stability of the initial “undecided” state (Figs. 69) (Miller and Katz 2013).
Fig. 2

The drift-diffusion model (DDM). The DDM is a one-dimensional model, so the two competing inputs and their noise terms are first combined: in this case S = S 1 − S 2 and σ 2 = σ 1 2  + σ 2 2

Fig. 3

The drift-diffusion model is a Wiener process (one-dimensional Brownian motion) with absorbing boundaries. In the absence of the boundaries, the probability distribution is Gaussian, centered at a distance St from its starting point with variance increasing as σ 2 t

From a neuroscience perspective, the decision variable is typically interpreted as either the mean firing rate of a group of neurons or a linear combination of rates of many neurons (Beck et al. 2008), the difference between two groups being the simplest such combination. There has been remarkable progress in matching the observed firing patterns of neurons (Newsome et al. 1989; Shadlen and Newsome 2001; Huk and Shadlen 2005) with the dynamics of a decision variable in more mathematical models of decision making (Glimcher 2001, 2003; Gold and Shadlen 2001, 2007; Smith and Ratcliff 2004; Ratcliff et al. 2007). This has led to the introduction of biophysically based models of neural circuits (Wang 2008), which have accounted for much of the concordance between simple mathematical models, neural activity, and behavior.

Optimal Decision Making

An optimal decision-making strategy either maximizes expected reward over a given time or minimizes risk. In TAFC perceptual tasks, a response is either correct or an error. In the interrogation paradigm, with fixed time per decision, the optimal strategy is the one leading to greatest accuracy, that is, the lowest expected error rate. In the free-response paradigm, the optimal strategy either delivers the greatest accuracy for a given mean response time or produces the fastest mean response time for a given accuracy. In these tasks, the sequential probability ratio test (SPRT), introduced by Wald and Wolfowitz (Wald 1947; Wald and Wolfowitz 1948), and, in its continuous form, the drift-diffusion model (DDM) (Ratcliff and Smith 2004; Ratcliff and McKoon; 2008) lead to optimal choice behavior by any of these measures of optimality (see (Bogacz et al. 2006) for a thorough review).

Using SPRT in the interrogation paradigm, one simply accumulates over time the log-likelihood ratio of the probabilities of each alternative given the stream of evidence, where the observed sensory input per unit time has a certain probability given alternative A and another probability given alternative B. Integrating the log-likelihood over time, after setting the initial condition as the log-likelihood ratio of the prior probabilities, log[P(A)/P(B)], leads to a quantity log[P(A|S)/P(B|S)] which is greater than zero if A is more likely than B given the stimulus and less than zero otherwise. Thus, from standard Bayesian theory, the optimal procedure is to choose A or B depending on the sign of the summed, or in the continuous limit, integrated, log-likelihood ratio.

In the free-response paradigm, a stopping criterion must be included. This is achieved by setting two thresholds for the integrated log-likelihood ratio, a positive one (+a) for choice A and a negative one (−b) for choice B. The further the thresholds are from the origin, the lower the chance of error, but the longer the integration time before reaching a decision. Thus, the thresholds reflect the fraction of errors that can be tolerated, with \( a= \log \frac{\beta }{1-\alpha } \) and \( b= \log \frac{\alpha }{1-\beta } \) where α is the probability of choosing A when B is correct and β is the probability of choosing B when A is correct.

The Models

Accumulator Models

The first models of decision making in humans or animals were accumulator models, sometimes called counter models or race models. In these models, evidence accumulates separately for each possible outcome. This has the advantage that if many outcomes are possible, the models are simply extended by addition of one more variable for each additional alternative, with evidence for each alternative accumulating within its allotted variable. In the interrogation paradigm, one simply reads out the highest variable, so the choice depends on the sign of the difference of the two variables in the TAFC paradigm. Thus, if the difference in accumulated quantities matched the difference in integrated log probabilities of the two types of evidence, such readout from an accumulator model would be equivalent to an SPRT, so would be optimal.

In the free-response paradigm, accumulator models produce a choice when any one of the accumulated variables reaches a threshold, so these models can be called “race to threshold models” or simply “race models.” The original accumulator models included neither interaction between accumulators nor ability for variables to decrease. However, for decisions in nature or in laboratory protocols, evidence in favor of one alternative is typically evidence against the other alternative. This is particularly problematic in the free-response paradigm, because the time at which one variable reaches threshold and produces the corresponding choice is independent of evidence accumulated for other choices. Thus, the behavior of simple accumulator models is not optimal. Comparisons of response time distributions of these models with behavioral responses showed the models to be inaccurate in this regard – observed response time distributions are skewed with a long tail, whereas the response times of accumulator models were much more symmetric about the mean. These discrepancies led to the ascendance of Ratcliff’s drift-diffusion model (Ratcliff 1978).

The Drift-Diffusion Model

The drift-diffusion model (DDM) is an integrator with thresholds (Fig. 2), or more precisely, the decision variable, x, follows a Wiener process with two absorbing boundaries (Fig. 3). It includes a deterministic (drift) term, S, proportional to the rate of incoming evidence and a diffusive noise term of variance σ 2, which produces variability in response times and can lead to errors:
$$ \frac{ dx}{ dt}=S+\sigma \eta (t) $$
where η(t) is a white noise term defined by 〈η(t)η(t′)〉 = δ(t − t′).

If the model is scaled to a given level of noise, then its three independent parameters are drift rate (S) and positions of each of the two thresholds (a, −b) with respect to the starting point. When the model was introduced, these parameters were assumed fixed for a given subject in a specific task. The threshold spacing determines where one operates in the speed-accuracy trade-off, so it can be optimized as a function of the relative cost for making an incorrect response and the time between trials. Any starting point away from the midpoint represents bias or prior information. The drift rate is proportional to stimulus strength.

With fixed parameters, which could be fitted to any subject’s responses, the DDM reproduces key features of the behavioral data: notably the skewed shape of response time distributions and the covariation of mean response times and response accuracy with task difficulty. Skewed response time distributions arise because the variance of a Wiener process increases linearly with time – responses much earlier than the mean response time, when the variance in the decision variable is low, are less likely than responses much later, when the variance in the decision variable is high. A more difficult perceptual choice is represented by a drift rate closer to zero, which increases response times and increases the probability of error. Such covariation of response times with accuracy matches behavioral data well, so long as an additional processing time is added to the model – the additional time representing a variable sensory transduction delay on the input side and a motor delay on the output side, both of which contribute to response times in addition to the processing within any decision circuit.

While the original DDM included trial-to-trial variability through the diffusive noise term, additional trial-to-trial variability in the parameters was needed to account for differences between the response time distributions of correct trials from those of incorrect trials. With fixed parameters and no initial bias, the DDM produces identical distributions (though with different magnitudes) for the timing of correct responses and errors. However, response times of human subjects are typically slower when they produce errors, unless they are instructed to respond as quickly as possible, in which case the reverse is true. These behaviors are accounted for in the DDM by including trial-to-trial variability in the drift rate and/or the starting point. Trial-to-trial variability in drift rate leads to slower errors, as errors become more likely on those trials when the drift rate is altered from its mean toward zero and mean responses are longer. Trial-to-trial variability in the starting point leads to faster errors, as errors become more likely on those trials in which the starting point is closer to the error boundary, in which case the response is faster. Error response times are reduced more by such variability than are correct response times since the correct responses include more of the trials started at the midpoint where mean times are longer.

Prior information or bias can be incorporated in the drift-diffusion model, either through a shift in the starting point of integration or an additional bias current to the inputs to be integrated. A shift in starting point is more optimal and has led to two-stage models, where the first stage sets the starting point from the values of the two choices, before a second stage of integration toward threshold commences. Such a model is supported by electrophysiological data (Rorie et al. 2010).

The Leaky Competing Accumulator Model

The leaky competing accumulator (LCA), introduced by Usher and McClelland (2001), was suggested to be in better accord with neural data and to better fit some behavioral data than the DDM. The LCA is a two-variable model, with each variable integrating evidence in support of one of the two alternatives in a TAFC task. The model includes a “leak” term, as decay back to baseline for each individual variable in the absence of incoming evidence. The “competition” in LCA is a cross-inhibition between the two variables, thus improving upon original accumulator models in allowing evidence for one variable contributing to a reduction in the other variable (Fig. 4).
$$ \tau \frac{d{X}_1}{ dt}={S}_1-k{X}_1-\beta {X}_2+{\sigma}_1\sqrt{\tau }{\eta}_1(t)\ \tau \frac{d{X}_2}{ dt}={S}_2-k{X}_2-\beta {X}_1+{\sigma}_2\sqrt{\tau }{\eta}_2(t) $$
Fig. 4

The leaky competing accumulator (LCA) model. Two separate integration processes for the two separate stimuli produce two decision variables, X 1 and X 2 . A “leak” term proportional to k causes a decay of the decision variable through self-inhibition, while a cross-inhibition term proportional to β produces competition. In the interrogation paradigm, a decision is made according to the greater of X 1 or X 2 at the response time, while in the free-response paradigm, a choice is made when one decision variable first reaches its threshold (given as +a and +b, respectively)

The difference in the two LCA variables, X d  = X 1 − X 2, follows an Ornstein-Uhlenbeck process:
$$ \tau \frac{d{X}_d}{ dt}={S}_d-k{X}_d+\beta {X}_d+{\sigma}_d\sqrt{\tau}\eta (t) $$
where S d  = S 1 − S 2 and \( {\sigma}_d=\sqrt{\sigma_1^2+{\sigma}_2^2} \). It should be noted that the DDM is retrieved as a special case of the LCA, if the coefficient of the term linear in X d , that is, β − k, is set to zero.

If the only criterion for choosing the best model were its match to behavioral data, one could use Akaike information criterion or Bayesian information criterion to assess whether inclusion of a nonzero term in the LCA is justified by the better fit to data so produced. As well as ability to fit human response times, the ability for a model to match neural activity and to be generated robustly in a neural circuit should be considered when assessing its value and relevance. One achievement of the LCA is to reproduce an initial increase in both variables, before the competition between variables causes one variable to be suppressed, while the other variable accelerates its rise. Such behavior matches electrophysiological data recorded in monkeys during perceptual choices – in particular, firing rates of neurons representing the response not chosen increase upon stimulus onset before they are suppressed (cf. trajectories in Fig. 6).

Neural Network Models and Attractor States

Some of the first models of perceptual categorization, which have had significant impact on neuroscience, were Hopfield networks, Cohen-Grossberg neural networks, and Willshaw networks. While these models are primarily aimed at formation and storage of memories, the retrieval of a memory via corrupted information is identical to a perceptual decision.

A correspondence between memory retrieval and decision making should not be surprising, since Ratcliff’s introduction of the DDM – the archetype of decision-making models – was within a paper entitled “A theory of memory retrieval.” Ratcliff was focused on fitting response times and thus produced an inherently dynamical model; memory retrieval was treated as a set of parallel DDMs, each representing an individual item with a “match” and “non-match” threshold to indicate its recognition or not. The rate of evidence accumulation in each individual DDM was set in terms of the similarity between a current item and one in memory.

Models such as those of Hopfield respond to the match between a current item and memories of previously encoded items, which are contained within the network as attractor states. The network’s activity reaches a stable attractor state more rapidly if the match is close. The temporal dynamics of memory retrieval or pattern completion, which comprises the decision process, is not addressed carefully in neural network models, in which neural units can be binary and time can be discretized, since this is not their goal. However, the use of attractor states to represent the outcome of a decision or a perceptual categorization has achieved success in biophysical models based on spiking neurons (Wang 2008), albeit with far simpler attractor states than those of neural networks.

Biophysical Models

The LCA model (Usher and McClelland 2001), being motivated by neurophysiology, is similar in spirit to the more detailed biophysical models that followed it. In particular, the first model of decision making based on spiking neurons assumes two competing integrators, where integration is produced through tuned recurrent excitation and competition is the result of cross-inhibition between the pools of neurons (see Fig. 5). However, when even the most elementary properties of spiking neurons are taken into account, a few additional complications arise (Wong and Wang 2006).
Fig. 5

A neural circuit model of decision making. Integration of stimuli (S1, S2) by groups of neurons with rates r1 and r2 can be achieved through strong excitatory recurrent connections within groups (looped connections with arrows). The mean firing rate of cells in the inhibitory pool (“INHIB”) increases with both r1 and r2, producing competition through inhibitory input to both cell groups (solid circles) (Wang 2002)

First, neurons emit spikes as a point process in time, so that even at a constant firing rate they supply a variable current to other cells. The variability in the current can be reduced with an increase in the number of cells in a group, so long as the spike times are uncorrelated – that is, cells are firing asynchronously. Asynchrony is most easily achieved when neurons spike irregularly, a feature produced by adding additional noise to each neuron in the decision-making circuit. So, in accord with most likely realizations of a decision-making circuit in vivo, biophysical models introduce additional noise into the decision-making model itself, which inevitably adds to any already-present stimulus noise (Wang 2002).

Second, neurons are either excitatory or inhibitory, so in order for two groups of cells with self-excitation to inhibit each other, at a minimum, a third group of inhibitory cells must be added. The need for an extra cell group to mediate the inhibition adds a small delay to the cross-inhibition compared to self-excitation, though this effect can be counteracted with fast responses in the synaptic connections to and from inhibitory cells versus slower synaptic responses in excitatory connections. The second effect of an intermediate cell group is to add the nonlinearity of the inhibitory interneuron’s response function into the cross-inhibition. Thus, the inhibitory input to one excitatory cell group is not a linear function of the firing rate of the other cell group. The consequences of this and other nonlinearities are discussed further below.

Third, biophysical models of neurons, just like neurons in vivo, respond nonlinearly to increased inputs. Similarly, synaptic inputs to a cell saturate, so are a nonlinear function of presynaptic firing rates. In the LCA model (Usher and McClelland 2001), both the neural response and the feedback are linear functions passing through the origin, so via the tuning of one variable, the two curves can match each other and produce an integrator. Integrators require such a matching of synaptic feedback to neural response so they can retain a stable firing rate in the absence of input – the firing rate produced by a given synaptic input must be exactly that needed to generate the same synaptic input – and this must be true for a wide range of firing rates. Such matching of synaptic feedback to neural response produces a state of marginal stability, typically called a line attractor or continuous attractor.

However, in the absence of symmetry or a remarkable similarity in the shape of neural response curves and synaptic feedback curves, the nonlinear curves of biophysical neurons intersect at no more than three points (Fig. 6), leading to the possibility of two discrete stable attractor states for a group of cells (with an unstable fixed point in between). When two such groups are coupled, such as by cross-inhibition, the network can have at most four stable states, given by the combinations of low and high firing rates for the two groups. In the winner-takes-all model of decision making, three of these states can be stable in the absence of input: the state when both cell groups have low or spontaneous activity and the two states with just one of the cell groups possessing high activity. The fixed point with both groups possessing high activity is unstable. In the presence of input, the symmetric low-activity state becomes unstable, and only two stable states remain: the decision states with one group active (“the winner”) and the other group inactive (“the loser”). Importantly, with a combination of slow synaptic time constants (NMDA receptors with a time constant of 50–100 ms are an essential ingredient of the model) and sufficient fine-tuning of parameters of the network, the time course for the network’s activity to shift from the unstable initial state to one of the two remaining stable attractor states is slow enough to match neural and behavioral response times.
Fig. 6

Nonlinearity of neural response functions produces stable fixed points, toward which neural activity evolves. Nullclines are represented by the solid thick red/green lines, the green line where dr2/dt = 0 at fixed r1 and the red line where dr1/dt = 0 at fixed r2. Crossings of the lines are fixed points of the system, with stable fixed points denoted by solid black circles. Decision states are stable fixed points with r2 >> r1 or r1 >> r2. As a symmetric input current is added to the network, the spontaneous “undecided” state, that is, the fixed point of low rates with r1 = r2, becomes unstable, so that by I1 = I2 = 10, the only stable fixed points correspond to one choice or the other and a decision is forced. With high enough applied input current, I1 = I2 = 15, a new symmetric stable state appears at high firing rates, but this plays no role in the decision-making process. Deterministic trajectories from a range of starting points are indicated by the thin colored lines that terminate at a fixed point. See Table 1 for an xppaut code that produces the nullclines and Table 2 for the Matlab code that produces the trajectories

Table 1

A code base on xppaut to produce nullclines and fixed points for two interacting groups of neurons, connected in a decision-making circuit, as used to produce Figs. 6, 7, 8, and 9

Table 2

The Matlab function used to simulate multiple decision-making trials as plotted in each panel of Figs. 6, 7, 8, and 9

Fig. 7

Stochastic noise in the simulation produces trial-to-trial variability in the neural responses. With the same neural circuit of Fig. 6, addition of noise can cause neural activity to end up in different states, even with the same starting points. Small colored dots indicate trajectories, with black solid circles denoting end points after 5 s. All simulations begin with r 1  = r 2  = 0.1 Hz. Far left: with no applied current, neural activity is maintained near the low-rate spontaneous state. Center left: with moderate applied current, noise can induce transitions to one of the two “decision states.” Center and far right: with increased applied current, trajectories always end up at a decision state – even with I1 = I2 = 15, the two decision states are more stable than the symmetric high-rate state (far right). See Table 1 for an xppaut code that produces the nullclines and Table 2 for the Matlab code that produces the trajectories

Fig. 8

A bias in the inputs causes neural activity to favor one decision state over the other, though noise means that “errors” can arise. Left: in the deterministic system, more trajectories terminate at the attractor point of high r1, because group 1 receives higher input current. In particular, a symmetric initial condition (r 1  = r 2 ) results in termination with high r 1 and low r 2 . That is, the basin of attraction for the state with r 1  > r 2 includes the line r 1  = r 2 . Right: with added noise, some trials with a symmetric starting point terminate in the state with high r 2 – these correspond to “errors” in the standard terminology of decision making. See Table 1 for an xppaut code that produces the nullclines and Table 2 for the Matlab code that produces the trajectories

Fig. 9

In a noisy system, the initial state can remain deterministically stable, but responses terminate in one of the decision states, with the bias favoring one final state over the other. Left: in the absence of noise and a bias in the inputs, more trajectories evolve to the fixed point favored by the input, but many terminate at the “undecided” state. Right: with added noise and symmetric initial conditions, most otherwise “undecided” responses switch to the decision state favored by the input bias (corresponding to “correct” responses), while a few terminate in the other decision state (“incorrect” responses) and one remains in the symmetric low-rate state (an “undecided” response). See Table 1 for an xppaut code that produces the nullclines and Table 2 for the Matlab code that produces the trajectories

Extension of Models to Multiple Choices

Many decision-making models for TAFC are simply extended to the case of multiple alternatives (Bogacz et al. 2007b; Furman and Wang 2008; Niwa and Ditterich 2008; Ditterich 2010). Electrophysiological data suggests that neural firing rates reach a threshold independent of number of alternatives, but neurons receive greater inhibition, as revealed by reduced firing rates in their initial, spontaneous activity state (Churchland et al. 2008; Churchland and Ditterich 2012). This is akin to a reduction in the prior probability of each individual choice alternative before stimulus onset, if prior probability impacts the starting point of integration. Models in which the total amount of inhibition depends on the total number of alternative choices (see Fig. 10) reproduce such behavior. One consequence of the increased inhibition is a slowing of decision times as the number of alternatives increases.
Fig. 10

Models for decision making with multiple alternatives. Left: extension of the leaky competing accumulator model to multiple alternatives, such that the negative feedback is proportional to the sum of multiple decision variables. Right: extension of the biophysical model, in which cells providing inhibitory feedback are activated by the summed activity of multiple groups of excitatory cells

Sequential Discrimination, Context-Dependent Decisions, and Prior Information

The most developed dynamical models of decision making pertain to the identification of an incoming sensory stimulus or the comparison of two or more concurrent stimuli. However, many decisions require a comparison of successive stimuli, and even when two stimuli are concurrent, our attention typically switches from one to the other when making a perceptual choice. Thus, models of decision making have been developed in which the stimuli are separated in time, so a form of short-term memory is required, with the contents of short-term memory and a later sensory input both affecting the choice of action (Romo and Salinas 2003; Machens et al. 2005; Miller and Wang 2006). The process of making a decision by combining short-term memory, which can represent the current “context” (Salinas 2004), with sensory input provides the essential ingredient for working memory tasks and for model-based strategies of action selection (Deco and Rolls 2005).

Prior information can make one response either a more likely alternative or a more rewarding alternative given ambiguous sensory information and can lead to across-trial dependences in decision-making behavior. Consideration of how much to weigh prior information compared to the current stimulus requires a separate choice (Hanks et al. 2011), which establishes how long one should continue to acquire sensory input. Such a choice is akin to setting a decision-making threshold. Factors affecting the optimal period for obtaining sensory input include the relative clarity of incoming information compared to the strength of the prior (Deneve 2012), as well as the intrinsic cost of a reduced reward rate when taking more time to decide (Drugowitsch et al. 2012). At some point in time, awaiting further sensory evidence does not improve one’s probability of a correct choice sufficiently to warrant any extra delay of a potential reward. Solution by dynamical programming (Bellman 1957) of a model that takes into account these factors suggests that monkeys respond according to an optimal, time-varying cost function (Drugowitsch et al. 2012).

Testing Decision-Making Models

Decision-making models can be tested more stringently using tasks in which the stimulus is not held constant across the time allotted for producing a response (Zhou et al. 2009; Stanford et al. 2010; Shankar et al. 2011; Rüter et al. 2012). For example, in models such as the DDM based on perfect integration, if a stimulus is altered or even reversed for a short amount of time, so long as the stimulus alteration is in the period of its integration, it has the same effect on response time and choice probability whether it is early or late. However, in models where the initial state is unstable, a late altered stimulus has weaker impact on the decision-making dynamics than an early one. Conversely, in models where the initial state is stable, such as the LCA with a positive leak term, only stimuli presented shortly before the final response contribute to it; the time constant of the drift term that draws the decision variables back to the initial state corresponds to a time constant for the forgetting of earlier evidence.

Alternatively, if one sets up a task in which noise in the stimulus, controlled by the experimenter, is the dominant noise source in the decision-making process, one can analyze separately correct trials and error trials and align them by either time of stimulus onset or time of response to assess the impact of noise fluctuations on choice probability or response times. Care must be taken when aligning by response times, since threshold crossings are inevitably produced by noise fluctuations in the direction of the threshold crossed. However, in all cases, model predictions can be tested with experimental measurements. Current results appear to be task dependent, as some data sets suggest all evidence is equally impactful (supporting a perfect integrator) (Huk and Shadlen 2005; Brunton et al. 2013), while others suggest the weighing of sensory evidence is higher early (Ludwig and Davies 2011) or higher late (Cisek et al. 2009; Thura et al. 2012) or oscillatory (Wyart et al. 2012) across stimulus duration.

Beyond Discrimination: Action Selection

In the models considered heretofore, the decision of what action to take has been equivalent to the question of what is perceived. This is because in the relevant tasks, the difficulty is in unraveling the cause of a sensory input, which has been degraded either at source, or through sensory processing, or as a result of imperfections of memory encoding and recall. The requisite action given a sensory percept is either via a straightforward instruction for human subjects or produced by weeks to months of training in nonhuman animal subjects. Thus, in the post-training stage used to acquire data, the step from percept to action can be considered very fast and independent of the parameters varied by the experimenter to modify task difficulty.

However, most decisions require us to select a course of action given a percept or given a combination of percepts. Two general strategies, termed model based or model-free (Dayan and Daw 2008; Dayan and Niv 2008), are possible for action selection. Model-based strategies require an evaluation of all possible consequences, with their likelihood, similar in a chess game to calculating all the combinations of moves in response to one move, or, conversely, all the possible causes of an observation. A model-free system simply learns the value of a given state and selects an action based on the immediately reachable state with highest value. For example, one could move a chess piece to produce the pattern of pieces that has led to most games won in the past.

A wide literature in the field of reinforcement learning (Barto 1994; Redish et al. 2007) and its possible neural underpinnings (Daw and Doya 2006; Johnson et al. 2007; Lee and Seo 2007) addresses how one can learn the value of states over multiple trials. This literature has influenced model-free biophysical models of decision making (Soltani and Wang 2008, 2010), with the principal requirement being a need for enhanced Hebbian learning when the resulting action leads to positive reinforcement, but not otherwise. The neural activity underlying the decision precedes the reinforcement signal, so in order for the appropriate synapses to be potentiated when positive reinforcement arrives, authors have suggested the activity itself remains in a persistent state of high firing rate (Soltani and Wang 2006) or a molecular “eligibility trace” of earlier activity persists through the time of the reinforcement signal (Bogacz et al. 2007a; Izhikevich 2007).

A model-based method for making a decision is more flexible and can be more accurate, but in all but the simplest cases, the combinatorial explosion of alternatives becomes unwieldy and time-consuming to calculate. Hierarchical reinforcement learning renders such models more manageable (Barto and Mahadevan 2003; Ito and Doya 2011; Botvinick 2012).

Cross-References

References

  1. Balci F, Simen P, Niyogi R, Saxe A, Hughes JA, Holmes P, Cohen JD (2011) Acquisition of decision making criteria: reward rate ultimately beats accuracy. Atten Percept Psychophys 73:640–657PubMedCentralPubMedCrossRefGoogle Scholar
  2. Barto AG (1994) Reinforcement learning control. Curr Opin Neurobiol 4:888–893PubMedCrossRefGoogle Scholar
  3. Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst Theory Appl 13:343–379Google Scholar
  4. Beck JM, Ma WJ, Kiani R, Hanks T, Churchland AK, Roitman J, Shadlen MN, Latham PE, Pouget A (2008) Probabilistic population codes for Bayesian decision making. Neuron 60:1142–1152PubMedCentralPubMedCrossRefGoogle Scholar
  5. Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonGoogle Scholar
  6. Bogacz R, Brown E, Moehlis J, Holmes P, Cohen JD (2006) The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev 113:700–765PubMedCrossRefGoogle Scholar
  7. Bogacz R, McClure SM, Li J, Cohen JD, Montague PR (2007a) Short-term memory traces for action bias in human reinforcement learning. Brain Res 1153:111–121PubMedCrossRefGoogle Scholar
  8. Bogacz R, Usher M, Zhang J, McClelland JL (2007b) Extending a biologically inspired model of choice: multi-alternatives, nonlinearity and value-based multidimensional choice. Philos Trans R Soc Lond B Biol Sci 362:1655–1670PubMedCentralPubMedCrossRefGoogle Scholar
  9. Botvinick MM (2012) Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol 22:956–962PubMedCrossRefGoogle Scholar
  10. Brunton BW, Botvinick MM, Brody CD (2013) Rats and humans can optimally accumulate evidence for decision-making. Science 340:95–98PubMedCrossRefGoogle Scholar
  11. Cain N, Shea-Brown E (2012) Computational models of decision making: integration, stability, and noise. Curr Opin Neurobiol 22:1047–1053PubMedCrossRefGoogle Scholar
  12. Churchland AK, Ditterich J (2012) New advances in understanding decisions among multiple alternatives. Curr Opin Neurobiol 22:920–926PubMedCentralPubMedCrossRefGoogle Scholar
  13. Churchland AK, Kiani R, Shadlen MN (2008) Decision-making with multiple alternatives. Nat Neurosci 11:693–702PubMedCentralPubMedCrossRefGoogle Scholar
  14. Cisek P, Puskas GA, El-Murr S (2009) Decisions in changing conditions: the urgency-gating model. J Neurosci 29:11560–11571PubMedCrossRefGoogle Scholar
  15. Daw ND, Doya K (2006) The computational neurobiology of learning and reward. Curr Opin Neurobiol 16:199–204PubMedCrossRefGoogle Scholar
  16. Dayan P, Daw ND (2008) Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci 8:429–453PubMedCrossRefGoogle Scholar
  17. Dayan P, Niv Y (2008) Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol 18:185–196PubMedCrossRefGoogle Scholar
  18. Deco G, Rolls ET (2005) Attention, short-term memory, and action selection: a unifying theory. Prog Neurobiol 76:236–256PubMedCrossRefGoogle Scholar
  19. Deneve S (2012) Making decisions with unknown sensory reliability. Front Neurosci 6:75PubMedCentralPubMedCrossRefGoogle Scholar
  20. Ditterich J (2010) A comparison between mechanisms of multi-alternative perceptual decision making: ability to explain human behavior, predictions for neurophysiology, and relationship with decision theory. Front Neurosci 4:184PubMedCentralPubMedCrossRefGoogle Scholar
  21. Doya K (2008) Modulators of decision making. Nat Neurosci 11:410–416PubMedCrossRefGoogle Scholar
  22. Drugowitsch J, Moreno-Bote R, Churchland AK, Shadlen MN, Pouget A (2012) The cost of accumulating evidence in perceptual decision making. J Neurosci Off J Soc Neurosci 32:3612–3628CrossRefGoogle Scholar
  23. Furman M, Wang XJ (2008) Similarity effect and optimal control of multiple-choice decision making. Neuron 60:1153–1168PubMedCentralPubMedCrossRefGoogle Scholar
  24. Gillespie DT (1992) Markov processes: an introduction for physical scientists. Academic, San DiegoGoogle Scholar
  25. Glimcher PW (2001) Making choices: the neurophysiology of visual-saccadic decision making. Trends Neurosci 24:654–659PubMedCrossRefGoogle Scholar
  26. Glimcher PW (2003) The neurobiology of visual-saccadic decision making. Annu Rev Neurosci 26:133–179PubMedCrossRefGoogle Scholar
  27. Gold JI, Shadlen MN (2001) Neural computations that underlie decisions about sensory stimuli. Trends Cogn Sci 5:10–16PubMedCrossRefGoogle Scholar
  28. Gold JI, Shadlen MN (2007) The neural basis of decision making. Annu Rev Neurosci 30:535–574PubMedCrossRefGoogle Scholar
  29. Hanks TD, Mazurek ME, Kiani R, Hopp E, Shadlen MN (2011) Elapsed decision time affects the weighting of prior probability in a perceptual decision task. J Neurosci Off J Soc Neurosci 31:6339–6352CrossRefGoogle Scholar
  30. Huk AC, Shadlen MN (2005) Neural activity in macaque parietal cortex reflects temporal integration of visual motion signals during perceptual decision making. J Neurosci Off J Soc Neurosci 25:10420–10436CrossRefGoogle Scholar
  31. Ito M, Doya K (2011) Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Curr Opin Neurobiol 21:368–373PubMedCrossRefGoogle Scholar
  32. Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17:2443–2452PubMedCrossRefGoogle Scholar
  33. Joel D, Niv Y, Ruppin E (2002) Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw Off J Int Neural Netw Soc 15:535–547CrossRefGoogle Scholar
  34. Johnson A, van der Meer MA, Redish AD (2007) Integrating hippocampus and striatum in decision-making. Curr Opin Neurobiol 17:692–697PubMedCentralPubMedCrossRefGoogle Scholar
  35. Lawler GF (2006) Introduction to stochastic processes. Chapman & Hall/CRC, Boca RatonGoogle Scholar
  36. Lee D, Seo H (2007) Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex. Ann N Y Acad Sci 1104:108–122PubMedCrossRefGoogle Scholar
  37. Ludwig CJ, Davies JR (2011) Estimating the growth of internal evidence guiding perceptual decisions. Cogn Psychol 63:61–92PubMedCrossRefGoogle Scholar
  38. Machens CK, Romo R, Brody CD (2005) Flexible control of mutual inhibition: a neural model of two-interval discrimination. Science 307:1121–1124PubMedCrossRefGoogle Scholar
  39. Miller P, Katz DB (2013) Accuracy and response-time distributions for decision-making: linear perfect integrators versus nonlinear attractor-based neural circuits. J Comput Neurosci 35:261–294Google Scholar
  40. Miller P, Wang XJ (2006) Discrimination of temporally separated stimuli by integral feedback control. Proc Natl Acad Sci U S A 103:201–206PubMedCentralPubMedCrossRefGoogle Scholar
  41. Newsome WT, Britten KH, Movshon JA (1989) Neuronal correlates of a perceptual decision. Nature 341:52–54PubMedCrossRefGoogle Scholar
  42. Niwa M, Ditterich J (2008) Perceptual decisions between multiple directions of visual motion. J Neurosci Off J Soc Neurosci 28:4435–4445CrossRefGoogle Scholar
  43. Ratcliff R (1978) A theory of memory retrieval. Psychol Rev 85:59–108CrossRefGoogle Scholar
  44. Ratcliff R (2002) A diffusion model account of response time and accuracy in a brightness discrimination task: fitting real data and failing to fit fake but plausible data. Psychon Bull Rev 9:278–291PubMedCrossRefGoogle Scholar
  45. Ratcliff R, Hasegawa YT, Hasegawa RP, Smith PL, Segraves MA (2007) Dual diffusion model for single-cell recording data from the superior colliculus in a brightness-discrimination task. J Neurophysiol 97:1756–1774PubMedCentralPubMedCrossRefGoogle Scholar
  46. Ratcliff R, McKoon G (2008) The diffusion decision model: theory and data for two-choice decision tasks. Neural Comput 20:873–922PubMedCentralPubMedCrossRefGoogle Scholar
  47. Ratcliff R, Smith PL (2004) A comparison of sequential sampling models for two-choice reaction time. Psychol Rev 111:333–367PubMedCentralPubMedCrossRefGoogle Scholar
  48. Redish AD, Jensen S, Johnson A, Kurth-Nelson Z (2007) Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol Rev 114:784–805PubMedCrossRefGoogle Scholar
  49. Romo R, Salinas E (2003) Flutter discrimination: neural codes, perception, memory and decision making. Nat Rev Neurosci 4:203–218PubMedCrossRefGoogle Scholar
  50. Rorie AE, Gao J, McClelland JL, Newsome WT (2010) Integration of sensory and reward information during perceptual decision-making in lateral intraparietal cortex (LIP) of the macaque monkey. PLoS ONE 5:e9308PubMedCentralPubMedCrossRefGoogle Scholar
  51. Rüter J, Marcille N, Sprekeler H, Gerstner W, Herzog MH (2012) Paradoxical evidence integration in rapid decision processes. PLoS Comput Biol 8:e1002382PubMedCentralPubMedCrossRefGoogle Scholar
  52. Salinas E (2004) Fast remapping of sensory stimuli onto motor actions on the basis of contextual modulation. J Neurosci 24:1113–1118PubMedCrossRefGoogle Scholar
  53. Seymour B, O'Doherty JP, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak RS (2004) Temporal difference models describe higher-order learning in humans. Nature 429:664–667PubMedCrossRefGoogle Scholar
  54. Shadlen MN, Newsome WT (2001) Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J Neurophysiol 86:1916–1936PubMedGoogle Scholar
  55. Shankar S, Massoglia DP, Zhu D, Costello MG, Stanford TR, Salinas E (2011) Tracking the temporal evolution of a perceptual judgment using a compelled-response task. J Neurosci Off J Soc Neurosci 31:8406–8421CrossRefGoogle Scholar
  56. Shea-Brown E, Gilzenrat MS, Cohen JD (2008) Optimization of decision making in multilayer networks: the role of locus coeruleus. Neural Comput 20:2863–2894PubMedCrossRefGoogle Scholar
  57. Simen P, Contreras D, Buck C, Hu P, Holmes P, Cohen JD (2009) Reward rate optimization in two-alternative decision making: empirical tests of theoretical predictions. J Exp Psychol Hum Percept Perform 35:1865–1897PubMedCentralPubMedCrossRefGoogle Scholar
  58. Smith PL, Ratcliff R (2004) Psychology and neurobiology of simple decisions. Trends Neurosci 27:161–168PubMedCrossRefGoogle Scholar
  59. Soltani A, Wang XJ (2006) A biophysically-based neural model of matching law behavior: melioration by stochastic synapses. J Neurosci 26:3731–3744PubMedCrossRefGoogle Scholar
  60. Soltani A, Wang XJ (2008) From biophysics to cognition: reward-dependent adaptive choice behavior. Curr Opin Neurobiol 18:209–216PubMedCrossRefGoogle Scholar
  61. Soltani A, Wang XJ (2010) Synaptic computation underlying probabilistic inference. Nat Neurosci 13:112–119PubMedCentralPubMedCrossRefGoogle Scholar
  62. Stanford TR, Shankar S, Massoglia DP, Costello MG, Salinas E (2010) Perceptual decision making in less than 30 milliseconds. Nat Neurosci 13:379–385PubMedCentralPubMedCrossRefGoogle Scholar
  63. Sugrue LP, Corrado GS, Newsome WT (2005) Choosing the greater of two goods: neural currencies for valuation and decision making. Nat Rev Neurosci 6:363–375PubMedCrossRefGoogle Scholar
  64. Thura D, Beauregard-Racine J, Fradet CW, Cisek P (2012) Decision making by urgency gating: theory and experimental support. J Neurophysiol 108:2912–2930PubMedCrossRefGoogle Scholar
  65. Usher M, McClelland JL (2001) The time course of perceptual choice: the leaky, competing accumulator model. Psychol Rev 108:550–592PubMedCrossRefGoogle Scholar
  66. Wald A (1947) Sequential analysis. Wiley, New YorkGoogle Scholar
  67. Wald A, Wolfowitz J (1948) Optimum character of the sequential probability ratio test. Ann Math Stat 19:326–339CrossRefGoogle Scholar
  68. Wang XJ (2002) Probabilistic decision making by slow reverberation in cortical circuits. Neuron 36:955–968PubMedCrossRefGoogle Scholar
  69. Wang XJ (2008) Decision making in recurrent neuronal circuits. Neuron 60:215–234PubMedCentralPubMedCrossRefGoogle Scholar
  70. Wong KF, Wang XJ (2006) A recurrent network mechanism of time integration in perceptual decisions. J Neurosci Off J Soc Neurosci 26:1314–1328CrossRefGoogle Scholar
  71. Wyart V, de Gardelle V, Scholl J, Summerfield C (2012) Rhythmic fluctuations in evidence accumulation during decision making in the human brain. Neuron 76:847–858PubMedCrossRefPubMedCentralGoogle Scholar
  72. Zhou X, Wong-Lin K, Philip H (2009) Time-varying perturbations can distinguish among integrate-to-threshold models for perceptual decision making in reaction time tasks. Neural Comput 21:2336–2362PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Volen Center for Complex Systems and Department of BiologyBrandeis UniversityWalthamUSA