# Decision Making, Models

**DOI:**https://doi.org/10.1007/978-1-4614-7320-6_312-3

## Keywords

Firing Rate Drift Rate Race Model Biophysical Model Response Time Distribution## Definition

Models of decision making attempt to describe, using stochastic differential equations which represent either neural activity or more abstract psychological variables, the dynamical process that produces a commitment to a single action/outcome as a result of incoming evidence that can be ambiguous as to the action it supports.

## Detailed Description

### Background

- 1.
Acquisition of sensory information to determine the

**state**of the environment and the organism within it - 2.
Evaluation of potential actions (options) in terms of the cost and benefit to the organism given its belief about the current state

- 3.
Selection of an

**action**based on, ideally, an optimal trade-off between the costs and benefits - 4.
Use of the outcome of the action to update the costs and benefits associated with it

Models of the dynamics of decision making have focused on perceptual decisions with only two possible responses available. The term two-alternative forced choice (TAFC) applies to such tasks when two stimuli are provided, but the term is now generally used for any binary choice discrimination task.

In a perceptual decision, the response, or action, is directly determined by the current percept. Thus, the decision in these tasks is essentially one of perceptual categorization, namely, process (1) above, though the same models can be used for action selection given ambiguous information of the current state (process 3).

Evaluation of the possible responses in terms of their value or the resulting state’s utility (process 2) (Sugrue et al. 2005) given both uncertainty in the current state, and uncertainty in the outcomes of an action given the state, is the subject of expected utility theory and prospect theory.

The necessary learning and updating of the values of different actions given the actual outcomes they produce (process 4) are the subject of instrumental conditioning and reinforcement learning, for example, via temporal difference learning (Seymour et al. 2004) and actor-critic models (Joel et al. 2002).

This entry is primarily concerned with the dynamics of the production of either a single percept given unreliable sensory evidence (1) or a single action given uncertainty in the outcomes (3).

#### General Features of Discrimination Tasks or TAFC Tasks

The tasks can be based on either a free-response paradigm, in which a subject responds after as much or little time as she wants, or an interrogation (forced response) paradigm, in which the stimulus duration is limited and the subject must make a response within a given time interval. The free-response paradigm is perhaps more powerful, since each trial produces two types of information: accuracy (correct or incorrect) and response time. However, by variation of the time allowed when responses are forced, both paradigms are valuable for constraining models, since they can provide a distribution of response times for both correct and incorrect trials, as well as the proportion of trials that are correct or incorrect with a given stimulus. These behavioral data can be modified by task difficulty, task instructions (such as “respond rapidly” versus “respond accurately”), or reward schedules and intertrial intervals.

Most models of the dynamics of decision making focus on tasks where the time from stimulus onset to response is no more than one to two seconds, a timescale over which neural spiking can be maintained. Choices requiring much more time than this are likely to depend upon multiple memory stores, neural circuits, and strategies, which become difficult to identify, extract, and model in a dynamical systems framework (a state-based framework is more appropriate).

In the standard setup of the models, two parallel streams of noisy sensory input are available, with each stream supplying evidence in support of one of the two allowed actions (see Fig. 1). The sensory inputs can be of either discrete or continuous quantities and can arrive discretely or continuously in time. The majority of models focus on continuous update in continuous time so they can be formulated as stochastic differential equations (Gillespie 1992; Lawler 2006). The sensory evidence, which is momentary, produces a decision variable, which indicates the likelihood of choosing one of the two alternatives given current evidence and all prior evidence. The primary difference between models is in how sensory evidence determines the decision variable. While most models incorporate a form of temporal integration of evidence (Cain and Shea-Brown 2012) and include a negative interaction between the two sources of evidence, differences arise in the stability of initial states which determines whether integration is perfect and in the nature of the interaction: feedforward between the inputs, feedforward between outputs, or feedback from outputs to decision variables (Bogacz et al. 2006). Models can also differ in their choice of *decision threshold* – the value of the decision variable at which a response is produced – in the free-response paradigm (Simen et al. 2009; Deneve 2012; Drugowitsch et al. 2012) and in particular whether this parameter or other model parameters, such as input gain, which also affect the response time distribution, are static or dynamical across a trial (Shea-Brown et al. 2008; Thura et al. 2012).

From a neuroscience perspective, the decision variable is typically interpreted as either the mean firing rate of a group of neurons or a linear combination of rates of many neurons (Beck et al. 2008), the difference between two groups being the simplest such combination. There has been remarkable progress in matching the observed firing patterns of neurons (Newsome et al. 1989; Shadlen and Newsome 2001; Huk and Shadlen 2005) with the dynamics of a decision variable in more mathematical models of decision making (Glimcher 2001, 2003; Gold and Shadlen 2001, 2007; Smith and Ratcliff 2004; Ratcliff et al. 2007). This has led to the introduction of biophysically based models of neural circuits (Wang 2008), which have accounted for much of the concordance between simple mathematical models, neural activity, and behavior.

#### Optimal Decision Making

An optimal decision-making strategy either maximizes expected reward over a given time or minimizes risk. In TAFC perceptual tasks, a response is either correct or an error. In the interrogation paradigm, with fixed time per decision, the optimal strategy is the one leading to greatest accuracy, that is, the lowest expected error rate. In the free-response paradigm, the optimal strategy either delivers the greatest accuracy for a given mean response time or produces the fastest mean response time for a given accuracy. In these tasks, the sequential probability ratio test (SPRT), introduced by Wald and Wolfowitz (Wald 1947; Wald and Wolfowitz 1948), and, in its continuous form, the drift-diffusion model (DDM) (Ratcliff and Smith 2004; Ratcliff and McKoon; 2008) lead to optimal choice behavior by any of these measures of optimality (see (Bogacz et al. 2006) for a thorough review).

Using SPRT in the interrogation paradigm, one simply accumulates over time the log-likelihood ratio of the probabilities of each alternative given the stream of evidence, where the observed sensory input per unit time has a certain probability given alternative A and another probability given alternative B. Integrating the log-likelihood over time, after setting the initial condition as the log-likelihood ratio of the prior probabilities, log[P(A)/P(B)], leads to a quantity log[P(A|S)/P(B|S)] which is greater than zero if A is more likely than B given the stimulus and less than zero otherwise. Thus, from standard Bayesian theory, the optimal procedure is to choose A or B depending on the sign of the summed, or in the continuous limit, integrated, log-likelihood ratio.

In the free-response paradigm, a stopping criterion must be included. This is achieved by setting two thresholds for the integrated log-likelihood ratio, a positive one (+*a*) for choice A and a negative one (−*b*) for choice B. The further the thresholds are from the origin, the lower the chance of error, but the longer the integration time before reaching a decision. Thus, the thresholds reflect the fraction of errors that can be tolerated, with \( a= \log \frac{\beta }{1-\alpha } \) and \( b= \log \frac{\alpha }{1-\beta } \) where *α* is the probability of choosing A when B is correct and *β* is the probability of choosing B when A is correct.

### The Models

#### Accumulator Models

The first models of decision making in humans or animals were accumulator models, sometimes called counter models or race models. In these models, evidence accumulates separately for each possible outcome. This has the advantage that if many outcomes are possible, the models are simply extended by addition of one more variable for each additional alternative, with evidence for each alternative accumulating within its allotted variable. In the interrogation paradigm, one simply reads out the highest variable, so the choice depends on the sign of the difference of the two variables in the TAFC paradigm. Thus, if the difference in accumulated quantities matched the difference in integrated log probabilities of the two types of evidence, such readout from an accumulator model would be equivalent to an SPRT, so would be optimal.

In the free-response paradigm, accumulator models produce a choice when any one of the accumulated variables reaches a threshold, so these models can be called “race to threshold models” or simply “race models.” The original accumulator models included neither interaction between accumulators nor ability for variables to decrease. However, for decisions in nature or in laboratory protocols, evidence in favor of one alternative is typically evidence against the other alternative. This is particularly problematic in the free-response paradigm, because the time at which one variable reaches threshold and produces the corresponding choice is independent of evidence accumulated for other choices. Thus, the behavior of simple accumulator models is not optimal. Comparisons of response time distributions of these models with behavioral responses showed the models to be inaccurate in this regard – observed response time distributions are skewed with a long tail, whereas the response times of accumulator models were much more symmetric about the mean. These discrepancies led to the ascendance of Ratcliff’s drift-diffusion model (Ratcliff 1978).

#### The Drift-Diffusion Model

*x*, follows a Wiener process with two absorbing boundaries (Fig. 3). It includes a deterministic (drift) term,

*S*, proportional to the rate of incoming evidence and a diffusive noise term of variance

*σ*

^{2}, which produces variability in response times and can lead to errors:

*η*(

*t*) is a white noise term defined by 〈

*η*(

*t*)

*η*(

*t*′)〉 =

*δ*(

*t*−

*t*′).

If the model is scaled to a given level of noise, then its three independent parameters are drift rate (*S*) and positions of each of the two * thresholds* (

*a*, −

*b*) with respect to the starting point. When the model was introduced, these parameters were assumed fixed for a given subject in a specific task. The threshold spacing determines where one operates in the speed-accuracy trade-off, so it can be optimized as a function of the relative cost for making an incorrect response and the time between trials. Any starting point away from the midpoint represents bias or prior information. The drift rate is proportional to stimulus strength.

With fixed parameters, which could be fitted to any subject’s responses, the DDM reproduces key features of the behavioral data: notably the skewed shape of response time distributions and the covariation of mean response times and response accuracy with task difficulty. Skewed response time distributions arise because the variance of a Wiener process increases linearly with time – responses much earlier than the mean response time, when the variance in the decision variable is low, are less likely than responses much later, when the variance in the decision variable is high. A more difficult perceptual choice is represented by a drift rate closer to zero, which increases response times and increases the probability of error. Such covariation of response times with accuracy matches behavioral data well, so long as an additional processing time is added to the model – the additional time representing a variable sensory transduction delay on the input side and a motor delay on the output side, both of which contribute to response times in addition to the processing within any decision circuit.

While the original DDM included trial-to-trial variability through the diffusive noise term, additional trial-to-trial variability in the parameters was needed to account for differences between the response time distributions of correct trials from those of incorrect trials. With fixed parameters and no initial bias, the DDM produces identical distributions (though with different magnitudes) for the timing of correct responses and errors. However, response times of human subjects are typically slower when they produce errors, unless they are instructed to respond as quickly as possible, in which case the reverse is true. These behaviors are accounted for in the DDM by including trial-to-trial variability in the drift rate and/or the starting point. Trial-to-trial variability in drift rate leads to slower errors, as errors become more likely on those trials when the drift rate is altered from its mean toward zero and mean responses are longer. Trial-to-trial variability in the starting point leads to faster errors, as errors become more likely on those trials in which the starting point is closer to the error boundary, in which case the response is faster. Error response times are reduced more by such variability than are correct response times since the correct responses include more of the trials started at the midpoint where mean times are longer.

Prior information or bias can be incorporated in the drift-diffusion model, either through a shift in the starting point of integration or an additional bias current to the inputs to be integrated. A shift in starting point is more optimal and has led to two-stage models, where the first stage sets the starting point from the values of the two choices, before a second stage of integration toward threshold commences. Such a model is supported by electrophysiological data (Rorie et al. 2010).

#### The Leaky Competing Accumulator Model

*X*

_{ d }=

*X*

_{1}−

*X*

_{2}, follows an Ornstein-Uhlenbeck process:

*S*

_{ d }=

*S*

_{1}−

*S*

_{2}and \( {\sigma}_d=\sqrt{\sigma_1^2+{\sigma}_2^2} \). It should be noted that the DDM is retrieved as a special case of the LCA, if the coefficient of the term linear in

*X*

_{ d }, that is,

*β*−

*k*, is set to zero.

If the only criterion for choosing the best model were its match to behavioral data, one could use Akaike information criterion or Bayesian information criterion to assess whether inclusion of a nonzero term in the LCA is justified by the better fit to data so produced. As well as ability to fit human response times, the ability for a model to match neural activity and to be generated robustly in a neural circuit should be considered when assessing its value and relevance. One achievement of the LCA is to reproduce an initial increase in both variables, before the competition between variables causes one variable to be suppressed, while the other variable accelerates its rise. Such behavior matches electrophysiological data recorded in monkeys during perceptual choices – in particular, firing rates of neurons representing the response not chosen increase upon stimulus onset before they are suppressed (cf. trajectories in Fig. 6).

#### Neural Network Models and Attractor States

Some of the first models of perceptual categorization, which have had significant impact on neuroscience, were Hopfield networks, Cohen-Grossberg neural networks, and Willshaw networks. While these models are primarily aimed at formation and storage of memories, the retrieval of a memory via corrupted information is identical to a perceptual decision.

A correspondence between memory retrieval and decision making should not be surprising, since Ratcliff’s introduction of the DDM – the archetype of decision-making models – was within a paper entitled “A theory of memory retrieval.” Ratcliff was focused on fitting response times and thus produced an inherently dynamical model; memory retrieval was treated as a set of parallel DDMs, each representing an individual item with a “match” and “non-match” threshold to indicate its recognition or not. The rate of evidence accumulation in each individual DDM was set in terms of the similarity between a current item and one in memory.

Models such as those of Hopfield respond to the match between a current item and memories of previously encoded items, which are contained within the network as attractor states. The network’s activity reaches a stable attractor state more rapidly if the match is close. The temporal dynamics of memory retrieval or pattern completion, which comprises the decision process, is not addressed carefully in neural network models, in which neural units can be binary and time can be discretized, since this is not their goal. However, the use of attractor states to represent the outcome of a decision or a perceptual categorization has achieved success in biophysical models based on spiking neurons (Wang 2008), albeit with far simpler attractor states than those of neural networks.

#### Biophysical Models

First, neurons emit spikes as a point process in time, so that even at a constant firing rate they supply a variable current to other cells. The variability in the current can be reduced with an increase in the number of cells in a group, so long as the spike times are uncorrelated – that is, cells are firing asynchronously. Asynchrony is most easily achieved when neurons spike irregularly, a feature produced by adding additional noise to each neuron in the decision-making circuit. So, in accord with most likely realizations of a decision-making circuit in vivo, biophysical models introduce additional noise into the decision-making model itself, which inevitably adds to any already-present stimulus noise (Wang 2002).

Second, neurons are either excitatory or inhibitory, so in order for two groups of cells with self-excitation to inhibit each other, at a minimum, a third group of inhibitory cells must be added. The need for an extra cell group to mediate the inhibition adds a small delay to the cross-inhibition compared to self-excitation, though this effect can be counteracted with fast responses in the synaptic connections to and from inhibitory cells versus slower synaptic responses in excitatory connections. The second effect of an intermediate cell group is to add the nonlinearity of the inhibitory interneuron’s response function into the cross-inhibition. Thus, the inhibitory input to one excitatory cell group is not a linear function of the firing rate of the other cell group. The consequences of this and other nonlinearities are discussed further below.

Third, biophysical models of neurons, just like neurons in vivo, respond nonlinearly to increased inputs. Similarly, synaptic inputs to a cell saturate, so are a nonlinear function of presynaptic firing rates. In the LCA model (Usher and McClelland 2001), both the neural response and the feedback are linear functions passing through the origin, so via the tuning of one variable, the two curves can match each other and produce an integrator. Integrators require such a matching of synaptic feedback to neural response so they can retain a stable firing rate in the absence of input – the firing rate produced by a given synaptic input must be exactly that needed to generate the same synaptic input – and this must be true for a wide range of firing rates. Such matching of synaptic feedback to neural response produces a state of marginal stability, typically called a line attractor or continuous attractor.

#### Extension of Models to Multiple Choices

#### Sequential Discrimination, Context-Dependent Decisions, and Prior Information

The most developed dynamical models of decision making pertain to the identification of an incoming sensory stimulus or the comparison of two or more concurrent stimuli. However, many decisions require a comparison of successive stimuli, and even when two stimuli are concurrent, our attention typically switches from one to the other when making a perceptual choice. Thus, models of decision making have been developed in which the stimuli are separated in time, so a form of short-term memory is required, with the contents of short-term memory and a later sensory input both affecting the choice of action (Romo and Salinas 2003; Machens et al. 2005; Miller and Wang 2006). The process of making a decision by combining short-term memory, which can represent the current “context” (Salinas 2004), with sensory input provides the essential ingredient for working memory tasks and for model-based strategies of action selection (Deco and Rolls 2005).

Prior information can make one response either a more likely alternative or a more rewarding alternative given ambiguous sensory information and can lead to across-trial dependences in decision-making behavior. Consideration of how much to weigh prior information compared to the current stimulus requires a separate choice (Hanks et al. 2011), which establishes how long one should continue to acquire sensory input. Such a choice is akin to setting a decision-making threshold. Factors affecting the optimal period for obtaining sensory input include the relative clarity of incoming information compared to the strength of the prior (Deneve 2012), as well as the intrinsic cost of a reduced reward rate when taking more time to decide (Drugowitsch et al. 2012). At some point in time, awaiting further sensory evidence does not improve one’s probability of a correct choice sufficiently to warrant any extra delay of a potential reward. Solution by dynamical programming (Bellman 1957) of a model that takes into account these factors suggests that monkeys respond according to an optimal, time-varying cost function (Drugowitsch et al. 2012).

### Testing Decision-Making Models

Decision-making models can be tested more stringently using tasks in which the stimulus is not held constant across the time allotted for producing a response (Zhou et al. 2009; Stanford et al. 2010; Shankar et al. 2011; Rüter et al. 2012). For example, in models such as the DDM based on perfect integration, if a stimulus is altered or even reversed for a short amount of time, so long as the stimulus alteration is in the period of its integration, it has the same effect on response time and choice probability whether it is early or late. However, in models where the initial state is unstable, a late altered stimulus has weaker impact on the decision-making dynamics than an early one. Conversely, in models where the initial state is stable, such as the LCA with a positive leak term, only stimuli presented shortly before the final response contribute to it; the time constant of the drift term that draws the decision variables back to the initial state corresponds to a time constant for the forgetting of earlier evidence.

Alternatively, if one sets up a task in which noise in the stimulus, controlled by the experimenter, is the dominant noise source in the decision-making process, one can analyze separately correct trials and error trials and align them by either time of stimulus onset or time of response to assess the impact of noise fluctuations on choice probability or response times. Care must be taken when aligning by response times, since threshold crossings are inevitably produced by noise fluctuations in the direction of the threshold crossed. However, in all cases, model predictions can be tested with experimental measurements. Current results appear to be task dependent, as some data sets suggest all evidence is equally impactful (supporting a perfect integrator) (Huk and Shadlen 2005; Brunton et al. 2013), while others suggest the weighing of sensory evidence is higher early (Ludwig and Davies 2011) or higher late (Cisek et al. 2009; Thura et al. 2012) or oscillatory (Wyart et al. 2012) across stimulus duration.

### Beyond Discrimination: Action Selection

In the models considered heretofore, the decision of what action to take has been equivalent to the question of what is perceived. This is because in the relevant tasks, the difficulty is in unraveling the cause of a sensory input, which has been degraded either at source, or through sensory processing, or as a result of imperfections of memory encoding and recall. The requisite action given a sensory percept is either via a straightforward instruction for human subjects or produced by weeks to months of training in nonhuman animal subjects. Thus, in the post-training stage used to acquire data, the step from percept to action can be considered very fast and independent of the parameters varied by the experimenter to modify task difficulty.

However, most decisions require us to select a course of action given a percept or given a combination of percepts. Two general strategies, termed model based or model-free (Dayan and Daw 2008; Dayan and Niv 2008), are possible for action selection. Model-based strategies require an evaluation of all possible consequences, with their likelihood, similar in a chess game to calculating all the combinations of moves in response to one move, or, conversely, all the possible causes of an observation. A model-free system simply learns the value of a given state and selects an action based on the immediately reachable state with highest value. For example, one could move a chess piece to produce the pattern of pieces that has led to most games won in the past.

A wide literature in the field of reinforcement learning (Barto 1994; Redish et al. 2007) and its possible neural underpinnings (Daw and Doya 2006; Johnson et al. 2007; Lee and Seo 2007) addresses how one can learn the value of states over multiple trials. This literature has influenced model-free biophysical models of decision making (Soltani and Wang 2008, 2010), with the principal requirement being a need for enhanced Hebbian learning when the resulting action leads to positive reinforcement, but not otherwise. The neural activity underlying the decision precedes the reinforcement signal, so in order for the appropriate synapses to be potentiated when positive reinforcement arrives, authors have suggested the activity itself remains in a persistent state of high firing rate (Soltani and Wang 2006) or a molecular “eligibility trace” of earlier activity persists through the time of the reinforcement signal (Bogacz et al. 2007a; Izhikevich 2007).

A model-based method for making a decision is more flexible and can be more accurate, but in all but the simplest cases, the combinatorial explosion of alternatives becomes unwieldy and time-consuming to calculate. Hierarchical reinforcement learning renders such models more manageable (Barto and Mahadevan 2003; Ito and Doya 2011; Botvinick 2012).

## Cross-References

## References

- Balci F, Simen P, Niyogi R, Saxe A, Hughes JA, Holmes P, Cohen JD (2011) Acquisition of decision making criteria: reward rate ultimately beats accuracy. Atten Percept Psychophys 73:640–657PubMedCentralPubMedCrossRefGoogle Scholar
- Barto AG (1994) Reinforcement learning control. Curr Opin Neurobiol 4:888–893PubMedCrossRefGoogle Scholar
- Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst Theory Appl 13:343–379Google Scholar
- Beck JM, Ma WJ, Kiani R, Hanks T, Churchland AK, Roitman J, Shadlen MN, Latham PE, Pouget A (2008) Probabilistic population codes for Bayesian decision making. Neuron 60:1142–1152PubMedCentralPubMedCrossRefGoogle Scholar
- Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonGoogle Scholar
- Bogacz R, Brown E, Moehlis J, Holmes P, Cohen JD (2006) The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev 113:700–765PubMedCrossRefGoogle Scholar
- Bogacz R, McClure SM, Li J, Cohen JD, Montague PR (2007a) Short-term memory traces for action bias in human reinforcement learning. Brain Res 1153:111–121PubMedCrossRefGoogle Scholar
- Bogacz R, Usher M, Zhang J, McClelland JL (2007b) Extending a biologically inspired model of choice: multi-alternatives, nonlinearity and value-based multidimensional choice. Philos Trans R Soc Lond B Biol Sci 362:1655–1670PubMedCentralPubMedCrossRefGoogle Scholar
- Botvinick MM (2012) Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol 22:956–962PubMedCrossRefGoogle Scholar
- Brunton BW, Botvinick MM, Brody CD (2013) Rats and humans can optimally accumulate evidence for decision-making. Science 340:95–98PubMedCrossRefGoogle Scholar
- Cain N, Shea-Brown E (2012) Computational models of decision making: integration, stability, and noise. Curr Opin Neurobiol 22:1047–1053PubMedCrossRefGoogle Scholar
- Churchland AK, Ditterich J (2012) New advances in understanding decisions among multiple alternatives. Curr Opin Neurobiol 22:920–926PubMedCentralPubMedCrossRefGoogle Scholar
- Churchland AK, Kiani R, Shadlen MN (2008) Decision-making with multiple alternatives. Nat Neurosci 11:693–702PubMedCentralPubMedCrossRefGoogle Scholar
- Cisek P, Puskas GA, El-Murr S (2009) Decisions in changing conditions: the urgency-gating model. J Neurosci 29:11560–11571PubMedCrossRefGoogle Scholar
- Daw ND, Doya K (2006) The computational neurobiology of learning and reward. Curr Opin Neurobiol 16:199–204PubMedCrossRefGoogle Scholar
- Dayan P, Daw ND (2008) Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci 8:429–453PubMedCrossRefGoogle Scholar
- Dayan P, Niv Y (2008) Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol 18:185–196PubMedCrossRefGoogle Scholar
- Deco G, Rolls ET (2005) Attention, short-term memory, and action selection: a unifying theory. Prog Neurobiol 76:236–256PubMedCrossRefGoogle Scholar
- Deneve S (2012) Making decisions with unknown sensory reliability. Front Neurosci 6:75PubMedCentralPubMedCrossRefGoogle Scholar
- Ditterich J (2010) A comparison between mechanisms of multi-alternative perceptual decision making: ability to explain human behavior, predictions for neurophysiology, and relationship with decision theory. Front Neurosci 4:184PubMedCentralPubMedCrossRefGoogle Scholar
- Doya K (2008) Modulators of decision making. Nat Neurosci 11:410–416PubMedCrossRefGoogle Scholar
- Drugowitsch J, Moreno-Bote R, Churchland AK, Shadlen MN, Pouget A (2012) The cost of accumulating evidence in perceptual decision making. J Neurosci Off J Soc Neurosci 32:3612–3628CrossRefGoogle Scholar
- Furman M, Wang XJ (2008) Similarity effect and optimal control of multiple-choice decision making. Neuron 60:1153–1168PubMedCentralPubMedCrossRefGoogle Scholar
- Gillespie DT (1992) Markov processes: an introduction for physical scientists. Academic, San DiegoGoogle Scholar
- Glimcher PW (2001) Making choices: the neurophysiology of visual-saccadic decision making. Trends Neurosci 24:654–659PubMedCrossRefGoogle Scholar
- Glimcher PW (2003) The neurobiology of visual-saccadic decision making. Annu Rev Neurosci 26:133–179PubMedCrossRefGoogle Scholar
- Gold JI, Shadlen MN (2001) Neural computations that underlie decisions about sensory stimuli. Trends Cogn Sci 5:10–16PubMedCrossRefGoogle Scholar
- Gold JI, Shadlen MN (2007) The neural basis of decision making. Annu Rev Neurosci 30:535–574PubMedCrossRefGoogle Scholar
- Hanks TD, Mazurek ME, Kiani R, Hopp E, Shadlen MN (2011) Elapsed decision time affects the weighting of prior probability in a perceptual decision task. J Neurosci Off J Soc Neurosci 31:6339–6352CrossRefGoogle Scholar
- Huk AC, Shadlen MN (2005) Neural activity in macaque parietal cortex reflects temporal integration of visual motion signals during perceptual decision making. J Neurosci Off J Soc Neurosci 25:10420–10436CrossRefGoogle Scholar
- Ito M, Doya K (2011) Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Curr Opin Neurobiol 21:368–373PubMedCrossRefGoogle Scholar
- Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17:2443–2452PubMedCrossRefGoogle Scholar
- Joel D, Niv Y, Ruppin E (2002) Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw Off J Int Neural Netw Soc 15:535–547CrossRefGoogle Scholar
- Johnson A, van der Meer MA, Redish AD (2007) Integrating hippocampus and striatum in decision-making. Curr Opin Neurobiol 17:692–697PubMedCentralPubMedCrossRefGoogle Scholar
- Lawler GF (2006) Introduction to stochastic processes. Chapman & Hall/CRC, Boca RatonGoogle Scholar
- Lee D, Seo H (2007) Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex. Ann N Y Acad Sci 1104:108–122PubMedCrossRefGoogle Scholar
- Ludwig CJ, Davies JR (2011) Estimating the growth of internal evidence guiding perceptual decisions. Cogn Psychol 63:61–92PubMedCrossRefGoogle Scholar
- Machens CK, Romo R, Brody CD (2005) Flexible control of mutual inhibition: a neural model of two-interval discrimination. Science 307:1121–1124PubMedCrossRefGoogle Scholar
- Miller P, Katz DB (2013) Accuracy and response-time distributions for decision-making: linear perfect integrators versus nonlinear attractor-based neural circuits. J Comput Neurosci 35:261–294Google Scholar
- Miller P, Wang XJ (2006) Discrimination of temporally separated stimuli by integral feedback control. Proc Natl Acad Sci U S A 103:201–206PubMedCentralPubMedCrossRefGoogle Scholar
- Newsome WT, Britten KH, Movshon JA (1989) Neuronal correlates of a perceptual decision. Nature 341:52–54PubMedCrossRefGoogle Scholar
- Niwa M, Ditterich J (2008) Perceptual decisions between multiple directions of visual motion. J Neurosci Off J Soc Neurosci 28:4435–4445CrossRefGoogle Scholar
- Ratcliff R (1978) A theory of memory retrieval. Psychol Rev 85:59–108CrossRefGoogle Scholar
- Ratcliff R (2002) A diffusion model account of response time and accuracy in a brightness discrimination task: fitting real data and failing to fit fake but plausible data. Psychon Bull Rev 9:278–291PubMedCrossRefGoogle Scholar
- Ratcliff R, Hasegawa YT, Hasegawa RP, Smith PL, Segraves MA (2007) Dual diffusion model for single-cell recording data from the superior colliculus in a brightness-discrimination task. J Neurophysiol 97:1756–1774PubMedCentralPubMedCrossRefGoogle Scholar
- Ratcliff R, McKoon G (2008) The diffusion decision model: theory and data for two-choice decision tasks. Neural Comput 20:873–922PubMedCentralPubMedCrossRefGoogle Scholar
- Ratcliff R, Smith PL (2004) A comparison of sequential sampling models for two-choice reaction time. Psychol Rev 111:333–367PubMedCentralPubMedCrossRefGoogle Scholar
- Redish AD, Jensen S, Johnson A, Kurth-Nelson Z (2007) Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol Rev 114:784–805PubMedCrossRefGoogle Scholar
- Romo R, Salinas E (2003) Flutter discrimination: neural codes, perception, memory and decision making. Nat Rev Neurosci 4:203–218PubMedCrossRefGoogle Scholar
- Rorie AE, Gao J, McClelland JL, Newsome WT (2010) Integration of sensory and reward information during perceptual decision-making in lateral intraparietal cortex (LIP) of the macaque monkey. PLoS ONE 5:e9308PubMedCentralPubMedCrossRefGoogle Scholar
- Rüter J, Marcille N, Sprekeler H, Gerstner W, Herzog MH (2012) Paradoxical evidence integration in rapid decision processes. PLoS Comput Biol 8:e1002382PubMedCentralPubMedCrossRefGoogle Scholar
- Salinas E (2004) Fast remapping of sensory stimuli onto motor actions on the basis of contextual modulation. J Neurosci 24:1113–1118PubMedCrossRefGoogle Scholar
- Seymour B, O'Doherty JP, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak RS (2004) Temporal difference models describe higher-order learning in humans. Nature 429:664–667PubMedCrossRefGoogle Scholar
- Shadlen MN, Newsome WT (2001) Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J Neurophysiol 86:1916–1936PubMedGoogle Scholar
- Shankar S, Massoglia DP, Zhu D, Costello MG, Stanford TR, Salinas E (2011) Tracking the temporal evolution of a perceptual judgment using a compelled-response task. J Neurosci Off J Soc Neurosci 31:8406–8421CrossRefGoogle Scholar
- Shea-Brown E, Gilzenrat MS, Cohen JD (2008) Optimization of decision making in multilayer networks: the role of locus coeruleus. Neural Comput 20:2863–2894PubMedCrossRefGoogle Scholar
- Simen P, Contreras D, Buck C, Hu P, Holmes P, Cohen JD (2009) Reward rate optimization in two-alternative decision making: empirical tests of theoretical predictions. J Exp Psychol Hum Percept Perform 35:1865–1897PubMedCentralPubMedCrossRefGoogle Scholar
- Smith PL, Ratcliff R (2004) Psychology and neurobiology of simple decisions. Trends Neurosci 27:161–168PubMedCrossRefGoogle Scholar
- Soltani A, Wang XJ (2006) A biophysically-based neural model of matching law behavior: melioration by stochastic synapses. J Neurosci 26:3731–3744PubMedCrossRefGoogle Scholar
- Soltani A, Wang XJ (2008) From biophysics to cognition: reward-dependent adaptive choice behavior. Curr Opin Neurobiol 18:209–216PubMedCrossRefGoogle Scholar
- Soltani A, Wang XJ (2010) Synaptic computation underlying probabilistic inference. Nat Neurosci 13:112–119PubMedCentralPubMedCrossRefGoogle Scholar
- Stanford TR, Shankar S, Massoglia DP, Costello MG, Salinas E (2010) Perceptual decision making in less than 30 milliseconds. Nat Neurosci 13:379–385PubMedCentralPubMedCrossRefGoogle Scholar
- Sugrue LP, Corrado GS, Newsome WT (2005) Choosing the greater of two goods: neural currencies for valuation and decision making. Nat Rev Neurosci 6:363–375PubMedCrossRefGoogle Scholar
- Thura D, Beauregard-Racine J, Fradet CW, Cisek P (2012) Decision making by urgency gating: theory and experimental support. J Neurophysiol 108:2912–2930PubMedCrossRefGoogle Scholar
- Usher M, McClelland JL (2001) The time course of perceptual choice: the leaky, competing accumulator model. Psychol Rev 108:550–592PubMedCrossRefGoogle Scholar
- Wald A (1947) Sequential analysis. Wiley, New YorkGoogle Scholar
- Wald A, Wolfowitz J (1948) Optimum character of the sequential probability ratio test. Ann Math Stat 19:326–339CrossRefGoogle Scholar
- Wang XJ (2002) Probabilistic decision making by slow reverberation in cortical circuits. Neuron 36:955–968PubMedCrossRefGoogle Scholar
- Wang XJ (2008) Decision making in recurrent neuronal circuits. Neuron 60:215–234PubMedCentralPubMedCrossRefGoogle Scholar
- Wong KF, Wang XJ (2006) A recurrent network mechanism of time integration in perceptual decisions. J Neurosci Off J Soc Neurosci 26:1314–1328CrossRefGoogle Scholar
- Wyart V, de Gardelle V, Scholl J, Summerfield C (2012) Rhythmic fluctuations in evidence accumulation during decision making in the human brain. Neuron 76:847–858PubMedCrossRefPubMedCentralGoogle Scholar
- Zhou X, Wong-Lin K, Philip H (2009) Time-varying perturbations can distinguish among integrate-to-threshold models for perceptual decision making in reaction time tasks. Neural Comput 21:2336–2362PubMedCentralPubMedCrossRefGoogle Scholar