# Improved information pooling for hierarchical cognitive models through multiple and covaried regression

## Abstract

Cognitive process models are fit to observed data to infer how experimental manipulations modify the assumed underlying cognitive process. They are alternatives to descriptive models, which only capture differences on the observed data level, and do not make assumptions about the underlying cognitive process. Process models may require more observations than descriptive models however, and as a consequence, usually fewer conditions can be simultaneously modeled with them. Unfortunately, it is known that the predictive validity of a model may be compromised when fewer experimental conditions are jointly accounted for (e.g., overestimation of predictor effects, or their incorrect assignment). We develop a hierarchical and covaried multiple regression approach to address this problem. Specifically, we show how to map the recurrences of all conditions, participants, items, and/or traits across experimental design cells to the process model parameters. This systematic pooling of information can facilitate parameter estimation. The proposed approach is particularly relevant for multi-factor experimental designs, and for mixture models that parameterize per cell to assess predictor effects. This hierarchical framework provides the capacity to model more conditions jointly to improve parameter recovery at low observation numbers (e.g., using only 1/6 of trials, recovering as well as standard hierarchical Bayesian methods), and to directly model predictor and covariate effects on the process parameters, without the need for post hoc analyses (e.g., ANOVA). An example application to real data is also provided.

## Keywords

Hierarchical cognitive modeling Multiple regression Psychometrics Data analysis Parameter recovery Bayesian inference## Introduction

Descriptive models for data analysis (e.g., stochastic distributions, regression, principal components analysis) are efficient for measurement purposes. They have relatively few parameters, and can model predictor effects using low numbers of observations (Baayen, Davidson, & Bates, 2008; Lazarsfeld, 1959; Cohen, 1968; Howell, 2012; Jolliffe, 2002; Wilcox, 2012). For example, traditional regression models account for predictor effects with single parameters (e.g., the *β* coefficients) that are based on the information pooled across an experiment’s data cells. This approach is efficient for investigating many predictors in a joint context (simultaneously). Jointly modeling the effects of predictors is preferred because it usually improves the predictive validity of a model, as compared to approaches with independent effects (see Baayen, 2004; Baayen et al., 2008; Barr, Levy, Scheepers, & Tily, 2013). Through modeling several predictors simultaneously, descriptive models can be used to determine *which* predictors are informative. Predictors can be considered informative to the extent that they account for variance in the response data (e.g., smaller/larger magnitudes of choice proportions, accuracy rates, response times). However, these modeling approaches also have their limitations. Particularly, they serve as descriptive analytical tools rather than as explanatory process models. That is, they do not provide any model of *how* the predictors may affect the underlying cognitive process(es) involved in the generation of the observed behaviors (Busemeyer and Diederich 2010).

In contrast, *process models* (e.g., Anderson, 1996; Busemeyer & Townsend, 1993; Pike, 1973; Van der Linden & Hambleton, 1997) aim to model cognitive dynamics by focusing on one or more cognitive mechanisms that can account for the observed performance differences in an experiment. Particularly, data-driven process models have parameters that map onto cognitive processes. These parameters can be estimated from the observed data and derived for each experimental condition of interest. This allows us to talk about predictors in terms of *how* they affect an underlying cognitive process. Fitting complex process models requires more data than fitting their descriptive counterparts however. Moreover, these models tend to pool information less efficiently, meaning that they are usually fit by experimental cell (e.g., several parameters are added for each cell). As a consequence, such cognitive models can include fewer predictors jointly, which can reduce the validity of their predictions. As noted previously, separate analyses of predictors (e.g., conditions, participants, or items), can cause misattribution errors such as overestimation of effects, type I errors, etc. For example, when certain predictors that significantly account for performance differences are not simultaneously modeled, a model may mistakenly attribute these performance differences to other predictors (Baayen 2004; Baayen et al. 2008; Barr et al. 2013).

We offer a solution to this problem based on maximizing the information pooled to the process model. With this framework, parameter recovery is improved for the process model at lower numbers of observations; consequently, more predictors can be simultaneously modeled. This is achieved through a hierarchical and covaried multiple regression approach in which information is mapped across experimental cells from the recurrences of all conditions, participants, and items (i.e., cases of repeated measures) to all of the process model’s cognitive parameters. In addition, the covariation between the cognitive parameters themselves is modeled. The framework is particularly relevant for multi-factor experiments and for mixture models that parameterize per cell to model predictor effects. It can be an effective approach for advancing empirical modeling methods used for measurement and cognitive-behavioral inferences, known in some domains as *cognitive psychometric* models (see Batchelder, 1998; Batchelder & Riefer, 1999; Riefer, Knapp, Batchelder, Bamber, & Manifold, 2002).

The proposed approach builds upon previous developments in a growing movement known as *hierarchical* cognitive modeling, which has been shown to improve the analytical potential of such process models. Hierarchical cognitive modeling (Lee, 2011; Kruschke, 2011; Rouder, Morey, & Pratte, 2013; Scheibehenne & Pachur, 2015) typically consists of embedding (or nesting) a statistical model at a layer above the process or cognitive model. Major hierarchical approaches have included the following implementations: stochastic population distributions over observed participants/items to constrain estimation error (Rouder & Lu, 2005; Rouder, Lu, Speckman, Sun, & Jiang, 2005; Rouder et al., 2007; a regression model on a parameter to analyze cofactors or trial-by-trial covariates of its value (Cavanagh et al., 2011; Frank et al., 2015; Oravecz, Anders, & Batchelder, 2015; Vandekerckhove et al., 2011); and *latent* predictor modeling as a method for clustering (e.g., of participants/latent signals Anders & Batchelder, 2012, or relevant cognitive abilities across tasks Vandekerckhove, 2014).

The current framework builds most closely upon the previously cited works that emphasize hierarchy through embedded regression models. Specifically, these works demonstrate the advantages of nesting a simple regression on one or two cognitive parameters in order to jointly model a predictor or a trial-by-trial neural activity that covaries with that parameter. The present paper elaborates this framework to a full-fledged regression structure which maps the entire experimental design (conditions and each of their levels, participants, and items). To our knowledge, it is the first work to demonstrate that such a framework can markedly improve parameter recovery at low observation numbers, and hence permit a cognitive model to simultaneously fit more experimental conditions than usual. Hence, the present work focuses on optimal hierarchical methods for experimental designs, and it is complementary research to the previous works. That is, economizing observation numbers for modeling the core experimental design may help to model additional cofactors or trial-by-trial covariates jointly.

The proposed framework involves a hierarchical modeling comprised of (1) a process model, (2) a multiple and covaried regression structure, and (3) by-group (e.g., by predictor, participant, and item intercepts) population distributions. In the following sections, we will demonstrate how this three-tiered approach can markedly improve the information pooled to a process model. Due to the more complex expression of likelihood however, a more advanced estimation approach is typically needed to implement this modeling framework. We therefore utilize the Bayesian estimation approach (Gelman, Carlin, Stern, & Rubin, 2004). A number of advantages have been identified with the Bayesian approach (Lee 2011; Kruschke 2011; Rouder et al. 2013; Scheibehenne and Pachur 2015), including the simultaneous, rather than sequential estimation of model parameters, and the ability to constrain error in estimation, which can improve parameter recovery performance or a model’s capacity to make predictions from data.

The paper is organized as follows. The first section, “Process parameters as a function of a hierarchical multiple regression structure” develops the framework. Next, “Data-driven process models for performance data: Sequential sampling” introduces a popular genre of cognitive models that is frequently used for performance data (e.g., response times and accuracy), known as sequential sampling (Busemeyer and Townsend 1992; Townsend and Ashby 1983). Consequently, we will demonstrate the framework using a standard sequential sampling model, which can be generalized to a number of experiments involving response time analyses. Then using hierarchical Bayesian methods, we develop an estimation approach in “Bayesian estimation”. We will also remark as to how the framework can be easily adapted to a variety of other process models. Next, “Fitting approach” discusses important fitting topics for the proposed approach, and “Bayesian sampler settings” includes the recommended specifications. “Application to simulated data” demonstrates the aforementioned advantages of the approach through several large simulation studies. “Application to experimental data” provides an empirical application, and finally “Discussion” includes the general discussion.

## Process parameters as a function of a hierarchical multiple regression structure

Consider an experiment that is designed with *F* factors, each having *L* _{ f } levels (for example a 2 × 2 design or a 3 × 3 × 2 design), which will be tested with multiple participants, *P*, and/or items, *I*, of interest. This kind of experimental design gives rise to a number of unique experimental design cells, *C*, each having a unique combination of factor levels per participant and/or item. The experimenter(s) will collect *N* observations in each cell, as \(y_{j_{c}}\), in which observation *j* ∈ 1 …, *N* and design cell *c* ∈ 1, …, *C*. This set of *N* observations along *C* experimental cells is defined as the response data.

*f*(⋅), that can be used to fit the data (\(y_{j_{c}}\)) as a function of a set of cognitive parameters, Ω. A maximally data-driven model will estimate these parameters per every design cell

*c*, as Ω

_{ c }, and is known as a finite mixture-model implementation (Everitt 1981).

^{1}The model parameters for a given cell are typically used to model the distribution of observed data in that cell (e.g., central trends and variance across trials). Then a general stochastic expression of the data

*per design cell*, \(y_{j_{c}}\), as related to the mixture application of the cognitive process model, can be expressed as

_{ c }contains the

*K*cognitive parameters

*ω*

_{ k }that model design cell

*c*. From this modeling, predictors (or how experimental conditions, participants, and items modify the cognitive parameters) can then be retrieved by a posterior analysis of the parameters across cells, for example by an analysis of variance (ANOVA, Iversen, & Norpoth, 1987; Cohen & Cohen, 1988).

However, in this by-cell modeling approach, extra parameters are specified per cell, and there is much information shared between cells that is lost. As a result, more data are required per cell, and less predictors can be modeled. Hierarchical modeling has made developments on this issue, particularly by addressing the recurrence of the same participants or items in different cells, which are cases of repeated measures. A current standard in the field is to nest a population distribution at a layer above the participants or items in the process model. This approach can utilize the between-subject variance to improve the within-subject estimates, and according to recurring subjects across conditions, can improve the within-subject parameter estimates. Furthermore, one can also use the subjects’ group parameters to make generalizations about the population itself (Rouder & Lu, 2005; Rouder, Lu, Speckman, et al., 2005; Rouder, Lu, Sun, et al., 2007). As opposed to a simple distribution nesting, this approach can also be implemented through nesting a regression structure at a layer above the model (e.g., population intercept and error term, see Vandekerckhove et al., 2011). Nesting a regression structure also has the advantage to allow for jointly modeling a covariate or between-trial effects (Cavanagh et al. 2011; Frank et al. 2015). However for a given process model, it is good practice to first verify through simulation analyses that there are enough observations to appropriately fit the data at such granularity.

While the benefits from sharing information across cells of repeated participants and/or items has been largely recognized, it has not yet been quantified to what extent one benefits from sharing information across cells of repeated condition levels (in tandem with the participants and items). Furthermore, it has not been studied how this information can be effectively mapped to *all* of the process model parameters. In this work, we pursue such a study and develop an approach to implement the methodology. This is accomplished through a multiple and covaried regression structure that is embedded hierarchically, which informs all of the cognitive parameters and also models their correlation. An illustration of how such an approach can improve the information pooled into the process model is provided in the following paragraph. Then through several implementations and large recovery analyses, the Section “Application to simulated data” demonstrates the advantages of the approach.

*L*

_{ f }, along a number of factors,

*F*, participants,

*P*, and items,

*I*, are found in other cells. Information about their effects across unique cells (e.g., in the context of other predictor/participant/item combinations) can be pooled by a hierarchical multiple regression structure. Note that regression maps information through indicator values,

*x*, that pool information from all recurrences of a condition, regardless of its cell membership. Consider a cognitive model in which there are

*K*= 3 process model parameters (

*ω*

_{1},

*ω*

_{2},

*ω*

_{3}), for which at least

*N*= 60 observations in a cell

*c*are needed to appropriately fit the parameters. Secondly, suppose that the experimental data is derived from a 2 × 2 design, which provides for

*L*= 2 levels, for each of

*F*= 2 factors, and involves

*P*= 10 participants and

*I*= 6 items. Let lowercase script be the index for each condition (factor, participant, item), then the parameters in Ω

_{ c }are quadruply indexed as \(\lbrace \omega _{1pi\, l_{f_{1}} l_{f_{2}}}, \omega _{2pi \, l_{f_{1}} l_{f_{2}} }, \omega _{3pi \, l_{f_{1}} l_{f_{2}}} \rbrace \) , which expands the total number of experimental cells to 240 (e.g., \( L_{f_{1}} \times L_{f_{2}} \times P \times I = 2 \times 2 \times 10 \times 6 = 240\)). Hence, each participant would need to complete 24 × 60 = 1440 trials to reliably estimate these predictors in a joint context. While these numbers are realistic for this simple experimental design, they are still largely inconvenient to obtain. However, as we will later show in Table 1, by implementing the proposed hierarchical multiple regression approach, one could have performed the same process modeling with only 1/6 of the observations (240 per subject instead of 1440). Therefore, such an approach allows for smaller experiments to be modeled, more conditions to be jointly modeled (e.g., covariates, between-trial effects, or experiments with additional predictors), and improved parameter recovery at low observation numbers. These developments can benefit a researcher’s capacity to make inferences from data with cognitive process models.

Process model parameter recovery, average Pearson correlations

Three-factor design (3 × 3 × 2 levels), with ten participants | |||||||||
---|---|---|---|---|---|---|---|---|---|

Cross-cell HBE Multi-Reg | By-cell HBE Non-Reg | By-cell MLE | |||||||

Base-level SSM Parameters | Base-level SSM Parameters | Base-level SSM Parameters | |||||||

Observations | | | | | | | | | |

| 0.99 | 0.99 | 1.00 | 0.98 | 0.98 | 0.99 | 0.93 | 0.88 | 0.99 |

| 0.99 | 0.98 | 1.00 | 0.97 | 0.94 | 0.99 | 0.88 | 0.77 | 0.99 |

| 0.97 | 0.94 | 1.00 | 0.90 | 0.80 | 0.99 | 0.79 | 0.63 | 0.98 |

| 0.95 | 0.90 | 0.99 | 0.85 | 0.72 | 0.96 | 0.70 | 0.52 | 0.96 |

| 0.94 | 0.86 | 0.99 | 0.81 | 0.56 | 0.92 | 0.61 | 0.42 | 0.94 |

| 0.91 | 0.79 | 0.98 | 0.71 | 0.31 | 0.86 | 0.50 | 0.32 | 0.92 |

| 0.86 | 0.67 | 0.98 | 0.66 | 0.19 | 0.78 | 0.40 | 0.15 | 0.88 |

_{ c }= {

*ω*

_{1c },

*ω*

_{2c },…,

*ω*

_{ K c }} contains a full regression structure of coefficients,

*b*

_{ w }, with

*w*∈ 1, …,

*W*for the levels among

*F*specified predictors, as well as participant

*b*

_{ p }and item

*b*

_{ i }intercepts. Then the model is specified by:

*ε*

_{ k }of parameter

*ω*

_{ k c }from the regression, as influenced by deviation from the model and the covariance of the other

*ω*

_{ k c }parameters, is modeled by the multivariate normal with mean 0 and the

*K*×

*K*covariance matrix Σ. The

*x*

_{ c }are the indicators that link the

*W*regressed conditions (and covariate effects if desired) to the corresponding experimental design cell

*c*, which also has intercepts,

*I*, according to participant

*p*and item

*i*.

*x*

_{ c }terms, which by values of 0 or 1, appropriately index these intercepts. Consequently, the notation for each parameter is simplified as a single vector of weights (which includes intercepts), \(\beta _{\omega _{k}}\), and a vector of indicator values for the cell

*X*

_{ c }, resulting in a generalized case notation where

*k*∈ 1, …,

*K*be the index of the appropriate parameter in Ω

_{ c }, the notation is further summarized as

## Data-driven process models for performance data: Sequential sampling

Among cognitive process models for handling performance data (such as reaction times, responses), sequential sampling models are currently very popular in several domains (Busemeyer and Townsend 1992; Townsend and Ashby 1983). Sequential sampling can be conceived of as a time-based extension of the predominant framework for modeling response data, known as signal detection theory (SDT, Green & Swets 1966; Pike, 1973). Sequential sampling posits that performance differences, in the context of time, may be modeled by a noisy accumulation of information toward a threshold, whose crossing triggers the response. Furthermore, these models involve a parameter that distinguishes the time elapsed in this decision process from time elapsed in external processes, such as during the motor movement that ensues after the threshold is triggered. This framework has been effective in accounting for performance differences through such a mechanism, and these models can closely fit response time (RT) distributions. The approach has experienced continued support since its beginnings in the 1960s (Stone 1960; Laming 1968; Gerstein and Mandelbrot 1964; Ratcliff 1978) in both theoretical (e.g., simulation exploration) and real data applications of experimental psychology (Ratcliff, Van Zandt, & McKoon, 1999; Ratcliff, Gomez, & McKoon, 2004; Ratcliff, Thompson, & McKoon, 2015; Ratcliff & McKoon, 2008; Anders, Riès, van Maanen, Alario, 2015) and neuroscience (Dehaene, 2008; Kelly & O’Connell, 2013; O’Connell, Dockree, & Kelly, 2012).

*γ*of quantity

*X*(the behavioral activation level that accumulates), an absorbing threshold of value

*α*, and an external time

*𝜃*. Using these three parameters (

*γ*,

*α*,

*𝜃*), a standard sequential sampling process is illustrated in the left plot of Fig. 1. This process models a single trial. The fluctuating black line is a representation of the activity (

*X*) for the modeled behavior, and this activity accumulates positively over time (with noise). Specifically, the noisy accumulation of

*X*occurs by the model at every time step,

*t*= 1 milliseconds (ms), by sequential independent samples from a Gaussian distribution with mean

*γ*and standard deviation 1 (hence the term, sequential sampling model). Note that in this simulation,

*X*begins at a neutral value of 0, and increases (with noise) over time with an average rate of 0.08 units/ms (

*γ*), until it hits the necessary threshold value at 40 units (

*α*). Upon reaching the threshold, the response is initiated. Parameter

*𝜃*includes motor time for response execution (here abbreviated as TEA, Time External to the Accumulation process), and may also include time for low-level perceptual processing or encoding.

In the right plot of Fig. 1, many trials (e.g., a subject within an experimental design cell) are modeled with the same three parameters that simulated the single trial in the left plot. Note that these finishing times, from when the evidence accumulates to the necessary threshold, plus the TEA (*𝜃*),^{2} model the RTs. These model-predicted RTs form a positive, right-skewed distribution.

In this canonical sequential sampling model, the resultant RT distribution is directly tractable by the probability density function (pdf) of the shifted Wald (SW) distribution, also known as the three-parameter inverse Gaussian distribution. The three sequential sampling parameters {*γ*,*α*,*𝜃*} respectively quantify RT distribution *tail thickness*, *variance around the mode*, and *location* (onset). Luce (1986) discusses the importance of these RT distribution aspects for psychometric studies. Furthermore, this particular model likelihood has a closed-form solution in Eq. 5. We will henceforth refer to this SW model as a canonical sequential sampling model (SSM), and it will be used to test a baseline implementation of the proposed framework.^{3}

### Adapting the multiple regression approach to a cognitive model

The generalized formula for the hierarchical multiple regression approach, provided in “Process parameters as a function of a hierarchical multiple regression structure”, are easily adapted to various data-driven process models. This is mainly achieved by specifying the likelihood *f*(⋅) in Eq. 1, and the number of parameters *K* in Eq. 4, to align with the proposed process model for the data. One may also consider simpler models than sequential sampling, such as signal detection models, binomial rate models, and item response theory models (see Lee & Wagenmakers, 2014, for such models, and other potentials). These kinds of models are relevant to our approach when a researcher seeks to account for response differences along several experimental conditions, participants, and items. However, complex models that are generally used for other purposes than measurement, such as neural networks, are likely too complex to adapt to the hierarchical multiple regression approach.

*K*= 3 parameters that are estimated per cell:

*γ*,

*α*, and

*𝜃*. We apply this model to analyze performance data consisting of response times from correct responses. The likelihood function for the RT data,

*f*(⋅), is simply the shifted Wald probability density function. Formally stated, the RT data likelihood function

*f*(⋅) in Eq. 1 is the SW pdf, in which the \(y_{j_{c}}\) are the RTs, as RT\(_{j_{c}}\), and Ω

_{ c }= {

*γ*

_{ c },

*α*

_{ c },

*𝜃*

_{ c }}, as

*α*

_{ c }/

*γ*

_{ c }+

*𝜃*

_{ c }, and variance \(\alpha _{c} / {\gamma _{c}^{3}}\), for \(\text {RT}_{j_{c}} \in (\theta _{c}, \infty )\) and

*γ*

_{ c },

*α*

_{ c },

*𝜃*

_{ c }> 0.

## Bayesian estimation

In this section, a Bayesian estimation approach is developed for the multiple and covaried regression framework. We will apply it to the canonical SSM, and we refer to the augmented model as the Multi-Reg _{SSM}. The Multi-Reg _{SSM} is summarized as a hierarchical cognitive model, in which the model’s process parameters are hierarchically derived by a multiple and covaried regression structure. Furthermore, population distributions are applied at a layer above the hierarchical regressions by group (predictors, participants, items). The advantages of hierarchical population distributions have been discussed previously (see Rouder & Lu, 2005; Rouder, Lu, Speckman, et al., 2005; Rouder, Lu, Sun, et al., 2007). Readers more interested in the implementation results, rather than the technical Bayesian details, may proceed to “Application to simulated data”.

### First level: Multiple (and Covaried) regression that derives all cognitive parameters

When considering the potential approaches for estimating the Multi-Reg _{SSM} in the Bayesian framework, it is important to consider two essential mathematical properties of the model: (i) the sequential sampling parameters exist on the positive half-line, {*γ* _{ c },*α* _{ c },*𝜃* _{ c }}∈ (0,*∞*), and (ii) the regressions that hierarchically derive these parameters, share an error covariate structure (e.g., Σ as in Eq. 4) which models the correlations between process parameters. To satisfy (i), one can either implement an estimation algorithm that confines the regression sums in Eq. 4 to be always above 0, or alternatively, calculate these regression sums on the logarithmic scale which is constraint-free. To satisfy the covariate error-modeling of (ii), the three parameters can be modeled by a three-dimensional multivariate distribution in which the regression sums are the hierarchical means of the parameters, as *M* _{ k c }, and Σ a 3 × 3 covariance matrix, handles the errors. From our simulation analyses, we found the method with the logarithmic scale to be practical and even advantageous for parameter comparisons (explained later in more detail), though subsequent work can be done to also develop optimal techniques for employing the alternative approach, i.e., the *M* _{ k c } > 0 regression-sum-constraint approach.

*M*

_{ k c }∈

*M*

_{ c }= [

*M*

_{1c },

*M*

_{2c },

*M*

_{3c }] be the regression sums that hierarchically derive the three process model parameters {

*γ*,

*α*,

*𝜃*} for a given experimental cell. These regression sums will serve as the population means (e.g., process parameter values before error) as

*X*

_{ c }, each

*β*

_{ k }

*X*

_{ c }also includes potential person or item intercepts. Then let Σ be the 3 × 3 error covariate matrix that defines the noise around these sums on the logarithmic scale (e.g., for

*ε*

_{1:3}in [3]). Then (i) and (ii) may be modeled by the following Bayesian priors,

*γ*

_{ c },

*α*

_{ c },

*𝜃*

_{ c }} is modeled by the multivariate normal, and these parameter values on their natural scale can be easily obtained by taking the exponential. A notable advantage of this logarithmic scale approach concerns how the logarithmic locations, and modifications thereof of {

*γ*

_{ c },

*α*

_{ c },

*𝜃*

_{ c }}, will correspond proportionally to their naturally scaled values, despite respectively existing in different magnitudes (e.g., tenths, tens, and hundreds on the natural scale). This feature will facilitate interpretation and comparison of these

*β*weights in Eq. 6. For instance, although these regression weights {

*β*

_{ γ },

*β*

_{ α },

*β*

_{ 𝜃 }} are modeled to exist in comparable ranges (in respect to the multivariate normal) they will result in appropriately-scaled effect sizes of {

*γ*

_{ c },

*α*

_{ c },

*𝜃*

_{ c }} on their natural scale (see examples after Eq. 9).

*β*values in Eq. 6, a natural prior distribution choice is the normal distribution. Though, keeping in mind that these

*β*values include both a set of factor weights

*β*

_{(f)}and potential person or item intercepts

*β*

_{(i)}, it is useful to distinguish for each set, appropriate prior mean

*μ*and s.d.

*σ*settings as follows:

*w*∈{1,…,

*W*},

*W*being the number of regressed factors (weights + intercepts).

### Second level: Stochastic population distributions

A second level is formulated on the top of the model by-group (e.g., factor weights, intercepts) to pool information and constrain error in the regression weights themselves. This is done by using the population distribution approach previously mentioned. Specifically, for each parameter *k* ∈{*γ*,*α*,*𝜃*}, the \(\beta _{k w_{(f)}}\) factor weights are modeled by a hierarchical normal distribution with mean *μ* _{ f k } and standard deviation *σ* _{ f k }. Since as in our categorical regression coding, the first factor serves as baseline (that is, 0), this hierarchical modeling allows the factor effects to be predominantly positive or negative from baseline. Otherwise, if one used *μ* _{ f k } = 0, then the various \(\beta _{k w_{(f)}}\) factors would be pushed by the prior to add to 0. A similar hierarchical modeling approach is also used for the \(\beta _{k w_{(i)}}\) values that serve as the participant or item intercepts. Though in contrast, as intercepts which tend to locate the regression, greater prior mass is allocated to logarithmic ranges that correspond to the natural magnitudes of {*γ*,*α*,*𝜃*} on the positive reals, as in Eq. 9.

*μ*

_{ i k }and

*σ*

_{ i k }quantify the population mean and standard deviation pertaining to the group of participants or items involved in the experiment. Based on our simulation analyses, reasonable priors for these hierarchical parameters

*μ*and

*σ*are the following:

*γ*,

*α*,

*𝜃*}, since adjustments from a location on the log scale are similarly proportional for various magnitudes on the positive real scale. Then for the intercepts (right), which generally serve to locate the regression (e.g., such as a regression mean), it is useful to utilize priors which provide probability mass for reasonable hierarchical mean values for {

*γ*,

*α*,

*𝜃*} in the positive reals.

For example, to get a grasp of the various magnitudes in the second level settings in Eq. 9, suppose the intercepts for *γ*, *α*, and *𝜃* are respectively − 2.0, 3.0, and 5.5, which on the natural scale are values 0.135, 20, and 244. Then an observed *β* weight of 0.1 (for an *x* = 1) results in a shift to 0.149, 22, and 270, and an observed *β* weight of 0.05 results in 0.142, 21, and 257. As for the intercept population mean priors, note that greater prior probability is placed around population mean intercepts on the natural scale for *γ* = 0.135, *α* = 20 and *𝜃* = 244. A movement of 1 standard deviation (here 0.5) above, provides that these population mean increase to *γ* = 0.223, *α* = 33 and *𝜃* = 403 on the natural scale. Thus, the prior provides enough flexibility to accommodate various data ranges.

### On modifying the priors

The suggested priors will allow the Multi-Reg _{SSM} to handle a variety of RT data from different experiments. The settings described above have shown to provide appropriate model stability and Bayesian mixing performance in our simulations and empirical applications (e.g., with respect to Eq. 9, using categorical experimental factors as weights–left column, and persons and/or items as intercepts–right column). However, in cases when researchers are attempting to fit non RT-data with this model (e.g., accumulation over months, years, for examples see Chhikara, 1988; Folks & Chhikara, 1978), much longer RTs, or use notably different regression forms than discussed herein, the researcher is encouraged to calibrate these prior settings in order to achieve optimal Bayesian mixing. Particularly since the proposed Multi-Reg _{SSM} is quite complex to fit, informative prior settings are recommended.

### Covariance structure

**R**, the Cholesky factor of the correlation matrix underlying Σ, and a diagonal matrix,

**S**, containing the scalars in which

**C**=

**R**×

**S**provides the Cholesky factor of the covariance matrix of Σ. As

**C**being the Cholesky factor of Σ, then Σ =

**C**×

**C**

^{T}. Such a practice is recommended by the developers of the Bayesian inference software Stan (Stan Development Team 2015b), in which the suggested priors (see p. 72) are the following distributions:

*s*

_{ k k }are the diagonal values of

**S**, and LKJ Cholesky is a prior distribution for the Cholesky factors of correlation matrices, as developed by (Lewandowski et al. 2009). Since the values which occupy

*s*

_{ k k }are bounded to be greater than zero, the prior in Eq. 10 for

*s*

_{ k k }serves as a half-Cauchy prior. Note that this approach of estimating the reduced elements of Σ is a development from previous approaches which estimated the full covariance matrix using the Wishart distribution (see Gelman & Hill 2007). Finally, also note that we lower the scale of the Cauchy prior to 0.025 (from e.g., 2.50), since a number of real data analysis fits with the Multi-Reg

_{SSM}have shown that the hierarchical regression weights (or rather the respective regression residuals) that derive {

*γ*

_{ c },

*α*

_{ c },

*𝜃*

_{ c }}, tend to occupy a markedly smaller range than for example, regressed RT values and their residuals.

### The process model and data likelihood

_{ j }values being modeled on their natural scale by the SSM likelihood function in Eq. 5. Thus the Multi-Reg

_{SSM}parameters on their natural scale may simply be accessed by

### Advantages with a Bayesian implementation

There are notable advantages of implementing the methodology in the Bayesian framework. Firstly, the model parameters (e.g., the regression coefficients) are estimated simultaneously, which can improve fit performance. This is contrasted with some maximum likelihood or deviance minimization techniques where estimations of one or multiple parameters are serially-used to derive the other parameters (e.g., from method of moment equations). Secondly, a distribution of estimations is provided for each parameter that readily provides a measure of posterior uncertainty in the results, which is an aspect not readily available in frequentist approaches that provide point estimates. Thirdly, error in estimation can be constrained by the appropriate use of priors. Furthermore, the cognitive parameters are simultaneously modeled in the context of a covariance structure (a multivariate Gaussian distribution with Σ_{ K×K }), which handles parameter intercorrelation. Finally, the estimation technique combines various advantages of hierarchical modeling (Lee 2011) by nested population distributions (Rouder & Lu, 2005; Rouder, Lu, Speckman, et al., 2005; Rouder, Lu, Sun, et al., 2007) and a regression structure (Vandekerckhove et al. 2011).

## Fitting approach

The fitting approach we have developed can be summarized as follows. Firstly, as a maximally data-driven mixture model application, the approach will estimate a drift rate, *γ* _{ c }, threshold, *α* _{ c }, and non-accumulation time, *𝜃* _{ c }, simultaneously for every unique design cell *c* of an experiment. Furthermore, the corresponding population means for each of these design cells are estimated, and so is the covariation between cognitive parameters. These parameters, and particularly the population means, are hierarchically derived by the respective regression models in Eq. 6, which are calculated independently of one another, except for the shared error covariate structure in Eq. 7 that models the process parameter correlation.

These regression coefficients directly quantify the experimental predictor main effects (also covariates and interactions, if specified, although not considered in the current development) *β* _{(f)}, as well as participant, item, or trait effects *β* _{(i)}. Notably, this modeling of experimental main effects and parameter correlations, that pools information across the experimental cells, economizes observation numbers (as in Table 1) and these quantities are retrieved in one step rather than through post-hoc analyses (e.g., ANOVA). Furthermore, the Bayesian fitting approach estimates these predictors simultaneously (improving reliability), provides informative measures of uncertainty (e.g., parameter posterior distributions), and integrates the parameters over the uncertainty of all other parameters: e.g., individual level estimates are propagated to the group level estimates and vice-versa.

### Defining an indicator matrix based on the regressed coefficients

*N*unique design cells and

*W*factors (including potential person or item intercepts). Then, to identify the model, the standard coding of the factor matrix

**X**

_{ N×W }in a typical linear regression design is recommended. In this paper, we will demonstrate and apply categorical coding. For example, when one codes four experimental categorical conditions, then

**X**for participant 1 may resemble the following: setting the first level as baseline (left) or last level (right),

**X**, populated by 1’s for each unique experimental design cell, and 0’s for where there are other participants (thus adding also e.g., four rows in

**X**of Eq. 13). Furthermore, although not shown in the current example, item intercepts may also be introduced into the regression. Then as for categorical variables (e.g., experimental conditions), these will be coded to possess

*L*− 1 levels, so as to not form additional intercepts.

With the categorical coding approach, an experiment which possesses one factor with three levels, one factor with two levels, ten participants, and no item intercepts, should have *N* = 3 × 2 × 10 = 60 unique experimental cells and *K* = 2 + 1 + 10 = 13 regression coefficients. Hence, **X** is 60 × 13 and is populated by 1’s and 0’s; each *β* in Eq. 4 has length 13; and 60 sets of {*γ* _{ c },*α* _{ c },*𝜃* _{ c }} values are estimated jointly. Finally, note that other kinds of covariates (continuous, ordered) may be included, such as participant age and so forth. Alternatively for **X**, one may also consider using effects coding rather than categorical coding.

## Bayesian sampler settings

Through regular testing of the model in the hierarchical Bayesian estimation (HBE) framework, we have found that typically six chains, 1000 samples each, 500 of which is warm-up (burnin/adaptation phase in Stan), and a thinning^{4} of 5, resulting in 100 × 6 = 600 final samples for analysis, are reasonable settings for appropriate mixing of the model (for a review of sampling terms, see Gelman et al., 2004). They also produce a good compromise between model fit performance and exceedingly long model run times. With these settings, our fits to real data have taken generally between 6 to 48 h with Stan and RStan (Stan Development Team 2015a, 2015b) software, depending on data size. Note that due to the high complexity of this three-tiered modeling, we have found that it is very important to observe the parameter traceplots (chain mixing plots) from the fit, since occasionally a few chains may have difficulty in appropriately converging. Another solution to reduce the probability of this occurrence is to further optimize the prior settings, such as in Eq. 8, according to the data and model chosen. Typically when estimating six chains, when one or a few chains have difficulty appropriate converging, this will also be reflected in many of the chain convergence diagnostic values, \(\hat {R}\)’s, being greater than 1.10. Alternatively, one may address convergence issues by increasing the number of samples and burn-in iterations.

## Application to simulated data

In this section, we demonstrate the advantages of the approach using the Multi-Reg _{SSM} as an example. These tests involve parameter recovery across different experimental designs using simulated data analyses. The subsequent section follows with an example application to experimental data.

The results we present are from several large simulated data analyses that consist of varying the complexity of the experiment, and the number of available observations per unique experimental design cell (observation sizes: 250, 125, 60, 30, 20, 10, 5). The simulation involves the analysis of 30 data sets per observation size. Each of these 30 simulated data sets had hierarchical data-generating parameters that were randomly drawn from distributions similar to those as in Eq. 9. That is, random sets of *β* _{ k w } values were generated (e.g., for each of the experimental factors and levels, participants, etc.) that hierarchically derive each of the parameters in Ω_{ c } = {*γ* _{ c }, *α* _{ c }, *𝜃* _{ c }}, as well as the random parameter covariance matrix Σ_{ K×K }.

In these simulated analyses, the Multi-Reg _{SSM} is fit, which estimates these regression weights, {*β* _{ γ w }, *β* _{ α w }, *β* _{ 𝜃 w }}, the three SSM parameters {*γ* _{ c }, *α* _{ c }, *𝜃* _{ c }} for each experimental design cell *c*, and their covariation, **Σ**_{ K×K }. Then, the recovery of the parameters and the fit of the observed data’s quantiles are calculated, both of which have been previously used to assess appropriate model fit. This large analysis is performed twice in two different contexts: firstly, for a three-factor (3 × 3 × 2 levels) experimental design that has ten participants, in which one can expect notable benefits in pooling cross-cell information by the Multi-Reg _{SSM} framework, and, secondly, for a single factor (two levels) experimental design with ten participants, in which one can expect similar performance to a regular hierarchical Bayesian implementation, since only participants can be pooled in this case.

Furthermore, to examine the advantages of the approach in the context of other fitting methods, we will compare performance in parameter recovery to two other principal methods: standard hierarchical Bayesian estimation (HBE), and maximum likelihood estimation (MLE). In summary, the Multi-Reg _{SSM} method is *(i)* a ‘Cross-cell HBE Multi-Reg’ approach, in which the recurrence of all experimental effects (participants, items, conditions) across cells is utilized in estimation, thus economizing the number of observations needed. It also models parameter covariation. We compare the results to *(ii)* a ‘By-cell HBE Non-Reg’ approach, which is a standard HBE implementation defined in Appendix AB, in which only the recurrence of participants across cells is utilized, offering only partial observation economization. We also compare the results to *(iii)* a ‘By-cell MLE’ approach recently developed by Anders et al. (2016), which uses MLE / quantile-minimization (QM) to fit a non-hierarchical version that models no recurrence across cells (hence does not economize observation numbers), but has shown to fit data adequately (at also low numbers of observations, e.g., *N* = 20), and in a much more rapid amount of time than the other two methods (within a few minutes).

Table 1, from *N* = 250 to *N* = 5 observations, provides the average parameter recovery trend for the three methods *(i)*, *(ii)*, and *(iii)*, of {*γ*, *α*, *𝜃*}, across 30 data set simulations using a three-factor (3 × 3 × 2) experimental design that has 10 participants. By Table 1, it is evident that the Multi-Reg _{SSM}’s ‘Cross-cell HBE Multi-Reg’ approach provides an advantage over the partial/full ‘by-cell’ approaches in terms of markedly improved parameter recovery, even with as few as *N* = 5 observations per unique experimental design cell. At all observation size levels, the Multi-Reg _{SSM} performed better, and only at *N* = 125 observations do the ‘By-cell’ approaches begin to provide similar results. The remarkable result is the Multi-Reg _{SSM} approach provides comparable performance using only 1/6 of the observation numbers (*N* = 5) than the traditional hierarchical Bayesian implementation (*N* = 30), and furthermore 1/12 of the observations than a maximum likelihood implementation (*N* = 60). That is, the same experiment that uses 3 × 3 × 2 × 30 = 540 observations per participant, could have been performed with only 3 × 3 × 2 × 5 = 90 observations per participant. Aside from improved parameter recovery, this suggests how more predictors, conditions, or covariates could be included in cognitive model analyses when using this approach, or how more data-demanding versions of SSMs may be enabled to fit the data.

Figure 2 contains a visual plot of the parameter recovery results for the *N* = 20, *N* = 10, and *N* = 5 cases of Table 1 for the Multi-Reg _{SSM}. These plots can reflect if there are systematic trends that may not be captured by the simple Pearson *r* correlation statistic. One can see that the model recovers the generating parameter values consistently well, with almost no strong outliers or biases. Finally, the right column of Fig. 2 provides a residual distribution diagnostic check. In cases of appropriate model fit, Anders et al. (2016) found that the distribution of standardized residuals (divided by \(\sigma = \sqrt {\alpha _{c}/{\gamma _{c}^{3}}}\) from Eq. 5) of predicted versus observed RT deciles, tends to follow an ordered trend in magnitude. One can see that these decile residual distribution modes and variances have an ordered tendency, and occupy values generally between 0.05 to 0.25.

*(i)*,

*(ii)*, and

*(iii)*, in which we simulate a simpler experimental design: a single factor (two levels) with ten participants. The recovery overall shows to be satisfactory in Table 2. However, comparing it to that of a more overlapping experimental design (three factors) as in Table 1, there is a notable reduction in the recovery strength, particularly for the cases of low numbers of observations per cell, e.g.,

*N*= 5 to

*N*= 30. Then, comparing the recovery results of the Multi-Reg

_{SSM}with the other two methods, the advantage of the ‘Cross-cell’ approach of the Multi-Reg

_{SSM}is notably diminished. In this case, the improvement in recovery of the most difficult parameter to recover,

*α*, is improved only by near 0.10 in the Pearson correlations, by using the ‘Cross-cell’ HBE approach versus the ‘By-cell’ HBE approach. Since the specifications are generally equal (only being able to pool participant information across cells), we speculate that the small advantages of the Multi-Reg

_{SSM}over the standard HBE approach might be due to a modeling of the parameter covariation.

Process Model parameter recovery, average Pearson correlations

One-factor design (two levels), with ten participants | |||||||||
---|---|---|---|---|---|---|---|---|---|

Cross-cell HBE Multi-Reg | By-cell HBE Non-Reg | By-cell MLE | |||||||

Base-level SSM Parameters | Base-level SSM Parameters | Base-level SSM Parameters | |||||||

Observations | | | | | | | | | |

| 0.99 | 0.97 | 1.00 | 0.98 | 0.98 | 0.99 | 0.93 | 0.88 | 0.99 |

| 0.91 | 0.79 | 0.99 | 0.87 | 0.74 | 0.98 | 0.88 | 0.77 | 0.99 |

| 0.86 | 0.72 | 0.98 | 0.78 | 0.64 | 0.96 | 0.79 | 0.60 | 0.97 |

| 0.81 | 0.64 | 0.97 | 0.74 | 0.55 | 0.93 | 0.66 | 0.48 | 0.96 |

| 0.75 | 0.56 | 0.95 | 0.67 | 0.46 | 0.92 | 0.62 | 0.40 | 0.95 |

| 0.63 | 0.52 | 0.95 | 0.59 | 0.42 | 0.91 | 0.48 | 0.28 | 0.92 |

| 0.51 | 0.32 | 0.92 | 0.47 | 0.24 | 0.87 | 0.35 | 0.18 | 0.86 |

So far, we have only observed the recovery of the base-level SSM parameters at various experimental design complexities. However, we have not yet observed recovery of the regression weights, {*β* _{ γ }, *β* _{ α }, *β* _{ 𝜃 }}, which hierarchically derive these cognitive parameters. Furthermore, it is worthwhile to note that these regression weights offer measurements of experimental effects, notably from a modeling that is aimed to disentangle experimental effect magnitudes from between-parameter correlations/covariation, e.g., as by Σ_{3×3} from Eq. 7.

*β*

_{ γ },

*β*

_{ α },

*β*

_{ 𝜃 }}, in which the

*β*

_{ f }subscript denotes regression weights for factors, and the

*β*

_{ i }subscript denotes the participant intercepts. The recovery performance for the three-factor design related to Table 1 is provided on the left, and the one-factor design related to Table 2 on the right. The results indicate a strong recovery of the weights in the three factor design, and also an appropriate recovery for the single factor design, that each markedly improve with increasing observation sizes. Then, Fig. 3 provides visual plots of the regression weight (upper row) and intercept (bottom row) parameter recovery for the two lowest observation sizes,

*N*= 10 and

*N*= 5, for the three-factor experimental design case. Recovery is most tightly packed for weights related to the

*𝜃*parameter, and secondly for the

*γ*parameter (see also Fig. 2). The recovery of weights for the

*α*parameter shows satisfactory trends, even at low numbers of observations in this three-factor design.

Hierarchical parameter recovery (regression coefficients), average Pearson correlations

Three-factor design | One-factor design | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Factor weights | Intercepts | Factor weights | Intercepts | |||||||||

Observations | | | | | | | | | | | | |

| 0.99 | 0.99 | 1.00 | 0.99 | 0.99 | 1.00 | 0.94 | 0.91 | 0.99 | 0.95 | 0.91 | 0.98 |

| 0.99 | 0.95 | 1.00 | 0.99 | 0.99 | 1.00 | 0.91 | 0.87 | 0.98 | 0.91 | 0.81 | 0.99 |

| 0.97 | 0.87 | 0.99 | 0.98 | 0.95 | 0.99 | 0.91 | 0.81 | 0.98 | 0.86 | 0.72 | 0.98 |

| 0.91 | 0.84 | 0.99 | 0.97 | 0.93 | 0.99 | 0.88 | 0.72 | 0.96 | 0.84 | 0.63 | 0.97 |

| 0.82 | 0.85 | 0.99 | 0.96 | 0.84 | 0.99 | 0.85 | 0.69 | 0.95 | 0.75 | 0.57 | 0.95 |

| 0.92 | 0.79 | 0.99 | 0.92 | 0.78 | 0.98 | 0.59 | 0.55 | 0.94 | 0.68 | 0.58 | 0.96 |

| 0.79 | 0.59 | 0.98 | 0.87 | 0.69 | 0.97 | 0.41 | 0.44 | 0.94 | 0.57 | 0.30 | 0.92 |

_{3×3}from Eq. 7 may capture. There are a number of ways in which recovery of such inter-correlations may be measured. In Table 4, we provide the average absolute differences between Pearson

*r*inter-parameter correlations, e.g., \( \lvert r_{\gamma _{est} \alpha _{est}} - r_{\gamma _{sim} \alpha _{sim}} \rvert = {\Delta }_{r_{\gamma \alpha }}\) for each observation number size. An analysis of the results of Table 4 show a satisfactory recovery of the parameter inter-correlations that improves as observation numbers increase, and particularly for multi-factor experimental designs (left columns). At low observation numbers, the correlations consisting of the individual parameters that are most difficult (e.g.,

*α*, see Table 1) or easiest (e.g.,

*𝜃*) to recover, correspond to the magnitudes in the correlation recoveries.

Process model parameter correlation recovery, \({\Delta }_{r_{\gamma \alpha }} = \lvert r_{\gamma _{est} \alpha _{est}} - r_{\gamma _{sim} \alpha _{sim}} \rvert \)

Three-factor design | One-factor design | |||||
---|---|---|---|---|---|---|

Observations | \(\overline {{\Delta }_{r_{\gamma \alpha }}}\) | \(\overline {{\Delta }_{r_{\gamma \theta }}}\) | \(\overline {{\Delta }_{r_{\alpha \theta }}}\) | \(\overline {{\Delta }_{r_{\gamma \alpha }}}\) | \(\overline {{\Delta }_{r_{\gamma \theta }}}\) | \(\overline {{\Delta }_{r_{\alpha \theta }}}\) |

| 0.06 | 0.04 | 0.05 | 0.18 | 0.13 | 0.21 |

| 0.07 | 0.04 | 0.05 | 0.27 | 0.09 | 0.20 |

| 0.09 | 0.06 | 0.08 | 0.32 | 0.15 | 0.19 |

| 0.10 | 0.08 | 0.09 | 0.40 | 0.19 | 0.29 |

| 0.18 | 0.07 | 0.14 | 0.58 | 0.19 | 0.37 |

| 0.29 | 0.08 | 0.18 | 0.50 | 0.22 | 0.41 |

| 0.30 | 0.12 | 0.20 | 0.66 | 0.24 | 0.68 |

For example, \({\Delta }_{r_{\gamma \theta }}\) is the most easily recovered, followed by \({\Delta }_{r_{\alpha \theta }}\), and then \({\Delta }_{r_{\gamma \alpha }}\). Then in respect to the single factor design, with ten participants, parameter inter-correlations are much more difficult to recover precisely at lower observation numbers *N* = 5 to *N* = 20, but notably becomes more appropriately on par near *N* = 30 observations and above. Augmentations also in the amount of participants may improve correlation recovery performance in these single factor designs.

## Application to experimental data

In this section, the approach is demonstrated on a large data set involving a manual-gesture response task, in which 27 baboons (*Papio papio*) performed a visual search with contextual cues Goujon and Fagot (2013). The task consisted of searching for a visual target (the letter “T”) that was embedded within configurations of distractors (letters “L”). The letters were either arranged predictively to locate the target (hence a contextual cue), or non-predictively (shuffled, without a cue). The baboons responded by touching the target on the display screen. The experimenters explored an animal model of statistical learning mechanisms in humans, specifically the ability to implicitly extract and utilize statistical redundancies within the environment for goal-directed behavior.

This data set was previously analyzed with the MLE/QM approach by Anders et al. (2016, e.g., the method compared in the right column of Tables 1 and 2) for the SSM, and the results can be compared here with the Multi-Reg _{SSM} approach. As organized in the original publication, there are three meaningful partitions for this data set: the *C* = 2 non-predictive (control) vs. predictive contextual cue conditions; the *E* = 40 time-points (epochs) to observe training effects, in which every unit step in *E* consists of five blocks (each block contains 12 trials, and thus each *E* contains 60 trials); and the *P* = 27 individual baboons. The experiment hence consists of two factors (2 × 40) and 27 participants, leading to *N* = 2 × 40 × 27 = 2160 experimental design cells. However, only 2158 total cells were accessible since one baboon was absent from the experiment during the 36th epoch. The average RT distribution length (number of observations) per design cell is \(\bar {L} = 30\), with standard deviation, SD (*L*) = 1.10.

### Model fit checks

_{SSM}fit, we include these two learning conditions and 40 time points as regression factors, and use the last level of each factor as baseline (e.g., the right matrix in Eq. 13). Figure 4 provides the fit results. Beginning with the model goodness-of-fit checks, the right column of plots provides standard diagnostics (see Anders et al., 2016, for more detail). The top plot contains the deciles of all

*N*= 2158 distributions fit with the Multi-Reg

_{SSM}. As can be seen, there is no systematic curvature in the plot and the SSM performs systematically well on the data set. The plot also captures the range of the data, and that there are about 4-6 of the 2158 cells fit in which their 9th decile (upper right of the plot) are notably underestimated by the Multi-Reg

_{SSM}. Then the middle plot provides the distribution of standardized residuals for each of the nine deciles (model-predicted RTs versus observed RTs) across the 2158 cells fit. Here it is shown that the fit optimally satisfies an ordering of distribution modes and variances. Finally, the bottom plot provides the sum standardized decile residuals, Δ, by cell, and its mean value, \(\overline {\Delta } = 1.35\). Using the plot, one can also observe which cells are more poorly fit. Overall,

*ρ*

_{Δσ }is small at -0.07, which supports Δ being a standardized residual statistic, as generally unbiased across varying observed RT cell distribution variances.

### Main Results

The left column of Fig. 4 provides the parameter main-effect results of the analysis. These include the posterior mean regression coefficients *β* and their 95 *%* Bayesian credible intervals, for the two experimental factors: the contextual cue learning condition (left) and training time points (epochs, right). Each row respectively corresponds to {*β* _{ γ }, *β* _{ α }, and *β* _{ 𝜃 }}, which hierarchically derive the SSM parameters, and provide direct inferences about the experimental factor main effects on the cognitive process, without a need for post-hoc analyses (e.g., ANOVA). The dotted line indicates the baseline (the last level of each factor), which should be used to interpret these regression parameter values.

Beginning with the effect of the contextual cue condition on visual search latency in the left column of Fig. 4, the RT latencies are shown to be considerably faster due to a significant difference in the signal accumulation rate parameter, *β* _{ γ }, when the cues are arranged in predictive patterns (baseline). Secondly, there is a small suggestive effect in reduced threshold (that could be interpreted as reduced response caution) in the control (no predictive cue) condition, as the narrow Bayesian interval overlaps the baseline. Finally, no effect was observed in the time external to accumulation, *𝜃*.

Next, regarding training effects on the visual search latencies, all parameters were affected in ways that support faster RTs with more training, yet in different patterns. Over the training interval, the signal accumulation rate (*β* _{ γ }) increases rapidly between epochs 1-6 and then gradually settles between epochs 24 to 30. The response-triggering threshold (*β* _{ α }) provides a steady decrease across training levels. The trend suggests that it may continue to improve with training beyond 40 epochs. Finally, non-accumulation time (*β* _{ 𝜃 }) appears to show a slight increase between epochs 1 − 6 before it begins a steady decreasing trend up to epoch 40.

### Predicting missing data

On the topic of these main effects through the *β* _{ k w } regressors, it is worthwhile to note that they may be used to predict the missing data cells in experiments. For example in this experiment, data is missing for one baboon in the 36th epoch (for both experimental conditions). We remark that since we have estimated the baboon’s participant intercept for each of the parameters, the *β* _{ k w } for the 36th epoch, and the *β* _{ k w } for the control condition, these may be combined to accurately predict what its response times would have been for the missing epoch in each of the conditions.

### Examining the cognitive parameters

_{SSM}parameters, and how these Multi-Reg

_{SSM}parameters relate to the prior analysis results of the SSM with the MLE/QM method previously discussed. These two topics are respectively illustrated in Figs. 5 and 6. In each figure, the grey bars in the plots are the main-effect mean process parameter values, and are calculated by the mean of within-subject posterior means for a given experimental level (as in also Anders, Riès, et al., 2015; Anders, Alario, et al., 2016). The interval bars for the Multi-Reg

_{SSM}represent the 95% Bayesian credible intervals for the pairwise-differences between adjacent experimental levels. The interval bars for the MLE/QM SSM represent the standard error of the mean, corrected by within-subject differences.

Firstly, in comparing the hierarchical predictors of the Multi-Reg _{SSM}’s {*β* _{ γ }, *β* _{ α }, and *β* _{ 𝜃 }} in Fig. 4 to the base-level parameters {*γ*, *α*, *𝜃*} in Fig. 5, there is a generally strong correspondence between the results.^{5} Note that in our simulation analyses, the base-level parameters (cognitive parameters) exhibited slightly better recovery performance than the hierarchical parameters (i.e. *β* coefficients), which can be a characteristic of many hierarchical models. Here, the only notable, but small difference between the results is in respect to the threshold *α* for the contextual cue condition. While the credible interval of *β* _{ α } narrowly overlaps 0, the pairwise credible interval of *α* does not completely overlap the two condition levels. In both cases however, a potential effect on *α* is suggestive, though with low statistical power.

### Interpretation

Based on analysis of the cognitive parameters in Fig. 5 (*γ*, *α*, *𝜃*), it is clear that the presence of contextual cues allows for a much faster accumulation of information *γ* from the stimulus as to where the target is. The potential increase in *α* when there are contextual cues, suggests that the baboons may be more cautious to accumulate information from the predictive patterns (the cues) to locate the target, as compared to the control condition in which there is no information in the cues to locate the target. However, this slight delay in caution is overpowered by the much faster accumulation *γ*, so the RTs are still consistently faster during the contextual cue condition. Next, while the presence of contextual cues does not allow for a decrease in motor response time (modeled by *𝜃*), training over epochs clearly resulted in improvements. Training also improved the other parameters, which could be interpreted as over time, the baboons improve on processing the statistical redundancies in the environment, and this leads to faster accumulation of information *γ* from the stimulus, and less total information needed *α* from the stimulus in order to infer the location of the target.

#### Comparing the results to the MLE fitting method

Secondly, it may be interesting to compare these results with the previous method developed, the MLE/QM fitting method for the SSM by Anders et al. (2016, e.g., the method in the right column of Tables 1 and 2). The fit results using this method are contained in Fig. 6. The main differences observed are as follows. For the contextual cue condition, in contrast there is no suggested difference in threshold value between levels. Secondly for epoch (training effects), the Multi-Reg _{SSM} suggested a logistic increasing trend of *γ* over time, and curved decreasing trends in *α* and *𝜃* that begin later (near epoch 6). In contrast, the ‘by cell’ MLE SSM approach suggests a linear improvement in *γ* over epochs, and curved decreasing trends in *α* and *𝜃* that begin immediately (near epoch 1).

The model fit diagnostics between the methods (the right columns of Figs. 4 and 6) provide interesting results as well. All three plots provide support that the MLE/QM method of fitting the SSM (which minimizes observed versus predicted quantiles), results in notably smaller quantile residuals than the Multi-Reg _{SSM} which seeks to optimize the likelihood function. However, the Multi-Reg _{SSM} achieves a markedly better log likelihood value. Thus, each method respectively won according to the criterion that it aimed to optimize. For example in respect to the quantile residuals between the two methods, MLE/QM _{SSM} versus the Multi-Reg _{SSM}: the mean standardized residuals are \(\overline {\Delta } = 0.87\) versus 1.35, and plot two displays the smaller standardized residual decile distribution modes achieved by the former (at 0.05 or below). Then in contrast, the MLE/QM _{SSM} has a notably smaller log likelihood − 430,724 versus − 420,536.

This inspires future work for the best kinds of model diagnostic criteria to assess appropriate model fit, as quantile matching might not always represent the best parameter recovery. For example, see Tables 1 and 2 where the Multi-Reg _{SSM} and standard by-cell Bayesian SSM recovered parameters better than the by-cell MLE/QM _{SSM}. However, it is also worth considering that quantiles as data points are more resilient to contaminant RT effects (Brown & Heathcote, 2003), which may adjust results in real data applications. However, we have performed simulation analyses like those in “Application to simulated data” using the quantiles as data points for the Bayesian models, and still the regular RTs used as data points achieved markedly better recovery performance.

### Considering an interaction between factors

Lastly, one might be interested to fit this data with a Multi-Reg _{SSM} that allows an interaction between the contextual cue condition, and the learning time points (epochs). We indeed fit such a model as well, but found minimal differences in the regression weights from Fig. 4, and the interaction *β* regression terms did not provide notably strong trends. Furthermore, the model fit checks were not very different from the right column in Fig. 4, and the log likelihood was only minimally improved at − 420,529 (interaction) versus − 420,536 (no interaction). Therefore, we retained the simpler model for the demonstration.

## Discussion

We have demonstrated the advantages gained from nesting a multiple regression structure in a data-driven process model. These approaches are useful for analyzing experiments with multiple conditions, participants, and/or items of interest; and they are relevant for the models that would fit parameters along these experimental cells. Specifically, we developed a framework for how a full experimental design may be mapped into a multiple and covaried regression model that will maximally pool information from all recurrences of conditions between cells (factors and their levels, participants, and items). This information is used to hierarchically derive the process model parameters. The methodology allows for improved model parameter recovery at low numbers of observations, and consequently, allows for more experimental predictors to be simultaneously modeled. Simultaneous (joint) modeling of predictors may improve the predictive validity of a model, in contrast to separate analyses of predictors (e.g., conditions, participants, or items) which can cause misattribution errors (e.g., overestimation of effects, type I errors). For example, when certain predictors that significantly account for performance differences are not simultaneously modeled, a model may mistakenly attribute these performance differences to other predictors (Baayen 2004; Baayen et al. 2008; Barr et al. 2013). Therefore, the proposed methodology may improve the cognitive-behavioral inferences made from experiments with data-driven process models.

The large simulation analyses included in “Application to simulated data” demonstrate that this methodology can provide a new standard over current practices in hierarchical modeling. The approach builds upon simpler nested regression structures previously proposed (Vandekerckhove et al. 2011), and also incorporates hierarchical population distributions (as discussed by Rouder & Lu, 2005; Rouder, Lu, Speckman, et al., 2005; Rouder, Lu, Sun, Speckman, et al., 2007). Specifically, our analyses demonstrated that for experiments having more than one factor (e.g., 2 levels), the methodology can achieve comparable performance to traditional hierarchical Bayesian modeling by using only a *fraction* of the observations. For example, comparable performance to standard hierarchical Bayesian modeling was achieved with only 1/6 of the observations, and with respect to standard maximum likelihood methods, only 1/12 of the observations. This is made possible when information can be pooled from the recurrences of conditions across the experimental cells (i.e. repeated measures). For example, these major advantages occurred in the simulation study for three-factor (e.g., 3 × 3 × 2 levels) experimental designs, but performance was otherwise similar to standard hierarchical methods for single factor experimental designs (2 level). We note that the designs were compared with equal numbers of observations per experimental cell.

In summary, the proposed multiple (and covaried) regression framework, used hierarchically for data-driven process models, can offer the following qualities: (i) an advantage in mixture modeling experimental data, by utilizing all recurring information across cells, (ii) markedly improved parameter recovery during low numbers of observations, allowing more predictors to be jointly modeled, (iii) pooling of information within groups (conditions, participants, and items) through modeling them with population distributions, which has been shown to improve performance, and (iv) pooling of information between experimental cells through the multiple regression design. Note that these nested regressions can also be used to incorporate covariate modeling (see Cavanagh et al., 2011; Frank et al., 2015). In addition, the framework offers (v) a direct modeling of experimental predictors for effects on cognitive parameters without a need for post-hoc analyses (e.g., ANOVA), (vi) the ability to easily predict missing data, based on having direct access to the predictors for each condition, participant, and item, and finally (vii) the potential to fit more complicated cognitive process models that require more data, since the approach economizes observations numbers.

As measurement and inference tools for experimental data, data-driven process models have been termed in some domains, as *cognitive psychometric* models (see Batchelder, 1998; Batchelder & Riefer, 1999; Riefer et al., 2002). We demonstrated the advantages of our approach using a canonical sequential sampling model (SSM), which is a family of models popularly used to account for performance differences in the time domain (e.g., reaction times). These models are not as thorough as multi-system or neural network models, but may be important empirical research tools.

Using simpler process models for empirical research is supportive of previous literature, with the notion that “less is more” when it comes to selecting a psychometric model for accurately estimating predictor and participant effects from experimental data. For example, van Ravenzwaaij, Donkin, and Vandekerckhove (2016) show that simpler SSMs with fewer parameters (e.g., the EZ-diffusion model, Wagenmakers, Van Der Maas, & Grasman, 2007) recovered the significant predictor effects in experiments better than their more complex counterparts, the Diffusion Decision Model (DDM, Ratcliff, 1978; Ratcliff & Smith, 2004; Ratcliff & McKoon, 2008). Such findings also highlight the growing differences between simple SSMs as apt measurement (quantitative, data-driven, psychometric) models, and others which are more suited for theoretical exploration (simulation testing, data-producing) of specific neural dynamics. More discussion on this topic is provided in Appendix AC.

As discussed in “Process parameters as a function of a hierarchical multiple regression structure”, our proposed framework can be easily applied to other SSMs, or other classes of models. With regard to other SSMs, the EZ-diffusion, DDM, LBA (Linear Ballistic Accumulator, Brown & Heathcote, 2008), and Q-/D-diffusion models (van der Maas, Molenaar, Maris, Kievit, & Borsboom, 2011) should be worthwhile candidates to further explore this framework. Moreover, it is worthwhile to note that a software package (in Python, Wiecki, Sofer, & Frank, 2013) currently exists for fitting predictors or covariates for the DDM. Several works using the package, and its tutorial, have generally emphasized simple regressions on one or two cognitive parameters in order to jointly model a cofactor or a trial-by-trial neural activity that could covary with that parameter. The research in the present paper develops upon this, and confirms with several large analyses, how a multiple and covaried regression structure on all cognitive parameters (mapping all experimental conditions) can improve information pooling, significantly improve parameter recovery at low numbers of observations, and allow more conditions to be jointly modeled. This is complementary to such covariate analyses, which may require additional observations. These performance advantages, and the capacity to model trial-by-trial covariates, may lead to nested regression structures becoming a new standard in hierarchical modeling. Though it is not clear if methods in current packages are appropriately calibrated to achieve model convergence with such advanced regression designs. Furthermore, parameter trade-off with the DDM should be an examined issue. Hence, current packages could test the Bayesian mixing performance (and parameter recovery) for full-fledged regression structures in these more complicated models, as well as implement the canonical SSM, which has various important empirical applications such as to visual search tasks, go/no-go tasks (driving simulations, cognitive load), lexical selection (picture naming/interference), saccades, general signal detection, and others. The code to implement the framework with the canonical SSM is provided in Appendix AA.

While the proposed framework has already shown to be promising for data-driven process modeling techniques, there are several potential improvements left for future research. For example, although modeling potential for shorter and/or more complex experiments is heightened with the three-tiered hierarchical approach, model fitting time is likewise considerably lengthened. Currently model application times run near 24 h, even for a simple process model. This provides a challenge for sufficiently exploring the space of possible regression structure specifications, that for example Barr et al. (2013) may suggest examining. For instance, many different combinations of predictors, covariates, interactions, or even non-linear, quadratic regression equations could be explored for the best model fit.

While regression models pool information from all the recurrences of condition levels across experimental cells, it is made with the assumption that there are few interactions between conditions. Thus ideally, several versions of the hierarchical regression structure should be tested to select the hierarchical model that achieves the best quality of fit. Aside from this important note, future work could seek to (i) further refine estimation techniques (e.g., estimation algorithms, the Bayesian priors, such as for Eq. 9) of the framework, (ii) examine how the approach may function with extensions such as two- or three-way interactions, and non-linear regressions, (iii) implement and test extensions to other classes of cognitive models with the approach, or (iv) develop the multiple regression estimation to be calculated on parameter scales other than the logarithm (see “Bayesian estimation”), such as the natural scale (e.g., using other multivariate distributions than the Gaussian).

## Footnotes

- 1.
A less data-driven model may economize observation numbers by estimating select parameters from data

*pooled*across cells (fewer parameters estimated), rather than having the opportunity to observe if the data instead suggest the parameters vary across individual cells. - 2.
For illustrative simplicity, here

*𝜃*(TEA) is placed before the evidence accumulation begins (at*𝜃*= 200 ms). However, whether*𝜃*is placed before, after, or split around the actual accumulation process (e.g., accounting for both concept/visual recognition and response execution time), all of these options are quantified equally (mathematically). - 3.
The choice is also made for the sake of quantifying estimation advantages of the approach in a non-biased fashion. For example, some other popular sequential sampling variants (two-boundary) do not have a closed-form solution, and may use approximations to estimate the model (Navarro and Fuss 2009). The results of interest could thus depend on the specific approximation used.

- 4.
Note that thinning is only recommended to avoid memory issues, and is not necessary.

- 5.
Note that the dotted line of Fig. 4 is baseline, and thus respectively represents the second level of the contextual cue condition, or the last level of epoch.

## Notes

## References

- Anders, R., Alario, F.-X., & Van Maanen, L. (2016). The shifted Wald distribution for response time data analysis.
*Psychological Methods*, 21.Google Scholar - Anders, R., & Batchelder, W. H. (2012). Cultural consensus theory for multiple consensus truths.
*Journal for Mathematical Psychology*,*56*, 452–469.CrossRefGoogle Scholar - Anders, R., & Batchelder, W. H. (2013). Cultural consensus theory for the ordinal data case.
*Psychometrika*,*80*, 151–181.CrossRefPubMedGoogle Scholar - Anders, R., Riès, S., van Maanen, L., & Alario, F.-X. (2015). Evidence accumulation as a model for lexical selection.
*Cognitive Psychology*,*82*, 57–73.CrossRefPubMedGoogle Scholar - Anderson, J. R. (1996). ACT: A simple theory of complex cognition.
*American Psychologist*,*51*, 355.CrossRefGoogle Scholar - Baayen, R. H. (2004). Statistics in psycholinguistics: A critique of some current gold standards.
*Mental Lexicon Working Papers*,*1*, 1–47.Google Scholar - Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items.
*Journal of Memory and Language*,*59*, 390–412.CrossRefGoogle Scholar - Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal.
*Journal of Memory and Language*,*68*, 255–278.CrossRefGoogle Scholar - Batchelder, W. H. (1998). Multinomial processing tree models and psychological assessment.
*Psychological Assessment*,*10*, 331.CrossRefGoogle Scholar - Batchelder, W. H., & Riefer, D. M. (1999). Theoretical and empirical review of multinomial process tree modeling.
*Psychonomic Bulletin & Review*,*6*, 57–86.CrossRefGoogle Scholar - Brown, S., & Heathcote, A. (2003). QMLE: Fast, robust, and efficient estimation of distribution functions based on quantiles.
*Behavior Research Methods, Instruments, & Computers*,*35*, 485–492.CrossRefGoogle Scholar - Brown, S., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation.
*Cognitive Psychology*,*57*, 153–178.CrossRefPubMedGoogle Scholar - Busemeyer, J. R., & Diederich, A. (2010).
*Cognitive modeling*. Sage.Google Scholar - Busemeyer, J. R., & Townsend, J. T. (1992). Fundamental derivations from decision field theory.
*Mathematical Social Sciences*,*23*, 255–282.CrossRefGoogle Scholar - Busemeyer, J. R., & Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment.
*Psychological Review*,*100*, 432.CrossRefPubMedGoogle Scholar - Cavanagh, J. F., Wiecki, T. V., Cohen, M. X., Figueroa, C. M., Samanta, J., Sherman, S. J., & Frank, M. J. (2011). Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold.
*Nature Neuroscience*,*14*, 1462–1467.CrossRefPubMedPubMedCentralGoogle Scholar - Chhikara, R. (1988).
*The Inverse Gaussian Distribution: Theory, Methodology, and Applications*volume 95. CRC Press.Google Scholar - Cohen, J. (1968). Multiple regression as a general data-analytic system.
*Psychological Bulletin*,*70*, 426.CrossRefGoogle Scholar - Cohen, Y., & Cohen, J. Y. (1988). Analysis of variance.
*Statistics and data with R: An applied approach through examples*, pp. 463–509.Google Scholar - Dehaene, S. (2008). Conscious and nonconscious processes: Distinct forms of evidence accumulation.
*Better Than Conscious*pp. 22–49.Google Scholar - Diederich, A., & Busemeyer, J. R. (2006). Modeling the effects of payoff on response bias in a perceptual discrimination task: Bound-change, drift-rate-change, or two-stage-processing hypothesis.
*Perception & Psychophysics*,*68*, 194–207.CrossRefGoogle Scholar - Diederich, A., & Busemeyer, J. R. (in review). Multi-stage sequential sampling model of multi-attribute decision making.Google Scholar
- Everitt, B. S. (1981).
*Finite mixture distributions*. Wiley Online Library.Google Scholar - Folks, J., & Chhikara, R. (1978). The inverse Gaussian distribution and its statistical application–a review.
*Journal of the Royal Statistical Society. Series B (Methodological)*, pp. 263–289.Google Scholar - Frank, M. J., Gagne, C., Nyhus, E., Masters, S., Wiecki, T. V., Cavanagh, J. F., & Badre, D. (2015). fMRI and EEG predictors of dynamic decision parameters during human reinforcement learning.
*The Journal of Neuroscience*,*35*, 485–494.CrossRefPubMedPubMedCentralGoogle Scholar - Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004).
*Bayesian data analysis*, 2nd ed. Boca Raton, FL.: Chapman & Hall/CRC Press.Google Scholar - Gelman, A., & Hill, J. (2007).
*Data analysis using regression and hierarchical/multilevel models*. Cambridge, UK: Cambridge University Press.Google Scholar - Gerstein, G. L., & Mandelbrot, B. (1964). Random walk models for the spike activity of a single neuron.
*Biophysical Journal*,*4*, 41–68.CrossRefPubMedPubMedCentralGoogle Scholar - Goujon, A., & Fagot, J. (2013). Learning of spatial statistics in nonhuman primates: Contextual cueing in baboons (
*Papio papio*).*Behavioural Brain Research*,*247*, 101–109.CrossRefPubMedGoogle Scholar - Green, D. M., & Swets, J. A. (1966).
*Signal detection theory and psychophysics*. New York: Wiley.Google Scholar - Hawkins, G. E., Forstmann, B. U., Wagenmakers, E.-J., Ratcliff, R., & Brown, S.D. (2015). Revisiting the evidence for collapsing boundaries and urgency signals in perceptual decision-making.
*The Journal of Neuroscience*,*35*, 2476–2484.CrossRefPubMedGoogle Scholar - Holmes, W. R., Trueblood, J. S., & Heathcote, A. (2016). A new framework for modeling decisions about changing information: The piecewise linear ballistic accumulator model.
*Cognitive Psychology*,*85*, 1–29.CrossRefPubMedPubMedCentralGoogle Scholar - Howell, D. C. (2012).
*Statistical methods for psychology*. Cengage Learning.Google Scholar - Iversen, G. R., & Norpoth, H. (1987).
*Analysis of variance*. 1. Sage.Google Scholar - Jolliffe, I.T. (2002).
*Principle component analysis*, 2nd Edn. New York: Springer-Verlag.Google Scholar - Kelly, S. P., & O’Connell, R. G. (2013). Internal and external influences on the rate of sensory evidence accumulation in the human brain.
*The Journal of Neuroscience*,*33*, 19434–19441.CrossRefPubMedGoogle Scholar - Kruschke, J. K. (2011).
*Doing Bayesian data analysis: A tutorial with R and BUGS*. New York: Academic Press.Google Scholar - LaBerge, D. (1962). A recruitment theory of simple behavior.
*Psychometrika*,*27*, 375–396.CrossRefGoogle Scholar - Laming, D. R. J. (1968).
*Information theory of choice-reaction times*. Academic Press.Google Scholar - Lazarsfeld, P. F. (1959).
*Latent structure analysis volume 3*. NY: McGraw-Hill.Google Scholar - Lee, M. D. (2011). How cognitive modeling can benefit from hierarchical Bayesian models.
*Journal of Mathematical Psychology*,*55*, 1–7.CrossRefGoogle Scholar - Lee, M. D., & Wagenmakers, E.-J. (2014),
*Bayesian cognitive modeling: A practical course*. Cambridge University Press.Google Scholar - Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method.
*Journal of Multivariate Analysis*,*100*, 1989–2001.CrossRefGoogle Scholar - Van der Linden, W. J., & Hambleton, R.K. (1997),
*Handbook of modern item response theory*. Springer.Google Scholar - Luce, R. D. (1986).
*Response times: Their role in inferring elementary mental organization*. Oxford University Press.Google Scholar - van der Maas, H. L., Molenaar, D., Maris, G., Kievit, R. A., & Borsboom, D. (2011). Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences.
*Psychological Review*,*118*, 339.CrossRefPubMedGoogle Scholar - Miletić, S., Turner, B. M., Forstmann, B. U., & van Maanen, L. (2017). Parameter recovery for the leaky competing accumulator model.
*Journal of Mathematical Psychology*,*76*, 25–50.CrossRefGoogle Scholar - Navarro, D. J., & Fuss, I. G. (2009). Fast and accurate calculations for first-passage times in Wiener diffusion models.
*Journal of Mathematical Psychology*,*53*, 222–230.CrossRefGoogle Scholar - O’Connell, R. G., Dockree, P. M., & Kelly, S. P. (2012). A supramodal accumulation-to-bound signal that determines perceptual decisions in humans.
*Nature Neuroscience*,*15*, 1729–1735.CrossRefPubMedGoogle Scholar - Oravecz, Z., Anders, R., & Batchelder, W. H. (2015). Hierarchical Bayesian modeling for test theory without an answer key.
*Psychometrika*,*80*, 341–364.CrossRefPubMedGoogle Scholar - Pike, R. (1973). Response latency models for signal detection.
*Psychological Review*,*80*, 53.CrossRefPubMedGoogle Scholar - Ratcliff, R. (1978). A theory of memory retrieval.
*Psychological Review*,*85*, 59.CrossRefGoogle Scholar - Ratcliff, R., Gomez, P., & McKoon, G. (2004). A diffusion model account of the lexical decision task.
*Psychological Review*,*111*, 159.CrossRefPubMedPubMedCentralGoogle Scholar - Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks.
*Neural Computation*,*20*, 873–922.CrossRefPubMedPubMedCentralGoogle Scholar - Ratcliff, R., & Smith, P. L. (2004). A comparison of sequential sampling models for two-choice reaction time.
*Psychological Review*,*111*, 333.CrossRefPubMedPubMedCentralGoogle Scholar - Ratcliff, R., Thompson, C. A., & McKoon, G. (2015). Modeling individual differences in response time and accuracy in numeracy.
*Cognition*,*137*, 115–136.CrossRefPubMedPubMedCentralGoogle Scholar - Ratcliff, R., Van Zandt, T., & McKoon, G. (1999). Connectionist and diffusion models of reaction time.
*Psychological Review*,*106*, 261–300.CrossRefPubMedGoogle Scholar - van Ravenzwaaij, D., Donkin, C., & Vandekerckhove, J. (2016). The EZ diffusion model provides a powerful test of simple empirical effects.
*Psychonomic Bulletin & Review*, pp. 1–10.Google Scholar - Riefer, D. M., Knapp, B. R., Batchelder, W. H., Bamber, D., & Manifold, V. (2002). Cognitive psychometrics: Assessing storage and retrieval deficits in special populations with multinomial processing tree models.
*Psychological Assessment*,*14*, 184.CrossRefPubMedGoogle Scholar - Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection.
*Psychonomic Bulletin & Review*,*12*, 573–604.CrossRefGoogle Scholar - Rouder, J. N., Lu, J., Speckman, P., Sun, D., & Jiang, Y. (2005). A hierarchical model for estimating response time distributions.
*Psychonomic Bulletin & Review*,*12*, 195–223.CrossRefGoogle Scholar - Rouder, J. N., Lu, J., Sun, D., Speckman, P., Morey, R., & Naveh-Benjamin, M. (2007). Signal detection models with random participant and item effects.
*Psychometrika*,*72*, 621–642.CrossRefGoogle Scholar - Rouder, J. N., Morey, R. D., & Pratte, M. S. (2013). Hierarchical Bayesian models.
*Practice*,*1*, 10.Google Scholar - Scheibehenne, B., & Pachur, T. (2015). Using Bayesian hierarchical parameter estimation to assess the generalizability of cognitive models of choice.
*Psychonomic Bulletin & Review*,*22*, 391–407.CrossRefGoogle Scholar - Smith, P. (2016). Diffusion theory of decision making in continuous report.
*Psychological Review*.Google Scholar - Stan Development Team (2015a). RStan: The R interface to Stan, version 2.8.0.Google Scholar
- Stan Development Team (2015b).
*Stan Modeling Language Users Guide and Reference Manual, Version 2.8.0*.Google Scholar - Stone, M. (1960). Models for choice-reaction time.
*Psychometrika*,*25*, 251–260.CrossRefGoogle Scholar - Stroock, D. W., & Varadhan, S. S. (1979). Multidimensional diffusion processes, volume 233 of grundlehren der mathematischen wissenschaften [fundamental principles of mathematical sciences].Google Scholar
- Townsend, J. T., & Ashby, F. G. (1983).
*Stochastic modeling of elementary psychological processes*. CUP Archive.Google Scholar - Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model.
*Psychological Review*,*108*, 550.CrossRefPubMedGoogle Scholar - Usher, M., Olami, Z., & McClelland, J. L. (2002). Hick’s law in a stochastic race model with speed–accuracy tradeoff.
*Journal of Mathematical Psychology*,*46*, 704–715.CrossRefGoogle Scholar - Vandekerckhove, J. (2014). A cognitive latent variable model for the simultaneous analysis of behavioral and personality data.
*Journal of Mathematical Psychology*,*60*, 58–71.CrossRefGoogle Scholar - Vandekerckhove, J., Tuerlinckx, F., & Lee, M. D. (2011). Hierarchical diffusion models for two-choice response times.
*Psychological methods*,*16*, 44.CrossRefPubMedGoogle Scholar - Wagenmakers, E.-J., Van Der Maas, H. L., & Grasman, R. P. (2007). An EZ-diffusion model for response time and accuracy.
*Psychonomic Bulletin & Review*,*14*, 3–22.CrossRefGoogle Scholar - Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python.
*Frontiers in Neuroinformatics*,*7*, 14.CrossRefPubMedPubMedCentralGoogle Scholar - Wilcox, R. R. (2012).
*Introduction to robust estimation and hypothesis testing*. Academic Press.Google Scholar