Advertisement

Item response and response time model for personality assessment via linear ballistic accumulation

  • Kyosuke BunjiEmail author
  • Kensuke Okada
Original Paper

Abstract

On the basis of a combination of linear ballistic accumulation (LBA) and item response theory (IRT), this paper proposes a new class of item response models, namely LBA IRT, which incorporates the observed response time (RT) by means of LBA. Our main objective is to develop a simple yet effective alternative to the diffusion IRT model, which is one of best-known RT-incorporating IRT models that explicitly models the underlying psychological process of the elicited item response. Through a simulation study, we show that the proposed model enables us to obtain the corresponding parameter estimates compared with the diffusion IRT model while achieving a much faster convergence speed. Furthermore, the application of the proposed model to real personality measurement data indicates that it fits the data better than the diffusion IRT model in terms of its predictive performance. Thus, the proposed model exhibits good performance and promising modeling capabilities in terms of capturing the cognitive and psychometric processes underlying the observed data.

Keywords

Item response theory Response time Diffusion model Linear ballistic accumulation 

1 Introduction

In recent years, personality assessments have been widely administered via computers and tablets. Through such computerized tests, researchers can obtain not only the item responses but also the response times (RTs) of respondents. Considering the fact that every testing situation typically involves at least some time pressure, the time required to respond to items can reveal useful information about respondents’ traits (Tuerlinckx et al. 2016).

Tuerlinckx et al. (2016) conceptually classified previously proposed models for dealing with both the item responses and RTs into three types. First, there exist models that regard the RT as collateral information. These models are primarily concerned with the item response model and use the RT to improve the efficiency of parameter estimation or to provide additional information such as the detection of cheating and guessing. For example, Roscam (1987, 1997) added the logarithm of the RT as an independent variable to the logistic item response model. This class of models essentially represents the probability of a correct item response and uses the RT merely as a source of additional information. The second type of model includes normative models, which typically employ some scoring rules. For example, van der Maas and Wagenmakers (2005) introduced the correct item summed residual time (CISRT) scoring rule, which assigns the sum of all residual times for correct responses as the score of the respondent for the measurement of chess expertise. Although scoring rules are popular in games and sports, they are not commonly adopted in psychometrics because it is difficult to establish reasonable and widely acceptable scoring rules.

The third type of model includes process models, which make specific assumptions about the underlying psychological process that result in the item responses and RTs. Such models enable us to apply accumulated scientific knowledge in psychology for the direct modeling of the relationships between the observed test performance and the inner psychological process that drives the item-answering behavior. In contrast, the first and second types of models mentioned above have no obvious connection to the psychological models for representing the underlying mechanisms of such information processing; hence, it is difficult for them to consider the underlying cognitive process of the item response and RT. Nevertheless, most previously proposed RT-incorporated item response models are of the first or second type (van der Linden 2016).

By explicitly taking into account the cognitive process underlying the observed responses, we can obtain psychologically meaningful parameter estimates such as the speed of information uptake and the amount of information used to make a decision. This helps us to quantify and empirically test psychological theories in an applied context. In fact, process models have been successfully applied to understand and explain the processes underlying a variety of topics in human sciences, such as memory, familiarity effects, perceptual judgments, and decision making (see van Ravenzwaaij and Oberauer 2009). For example, Ratcliff et al. (2007) applied a process model to examine why the elderly take more time than younger ones in lexical decision tasks. Their finding based on the obtained parameter estimates is that this is not due to the decreased rate of information processing but because older respondents set their response boundaries more conservatively. Palada et al. (2016) also applied process models to response data with complex stimuli. In their study, respondents tended to perform a task faster when the stimulus is more complex. On the basis of the parameter estimates of process models, they found that this is not because the respondents’ capacity increases with the complexity of stimuli (which is a phenomenon called “super-capacity”) but because respondents set a lower threshold for a response when the task is complex. As Donkin et al. (2011, p. 61) state, “this kind of conclusion would have been very difficult to draw without a cognitive model of choice RT.”

On the basis of these considerations, we focus on process models in this study. First, we review the relevant literature.

In the field of cognitive and mathematical psychology, which has a rich tradition of RT modeling (Luce 1986; Voss et al. 2013), psychological models have been developed to explain the course of information processing that leads to the observed human response and the associated RT. Two of the best-known models are the diffusion model and the linear ballistic accumulation (LBA) model. By introducing several cognitive parameters, as elaborated in the next section, the diffusion and LBA models address the tradeoff between the response accuracy and the speed, thereby facilitating the quantification of individual performance. The diffusion model, originally formalized by Ratcliff (1978) on the basis of preceding studies such as Stone (1960) and Laming (1968), is a representative model that considers the underlying response generation process. In contrast, the LBA model was proposed by Brown and Heathcote (2008), who aimed to propose a simple yet effective alternative to the diffusion model. They demonstrated that the LBA model accounts for many important empirical phenomena from choice tasks, such as the speed difference between correct and incorrect responses and the shape of the speed–accuracy tradeoff function.

The original diffusion and LBA models do not distinguish respondent and item parameters, which are fundamental characteristics of item response theory (IRT) models. However, considering the fact that RT-incorporating IRT models are meant to be applied to item response data obtained as a result of internal human information processing, it would be reasonable to believe that the combination of traditional IRT models for the item response and psychological models for the RTs would lead to the emergence of a novel, important, and practical class of models. On the basis of this idea, the D-diffusion IRT model (Tuerlinckx and De Boeck 2005; van der Maas et al. 2011) has been proposed as a novel category of RT-incorporating IRT models and is currently the most representative process model for item-answering behavior for personality assessments. It is an elegant combination of the diffusion and IRT models with both respondent and item parameters. Its respondent parameters provide psychological insights into the underlying cognitive properties of human information processing.

Nevertheless, the diffusion model is a complex nonlinear model. It is expressed as a sum of infinite series, and the parameters vary across trials. Wagenmakers et al. (2007) suggested that mathematical psychologists use such a complicated model because of the substantial payoff involved; the estimated parameter values of the diffusion model can provide psychological insights that cannot be provided by standard superficial methods of analysis. However, when combined with IRT in the form of the diffusion IRT model, the diffusion model becomes even more complex. The increased complexity of the model might prevent its application to real datasets. This is a critical issue in practice, especially when the data have a large sample size or when the hierarchical structure of the data is to be taken into account.

In addition, in the diffusion IRT model, the discriminability and expected RT have a linear relationship, as we elaborate and illustrate in Sect. 3.2. The discriminability is equivalent to the slope of the logistic curve, i.e., how well an item can distinguish high-scoring people from low-scoring ones. However, this linear relationship may be a strong and unrealistic assumption. In the case of a reaction time task in cognitive psychology, which is the original context modeled with the diffusion model, the expected RT is typically shorter than a few seconds; then, the linearity assumption might work out as a primary assumption. However, when the expected RT is longer than a few seconds, which is actually the case in typical personality assessments, the estimates of the discriminability could be considerably inflated owing to this linear relationship. One possible way to deal with this problem is to add another new parameter to moderate the relationship. However, as noted above, the diffusion IRT model is already a complex model. It would be better to not increase the number of parameters further in consideration with the estimation efficiency. Therefore, this study instead focuses on a simpler alternative of the diffusion IRT model.

Donkin et al. (2011) investigated whether and to what extent conclusions regarding psychological processes depend on the choice between the diffusion and LBA models. They found a largely straightforward correspondence between the parameters of the two models. In fact, for the diffusion and LBA models, they concluded that “inferences about psychological processes made from real data are unlikely to depend on the model that is used” (p. 61).

On the basis of the abovementioned observations, the central idea of this study is that the combination of the LBA and IRT models would be beneficial as a novel model for modern test data. In particular, it would provide us with insights into the underlying psychological process, distinguish the item and respondent characteristics, and facilitate faster and more stable estimation than the diffusion IRT models, even when new parameters are included. Thus, the objectives of the present study are to propose a new LBA IRT model and to comparatively evaluate it against existing diffusion IRT models using simulated and real data.

The remainder of the paper is organized as follows. Section 2 reviews the existing diffusion and diffusion IRT models. Section 3 introduces the proposed LBA IRT model. Section 4 describes a simulation study conducted to compare the performance of the proposed LBA IRT model with that of existing diffusion IRT models. Section 5 provides an empirical illustration of the proposed and existing methods using a real personality dataset. Finally, Sect. 6 concludes the paper with a brief discussion of the directions for future research.

2 Existing models

2.1 Diffusion model

The diffusion model provides us with detailed information about the cognitive process underlying the respondent’s answers to items based on the observed data. Figure 1 shows a schematic of the diffusion model. When an item is presented to the respondent, the cognitive information required to answer the item (or conviction towards the item response) accumulates over time from the starting point (z). Once it reaches the upper (\(\alpha\)) or lower (0) boundary, the respondent answers the item. The upper boundary corresponds to the correct response, and the lower boundary corresponds to the incorrect response. The increase in the amount of information in unit time follows a normal distribution with a mean v and within-trial variance \(s^{2}\).
Fig. 1

Schematic of the diffusion model. The accumulator starts from the starting point (z) and moves upwards and downwards. The direction is randomly drawn from N(vs) at each unit of time. The accumulation continues until it reaches the upper or lower bound (\(\alpha\) or 0), each of which corresponds to different response options

The diffusion model has four main parameters. \(\alpha\) represents the distance between the upper and lower boundaries. A larger value of \(\alpha\) means that a longer time is required to answer the item, which suggests that the respondent’s choice is more deliberate. z denotes the starting point. If the subject has a positive response bias to the item from the beginning, z becomes larger; thus, the starting point approaches the upper bound (\(\alpha\)), which leads to a higher probability of giving the correct answer. When there is no response bias, i.e., when the respondent does not favor or avoid one of the choice alternatives at the beginning (Leite and Ratcliff 2011), z equals \(\alpha /2\). v represents the average slope of the information accumulation process. The process approaches the upper (resp. lower) bound when v is positive (resp. negative). \(\tau\) represents the nonresponse time, i.e., the duration of nondecisional processes, which may comprise basic encoding processes.

Let X be a binary observed variable that takes the value 1 when the correct answer is observed (or the accumulation reaches the upper bound) and 0 when the wrong answer is observed (or the accumulation reaches the lower bound). To derive the formulation of the diffusion model, consider a discrete random walk process at first. In the process, the accumulator goes up one unit with a probability p and goes down one unit with a probability \(q=1-p\). Let us consider the probability that the accumulator reaches the lower boundary when it starts from z, which we denote by \(P(X=0|z)\). From the above assumption, \(P(X=0|z)\) must satisfy the recursive equation \(P(X=0|z)=pP(X=0|z+1)+qP(X=0|z-1).\) By solving this equation, \(P(X=0|z)\) becomes
$$\begin{aligned} P(X=0|z)= {\left\{ \begin{array}{ll} \frac{\left( \frac{q}{p}\right) ^\alpha -\left( \frac{q}{p}\right) ^z}{\left( \frac{q}{p}\right) ^a-1} &{} \text{when} \ \ q\ne p\\ 1-\frac{z}{\alpha } &{} \text{when} \ \ q=p. \end{array}\right. } \end{aligned}$$
(1)
In a similar manner, let us consider \(p(X=0,n|z)\), which is the probability that the accumulator reaches the lower boundary at the n-th step when it starts from z. \(p(X=0,n|z)\) must satisfy the recursive equation \(p(X=0,n+1|z)=pp(X=0,n|z+1)+qp(X=0,n|z-1).\) As a result, \(p(X=0,n|z)\) becomes
$$\begin{aligned} p(X=0,n|z)=\frac{2^{n+1}}{\alpha }p^{\frac{n-z}{2}}q^{\frac{n-z}{2}} \sum _{k<\alpha /2}\cos ^{n-1}\frac{\pi k}{\alpha }\sin \frac{\pi k}{\alpha }\sin \frac{\pi zk}{\alpha }. \end{aligned}$$
(2)
The diffusion process is expressed as the continuous version of this random walk process. The limiting form of the joint distribution of the item response (X) and RT (T) in the diffusion model is represented as
$$\begin{aligned} \begin{aligned} f(x,t)&=\frac{\pi s^{2}}{\alpha ^{2}}\exp \left( \frac{(\alpha x-z)v}{s^{2}}-\frac{v^2}{2s^{2}}(t-\tau )\right) \\&\quad \times \sum _{m=1}^{\infty }m\sin \left( \frac{\pi m(\alpha x-2zx+z)}{\alpha }\right) \exp \left( -\frac{1}{2}\frac{\pi ^{2} s^{2}m^{2}}{\alpha ^{2}}(t-\tau )\right) . \end{aligned} \end{aligned}$$
(3)
For more details, refer to Ratcliff (1978) and Feller (1968, chap. 14).

2.2 Diffusion IRT model

Tuerlinckx and De Boeck (2005) extended the diffusion model to the IRT framework. Eq. (1) indicates the probability that the accumulator reaches the lower bound in the random walk process. Again, by considering the continuous version of the random walk process, Eq. (1) becomes
$$\begin{aligned} P(X=0|z)=\frac{\exp (-2\alpha v)-\exp (-2zv)}{\exp (-2\alpha v)-1}. \end{aligned}$$
(4)
In typical IRT applications (e.g., educational testing and personality questionnaires), it would be reasonable to assume that there is no response bias at the starting point. Under the assumption that z is set to \(\alpha /2\) (no response bias), the probability that respondent i answers the j-th item correctly is given as
$$\begin{aligned}&P(X_{ij}=1|z=\alpha /2)=1-P(X_{ij}=0|z=\alpha /2)=\frac{\exp (-2z_{ij} v_{ij})-1}{\exp (-2\alpha _{ij} v_{ij})-1} \nonumber \\ &\quad =\frac{\exp (\alpha _{ij} v_{ij})}{1+\exp (\alpha _{ij} v_{ij})}. \end{aligned}$$
(5)
This equation corresponds to the standard two-parameter logistic IRT model when \(\alpha _{ij}\) corresponds to the discriminability and \(v_{ij}\) corresponds to the difference between the respondent’s trait and the item difficulty parameters.
Note that in this paper, following Tuerlinckx and De Boeck (2005) and van der Maas et al. (2011), we use the subscripts i and j for the parameters of the IRT-based models but suppress them for the original diffusion and LBA model parameters. This is because, whereas the separation between item and respondent parameters is the key characteristic of IRT models, the original diffusion and LBA models essentially do not distinguish them. In the diffusion IRT model, this separation can be represented as
$$\begin{aligned} \alpha _{ij} = w(\gamma _{i},a_{j}), \quad v_{ij}=u(\theta _{i},b_{j}), \end{aligned}$$
(6)
where \(\gamma _{i}\) and \(a_{j}\) represent the respondent factor (e.g., deliberateness and potential speed to answer) and the item factor (e.g., item complexity and item length) involved in the boundary parameter, respectively. \(\theta _{i}\) and \(b_{j}\) can be interpreted in almost the same manner as those in the original IRT model. That is, \(\theta _{i}\) represents the respondent’s latent trait to be measured, and \(b_j\) represents the item threshold (difficulty or severity level).

IRT models are typically used for two different types of psychological measurement: personality measurement such as the Big Five scale and ability measurement such as college entrance examinations. Corresponding to these two scenarios, two types of diffusion IRT models with different functional forms have been developed.

In the first type—namely, the D-diffusion IRT model, v is expressed as the difference between the item threshold and the respondent’s latent trait. In this model, the specific form of Eq. (6) is given as
$$\begin{aligned} \left\{ \begin{array}{ll} \alpha _{ij}=\frac{\gamma _{i}}{a_{j}} &{}\text{with }\gamma _{i}, a_{j} \in {\mathbb {R}}_{>0} \\ v_{ij}=\theta _{i}-b_{j} &{}\text{with }\theta _{i}, b_{j} \in {\mathbb {R}}, \end{array}\right. \end{aligned}$$
(7)
where \(\alpha\) is the ratio of the respondent properties (e.g., attentiveness and deliberateness) and the item properties (e.g., complexity and length). In the original diffusion model, the expected RT is the largest when \(v=0\). This means that the expected RT is large when the respondent’s latent trait is close to the item difficulty. This is consistent with Ferrando and Lorenzo-Seva’s (2007) model, which extends Thissen’s (1983) model on the basis of the nearness hypothesis (Kuncel 1973) and the distance–difficulty hypothesis (Ferrando and Lorenzo-Seva 2007) of personality measurement. Because the D-diffusion IRT model generates many important predictions for personality measurement as described above, it is typically used for personality tests.
The second type of diffusion IRT model is the Q-diffusion IRT model, which expresses v as the quotient of the item difficulty and respondent’s ability. In this model, the specific form of Eq. (6) is given as
$$\begin{aligned} \left\{ \begin{array}{ll} \alpha _{ij}=\frac{\gamma _{i}}{a_{j}}&{}\text{with }\gamma _{i}, a_{j} \in {\mathbb {R}}_{>0}\\ v_{ij}=\frac{\theta _{i}}{b_{j}} &{}\text{with }b_{j} \in {\mathbb {R}}_{>0},\ \theta _{i} \in {\mathbb {R}}_{\ge 0}. \end{array}\right. \end{aligned}$$
(8)
This model corresponds to typical applications in ability measurement for which it has a number of attractive properties. For example, the expected RT is the largest when \(\theta =0\) because \(\theta\) and b are restricted to be nonnegative and positive, respectively. This corresponds to the assumption that a more competent respondent tends to answer faster than a less competent one. In addition, when a respondent has the lowest competence and answers a two-alternative forced choice item at random, the probability of obtaining a correct response should equal 50%. In Eq. (8), the lower bound of \(v_{ij}\) is 0 (when \(\theta _i =0\)), and the probability of reaching the upper bound (choose correct response) is exactly 50% in this case. In this manner, the Q-diffusion model takes into account the effect of guessing. The reader may refer to van der Maas et al. (2011) and Tuerlinckx et al. (2016) for further details regarding the differences in the D- and Q-diffusion models.

2.3 Linear ballistic accumulation model

Brown and Heathcote (2008) proposed a simple cognitive model called the LBA model, which is schematically illustrated in Fig. 2. In the LBA model, information regarding a certain choice of an item is linearly accumulated with time, whereas the amount of accumulation is normally distributed in the diffusion model. Once the item is presented to the respondent, the evidence toward each choice of the item accumulates independently (the accumulation of one choice is irrelevant to that of any other choice) and linearly (the amount of accumulation does not change with time). This means that the model assumes no within-trial variance (\(s^2=0\)). Instead, the LBA model introduces the between-trial variance \(\eta ^2\), which indicates that the amount of information accumulation over time varies between each trial, even if a respondent answers the same item repeatedly. When the information for any one choice reaches the boundary (\(\beta\)), a corresponding response is provided by the respondent. The starting point of evidence accumulation for each choice is a random realization between 0 and A, and the amount of evidence accumulated in unit time is a realization from a normal distribution with a mean v and between-trial variance \(\eta ^2\). All choices have A and \(\beta\) in common, whereas v differs among the choices.

In terms of the relationship with the diffusion model, v, \(\tau\), and \(\beta - \frac{A}{2}\) in the LBA model can be interpreted in a similar manner as v, \(\tau\), and \(\alpha\) in the diffusion model, respectively. However, the LBA model differs from the diffusion model in two major aspects. First, each choice has its own v, while the standard deviation of the drift rate, denoted by s, is common to all choices. On the contrary, in the diffusion model, the information for each choice accumulates dependently; for example, approaching one choice (e.g., the upper bound) indicates drifting away from the other choice (i.e., the lower bound), which is shown in Fig. 1. Hence, the diffusion model has only one drift rate parameter (v). In addition, because of the problem of scaling, the standard deviation of the drift rate is typically fixed. Second, A indicates the upper bound of the starting point, whereas z in the diffusion model corresponds to the starting point of information processing.
Fig. 2

Schematic of the LBA model. Contrary to the diffusion model, each response option has a different corresponding accumulator. The starting points are i.i.d. random variables with U[0, A]. The slopes are randomly drawn from \(N(v^{(k)},\eta ^2)\) and do not fluctuate within a trial. t is the time when any accumulator reaches the upper bound (\(\beta\)). The resulting RT is the sum of t and \(\tau\)

Brown and Heathcote (2008) have pointed out several differences between the LBA model and other decision-making models. First, the EZ-diffusion model (Wagenmakers et al. 2007) is simpler than the LBA model, but Wagenmakers et al. note that the EZ-diffusion model was developed for expressing data in a simple form rather than fully modeling the cognitive process. Therefore, it may not reflect all of the important features of the RT. For example, the EZ-diffusion model assumes that the RT distributions of the correct and incorrect responses are identical, which can be an unrealistically strong assumption in practice. On the other hand, the LBA model is considered as the simplest yet complete decision-making model of the RT because the model successfully accounts for important empirical phenomena of the choice RT, such as the speed–accuracy tradeoff and the relative speed of correct vs. incorrect responses (Brown and Heathcote 2008). Similarly, other choice RT models such as the latency model (Grice 1968) and the LATER model (Reddi and Carpenter 2000) can be seen as simplifications of the diffusion model that assume no trial-to-trial variability among evidence accumulation processes. In these models, however, the model-predicted RT distribution of the incorrect choice is negatively skewed, and this would never occur in the observed data. This inadequate negative skew is caused by projecting the tail of a normal distribution that generates negative slopes (Brown and Heathcote 2008). On the other hand, the LBA model represents accumulation processes for every choice; hence, the RT distributions of all choices can be fitted well. Finally, the LBA model has simple analytic solutions for choices among any number of different alternatives.

2.4 Relation between the diffusion and LBA models

Donkin et al. (2011) conducted a parameter recovery simulation study to examine the relation between the diffusion and LBA models. They considered the relationship between the two models with the following settings. (1) In the LBA model, the sum of v should be one. When the LBA model is compared with the diffusion model, \(v^{(2)}\) becomes \(1-v^{(1)}\). Here, \(v^{(1)}\) is the mean of the slope for the first response category (\(k=1\); e.g., “I agree”), and \(v^{(2)}\) is that of the second response category (\(k=2\); e.g., “I disagree”). (2) In the diffusion model, the distance from the starting point to the boundary is \(\frac{\alpha }{2}\). In the LBA model, the starting point is uniformly distributed from 0 to A. Therefore, the expected distance from the starting point to the boundary becomes \(\beta -\frac{A}{2}\). Accordingly, they compared \(\frac{\alpha }{2}\) with \(\beta -\frac{A}{2}\) rather than comparing \(\alpha\) with \(\beta\) directly. In their simulation study, simulated data were generated and estimated with both models. Their results indicated the existence of a nearly one-to-one relationship with regard to the drift rate or nondecision time parameters, while the boundary parameters did not exhibit simple mapping (even though they have a fairly high correlation).

On the other hand, there are also studies that discuss the differences between the diffusion and LBA models. For example, Heathcote and Hayes (2012) pointed out that the parameters of the two models would result in equivalent inferences under some conditions and different inferences under other conditions. This is not surprising because the precise functional forms of the diffusion and LBA models are different. Thus, caution may be needed when qualitatively translating a parameter estimate of one model to the other. In general, however, most core parameters of the diffusion and LBA models are comparable and have similar empirical meanings (Heathcote et al. 2015).

3 Proposed model

We apply the LBA framework of modeling the RT data, which has been proved to be useful in the field of cognitive and mathematical psychology, to IRT models that are popular in psychometrics. For this purpose, we first present the original formulation of the LBA model (Brown and Heathcote 2008) in this section and then reparameterize it to yield the proposed LBA IRT model. This reparameterization allows us to combine the strengths of the LBA and IRT models, which are both popular in different fields. Note that in this study, we only focus on two-alternative (\(k=1, 2\)) forced choice tasks, although the original LBA model applies to K-alternative (\(k = 1,\ldots , K\)) tasks.

To derive the LBA model, let c be the random value derived from \(U[\beta -A,\beta ]\). This c represents the distance from the start point, which is randomly derived from U[0, A], to the threshold \(\beta\). Moreover, let d be the random value derived from \(N(v^{(k)},\eta ^2)\). Then, the cumulative distribution function of RT (T) for the k-th choice is given as
$$\begin{aligned} F_k(t)= \,& {} \text{prob}\left( \frac{c}{d}<t\right) \nonumber \\=\, & {} \text{prob}(c<dt) \nonumber \\=\, & {} \int _{-\infty }^{\infty } U(m|\beta -A,\beta )\phi (m|tv^{(k)},t\eta )~\mathrm{d}m \nonumber \\=\, & {} \int _{\beta -A}^{\beta } \frac{m-\beta +A}{A}\phi (m|tv^{(k)},t\eta )~\mathrm{d}m+1-\varPhi (\beta |tv^{(k)},t\eta ), \end{aligned}$$
(9)
where \(\phi (.)\) and \(\varPhi (.)\) represent the density and cumulative distribution functions of the standard normal distribution, respectively. By transforming Eq. (9), the cumulative distribution function of RT (T) for the k-th response category (\(k=1, 2\)) is re-expressed as (for the detailed derivation, see Appendix A in Brown and Heathcote 2008)
$$\begin{aligned} \begin{aligned} F_{k}(t)=&1+\frac{\beta -A-tv^{(k)}}{A}\varPhi \left( \frac{\beta -A-tv^{(k)}}{t\eta }\right) -\frac{\beta -tv^{(k)}}{A}\varPhi \left( \frac{\beta -tv^{(k)}}{t\eta }\right) \\&+\frac{t\eta }{A}\phi \left( \frac{\beta -A-tv^{(k)}}{t\eta }\right) - \frac{t\eta }{A}\phi \left( \frac{\beta -tv^{(k)}}{t\eta }\right) . \end{aligned} \end{aligned}$$
(10)
The corresponding probability density function is derived by the differentiation of Eq. (10) with respect to t:
$$\begin{aligned} f_{k}(t)= & {} \frac{1}{A}\left[ -v^{(k)}\varPhi \left( \frac{\beta -A-tv^{(k)}}{t\eta }\right) +\eta \phi \left( \frac{\beta -A-tv^{(k)}}{t\eta }\right) \right. \nonumber \\&\left. \quad +\, v^{(k)}\varPhi \left( \frac{\beta -tv^{(k)}}{t\eta }\right) -\eta \phi \left( \frac{\beta -tv^{(k)}}{t\eta }\right) \right] . \end{aligned}$$
(11)
These are the functions concerning the time when a certain accumulator of the response k reaches the boundary, regardless of any other response. However, in typical applications of the LBA model, we can only observe the time for a single (chosen) choice to reach the threshold; the times for the other choices are not known. Therefore, we need to derive the defective (meaning not summing to 1) distribution, which is the distribution of a certain choice reaching the threshold before any other choice at time t. From Eqs. (10) and (11), we obtain the defective density function of choice k with RT t as
$$\begin{aligned} \text{PDF}_{k}(t)=f_{k}(t-\tau )\prod _{l\ne k}(1-F_{l}(t-\tau )). \end{aligned}$$
(12)
The corresponding cumulative distribution function can be obtained by the numerical integration of Eq. (12). The probability of the respondent’s choice k can be derived by the integration of Eq. (12) over all \(t (0\le t\le \infty )\) or by evaluating the cumulative distribution function given by Eq. (12) at \(t \rightarrow \infty\). Apparently, it is impossible to derive a logistic (or normal ogive) function in the form of Eq. (5). Therefore, it is impossible to obtain a simple relationship between the functional forms of the LBA and IRT equations from Eq. (12). Nevertheless, we can adopt a numerical approach—namely, the Markov chain Monte Carlo (MCMC) estimation method—for the proposed model.

3.1 Parameter settings

As stated in Sect. 2.2, there exist two classes of diffusion IRT models. In this study, we chose to extend the D-diffusion model for the following reasons. First, \(\theta\) and b of the D-diffusion model can be regarded as nearly the same as those of traditional IRT, whereas those of Q-diffusion are restricted in that they cannot take negative values. Second, v in D-diffusion is simply the difference between \(\theta\) and b; therefore, it is easier to estimate than in the case of Q-diffusion, where v is a quotient. Third, the responses of personality measurement tend to be faster than those of ability measurement. The LBA and diffusion models were typically applied to cognitive tasks, the RT of which is typically less than a few seconds (although there are recent exceptions; see Palada et al. 2016). In general, ability measurements require a much longer RT than personality measurements, and the model properties under such conditions are less well-known. Therefore, we adopt D-diffusion. Accordingly, the proposed model is called the D-LBA IRT model (we may simply call it the D-LBA model hereafter).

Boundary Donkin et al. (2011) showed that \(\alpha\) in the diffusion model can be interpreted as nearly the same as \(\beta -\frac{A}{2}\) of the LBA model. To retain the same number of parameters in the D-LBA model as that in the D-diffusion model, we need to introduce a parameter constraint. For this purpose, we consider A as fixed in this study; specifically, we set
$$\begin{aligned} A_{ij} = \frac{\beta _{ij}}{2}, \end{aligned}$$
(13)
where
$$\begin{aligned} \beta _{ij}=\frac{\gamma _{i}}{a_{j}}\quad \text{with }\gamma _{i}, a_{j} \in {\mathbb {R}}_{>0}. \end{aligned}$$
(14)
Other forms of constraint may also be possible, but we think this constraint is natural because the upper bound of the starting point in this setting is simply half of the distance between 0 and \(\beta\).
Drift rate We can treat the drift rate in the same manner as in the case of the D-diffusion model, except that it needs to satisfy an identification constraint, which is required for latent variable models. In the original LBA model, a common way of incorporating this constraint is to set the sum of the drift rates among the alternative choices to be one. We follow this approach by letting v to be the difference between \(\theta\) and b scaled by a logistic function:
$$\begin{aligned} \left\{ \begin{array}{l} v_{ij}^1=[1+\exp (-\theta _{i}+b_{j})]^{-1} \\ v_{ij}^2=[1+\exp (\theta _{i}-b_{j})]^{-1}. \end{array}\right. \end{aligned}$$
(15)
Note that to study the comparable performance of the proposed D-LBA model with the D-diffusion model, we consider the case of binary choice data (i.e., \(K=2\)) in this paper, although the D-LBA model can be extended to polytomous choices.

Nondecision time The nondecision time is nearly the same as that in the case of D-diffusion. We set the nondecision time as the item parameter \(\tau _{j}\).

Between-trial variability of the slope As with the LBA model, the diffusion model involves the identification issue. To deal with this in the diffusion model, the within-trial variance of the slope, s, has to be fixed at a certain value such as 0.1 or 1. This is due to the indeterminacy of the scale among \(s, \alpha\), and v in the diffusion model. On the other hand, the LBA model has no with-trial variance s; instead, it incorporates the between-trial variance \(\eta\). By decomposing this variance into the item and respondent parameters, the proposed model can solve the problem of the linear relationship between them, which was, as we noted before, one of the major problems of the diffusion IRT model. Specifically, the proposed model decomposes the between-trial variance \(\eta\) into person and item factors:
$$\begin{aligned} \eta _{ij}=\frac{\sigma _i}{\psi _j}. \end{aligned}$$
(16)
As a result, the total number of parameters in the D-LBA model becomes \(4J+3I\), while the D-diffusion model used in this study has \(3J+2I\) parameters. The joint cumulative distribution function of the item response and RT is given as
$$\begin{aligned} \begin{aligned} F_{1}(t)&=1+\frac{\frac{\gamma _{i}}{2a_{j}}-t\left[ 1+ \exp (\theta _{i}-b_{j})\right] ^{-1}}{\frac{\gamma _{i}}{2a_{j}}} \varPhi \left( \frac{\frac{\gamma _{i}}{2a_{j}}-t\left[ 1+ \exp (\theta _{i}-b_{j})\right] ^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \\&\quad -\frac{\frac{\gamma _{i}}{a_{j}}-t\left[ 1+\exp (\theta _{i}-b_{j}) \right] ^{-1}}{\frac{\gamma _{i}}{2a_{j}}}\varPhi \left( \frac{\frac{\gamma _{i}}{a_{j}}-t\left[ 1+\exp (\theta _{i}- b_{j})\right] ^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \\&\quad +\frac{t\frac{\sigma _i}{\psi _j}}{\frac{\gamma _{i}}{2a_{j}}} \phi \left( \frac{\frac{\gamma _{i}}{2a_{j}}-t\left[ 1+\exp (\theta _{i}- b_{j})\right] ^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) - \frac{t\frac{\sigma _i}{\psi _j}}{\frac{\gamma _{i}}{2a_{j}}} \phi \left( \frac{\frac{\gamma _{i}}{a_{j}}-t\left[ 1+\exp (\theta _{i}- b_{j})\right] ^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) , \\ F_{2}(t)&=1+\frac{\frac{\gamma _{i}}{2a_{j}}-t\left[ 1+\exp (-\theta _{i} +b_{j})\right] ^{-1}}{\frac{\gamma _{i}}{2a_{j}}}\varPhi \left( \frac{\frac{\gamma _{i}}{2a_{j}}-t\left[ 1+\exp (-\theta _{i}+ b_{j})\right] ^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \\&\quad -\frac{\frac{\gamma _{i}}{a_{j}}-t\left[ 1+\exp (-\theta _{i}+ b_{j})\right] ^{-1}}{\frac{\gamma _{i}}{2a_{j}}}\varPhi \left( \frac{\frac{\gamma _{i}}{a_{j}}-t\left[ 1+\exp (-\theta _{i}+ b_{j})\right] ^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \\&\quad +\frac{t\frac{\sigma _i}{\psi _j}}{\frac{\gamma _{i}}{2a_{j}}}\phi \left( \frac{\frac{\gamma _{i}}{2a_{j}}-t\left[ 1+\exp (-\theta _{i}+ b_{j})\right] ^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \\&\quad - \frac{t\frac{\sigma _i}{\psi _j}}{\frac{\gamma _{i}}{2a_{j}}} \phi \left( \frac{\frac{\gamma _{i}}{a_{j}}-t\left[ 1+ \exp (-\theta _{i}+b_{j})\right] ^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) , \end{aligned} \end{aligned}$$
(17)
and the corresponding joint probability density function is given as
$$\begin{aligned} \begin{aligned} f_{1}(t)&=\frac{2a_{j}}{\gamma _{i}} \left[ -[1+\exp (\theta _{i}-b_{j})]^{-1} \varPhi \left( \frac{\frac{\gamma _{i}}{2a_{j}}-t [1+\exp (\theta _{i}-b_{j})]^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \right. \\&\quad +\frac{\sigma _i}{\psi _j}\phi \left( \frac{\frac{\gamma _{i}}{2a_{j}}-t[1+\exp (\theta _{i}-b_{j})]^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \\&\quad +[1+\exp (\theta _{i}-b_{j})]^{-1}\varPhi \left( \frac{\frac{\gamma _{i}}{a_{j}}-t[1+\exp (\theta _{i}-b_{j})]^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \\&\quad \left. -\frac{\sigma _i}{\psi _j}\phi \left( \frac{\frac{\gamma _{i}}{a_{j}}-t[1+\exp (\theta _{i}-b_{j})]^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \right] ,\\ f_{2}(t)&=\frac{2a_{j}}{\gamma _{i}} \left[ -[1+\exp (-\theta _{i}+b_{j})]^{-1}\varPhi \left( \frac{\frac{\gamma _{i}}{2a_{j}}-t [1+\exp (-\theta _{i}+b_{j})]^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \right. \\&\quad +\frac{\sigma _i}{\psi _j}\phi \left( \frac{\frac{\gamma _{i}}{2a_{j}}-t[1+\exp (-\theta _{i}+b_{j})]^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \\&\quad +[1+\exp (-\theta _{i}+b_{j})]^{-1}\varPhi \left( \frac{\frac{\gamma _{i}}{a_{j}}-t[1+\exp (-\theta _{i}+b_{j})]^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \\&\quad \left. -\frac{\sigma _i}{\psi _j}\phi \left( \frac{\frac{\gamma _{i}}{a_{j}}-t[1+\exp (-\theta _{i}+b_{j})]^{-1}}{t\frac{\sigma _i}{\psi _j}}\right) \right] . \end{aligned} \end{aligned}$$
(18)

3.2 Meanings of the parameters in the D-diffusion and D-LBA models

To facilitate the understanding of parameters, in this section, we briefly show the relationships between the parameter values and the observed quantities, which are the RTs and item responses.

Expected RT Figure 3 shows the relationships between the expected RTs and \((\theta _i-b_j)\). This \((\theta _i-b_j)\) is equal to drift rate in the D-diffusion model (Eq. 7) and is a one-to-one with the drift rate in the D-LBA model (Eq. 15); thus, we comprehensively call it the drift rate component here. Each of the three lines correspond to different boundary parameter values. We can observe two major characteristics common to both models. First, the expected RT is longer when the absolute difference \(|\theta _i-b_j|\) is close to zero. Second, the expected RT is longer when the boundary (\(\gamma _{i}/a_{j}\)) is larger. In the D-diffusion model, the expected RT is given by (Tuerlinckx et al. 2016)
$$\begin{aligned} E(t_{ij})\simeq \left\{ \begin{array}{ll} \frac{1}{2|\theta _i-b_j|}\left( \frac{\gamma _i}{a_j}\right) &{} \quad \text{if}\ |\theta _i-b_j|\ne 0 \\ \frac{1}{4}\left( \frac{\gamma _i}{a_j}\right) ^2 &{} \quad \text{if} \ |\theta _i-b_j|= 0. \end{array}\right. \end{aligned}$$
(19)
From Eqs. (5), (7), and (19), we can see that both the expected RT and the discriminability are functions of the boundary parameters. More specifically, the expected RT is approximately a quadratic function of the boundary parameter when \(|\theta _i-b_j|\) is zero but otherwise an approximately linear function of it. For example, if the expected RT is 5 s when \(|\theta _i-b_j|\) is equal to zero, the discriminability becomes \(\sqrt{20} \simeq 4.47\). Figure 4 plots the relationship between the expected RT and the boundary parameter in the D-diffusion model.
Fig. 3

Relationship between the drift rate component (\(\theta _i - b_j\)) and the expected RT. Left panel: D-diffusion model; right panel: D-LBA model; solid line: \(\gamma _i / a_j=1\); dashed line: \(\gamma _i / a_j=3\); dotted line: \(\gamma _i / a_j=5\)

Fig. 4

Relationship between the boundary (\(\gamma _i / a_j\)) and the expected RT. Left panel: D-diffusion model; right panel: D-LBA model; solid line: \(|\theta _i - b_j|=0\); dashed line: \(|\theta _i - b_j|=1\); dotted line: \(|\theta _i - b_j|=2\)

Item response Figure 5 shows the relationship between the probability of choosing the first response category and the drift rate component \((\theta _i-b_j)\). Each of the three lines corresponds to different boundary parameter values. In the D-diffusion model, the discriminability differs in conjunction with the boundary parameters. In the D-LBA model, on the other hand, the discriminability does not differ even when the boundary parameters differ. This suggests that, in the D-LBA model, the boundary parameters only affect the expected RT. Here, Fig. 6 indicates how the between-trial variability in the slope, \(\eta _{ij}\), affects the discriminability and expected RT. The results suggest that \(\eta _{ij}\) affects both the discriminability and expected RT.
Fig. 5

Relationship between the drift rate (\(\theta _i - b_j\)) and the probability of choosing the first category. Left panel: D-diffusion model; right panel: D-LBA model; solid line: \(\gamma _i / a_j=1\); dashed line: \(\gamma _i / a_j=3\); dotted line: \(\gamma _i / a_j=5\)

Fig. 6

Relationships between the parameters and the observed quantities in the D-LBA model with different intertrial variabilities \(\eta _{ij}\) in the D-LBA model. Solid line: \(\eta _{ij}=0.1\); dashed line: \(\eta _{ij}=0.3\); dotted line: \(\eta _{ij}=0.5\). Left panel: relationship between the drift rate component (\(\theta _i - b_j\)) and the probability of choosing the correct response. Right panel: relationship between the expected RT and the drift rate component (\(\theta _i - b_j\))

The relationship shown in Fig. 6 shows the advantage of the proposed D-LBA model. The D-LBA model is free from the strong and unrealistic assumption of linearity between the discriminability and the expected RT by incorporating two different parameters \(\alpha _{ij}\) and \(\eta _{ij}\). In contrast, the D-diffusion model has this linearity. This may limit the applicability of the D-diffusion model to empirical data that has a longer RT than the typical context of the diffusion model.

3.3 Prior distribution

As described above, the proposed D-LBA model can be numerically estimated using the MCMC estimation method. For this, we set the prior distributions for each parameter. We use the following priors:
$$\begin{aligned} \begin{array}{ll} a_{j} \sim \text{HalfCauchy}(0,5), &{} \quad \gamma _{i} \sim LN(0,1), \\ b_{j} \sim N(0,2.5), &{} \quad \theta _{i} \sim N(0,1), \\ \psi _{j} \sim \text{HalfCauchy}(0,5), &{} \quad \sigma _{i} \sim LN(0,1), \\ \tau _{j} \sim \text{HalfCauchy}(0,5), &{} \quad \end{array} \end{aligned}$$
(20)
where \(U(\cdot )\) and \(LN(\cdot )\) denote uniform and lognormal distributions, respectively. Here, \(\theta _{i}\) follows the standard normal distribution for identification purposes; this is a popular assumption in IRT models. Other respondent parameters, \(\gamma _{i}\) and \(\sigma _{i}\), follow the standard lognormal distribution for the same reason. On the other hand, we set weakly informative priors (Gelman et al. 2008) to all item parameters. For determining the hyperparameters, we conducted a sensitivity analysis, and the results showed that even when we set \(\text{HalfCauchy}(0,5)\) for \(a_{j}\), \(\psi _{j}\), and \(\tau _{j}\), each posterior mean negligibly differs from a very weakly informative prior (\(\text{HalfCauchy}(0,100)\)). Thus, we used the abovementioned priors for computationally efficient estimation.

In the original LBA model, some parameters (viz., \(\beta\), A, and \(\eta\)) are permitted to differ between different response categories. As pointed out by Heathcote and Love (2012), allowing this flexibility can make the model fit better. However, in this study, we set these parameters to be equal between response categories for the following reasons. First, the proposed D-LBA model is meant to be a simpler alternative to the D-diffusion model; thus, we would not want to increase the number of parameters unless they are of substantial importance. Second, this constraint facilitates faster and stable estimation.

4 Simulation study

In this simulation study, we investigated several issues. First, we assessed parameter recovery. If the proposed model cannot properly recover parameters with the simulation data, it is pointless to examine other issues. Second, we investigated the similarity between the D-diffusion and D-LBA models. Because the parameters of these two models do not analytically map to each other, the empirical correspondence between the parameters needs to be investigated. Although the D-LBA and D-diffusion models may have similar parameter interpretations, they have different parameter scales. Hence, we evaluated the similarity by correlations rather than absolute differences. Third, we compared both models in terms of the number of iterations to convergence. Previous studies have shown that the LBA model is simpler than the diffusion model. If so, it may be natural to expect the D-LBA model to converge faster than D-diffusion, even if new parameters are added to the D-LBA model. Finally, we examined the model performance according to the information criteria.

Conditions In this simulation, we generated simulation data from both the D-diffusion and D-LBA models, and we estimated the parameters using both models. In addition, we set \(3\times 3=9\) conditions by combining the following factors: (1) the number of respondents \(I=(100,300,500)\) and (2) the number of items \(J=(10,20,30)\). Overall, we simulated \(3\times 3\times 2\times 2=36\) conditions with 20 replications for each condition.

Data generation To generate simulation data from the D-LBA IRT model, the following distributions were used:
$$\begin{aligned} \begin{array}{ll} a_{j} \sim U(0.5,3), &{} \qquad \gamma _{i} \sim LN(0,1), \\ b_{j} \sim U(-3,3), &{} \qquad \theta _{i} \sim N(0,1), \\ \psi _{j} \sim U(1,4), &{} \qquad \sigma _{i} \sim LN(0,1), \\ \tau _{j} \sim U(0.1,0.5). &{} \end{array} \end{aligned}$$
(21)
When generating data from D-diffusion, we used the same distributions except \(a_{j} \sim U(0.3,0.7)\) owing to the difference in the parameter scale. We set the range of parameters in Eq. (21) following Donkin et al. (2011). Furthermore, we fixed s to 1 in the diffusion model.

Using the true parameter values generated from Eq. (21), we generated simulation data with the rdiffusion function in the rtdists R package. In this process, we found that a very small proportion of the generated RTs were greater than 120 s. However, such data make estimation (calculation of the log-probability) difficult and unstable for technical reasons, especially in the case of diffusion IRT. Moreover, in practice, the observation of such large RT data is unlikely; if they exist, they are usually excluded from the analysis as “lazy” responses. For these reasons, we used only data for which the RT was less than 120 s.

Table 1 summarizes the descriptive statistics of the RTs used in this simulation. As seen in the first column, the proportion of excluded data is very slight. The following columns indicate the mean, 2.5% quantile, median, and 97.5% quantile of the data that were used after exclusion. The range of the RT can be considered adequate and relevant.
Table 1

Descriptive statistics of the RTs generated by the simulation

 

> 120 s

Mean

2.5%

50%

97.5%

D-diffusion

0.054%

2.111 s

0.254 s

1.066 s

10.587 s

D-LBA

0.007%

1.539 s

0.282 s

0.907 s

6.635 s

Prior distributions For estimation by the proposed D-LBA model, our priors are those in Eq. (20). We use the same prior for the D-diffusion model, except \(\psi _j\) and \(\sigma _i\).

For all of the estimation results presented below, we used R (3.4.0) and rstan 2.15.0 on a Windows 10 PC. Three MCMC chains were run for each dataset. The number of MCMC iterations per chain was 10,000, 9000 of which were discarded as warmup. The Stan code for LBA was obtained from the work of Annis et al. (2016) and extended to the D-LBA model. The Stan code used in this study is provided Open Science Framework (https://osf.io/ck7fr/). We used the posterior means of the parameters as their point estimates.

4.1 Results

Parameter recovery (RMSE and bias) Table 2 lists the mean root-mean-square error (RMSE) values for each condition when both the generation and estimation models are the same. The RMSE for the parameter \(\theta _i\), for example, is calculated by
$$\begin{aligned} \text{RMSE}=\sqrt{\frac{1}{I}\sum _{i=1}^{I}(\hat{\theta _{i}}-\theta _i)^2}. \end{aligned}$$
(22)
Since \(\gamma _{i}\) and \(\sigma _{i}\) are log-normally distributed, the RMSE between the log-transformed estimates and the log-transformed true values for \(\gamma _{i}\) and \(\sigma _{i}\) are shown for these parameters. A comparison of the absolute values between the two models may not be meaningful because the scales of the parameters are different. Nevertheless, it is evident that as the number of respondents I increases, the RMSEs for all item parameters decrease in both the D-diffusion and D-LBA models. On the other hand, the RMSEs for the respondent parameters did not decrease as a function of I. This can be attributed to the well-known “Neyman–Scott paradox” (Neyman and Scott 1948). Specifically, in some IRT models, the estimates of the respondent parameters do not converge to the true value even in the large-sample limit because the number of parameters also increases as the number of respondents increases.
Table 3 summarizes the mean biases for each condition. The bias for \(\theta _i\), for example, is calculated by
$$\begin{aligned} \text{bias}=\frac{1}{I}\sum _{i=1}^{I}(\hat{\theta _{i}}-\theta _i). \end{aligned}$$
(23)
From the table, the biases for all parameters were close to zero. This suggests that the proposed D-LBA model and D-diffusion model produce good parameter estimates on average.
From the viewpoint of the RMSEs and biases, it can be concluded that the proposed model parameters are properly estimated and that parameter recovery is as good as that of the D-diffusion model.
Table 2

Mean RMSE for each model

 

I

J

\(a_j\)

\(\log (\gamma _i)\)

\(b_j\)

\(\theta _i\)

\(\tau _j\)

\(\psi _j\)

\(\log (\sigma _i)\)

D-Diffusion

100

10

0.062 (0.047)

0.175 (0.075)

0.144 (0.050)

0.465 (0.046)

0.003 (0.002)

20

0.119 (0.054)

0.224 (0.079)

0.152 (0.050)

0.391 (0.051)

0.003 (0.001)

30

0.213 (0.080)

0.348 (0.103)

0.144 (0.040)

0.364 (0.052)

0.003 (0.001)

300

10

0.032 (0.022)

0.150 (0.024)

0.093 (0.031)

0.482 (0.032)

0.001 (0.000)

20

0.039 (0.026)

0.119 (0.034)

0.094 (0.027)

0.390 (0.028)

0.001 (0.000)

30

0.055 (0.033)

0.128 (0.048)

0.085 (0.028)

0.338 (0.028)

0.001 (0.000)

500

10

0.029 (0.016)

0.145 (0.017)

0.070 (0.026)

0.471 (0.023)

0.001 (0.000)

20

0.035 (0.018)

0.115 (0.021)

0.072 (0.021)

0.389 (0.027)

0.001 (0.000)

30

0.037 (0.016)

0.108 (0.024)

0.065 (0.012)

0.340 (0.018)

0.001 (0.000)

D-LBA

100

10

0.230 (0.128)

0.302 (0.056)

0.435 (0.158)

0.670 (0.071)

0.010 (0.004)

0.525 (0.208)

0.509 (0.046)

20

0.323 (0.167)

0.276 (0.054)

0.429 (0.138)

0.548 (0.058)

0.010 (0.004)

0.616 (0.219)

0.401 (0.047)

30

0.456 (0.171)

0.288 (0.051)

0.381 (0.096)

0.523 (0.058)

0.010 (0.003)

0.674 (0.228)

0.348 (0.045)

300

10

0.120 (0.057)

0.281 (0.034)

0.285 (0.133)

0.684 (0.055)

0.005 (0.003)

0.330 (0.180)

0.485 (0.020)

20

0.176 (0.099)

0.252 (0.028)

0.233 (0.089)

0.564 (0.032)

0.005 (0.001)

0.317 (0.105)

0.373 (0.026)

30

0.199 (0.097)

0.224 (0.035)

0.228 (0.067)

0.517 (0.037)

0.004 (0.001)

0.308 (0.103)

0.321 (0.024)

500

10

0.094 (0.056)

0.284 (0.020)

0.175 (0.106)

0.650 (0.053)

0.004 (0.002)

0.224 (0.121)

0.478 (0.018)

20

0.106 (0.060)

0.233 (0.018)

0.176 (0.099)

0.553 (0.031)

0.003 (0.001)

0.212 (0.083)

0.360 (0.017)

30

0.105 (0.062)

0.209 (0.017)

0.167 (0.072)

0.495 (0.027)

0.003 (0.001)

0.209 (0.069)

0.311 (0.020)

The SD is stated within parentheses

Table 3

Mean bias for each model

 

I

J

\(a_j\)

\(\log (\gamma _{i})\)

\(b_{j}\)

\(\theta _{i}\)

\(\tau _{j}\)

\(\psi _{j}\)

\(\log (\sigma _i)\)

D-Diffusion

100

10

− 0.044

− 0.084

0.020

0.021

0.000

20

− 0.109

− 0.194

− 0.030

− 0.032

0.001

30

− 0.205

− 0.338

0.022

0.008

0.000

300

10

− 0.023

− 0.056

− 0.012

− 0.020

0.000

20

− 0.030

− 0.061

0.001

− 0.001

0.001

30

− 0.049

− 0.095

0.009

0.007

0.000

500

10

− 0.013

− 0.035

− 0.019

− 0.010

0.000

20

− 0.028

− 0.059

− 0.013

− 0.014

0.000

30

− 0.033

− 0.070

− 0.008

− 0.002

0.001

D-LBA

100

10

− 0.129

− 0.090

0.008

− 0.028

0.000

− 0.244

− 0.145

20

− 0.269

− 0.163

0.066

0.006

0.001

− 0.428

− 0.189

30

− 0.405

− 0.215

0.007

0.001

0.000

− 0.490

− 0.172

300

10

− 0.054

− 0.067

0.023

− 0.005

0.001

− 0.134

− 0.137

20

− 0.142

− 0.100

0.005

− 0.003

0.000

− 0.134

− 0.098

30

− 0.163

− 0.101

0.006

0.010

0.000

− 0.186

− 0.099

500

10

− 0.050

− 0.064

0.001

0.005

0.001

− 0.050

− 0.110

20

− 0.057

− 0.052

0.006

0.000

0.000

− 0.064

− 0.074

30

− 0.076

− 0.063

− 0.011

− 0.006

0.000

− 0.106

− 0.077

Correspondence (correlations) Tables 4, 5, 6, 7 and 8 summarize the correlation of each parameter for each condition. With regard to the correlations for the conditions in which data were generated with the D-diffusion model and estimated with the D-LBA model, all of the parameters except \(a_j\) took sufficient values (greater than 0.9). Even though \(a_j\) had lower correlations than the others, this is consistent with the results of Donkin et al. (2011). In addition, the proposed model expresses the relationship between the discriminability and the RT with two parameters, \(\alpha _{ij}\) and \(\sigma _{ij}\), while the D-diffusion model uses only \(\alpha _{ij}\) to express this relationship. This would explain the result that the correlation of \(a_j\) is lower, particularly when the data were generated from the D-diffusion model and estimated with the D-LBA model. These results, especially those in Table 7, show an interesting point. When the data generation and estimation models were the same, the D-diffusion model exhibited a higher correlation than the D-LBA model. On the other hand, when data generation and estimation models were different, the D-LBA model exhibited a higher correlation than the D-diffusion model. This suggests that when the data generation model is known to be the D-diffusion model, the true D-diffusion model seems to indicate higher performance than the D-LBA model. However, when the data generation model is unknown, which is natural in a practical situation, the D-LBA model indicates more stable performance regardless of the true data generation model.
Table 4

Correlation between the true parameter value of \(a_{j}\) in the data generating model and its estimate in the estimation model

Data generating model

D-diffusion

D-LBA

Estimation model

D-diffusion

D-LBA

D-diffusion

D-LBA

I

J

    

100

10

0.973

0.407

0.920

0.990

20

0.982

0.598

0.905

0.991

30

0.981

0.665

0.890

0.989

300

10

0.994

0.519

0.942

0.996

20

0.993

0.547

0.939

0.996

30

0.993

0.553

0.936

0.996

500

10

0.996

0.530

0.937

0.998

20

0.996

0.582

0.932

0.998

30

0.996

0.603

0.948

0.998

Table 5

Correlation between the true parameter value of \(\log (\gamma _{i})\) in the data generating model and its estimate in the estimation model

Data generating model

D-diffusion

D-LBA

Estimation model

D-diffusion

D-LBA

D-diffusion

D-LBA

I

J

    

100

10

0.992

0.947

0.917

0.960

20

0.996

0.968

0.939

0.976

30

0.997

0.977

0.942

0.982

300

10

0.992

0.936

0.921

0.964

20

0.996

0.964

0.932

0.973

30

0.997

0.974

0.940

0.980

500

10

0.991

0.938

0.918

0.962

20

0.996

0.966

0.931

0.974

30

0.997

0.974

0.941

0.980

Table 6

Correlation between the true parameter value of \(b_{j}\) in the data generating model and its estimate in the estimation model

Data generating model

D-diffusion

D-LBA

Estimation model

D-diffusion

D-LBA

D-diffusion

D-LBA

I

J

    

100

10

0.998

0.958

0.944

0.981

20

0.998

0.971

0.947

0.984

30

0.998

0.976

0.934

0.982

300

10

0.999

0.964

0.946

0.990

20

0.999

0.978

0.963

0.993

30

0.999

0.980

0.952

0.993

500

10

> 0.999

0.964

0.970

0.996

20

> 0.999

0.980

0.961

0.996

30

> 0.999

0.980

0.955

0.996

Table 7

Correlation between the true parameter value of \(\theta _{i}\) in the data generating model and its estimate in the estimation model

Data generation model

D-diffusion

D-LBA

Estimation model

D-diffusion

D-LBA

D-diffusion

D-LBA

I

J

    

100

10

0.882

0.739

0.612

0.736

20

0.921

0.853

0.695

0.836

30

0.935

0.894

0.738

0.856

300

10

0.879

0.721

0.596

0.727

20

0.924

0.830

0.708

0.826

30

0.941

0.892

0.741

0.861

500

10

0.880

0.716

0.652

0.754

20

0.922

0.838

0.719

0.831

30

0.941

0.887

0.745

0.870

Table 8

Correlation between the true parameter value of \(\tau _{j}\) in the data generating model and its estimate in the estimation model

Data generation model

D-diffusion

D-LBA

Estimation model

D-diffusion

D-LBA

D-diffusion

D-LBA

I

J

    

100

10

> 0.999

0.999

0.986

0.995

20

> 0.999

0.999

0.988

0.996

30

> 0.999

0.999

0.985

0.997

300

10

> 0.999

> 0.999

0.997

0.999

20

> 0.999

> 0.999

0.995

0.999

30

> 0.999

> 0.999

0.995

0.999

500

10

> 0.999

> 0.999

0.997

0.999

20

> 0.999

> 0.999

0.998

> 0.999

30

> 0.999

> 0.999

0.997

> 0.999

Estimation efficiency (\({\hat{R}}\)and effective sample size) We computed the Gelman–Rubin diagnostic statistic (\({\hat{R}}\); Gelman et al. 2014) as a convergence diagnostic measure. We found that when the number of respondents is small, both models converge quite fast (fewer than 1000 iterations when \(I=100\)). Therefore, in this paper, we show the results when the number of respondents is the largest under the conditions that we considered.

Figure 7 shows the results for the conditions \(I=500\) and \(J=30\) with the D-LBA model as the data generation model. The x axis represents the warmup iterations (1000 iterations after these 9000 warmup iterations were used for posterior estimation). The y axis represents the proportion for which \({\hat{R}}\) is below the threshold. Gelman et al. (2014) suggested a threshold of 1.1 for \({\hat{R}}\); therefore, we set the same threshold. These proportions were computed on the basis of the MCMC samples from zero to the x-axis value iterations in increments of 500. The solid and dashed lines represent the results estimated by the D-LBA and D-diffusion models, respectively.
Fig. 7

Proportion of parameters for which \({\hat{R}}\) is lower than 1.1 when the data generation model is the D-LBA model. The solid line represents the results estimated by the D-LBA model. The dashed line represents the results estimated by D-diffusion

The results indicate that the D-LBA model converges much faster than D-diffusion, despite the fact that the number of parameters in the D-LBA model is larger. The D-LBA model took less than 1, 000 warmups to converge for more than 99% of the parameters, whereas D-diffusion seems to be more unstable. Figure 8, which shows the results when the data generation model was the D-diffusion model, also indicates that the D-LBA model converges several times faster than the D-diffusion model.
Fig. 8

Proportion of parameters for which \({\hat{R}}\) is lower than 1.1 when the data generation model is the D-diffusion model. The solid and dashed lines have the same meanings as those in Fig. 7

In addition, we checked the average effective sample size as a measure of the estimation efficiency. Table 9 summarizes the effective sample sizes under the conditions \(I=500\) and \(J=30\). It is evident that the D-LBA model obtained larger effective sample sizes than the D-diffusion model for all parameters regardless of the true data generation model. This result was consistent under all conditions. The results suggest that the proposed model can estimate parameters more efficiently than the D-diffusion model. Note that with regard to the empirical computational time per iteration, given the same computational environment, there exist no systematic differences between the two models. This means that the proposed D-LBA model provides a higher efficiency for estimation per iteration and per real-world time.
Table 9

Average effective sample sizes for each parameter under the conditions \(I=500\) and \(J=30\)

Generation model

D-Diffusion

D-LBA

Estimation model

D-Diffusion

D-LBA

D-Diffusion

D-LBA

\(a_j\)

48.25

146.09

49.03

63.41

\(\gamma _{i}\)

333.72

938.00

320.33

781.89

\(b_{j}\)

188.29

2101.40

202.00

1164.00

\(\theta _{i}\)

969.67

3615.56

2411.72

4153.28

\(\tau _{j}\)

1253.42

2291.05

2739.38

3153.62

\(\psi _{j}\)

508.43

430.78

\(\sigma _{i}\)

1516.56

2171.01

Information criteria In addition to the results presented above, we assessed the fitness of the models in terms of information criteria (WAIC: widely applicable information criterion and WBIC: widely applicable Bayesian information criterion; Watanabe 2010, 2013; Vehtari et al. 2017). Figure 9 shows the results under the largest data conditions (\(I=500, J=30\)). In all of these graphs, the solid lines represent the results estimated by the D-LBA model, and the dot–dash lines represent the results estimated by the D-diffusion model. The upper half represents the WAIC, and the lower half represents the WBIC. The graphs on the left are for data generated by D-diffusion, whereas those on the right are for data generated by the D-LBA model. For all datasets generated by the D-LBA model, both indices become lower when estimated by the D-LBA model. On the other hand, for all datasets generated by the D-diffusion model, the D-LBA model shows worse values than D-diffusion. These results are expected, and we confirmed the same results under all conditions.
Fig. 9

Results under the conditions \(I=500\) and \(J=30\). Solid lines: estimated with D-LBA; dot–dash lines: estimated with D-Diffusion; upper half: WAIC, lower half: WBIC; left side: data generated by D-Diffusion, right side: data generated by D-LBA

5 Real data application: extraversion data

In this section, we consider a more realistic situation using real data to examine the applicability of the proposed D-LBA model.

Data We used the extraversion data in the diffIRT R package. These data, obtained by Molenaar et al. (2015), comprise 146 respondents for 10 items. Each item is a particular word or phrase related to extraversion behavior (e.g., “active” or “noisy”). Respondents were asked whether each item is appropriate to their personalities. For all respondents and all items, the actual response (yes/no) and RT were recorded, some of which are missing.

5.1 Results

Table 10 summarizes the estimates of the item parameters obtained by both models. The correlations of the \(a_j\), \(b_j\), and \(\tau _j\) parameters between the D-diffusion and D-LBA estimates were 0.707, 0.806, and 0.851, respectively. Although they are slightly lower than the results of the simulation study, given the relatively small number of respondents (\(I=143\)), it can be said that the D-diffusion and D-LBA models are substantially similar models.
Table 10

Item parameters obtained by D-LBA and D-diffusion

 

Item

Prop.

MRT

\(a_j\)

\(b_j\)

\(\tau _j\)

\(\psi _j\)

D-LBA

D-diff

D-LBA

D-diff

D-LBA

D-diff

D-LBA

1

Active

0.741

1.486

0.966

0.520

\(-\) 2.019

\(-\) 0.704

0.451

0.575

1.503

2

Noisy

0.538

1.357

1.091

0.540

\(-\) 0.087

\(-\) 0.117

0.327

0.475

2.513

3

Energetic

0.846

1.120

1.573

0.597

\(-\) 2.032

\(-\) 1.322

0.394

0.502

2.813

4

Enthusiastic

0.916

1.000

2.090

0.579

\(-\) 2.440

\(-\) 1.854

0.398

0.458

2.957

5

Impulsive

0.539

1.298

1.240

0.551

\(-\) 0.240

\(-\) 0.213

0.339

0.464

2.751

6

Jovial

0.902

1.262

1.187

0.507

\(-\) 4.338

\(-\) 1.380

0.412

0.501

1.937

7

Viable

0.937

1.142

1.306

0.490

\(-\) 2.694

\(-\) 1.788

0.372

0.511

3.393

8

Eupeptic

0.958

1.090

1.419

0.454

\(-\) 3.448

\(-\) 2.065

0.352

0.434

2.955

9

Communicative

0.824

1.728

0.786

0.408

\(-\) 2.056

\(-\) 0.904

0.472

0.609

2.088

10

Spontaneous

0.860

0.986

1.848

0.632

\(-\) 2.488

\(-\) 1.527

0.370

0.462

2.750

Prop proportion that answers “yes”, MRT mean response time

In addition, Fig. 10 shows the posterior density for each item parameter. In the figure, the left-hand side shows the plot for \(a_j\), and the right-hand side shows the plot for \(b_j\). From this figure, we can see that each parameter estimate was properly obtained because each density has only one peak. Moreover, the parameter estimates for each model were highly correlated with the summary statistics. For items having a high proportion of “yes” responses (e.g., “viable” and “eupeptic”), \(b_j\) become lower. As for \(a_j\), it corresponds with the mean response time (MRT). This can be considered evidence for the validity of the estimates.

Table 11 summarizes the mean effective sample sizes for both models. All but \(\tau _j\) indicated higher values in the D-LBA model than in the D-diffusion model. This result suggests that the D-LBA model can estimate parameters more efficiently than the D-diffusion model, even for real data.

Furthermore, we conducted a posterior predictive check to validate the model. Specifically, we generated 15,000 random samples of responses and RTs from the posterior predictive distribution. Then, we calculated the proportion of responses that were the same as the observed data for each respondent. Figures 11 and 12 show the histograms of this proportion for all 143 respondents for the D-LBA and D-diffusion models, respectively. Similarly, Fig. 13 shows the posterior predictive distributions of the RT for the first respondent. Each line corresponds to the posterior predictive distribution for each item, and the black vertical line represents the probability density at the point of the observed RT. In Fig. 13, longer black lines mean that the model performs better in terms of posterior prediction of the RT. These results indicate that the proposed D-LBA model adequately explains the observed data and that its predictive performance is at least as good as that of the D-diffusion model.
Fig. 10

Posterior densities of the item parameters obtained by the D-LBA model. The posteriors of \(a_{j}\) are shown on the left-hand side, and the posteriors of \(b_j\) are shown on the right-hand side. The item indices correspond to those of Table 10

Table 11

Mean effective sample size for each parameter in the D-diffusion and D-LBA models

Estimation model

D-Diffusion

D-LBA

\(a_j\)

101.84

260.60

\(\gamma _{i}\)

370.02

843.82

\(b_{j}\)

684.45

1255.58

\(\theta _{i}\)

2465.66

3201.51

\(\tau _{j}\)

2035.36

1706.55

\(\psi _{j}\)

720.89

\(\sigma _{i}\)

1414.07

Fig. 11

Histograms of the posterior predictive samples that correspond to the observed response in the D-LBA model (sample size 143)

Fig. 12

Histograms of the posterior predictive samples that correspond to the observed response in the D-diffusion model (sample size 143)

Fig. 13

Posterior predictive distributions of the RT in the D-LBA (left side) and D-diffusion (right side) models along with the observed RT (black vertical line) for the first respondent. The upper half shows the distributions when the response is “yes,” and the lower half shows the distributions when the response is “no.” This respondent only answered “no” for items one and six

One of the major advantages of using the model-based parameters instead of much simpler descriptive statistics such as the MRT or the proportion of choosing the first category is that, while the theory of psychological measurement suggests that the observed data contain random fluctuations or errors, the substantially informed model parameters directly reflect the underlying psychological process that elicited the observed responses (Molenaar et al. 2015). The model also decomposes the observed information into several different meaningful sources of variability. For instance, the MRT is used as a property of an item, although it may be influenced by some respondents’ traits, which were represented by the parameter \(\gamma _i\). If more “deliberate” respondents were unintentionally collected, the observed MRT may be longer even if they answered the same items. However, we cannot distinguish the respondents’ traits from the item traits as long as the simple MRT is used. By estimating both \(\gamma _i\) and \(a_j\) at the same time, \(a_j\) can be seen as that not influenced by respondents’ traits. This also makes it reasonable to examine the item parameter estimates from a qualitative perspective without considering respondents’ traits. For example, it may be difficult to provide an answer quickly and intuitively for an item having a higher \(a_j\) (e.g., “communicative”). In other words, such items seem to indicate more complicated meanings than those having a lower \(a_j\). In addition, more respondents seem to answer “yes” for items having a lower \(b_j\). From a qualitative viewpoint, items with a higher \(b_j\) tend to have negative meanings. For example, in the Oxford Advanced Learner’s Dictionary (Deuter et al. 2015), the word “noisy” is defined as “making a lot of noise,” and the word “impulsive” is defined as “acting suddenly without thinking carefully about what might happen because of what you are doing.” Obviously, these two words have more negative connotations than positive ones. Therefore, respondents may be reluctant to answer “yes” to these items.

One of the main interests in this application is comparative model fitting. Our primary objective here is to comparatively evaluate the proposed D-LBA model with the existing D-diffusion model. However, to check whether or not more parsimonious model formulations show better performance than the full models, we also examined several more parsimonious model formulations. The conditions examined are listed in Table 12. With regard to the boundary, we considered four conditions: when the boundary parameter depends both on the respondents and items, when it depends on the items but is common across respondents, when it depends on the respondents but is common across items, and when it is common across both respondents and items. The specific parameterizations are listed in Table 12. Similarly, we considered these four conditions for the between-trial variance \(\eta _{ij}\) in the D-LBA model. As a result, we utilized \(4\times 4=16\) conditions for the D-LBA model and four conditions for the D-diffusion model. Table 12 summarizes the obtained WAIC and WBIC for these 20 models. Among all models, the full D-LBA models exhibit the best values in terms of both the WAIC and WBIC. Particularly, the obtained WAIC values of the full D-diffusion and full D-LBA models are 0.829 and 0.757, whereas the WBIC values are 1024.98 and 898.71, respectively. Therefore, the results indicate that the proposed D-LBA model is better fitted than the D-diffusion model in terms of these information criteria with this dataset. In other words, the assumption that the item discriminability and expected RTs are completely correlated is unlikely in real data. In addition, the results indicate that both \(\alpha _{ij}\) and \(\eta _{ij}\) should be decomposed into item factors and person factors to show better fit to real data.
Table 12

Information criteria values of the full model as well as more parsimonious submodels for the D-LBA and D-Diffusion models

Model

Boundary

Between-trial variance

Number of parameters

WAIC

WBIC

D-LBA

\(\beta _{ij} = \gamma _{i}/a_{j}\)

\(\eta _{ij}=\sigma _{i}/\psi _{j}\)

\(4J+3I\)

0.757

898.71

\(\eta _{ij}=\sigma _{i}\)

\(3J+3I\)

0.777

968.52

\(\eta _{ij}=\psi _{j}\)

\(4J+2I\)

0.828

1023.02

\(\eta _{ij}=\eta\)

\(3J+2I+1\)

0.784

982.39

\(\beta _{ij} = a_{j}\)

\(\eta _{ij}=\sigma _{i}/\psi _{j}\)

\(4J+2I\)

0.868

1123.83

\(\eta _{ij}=\sigma _{i}\)

\(3J+2I\)

0.909

1234.57

\(\eta _{ij}=\psi _{j}\)

\(4J+I\)

0.939

1234.60

\(\eta _{ij}=\eta\)

\(3J+I+1\)

0.915

1241.96

\(\beta _{ij} = \gamma _{i}\)

\(\eta _{ij}=\sigma _{i}/\psi _{j}\)

\(3J+3I\)

0.816

989.34

\(\eta _{ij}=\sigma _{i}\)

\(2J+3I\)

0.818

1030.26

\(\eta _{ij}=\psi _{j}\)

\(3J+2I\)

0.818

1030.26

\(\eta _{ij}=\eta\)

\(2J+2I+1\)

0.820

1040.59

\(\beta _{ij} = \beta\)

\(\eta _{ij}=\sigma _{i}/\psi _{j}\)

\(3J+2I+1\)

0.880

1144.93

\(\eta _{ij}=\sigma _{i}\)

\(2J+2I+1\)

0.931

1269.21

\(\eta _{ij}=\psi _{j}\)

\(3J+I+1\)

0.962

1273.85

\(\eta _{ij}=\eta\)

\(2J+I+2\)

0.944

1286.09

D-Diffusion

\(\alpha _{ij}=\gamma _{i}/a_{j}\)

-

\(3J+2I\)

0.829

1024.98

\(\alpha _{ij}=a_{j}\)

-

\(3J+I\)

0.902

1184.12

\(\alpha _{ij}=\gamma _{i}\)

-

\(2J+2I\)

0.849

1052.88

\(\alpha _{ij}=\alpha\)

-

\(2J+I+1\)

0.914

1204.62

When \(\alpha _{ij}=\alpha\), \(\beta _{ij}=\beta\), or \(\eta _{ij}=\eta\), these parameters do not depend on the respondent or item. Moreover, when \(\alpha _{ij}, =\gamma _i\), \(\beta _{ij}=\gamma _i\), or \(\eta _{ij}=\sigma _{i}\), the prior distributions were set to \(\text{HalfCauchy}(0,5)\) instead of LN(0, 1) because we do not need identification constraints under these conditions

6 Discussion

In this study, we proposed a new cognitively-based IRT model that can explain the flexible relationship between the item discriminability and the expected RTs for personality assessment using RT information. The likelihood function of the proposed D-LBA IRT model can be essentially seen as a reparameterization of the LBA model. Our argument is that this reparameterization is the point: the LBA framework for modeling the RT data, which has been proved to be useful in the field of cognitive and mathematical psychology, has not been applied to IRT models in psychometrics. The aims of this study are to clearly reveal the relationship between these two models, which are both popular in different fields, and to combine the strengths of both models to propose the D-LBA IRT model.

From the simulation results, we identified four advantageous properties of the proposed D-LBA model. First, the proposed model can recover parameters as sufficiently as the D-diffusion model. Second, each parameter in the proposed and D-diffusion models can be interpreted in nearly the same way. Third, the correlations between the true values and the estimates obtained from the proposed model are higher than those from the D-diffusion model when the true data generation model is different. Fourth, the proposed model converges much faster and estimates more effectively than the D-diffusion model. These findings suggest that the proposed D-LBA model is a more realistic, efficient, and practical yet simpler alternative to the D-diffusion model of the item response and RT.

In addition, we applied the D-LBA and D-diffusion models to a real personality measurement dataset. Consequently, from the viewpoint of the information criterion, the D-LBA model was found to fit this dataset better than the D-diffusion model.

By introducing a new parameter and extending the simple LBA model, the proposed model can mitigate the problems that originate from the diffusion IRT model. Nevertheless, in empirical applications of the proposed D-LBA model, three potentially significant issues might persist.

First, the time required for the MCMC estimation algorithm of the proposed D-LBA model might be substantial. In our simulation study, the proposed model took around 7000 iterations to achieve full convergence, which was judged on the basis of \({\hat{R}}\) when all parameters were less than 1.1. This corresponds to a few hours (with \(I=500\) and \(J=30\)) in our computational environment (CPU: Intel Core i7-7700K; Memory: 64 GB; Operating system: Windows 10). Note that a greater computational time was needed for the existing D-diffusion model; at times, even 40, 000 iterations were not sufficient for convergence. Nevertheless, for researchers who want to analyze real data, the estimation time of the proposed model might not be sufficiently fast. In this case, one may be able to use variational Bayes (VB) inference instead of MCMC estimation for the proposed D-LBA model. Using rstan, it is easy to apply automatic differentiation variational inference (ADVI) without the need to specify the approximating variational distribution. With this approach, researchers need to satisfy only one condition: each parameter should be approximately transformed to a normal distribution.

The second potential issue is related to the test properties. In this study, we applied the D-LBA model to a single-dimensional personality scale to compare the results with those of the existing D-diffusion model. However, many existing personality scales have more than two dimensions. Therefore, an interesting direction for future research would be to extend the proposed D-LBA model to the multidimensional case. One simple approach to deal with multidimensionality is to adopt the Thurstonian IRT model (Brown and Maydeu-Olivares 2011). In the Thurstonian IRT model, each choice corresponds to a different dimension, and the respondent is asked to choose the option that best describes himself/herself. A typical forced-choice questionnaire consists of three or four choices; hence, LBA-based models would be appropriate for this case as compared to the diffusion-based model, which can only handle two-choice items. In this study, we focused on cases in which an item has only two choices. However, the LBA-based approach provides the possibility of extension to items that have more than two alternative choices. We have already started examining the possibility of this extension, and our preliminary findings are that we might need additional parameter constraints to achieve identification in this case. This may be because the LBA model considers only the first choice that reached the boundary, even when the item has more than two choices. Therefore, extension of the proposed model to more than two alternative choices is a vital area of future study.

Third, our proposed approach is model-based; therefore, strictly speaking, the advantages of the model only apply when the underlying model is correct (Tuerlinckx et al. 2016). However, as is often said, “all models are wrong, but some are useful.” We believe, in line with Box (1979), that the relevant question is not whether the model assumptions are met exactly but rather whether the model is illuminating and sufficiently useful as an approximation of reality. On the basis of the model comparison and posterior predictive check that we present in this paper, we are particularly positive about the empirical applicability of the model. That being said, a more thorough investigation regarding the fit and prediction of the model, such as the evaluation of person fit (e.g., Ferrando 2007), would be desirable in future studies.

It is noted that the D-LBA model cannot be used for ability measurements because the drift rate is expressed as the difference between the respondent and item parameters. As mentioned earlier, this model assumption corresponds to the nearness hypothesis (Kuncel 1973) and the distance–difficulty hypothesis (Ferrando and Lorenzo-Seva 2007) of personality measurement. On the other hand, under these hypotheses, the respondent tends to answer more quickly when he or she has an extremely low ability; this is obviously unlikely in ability measurement unless the respondents are simply guessing. Therefore, another interesting direction for future research would be to develop a Q-version of the LBA IRT model for ability measurements.

Finally, one of the major advantages of the IRT framework is that thanks to the decomposition of the observed data variability into item and respondent parameters, items can be scaled independently of respondents; likewise, respondents can be scaled independently of items. As one of the anonymous reviewers pointed out, this characteristic would be particularly advantageous in situations when the observed data are accumulated from different sets of samples and items over time, as in the typical application of IRT for educational measurement. On the other hand, in typical personality assessment, items do not change across respondents. However, recent technological developments allow us to administer large-scale personality assessments in which different items are presented to different respondents and to model such data (e.g., Condon and Revelle 2015; Okada et al. 2018). Therefore, the proposed D-LBA IRT model may be applicable in such modern personality measurement studies in which the RT is also corrected. This can be a fruitful direction of future research.

Notes

Funding

Funding was provided by Japan Society for the Promotion of Science (Grant nos. JP17J07674, JP17H04787) and Okawa Foundation Research Grant.

References

  1. Annis, J., Miller, B. J., & Palmeri, T. J. (2016). Bayesian inference with Stan: A tutorial on adding custom distributions. Behavior Research Methods, 48, 1–24.CrossRefGoogle Scholar
  2. Box, G. E. P. (1979). Robustness in the strategy of scientific model building. In R. L. Launer & G. B. Wilkinson (Eds.), Robustness in statistics (pp. 201–236). New York: Academic Press.CrossRefGoogle Scholar
  3. Brown, A., & Maydeu-Olivares, A. (2011). Item response modeling of forced choice questionnaires. Educational and Psychological Measurement, 71, 460–502.CrossRefGoogle Scholar
  4. Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57, 153–178.CrossRefGoogle Scholar
  5. Condon, D. M., & Revelle, W. (2015). Selected personality data from the SAPA-Project: On the structure of phrased self-report items. Journal of Open Psychology Data, 3, e6.CrossRefGoogle Scholar
  6. Deuter, M., Bradbery, J., & Turnbull, J. (2015). Oxford advanced learner’s dictionary (9th ed.). London: Oxford University Press.Google Scholar
  7. Donkin, C., Brown, S., Heathcore, A., & Wagenmakers, E. J. (2011). Diffusion versus linear ballistic accumulation: Different models but the same conclusions about psychological processes? Psychonomic Bulletin & Review, 18, 61–69.CrossRefGoogle Scholar
  8. Feller, W. (1968). Random walk and ruin problems. In W. Feller (Ed.), An introduction to probability theory and its applications (3rd ed., Vol. 1, pp. 342–371). New York: Wiley.Google Scholar
  9. Ferrando, P. J. (2007). A Pearson-type-VII item response model for assessing person fluctuation. Psychometrika, 72, 25–41.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Ferrando, P. J., & Lorenzo-Seva, U. (2007). An item response theory model for incorporating response time data in binary personality items. Applied Psychological Measurement, 31, 525–543.MathSciNetCrossRefGoogle Scholar
  11. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). New York: CRC Press.zbMATHGoogle Scholar
  12. Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics, 2, 1360–1383.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Grice, G. R. (1968). Stimulus intensity and response evocation. Psychological Review, 75, 359–373.CrossRefGoogle Scholar
  14. Heathcote, A., Brown, S. D., & Wagenmakers, E.-J. (2015). An introduction to good practices in cognitive modeling. In B. U. Forstmann & E.-J. Wagenmakers (Eds.), An introduction to model-based cognitive neuroscience. Berlin: Springer Science & Business Media.Google Scholar
  15. Heathcote, A., & Hayes, B. (2012). Diffusion versus linear ballistic accumulation: Different models for response time with different conclusions about psychological mechanisms? Canadian Journal of Experimental Psychology, 66, 125–136.CrossRefGoogle Scholar
  16. Heathcote, A., & Love, J. (2012). Linear deterministic accumulator models of simple choice. Frontiers in Psychology, 3, 292.CrossRefGoogle Scholar
  17. Kuncel, R. B. (1973). Response process and relative location of subject and item. Educational and Psychological Measurement, 33, 545–563.CrossRefGoogle Scholar
  18. Laming, D. R. J. (1968). Information theory of choice reaction time. New York: Wiley.Google Scholar
  19. Leite, F. P., & Ratcliff, R. (2011). What cognitive process drive response biases? A diffusion model analysis. Judgment and Decision Making, 6, 651–687.Google Scholar
  20. Luce, R. D. (1986). Response times. New York: Oxford University Press.Google Scholar
  21. Molenaar, D., Tuerlinckx, F., & van der Maas, H. L. J. (2015). Fitting diffusion item response theory models for responses and response times using the R package diffIRT. Journal of Statistical Software, 66, 1–34.CrossRefGoogle Scholar
  22. Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16, 1–32.MathSciNetCrossRefzbMATHGoogle Scholar
  23. Okada, K., Vandekerckhove, J., & Lee, M. D. (2018). Modeling when people quit: Bayesian censored geometric models with hierarchical and latent-mixture extensions. Behavior Research Methods, 50, 406–415.CrossRefGoogle Scholar
  24. Palada, H., Neal, A., Vuckovic, A., Martin, R., Samuels, K., & Heathcote, A. (2016). Evidence accumulation in a complex task: Making choices about concurrent multiattribute stimuli under time pressure. Journal of Experimental Psychology: Applied, 22, 1–23.Google Scholar
  25. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108.CrossRefGoogle Scholar
  26. Ratcliff, R., Thapar, A., & Mckoon, G. (2007). Application of the diffusion model to two-choice tasks for adults 75–90 years old. Psychology and Aging, 22, 56–66.CrossRefGoogle Scholar
  27. Reddi, B. A. J., & Carpenter, R. H. (2000). The influence of decision time on performance. Nature Neuroscience, 3, 827–830.CrossRefGoogle Scholar
  28. Roscam, E. E. (1987). Toward a psychometric theory of intelligence. In E. E. Roscam & R. Suck (Eds.), Progress in mathematical psychology (pp. 151–174). Amsterdam: North-Holland.Google Scholar
  29. Roscam, E. E. (1997). Models for speed and time-limit tests. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 187–208). Berlin: Springer Science & Business Media.CrossRefGoogle Scholar
  30. Stone, M. (1960). Models for choice-reaction time. Psychometrika, 25, 251–260.CrossRefzbMATHGoogle Scholar
  31. Thissen, D. (1983). Timed testing: An approach using item response theory. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adapting testing (pp. 179–203). New York: Academic Press.Google Scholar
  32. Tuerlinckx, F., & De Boeck, P. (2005). Two interpretations of the discrimination parameter. Psychometrika, 70, 629–650.MathSciNetCrossRefzbMATHGoogle Scholar
  33. Tuerlinckx, F., Molenaar, D., & van der Maas, H. L. J. (2016). Diffusion-based response-time models. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of item response theory, volume one: Models (pp. 283–300). Boca Raton: Chapman & Hall/CRC Press.Google Scholar
  34. van der Linden, W. J. (2016). Lognormal response-time model. In W. J. van der Linden (Ed.), Handbook of item response theory, volume one: Models (pp. 261–282). Boca Raton: Chapman & Hall/CRC Press.Google Scholar
  35. van der Maas, H. L. J., Molenaar, D., Maris, G., Kievit, R. A., & Borsboom, D. (2011). Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences. Psychological Review, 118, 339–356.CrossRefGoogle Scholar
  36. van der Maas, H. L. J., & Wagenmakers, E.-J. (2005). A psychometric analysis of chess expertise. The American Journal of Psychology, 118, 29–60.Google Scholar
  37. van Ravenzwaaij, D., & Oberauer, K. (2009). How to use the diffusion model: Parameter recovery of three methods: EZ, fast-dm, and DMAT. Journal of Mathematical Psychology, 53, 463–473.MathSciNetCrossRefGoogle Scholar
  38. Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432.MathSciNetCrossRefzbMATHGoogle Scholar
  39. Voss, A., Nagler, M., & Lerche, V. (2013). Diffusion models in experimental psychology: A practical introduction. Experimental Psychology, 60, 385–402.CrossRefGoogle Scholar
  40. Wagenmakers, E.-J., van der Maas, H. L. J., & Grasman, R. P. P. P. (2007). An EZ-diffusion model for response time and accuracy. Psychonomic Bulletin & Review, 14, 3–22.CrossRefGoogle Scholar
  41. Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594.MathSciNetzbMATHGoogle Scholar
  42. Watanabe, S. (2013). A widely applicable Bayesian information criterion. Journal of Machine Learning Research, 14, 867–897.MathSciNetzbMATHGoogle Scholar

Copyright information

© Japanese Federation of Statistical Science Associations 2019

Authors and Affiliations

  1. 1.Graduate School of EducationThe University of TokyoBunkyo-kuJapan
  2. 2.Japan Society for the Promotion of ScienceTokyoJapan

Personalised recommendations