In a typical eyewitness lineup scenario, a witness observes a culprit commit a crime. After some delay, the witness is asked to identify the culprit from a set of faces, the lineup. One of the faces, the suspect, is the person the police believe committed the crime. The other filler faces, of people known to have not committed the crime, are typically selected to be similar to either the suspect (Wogalter, Malpass, & McQuiston, 2004) or a description of the culprit (Wells, Rydell, & Seelau, 1993; Tunnicliff & Clark, 2000). Lineups can vary in length. In the minimal lineup, a show-up, the suspect is shown without any fillers (e.g., Gonzalez, Ellsworth, & Pembroke, 1993). Lineups commonly contain six to eight faces, including the suspect (Police Executive Research Forum, 2013), although other lengths are possible (e.g., Levi, 2012). For lineups with at least one filler, the faces can be shown to the witness in different ways. The two most commonly employed methods are simultaneous presentations, in which the witness can view all of the faces at the same time, and sequential presentations, in which the witness views the faces one at a time (Police Executive Research Forum, 2013). Although variations exist (e.g., Horry, Brewer, Weber, & Palmer, 2015; Wells, 2014), when discussing sequential lineups, we assume here that the witness can only view each face once and the lineup ends after an identification is made. Regardless of the lineup specifics, the witness identification (ID) falls into one of three broad categories: suspect, in which the witness selects the suspect, filler, in which the witness selects one of the filler faces, if available, and reject or no identification, in which the witness does not identify any lineup member as the culprit.

To emphasize the obvious, the suspect and the culprit are not necessarily the same person (e.g., Wells & Olson, 2003). That is, the suspect may not have committed the crime. Thus, a suspect ID has the potential to endanger an innocent person, especially given the weight juries tend to place on eyewitness testimony (Brewer & Burke, 2002; Cutler, Penrod, & Dexter, 1990). It is therefore vital to make every effort to assess the accuracy of a suspect ID. One path forward is to collect confidence ratings. Although eyewitness confidence was initially disregarded as uninformative (e.g., Bothwell, Deffenbacher, & Brigham 1987; Deffenbacher, 1980), recent evidence strongly suggests that there is a tight relationship between the confidence with which a witness makes an identification and the accuracy of that identification (e.g., Mickes, 2015; Wixted, Mickes, Dunn, Clark, & Wells, 2016; Wells, Yang, & Smalarz, 2015), in particular, a high confidence suspect ID is a good indicator of guilt. Designated fillers are particularly useful because, unlike suspects who can be guilty or innocent, fillers are known to be innocent and so a filler identification is known to be an incorrect response (Wells & Turtle, 1986).

It is also important to know the probability that a suspect is actually guilty before the witness ID is determined, that is, how often the culprit is put into the lineup (Wells et al., 2015). For example, Wells et al. (2015) thoroughly explored the impact of base rate and how it interacts with a number of other variables commonly investigated by eyewitness memory researchers, and they make a convincing argument that base rate is an important factor to consider when interpreting lineup identifications. At one extreme, in a police precinct in which the suspect is almost never guilty, even a highly confident suspect ID is not a strong indication of guilt. At the other extreme, a suspect ID in a precinct that almost always finds the culprit should be given considerable weight, regardless of confidence. In experimental settings, this base rate, i.e., the probability of a guilty suspect, is under experimenter control. Outside of the lab, however, the base rate is unknown, and so must be estimated (Cohen, Starns, Rotello, & Cataldo, 2020; Wixted et al., 2016).

All of these experimental factors and statistical concerns can be naturally addressed using signal detection theory (SDT), which has become a common modeling framework to analyze lineup data within the last decade (e.g., Mickes, Flowe, & Wixted, 2012; Wixted & Mickes, 2012; Dobolyi & Dodson, 2013; Carlson & Carlson, 2014; Wetmore, Neuschatz, Gronlund, Wooten, Goodsell, & Carlson, 2015; Wixted et al., 2016; Colloff, Wade, Wixted, & Maylor, 2017). Such analysis can be difficult, and often relies on numerical simulations. The goal of this paper is to introduce a new R package, sdtlu, that streamlines and automates the SDT analysis of lineup data. Furthermore, the package implements closed-form solutions for both simultaneous and sequential lineups that allow for model-based analyses.

In what follows, we first briefly describe signal detection theory and how it can be applied to lineup data. We then introduce sdtlu and illustrate its capabilities through a series of examples.

Signal detection theory for lineups

Within SDT, a lineup identification is conceived of as a memory task. The witness is asked to match a memory representation of a culprit to a currently viewed face in the lineup. Consider Fig. 1. The x-axis represents the strength of match between the culprit memory representation and an individual face in the lineup. The higher the value, the better the match. As represented by the two distributions, this match is generally higher for target faces, the culprit or guilty suspect, than to lure faces, either a filler or innocent suspect. The lure and target distributions are assumed to be normally distributed with means μl and μt, respectively, and standard deviations σl and σt, respectively. Differentiation between targets and lures increases with the distance between μl and μt, as measured in σl and σt units. The application of SDT to both simultaneous and sequential lineups relies on this basic framework, but differs in how a choice is determined.

Fig. 1
figure 1

Signal detection model of lineup responses. The x-axis represents the strength of association between the culprit and either a lure or target item. Lure and target strengths are assumed to be normally distributed with means μl and μt, respectively, and standard deviations σl and σt, respectively. With three confidence levels, low, medium, and high, there are three response criteria, c1, c2, and c3

First, consider a simultaneous lineup with six faces. (This lineup size is not in any way critical to our conclusions, but is useful for expository purposes.) The basic idea is that a match strength is independently sampled for each of the six faces from the appropriate distribution in Fig. 1 and the highest sampled strength s* from face f* determines the witness’s decision. In a target absent lineup, the suspect is not the culprit and the match strengths for all six faces are sampled from the lure distribution. In a target present lineup, the suspect is the culprit, and the match strengths of the five filler faces are sampled from the lure distribution and the match strength for the suspect face is sampled from the target distribution. The ID is determined by comparing s* to a set of response criteria. There is one response criterion for each confidence level. Consider the common situation with three confidence levels: Low, medium, and high. If s* is lower than the first criterion or ID threshold, c1, the witness rejects the lineup, that is, none of the faces was a strong enough match to the culprit to warrant an ID. Any s* above c1, however, results in an ID. If s* falls between criteria c1 and c2, the witness identifies face f* with low confidence. Likewise, an s* between criteria c2 and c3 generates a medium confidence ID for f* and an s* above c3 produces a high confidence ID for f*. If f* is a suspect, the ID is a suspect ID (regardless of guilt), and if it is a filler, the ID is a filler ID. Because it is easier to make an ID with a lower criterion, lower criteria are considered more liberal. Similarly, because it is more difficult to make an ID with a higher criterion, higher criteria are considered more conservative.

Next, consider a sequential lineup with six faces. Recall that in a sequential lineup, the faces are seen exactly once, one at a time, and that the witness must stop after making an ID. Thus, the match strengths are likewise sampled one at a time. If the face is from the culprit, the match strength is sampled from the target distribution, otherwise, the match strength is sampled from the lure distribution. If the current match strength is less than c1, that face is rejected, the next face is viewed, and the process continues. If the current match strength exceeds c1, the current face is identified, confidence is determined in the same way as for the simultaneous lineup, i.e., by comparison to the response criteria, and the process ends. If none of the match strengths exceed c1, the lineup is rejected. Note that, unlike the model of the simultaneous lineup, predictions from the sequential model depend on the order in which faces are presented.Footnote 1 For example, a suspect is more likely to be identified if viewed earlier in the lineup because there is less of a chance of a filler ID before the suspect is viewed. This detail is important because suspects are often not equally likely to appear in each position. For example, position is randomly selected in only about 60% of police lineups, and in many experimental designs and about 5% of surveyed agencies, the suspect is never in the first or last position (Police Executive Research Forum, 2013). Thus, to calculate the probability of an ID, this procedure must be run for each possible suspect position with the combined results weighted by the probability of each suspect position.

Numerical simulations are typically used to generate model predictions, i.e., the probability of a rejection or a suspect or filler ID at different confidence levels. As discussed below, however, performing analyses on these models can be computationally demanding. It is therefore useful to be able to rely on closed-form solutions. Such equations are provided in Appendix A for a simultaneous lineup and in Appendix B for a sequential lineup (also see Wixted, Vul, Mickes, & Wilson, 2018). The equations work for lineups of an arbitrary size and an arbitrary number of confidence levels.

As discussed previously, it is important to be able to estimate the base rate p, i.e., the probability that a given lineup includes a guilty suspect. The SDT modeling approach provides a method for estimating this value even when the guilt or innocence of each suspect is unknown (Cohen, Starns, Rotello, & Cataldo, 2020; Wixted, Mickes, Dunn, Clark, & Wells, 2015). In an eyewitness memory experiment, the experimenter knows whether each lineup was target present (i.e., included the guilty suspect) or target absent, what we will call full data. In a real-world lineup, what we will call restricted data, this classification is exactly the information of interest. In the latter case, the available data are how often each ID was made (suspect, filler, or reject) and the associated confidence of the ID, and these values are collapsed across both target-present and target-absent lineups. As shown in the appendices, calculating the predictions of the SDT model requires specifying the probability that a lineup will include a guilty suspect. For both simultaneous and sequential lineups, whether the lineup is target present or target absent determines whether the suspect match strength is drawn from the target or lure distribution, respectively. The critical insight, therefore, is that this base rate or target present probability becomes a parameter of the model which, in turn, can be estimated from data. Indeed, prior work has shown that such probabilities can be reliably estimated and applied to field data (Wixted et al., 2015). Cohen et al. (2020) extended these results and clarified how the model is able to estimate base rate by evaluating the relative probability and confidence distributions of suspect and filler IDs. Essentially, the model can produce a higher ratio of suspect to filler IDs by either assuming that a higher proportion of lineups have guilty suspects or assuming that witnesses have better memory (and thus are more likely to identify guilty suspects when they are present). These scenarios are distinguishable because higher memory increases confidence levels for suspect IDs to a greater degree than higher base rates.

In summary, to apply the SDT model, the following experimental design parameters are needed: l, the lineup size and n, the number of confidence levels. In addition, for sequential lineups, the model needs to know the probability that the suspect appears in each lineup position. In principle, the full SDT model has the following parameters: μl and σl, the mean and standard deviation of the lure distribution, respectively; μt and σt, the mean and standard deviation of the target distribution, respectively; c1-cn, the response criteria; and p, the probability that the suspect is guilty. Without loss of generality, it is typically assumed that μl = 0 and σl = 1. For full data, p can be directly set to the observed proportion of a target-present lineups in the data set. For restricted data, p must be estimated. Unfortunately, in this case, not all of the parameters are identifiable. In particular, it is not possible to estimate both p and σt. In this situation, we adopt the solution of estimating p and setting σt =σl = 1, i.e., assuming an equal-variance SDT model. In either the full or restricted case, the model has 2+n parameters. For example, with three confidence levels, the model has five parameters.

Receiver operating characteristic curves

Receiver operating characteristic (ROC) curves are a very convenient and common way to represent SDT data. For lineup data, an ROC curve plots the probability of a suspect ID when the target is present against the probability of a suspect ID when the target is absent at every possible confidence level. Example ROC curves are provided in Fig. 2. The left and right panels display ROC curves for a simultaneous and sequential lineup of size 6 using the SDT parameters from Fig. 1. For the sequential lineup, all suspect positions were assumed equally likely. The x- and y-axes represent the probability of a suspect ID in a target absent and target present lineup, respectively. The curves were determined from the SDT models described previously by sweeping the ID criteria across a wide range. Being able to easily produce such curves is one advantage of the closed-form solution of the SDT model. The dots represent the probabilities at the three response criteria from Fig. 1. ROC curves that lie on the diagonal represent chance performance at selecting the culprit. Performance improves as the ROC curve moves into the upper left corner, where correct IDs are more likely than errors. As the response criteria become more liberal or conservative, the dots will move up or down the curve, respectively.

Fig. 2
figure 2

Example receiver operating characteristic (ROC) curves for a simultaneous (left) and sequential (right) signal detection model

Standard ROC curves typically end at the point (1, 1). Because suspect selection when the target is absent is assumed to be at chance, i.e., selected from the lure distribution, ROC curves for lineup data usually do not reach (1, 1), and so are partial ROCs. The simultaneous ROC ends at the inverse of the lineup size, in this case 1/6, which is the probability of selecting the suspect by chance. The ROC for the sequential lineup is somewhat more complex. If the response criterion is extremely liberal, i.e., any face is highly likely to be selected, only a suspect in the first position will realistically be selected. Thus, the sequential ROC curve will end with both the hit and false alarm rates equal to the probability that the suspect appears in the first position (Rotello & Chen, 2016). In this example, that is 1/6. However, if the suspect appears in the first position with probability less than 1/6, which can easily be the case in sequential lineups, the ROC can reverse and can even end at the origin (see Fig. 12).

The ROC curves in Fig. 2 are model-based. Similar, data-based ROC curves are also common. In such plots, the points represent the probabilities of a suspect ID present in the data and are usually simply connected by a straight line. An example is shown in Fig. 7.

Measuring performance

Within the SDT framework, there are a number of ways to measure subject performance. Perhaps the most common performance measure within the SDT framework is d′, which is the distance between the distribution means in standard deviation units (Macmillan & Creelman, 2005). However, d′ is only defined for the equal-variance model, i.e., when the standard deviations of the two distributions are identical. As discussed previously, the equal-variance model in the sdtlu package assumes that μl = 0 and σl = σt = 1. Thus, under the equal-variance assumption, d′ is identical to the sdtlu estimate of μt, the mean of the target distribution.

Area under the curve (AUC) is commonly used in the lineup context (Mickes et al., 2012; Rotello & Chen, 2016). The AUC measure is derived from the ROC curve. Recall that the ROC curve lies on the diagonal when performance is at chance and moves into the upper-left corner when performance is good. Thus, a natural way to measure performance is to determine the area under the ROC curve. Higher values mean the curve is farther into the upper-left corner and therefore indicate better performance. Because lineup ROC curves do not reach (1, 1), the AUC measure is technically a partial AUC (McClish, 1989). For simplicity, however, we refer to it as AUC throughout. It is important to note, however, that the AUC is affected by lineup length for a simultaneous lineup (Rotello & Chen, 2016) and the probability of a first-position suspect in a sequential lineup. Thus, AUC is a relative, not absolute, measure and care is required when comparing AUC across experiments with different designs. Analytic solutions do not exist, so AUC is calculated by numerical approximation. In the current work, iterative quadrature is used to compute AUC from model-defined ROC functions. Because the numerical approximation becomes unstable when part of the ROC curve is vertical, AUC should be treated with caution in situations like the right panel of Fig. 2. AUC is not defined when the ROC curve is not monotonic on p(suspect | target absent), as is the case for some sequential lineups.

Diagnosticity is another common measure of lineup performance (Wells & Lindsay, 1980), although it has been shown to be confounded with response bias, leading some to conclude that AUC is a better measure of ID accuracy (Mickes et al., 2012; Rotello & Chen, 2016). Nevertheless, diagnosticity is a critical consideration in evaluating how lineup identifications provide evidence about the guilt or innocence of the suspect (e.g., Wells et al., 2015). Diagnosticity is the ratio of the probability of a suspect ID given target present and target absent lineups. A value of 1 means that guilty and innocent suspects are equally likely to be selected, i.e., chance performance. The higher the value, the more likely a selected suspect is guilty. Diagnosticity can be computed directly from data. Here we take a different approach and use the SDT model to estimate diagnosticity. Diagnosticity can be calculated from the SDT model in two different ways. The first method collapses over all confidence levels. That is, the relevant probabilities are calculated without regard to confidence. The second method restricts calculation within a confidence region. That is, the relevant probabilities are calculated based on confidence levels between two response criteria. All other things being equal, diagnosticity increases with confidence level. Two other, related measures of performance are discussed below in relation to the sdtlu package figures.

Equations for AUC and diagnosticity are provided in Appendix C.

The sdtlu package

Overview

The SDT model discussed in the previous section is implemented in the sdtlu (Signal Detection Theory - LineUp) R package, which is publicly available at https://osf.io/mfk4e. The package instantiates functions for the preparation and processing of lineup data, fitting SDT models to data, comparing models across two data sets, generating accuracy measures, graphing data and model results, and simulating experiments. In this section, we introduce how to use sdtlu for each of these tasks. To illustrate the functionality and use of this package, we walk through a simultaneous lineup example and then a sequential lineup example. All references to the sdtlu package, functions, variables, and output files are provided in Courier font. For clarity, the package functions are also shown in bold.

To preview, the main functions are sdtlu_process_data, which processes lineup data, sdtlu_fit, which fits the SDT model to lineup data, and sdtlu_compare_2, which uses bootstrapping to compare parameters from the SDT model across two data sets. The functions sdtlu_sim_sim and sdtlu_seq_sim can be used to simulate simultaneous and sequential SDT lineup data, respectively.

Data

As an example of a simultaneous lineup study, we will use Palmer, Brewer, Weber, and Nagesh (2013), which was also the data set used by Wixted et al. (2016). This is a field study in which 908 participants were asked to identify a culprit from a simultaneous, eight-person lineup in public. Approximately 50% of the lineups were target present and 50% were target absent. Participants provided a confidence rating on an 11-point scale, which the authors collapsed onto a five-point scale by combining some ratings categories. For simplicity, we ignore the other manipulated factors including exposure time and delay, although we revisit delay below. For the target absent trials, we randomly selected one of the eight faces to act as the innocent suspect.

For a sequential data set, we use Gronlund, Carlson, Dailey, and Goodsell (2009). This was a joint lab and online study in which subjects watched a video of a crime and then were asked to ID the culprit from a 6-person lineup and rate their confidence on a 1–7 scale. Because suspect IDs were rarely, if ever, made at the lowest confidence levels, some measures were undefined and the model fits were unstable. To demonstrate the full functionality of the package, we collapsed confidence levels 5, 6, and 7. Thus, there are five confidence levels in the analyzed data. The authors designated innocent suspects. Suspects (both guilty and innocent) were only tested in positions 2 and 5 of the lineups. We ignore all other factors including view quality, lineup bias, and suspect similarity to culprit. This data set includes 1250 trials.

Data format

For use in sdtlu, the data must be a comma-separated (csv) file with the following columns: id_type is the participant’s ID (suspect/filler/reject); conf_level the participant’s confidence level (e.g., 1,2,3,4,5), where 1 is the highest confidence level; culprit_present is whether the lineup was target present or absent (present/absent); and lineup_size is how many individuals were in the lineup. Ten randomly selected rows of the Palmer et al. (2013) data set are shown in Fig. 3. Sequential data would have an additional column, suspect_position, which provides the position of the suspect (an integer from 1 to lineup_size).

Fig. 3
figure 3

Ten randomly selected rows of the Palmer et al. (2013) data set formatted for sdtlu. The whitespace is arbitrary

Processing data

Raw lineup data, like that shown in Fig. 3, can be processed using the function sdtlu_process_data. You can pass the data either as a file name, with the file formatted as described previously, as is done in this example, or as a data frame with the same variables present in a data file. A sample call and output are shown in Fig. 4. We describe this output next.

Fig. 4
figure 4

Output of sdtlu_process_data when applied to the Palmer et al. (2013) simultaneous lineup data set. Values were rounded to 3 decimals

resp_data_restr provides the response counts when collapsed across target present and target absent conditions. This output is useful for real-world studies when it is unknown whether the suspect was guilty, and so it is unknown whether a datum is from a target present or target absent condition. The output is in the following order: suspect ID highest confidence, …, suspect ID lowest confidence, filler ID highest confidence, …, filler ID lowest confidence, reject. In all inputs and outputs, confidence responses range from the highest confidence level to the lowest confidence level, left to right. Because this data set has five confidence levels, there are 11 counts (five suspect, five filler, and one rejectFootnote 2). For example, 100 participants identified the suspect with the highest confidence level, regardless of whether the suspect was the culprit or not.

resp_data_full provides the response counts categorized by target absent and target present condition if those data are available, as they are in this data set and most experimental work. The counts are in the same order as for resp_data_restr, but are shown twice, first for target present trials and then for target absent trials. For example, 98 participants identified the suspect at the highest confidence level, when the suspect was the culprit, and two participants identified the suspect at the highest confidence level, when the suspect was not the culprit.

Although we do not show an example here, lineups of different lengths can be processed together. This functionality is useful for meta-analyses in which data from multiple experiments are combined. When multiple lineup sizes are present, resp_data_restr and resp_data_full provide one row of counts per lineup size.

The next three outputs provide the proportion of response categories, rather than counts. If multiple lineup sizes exist, they collapse across the lineup sizes, i.e., lineup size is ignored. If there is only a single lineup size, the proportions are for that lineup size. overall_resp_data_restr_prop shows resp_data_restr, but as a proportion. For example, 11% of participants, i.e., 100/908, provided a high confidence suspect ID. overall_resp_data_full_prop_joint and overall_resp_data_full_prop_cond provide the count data from resp_data_full as proportions. The former shows them as joint probabilities, e.g., P(resp = susconf = itar = pres), and the latter as conditional probabilities, e.g., P(resp = susconf = i| tar = pres). For example, from overall_resp_data_full_prop_joint we learn that 11% of trials were high confidence, target present, suspect IDs and from overall_resp_data_full_prop_cond we learn that 22% of target present trials were high confidence suspect IDs.

lu_sizes is a vector of the lineup sizes in the data set. In the Palmer et al. (2013) example data, lineups were only of size 8.

pos_prop provides the proportion of times the suspect was present in each lineup position. The distributions of suspects over positions was assumed to be the same for both guilty and innocent suspects. This output is only relevant to sequential lineups and will be discussed below.

ptp is the base rate, i.e., the probability of a target present lineup, if known. In this case, 50.2% of lineups were target present.

ntrials is the number of data points, trials, or participants. This data set had 908 participants.

acc_data provides a measure of accuracy, i.e., the proportion of correct IDs for suspect picks and rejects (e.g., Wixted et al., 2016). This measure is similar to diagnosticity. There is one proportion for each confidence level (five in this example, starting with the highest confidence level) and one for rejections. For suspect IDs, this measure is P(tar = pres|resp = susconf = i). For rejects, this measure is P(tar = abs|resp = rej). For example, out of all high confidence suspect IDs, 98% were actually the culprit, and out of all rejections, the culprit was not present 72% of the time. This measure is only available for full data sets.

sus_g_id_data provides the probability of a suspect ID at a given confidence level, excluding rejections (e.g., Wixted et al., 2016). Each value is taken at a single confidence level (starting with the highest). For example, out of all high confidence IDs, 77% were suspect IDs.

n_resp_cats shows the number of possible response categories. With five confidence levels, this data set had 11 response categories (five suspect + five filler + one reject).

The output for the sequential Gronlund et al. (2009) study is shown in Fig. 5. The output format is identical to Fig. 4. Because suspect position was specified, we learn that approximately 50% of the suspects were shown in positions 2 and 50% in postion 5.

Fig. 5
figure 5

Output of sdtlu_process_data when applied to the Gronlund et al. (2009) sequential lineup data set. Values were rounded to 3 decimals. Confidence levels 5-7 were combined

Fitting SDT data

Perhaps the most fundamental aspect of sdtlu is its ability to fit the SDT model to experimental data. Fitting is done using the sdtlu_fit function. This function takes as arguments experimental data and a set of options.

The data can be in one of three forms. First, the data can be the output from sdtlu_process_data. This method is used in the following example. Second, you can provide a file name and sdtlu_fit will then internally analyze the data using sdtlu_process_data. Third, the data can be a matrix of count data of each response at each confidence level, of the same form as resp_data_full or resp_data_restr. When count data are provided, the function requires the lineup size, i.e., the number of people in the lineup. An example of this third method is also provided below.

There are numerous options available, which are grouped in a list. model_type specifices a simultaneous (‘sim’, default) or sequential (‘seq’) SDT model. fit_fcn allow you to select a method for calculating the fit of the model to the data. The default is to use G2 (‘G2’), but χ2 (‘chi-square’) is also availableFootnote 3. If use_restr_data is TRUE, the data are assumed to be collapsed over target present and absent, as in resp_data_restr. If FALSE (the default), target present and absent are separated, as in resp_data_full. n_fits tells the function how many times to restart the parameter search with a different starting parameter set, which can be useful for avoiding local minima. save_file_name (sdtlu_save.RData, default) and fig_file_name (sdtlu_figs.png, default) allow you to specify where the numeric and graphic outputs are stored.

The arguments fix_p and fix_sigma_t control the behavior of the parameters for the proportion of target present trials p and the standard deviation σt of the target distribution. If set to ‘free’, the associated parameters are free to vary. These arguments can also be set to a value, which fixes the associated parameter to that value. If fix_p is ‘data’, and the base rates are known, p (also referred to as ptp, probability of target present trials, depending on context) is set to the proportion of target present trials in the data (for restricted data, p is allowed to vary). If fix_sigma_t is set to ‘sigma_f’Footnote 4, then σt = σ1 = 1. Recall that, for restricted data, it is not possible to simultaneously estimate the proportion of target present trials p and the standard deviation σt of the target distribution. Thus, when restricted data are used, fix_p and fix_sigma_t should not both be set to ‘free’. For restricted data, the defaults are fix_p = ‘free’ and fix_sigma_t=‘sigma_f’. For full data, the defaults are fix_p=‘data’ and fix_sigma_t=‘free’.

To provide bounds on the fit, parameter, and accuracy values, sdtlu_fit can also bootstrap the data. That is, the data are resampled with replacement (within target present/absent conditions, if available), the fit is repeated on this new sample, and all measures are recalculated. Bounds are then determined from quantiles on these samples. The bootstrap is run if run_bootstrap is TRUE (the default). If you only want point estimates, it is faster to set run_bootstrap to FALSE. The number of bootstrap samples is determined by n_bootstrap_samps. Bootstrapping can be slow, so we suggest trying it with a small n_bootstrap_samps first before running a longer version. The default is 1000 samples. The function returns the .01, .025, .05, .025, .5, .75, .95, .975, and .99 quantiles from the bootstrapped samples. If all samples are required, not just the quantiles, set output_bootstrap_samps to TRUE (default is FALSE). Only one fit is run per bootstrap, with starting parameters determined by the best-fitting parameters to the non-bootstrapped data. If the bootstrap creates an error in the fit function, e.g., the sample is impossible under the model, a warning is provided, and a new sample is drawn.

A sample run for the Palmer et al. (2013) simultaneous data is provided in Fig. 6 along with partial output. First, the data are processed, as in Fig. 4. The options are then set. For this example, we set many of the options to their default value, but for convenience, we only use 50 bootstrap samples. In general, we suggest at least 1000 bootstrap samples.

Fig. 6
figure 6

Partial output from sdtlu_fit with the Palmer et al. (2013) simultaneous lineup data

The output provides means and bootstrapped quantiles for the fit measure, all model parameters, AUC, and diagnosticity. The output also provides the data (in the same order as resp_data_full) and the best-fitting model prediction. Left out of the figure are the fit-by-fit fit measures, the full set of bootstrapped results produced, and the function options. These results are stored, by default, in sdtlu_save.RData in the current directory.

The function also produces a set of figures as shown in Fig. 7, also saved in the current directory. The upper-left panel shows the SDT model as in Fig. 1 with the best fitting model parameters. The color/brightness of the distribution lines represent the probability of a target present or absent sample. Because the base rate is near .5, the two colors are indistinguishable for this example. The upper-center panel shows the model-based ROC as in Fig. 2. The thinner lines are generated from the parameter sets that produced AUC values at the .05 and .95 quantiles of the distribution across bootstrap samples, and thus provide a form of equal-tailed interval on the ROC curve. The upper-right panel shows the data ROC. The bottom-left panel shows how well the model fits the data and provides both the data and model predicted proportions for each response category for each condition. The bottom-center panel provides the data and predicted model accuracy, i.e., the probability of a correct response given either a suspect ID at the different confidence levels or a rejection. The bottom-right panel shows the data and model predicted probability of a suspect ID at different confidence levels. To be consistent with previous work, in the latter three graphs, confidence increases from left to right. The model does an overall good job of accounting for the experimental data, both qualitatively and quantitatively.

Fig. 7
figure 7

Figure generated from sdtlu_fit with the Palmer et al. (2013) simultaneous lineup data. The figure layout has been modified to remove whitespace

We can use the same data to simulate restricted data, i.e., real-world data in which the proportion of target present trials is not known. To do this, we simply collapse the counts at a confidence level across target present and absent conditions. In the sdtlu_fit function, you can do that by setting the use_restr_data option argument to TRUE. Now p is free to vary, and we set σt = σl = 1. Example output is provided in Fig. 8, and the associated figures are provided in Fig. 9. Note that the estimated value of p, .509, is very close to the actual value from Fig. 4 of .502. Because it is impossible to separate the data into target present and absent categories for restricted data, a data ROC curve cannot be drawn.

Fig. 8
figure 8

Partial output from sdtlu_fit with the Palmer et al. (2013) restricted simultaneous lineup data

Fig. 9
figure 9

Figure generated from sdtlu_fit with the Palmer et al. (2013) restricted simultaneous lineup data. The figure layout has been modified to remove whitespace

As mentioned previously, sdtlu_fit can also be applied directly to counts. This application of the function is useful when fitting to previously analyzed data, data from tables, or simulated data. We provide a sample call to the Palmer et al. (2013) restricted in Fig. 10. Because these are restricted data, the counts are in the same order as resp_data_restr, full data would be in the same order as resp_data_full. Because these are count data, the lineup size now needs to be specified as an argument. The output is identical to Figs. 8 and 9.

Fig. 10
figure 10

Call to sdtlu_fit for counts from Palmer et al. (2013) restricted simultaneous lineup data

We can also apply sdtlu_fit to sequential data. This function was applied to the Gronlund et al. (2009) data. The call and output are shown in Fig. 11 and the associated figures are in Fig. 12. The suspect position proportions, which are needed for a sequential model, are included in the processed data, as was shown in Fig. 5Footnote 5. In our experience, the parameter search for the sequential model is more difficult. To avoid local minima, we increased the number of model fits to 10. Because of the unusual shape of the sequential ROC curve, it is also more difficult to compute a stable AUC. AUC is computed by default, but should be checked carefully for numerical stability and non-monotonicity. For similar reasons, it may be necessary to lower the maximum number of recursion steps (max_steps) in the computation of AUC in the sdtlu_auc function. Although the data do show increasing performance with increasing confidence, the model does not fit these data nearly as well as the simultaneous Palmer et al. (2013) data. Indeed, the model fits very poorly. Note the oddly shaped, non-monotonic ROC curve, which is a clear indication that AUC should not be used here. We should note that the sequential model can accurately recover parameters generated from the SDT sequential model, thus, these results strongly suggest that these data were not generated from the sequential SDT model we have implemented. Different assumptions about sequential lineups might result in a better fit.

Fig. 11
figure 11

Partial output from sdtlu_fit with the Gronlund et al. (2009) sequential lineup data

Fig. 12
figure 12

Figure generated from sdtlu_fit with the Gronlund et al. (2009) sequential lineup data. The figure layout has been modified to remove whitespace

It is likely that this poor fit is due, in large part, to the Gronlund et al. (2009) experimental paradigm in which the memory strength of the innocent and guilty suspects was manipulated (see Cohen, et al. 2020, for an in-depth discussion). Specifically, manipulating innocent suspect strength likely violated the signal detection model’s assumption that innocent suspects and fillers come from the same memory distributionFootnote 6. These data, however, serve as an illustration that the model is not overly flexible and cannot fit any data set, which is a desirable quality because it means that finding a good fit is a better indication that the processes generating the data conform to the model’s assumptions. When fit to a different sequential data set from Horry, Palmer, & Brewer (2012), the fit was greatly improved with a G2 of approximately 21 (see OSF for results). This is perhaps not surprising, given that these researchers did not have the goal of manipulating how much the innocent suspect matched the culprit, and thus took careful measures to ensure lineup fairness as assumed by the model, i.e., innocent suspects are no more likely to strongly match the culprit than fillers.

As mentioned previously, sdtlu_fit can be used on data with multiple lineup sizes. When there are multiple lineup sizes, a single set of parameters are estimated across lineup size, however, by necessity, the accuracy measures and predictions are provided separately for each lineup size. In the figures, there will also be one model ROC curve per lineup size. When fitting the model, the model predictions are computed for each lineup size and combined across lineup sizes weighted by the number of trials at that lineup size. These combined predictions are used to compute the fit measure.

Comparing data

The sdtlu package also contains a function, sdtlu_compare_2, to compare parameter values from an SDT model across two different data sets. This function is useful, for example, to researchers who want to know how an experimental manipulation affects the SDT parameters. This function takes many of the same fitting and model parameters as sdtlu_fit. The major difference is that there are now two sets of data (data_1 and data_2) and lineup sizes (lineup_sizes_1 and lineup_sizes_2). You can also name each of the data sets (data_name_1 and data_name_2), which changes labels in the figure. The options are also similar to sdtlu_fit, however, you can also change the output and figure files names including the names of the two individual data sets (save_file_name_1, save_file_name_2 and fig_file_name_1, fig_file_name_2) and the comparison (save_file_name_compare and fig_file_name_compare).

Recall that Palmer et al. (2013) varied delay, i.e., the time between viewing the culprit and participating in the lineup. There were two levels of delay, immediate testing and a week’s delay, what we will refer to as short and long delays, respectively. We compare the recovered SDT model parameter values for the Palmer et al. (2013) short and long delay data. Because the results relied on very small differences, we increased the number of bootstrap samples to 500.

The example calls and output are provided in Fig. 13 and a subset of the figure is provided in Fig. 14. This example sends in the data as counts and lineup sizes. The sdtlu_compare_2 function starts by calling sdtlu_fit on both data sets and provides all of the associated output and figures, as described previously. To avoid repetition, this output is not provided here. The new output is shown in Figs. 13 and 14. To provide a measure of statistical differences, the 2.5, 50, and 97.5% quantiles for the differences in bootstrapped fit measures, parameters, and performance measures are provided along with the proportion of differences that are less than and greater than 0. The differences are for the long condition (data set 1) minus the short condition (data set 2). We consider there to be a significant difference if zero is not included in the 2.5 to 97.5% interval. The range of this interval is determined by the alpha parameter (the default alpha is .05).

Fig. 13
figure 13

Partial output from an example use of the sdtlu_compare_2 function. Comparing parameter values across the short and long delay conditions of Palmer et al. (2013)

Fig. 14
figure 14

Figure generated from1 sdtlu_compare_2 with the Palmer et al. (2013) short and long delay simultaneous lineup data. Only the results of μt and σt are shown. The full figure is shown in Fig. 16 in Appendix D

For parameters, only c4 definitively differs across data sets, although there is a strong trend for overall more conservative responding in the short delay data condition. There are also marginal differences in and μt and σt, with μt tending to be larger and σt tending to be smaller in the short delay condition. These results go along with overall better performance with a short delay. Histograms of the differences for μt and σt are shown in Fig. 14 (the full figure is shown in Fig. 16 in Appendix D). For performance measures, AUC is larger in the short delay condition, a sensible outcome. Diagnosticity is also larger in the short delay condition, but only definitively at confidence levels 2–4 and marginally at confidence level 5.

Simulating SDT models

The sdtlu package provides the ability to simulate the SDT models described previously and provided in the Appendix A and Appendix B. The sdtlu_sim_sim and sdtlu_seq_sim functions simulate data from the simultaneous and sequential SDT models, respectively.

Both functions were run using the parameters from Fig. 1. Recall that the full set of parameters are the proportion of target present trials p, the mean μt and standard deviation σt of the target distribution, and the response criteria c1-cn. These functions also need to know the lineup size (lineup_sizes) and how many trials are being simulated (n_trials). If use_restr_data is TRUE, restricted data is simulated. In addition, the sequential model is provided with a distribution of suspect positions (pos_prop). The function calls and output are provided in Fig. 15. In this case we simulate two experiments (specified by n_sims).

Fig. 15
figure 15

Examples of the use of sdtlu_sim_sim and sdtlu_seq_sim to simulated experimental data

Help and other functions

There are other useful sdtlu functions. To get additional help, after installing the package, typing ??sdtlu will list all of the available functions and a “how to” file that includes another set of examples. Each function also has its own help file that includes examples.

Discussion

Signal detection theory is a powerful framework for analyzing data. This power has been implemented in several existing R packages that are available for the analysis of ROC data. The sdtlu package leverages the power of the signal detection framework specifically for the analysis of lineup data, or other similarly structured data such as the identification of the location of a tumor within a radiological image (Starr, Metz, Lusted, & Goodenough, 1975; Swets & Pickett, 1982). sdtlu provides functions to process lineup data, determine the best-fitting SDT parameters, compute model-based performance measures such as AUC and diagnosticity, use bootstrapping to determine intervals around these parameters and measures, and compare parameters across two different data sets. Both simultaneous and sequential lineups are supported, as well as show-ups. Closed-form solutions are used. The package can also produce a full set of graphs, including data and model-based ROC curves and the underlying SDT model.

To our knowledge, the sdtlu package represents the first R package implementation of equations that can be solved as integrals to define the predictions of the lineup SDT model. This form offers a computationally more efficient way to determine precise model predictions than the simulation methods often used in past studies (although see Wixted et al., 2018). That said, this package can also simulate data by randomly sampling observations from the model, a function that makes it easy to perform parameter recovery simulations, and it provides an easy way to explore position effects in sequential lineups. sdtlu returns a variety of performance measures used by lineup researchers, and it is, to our knowledge, the first package to calculate theoretical AUC measures from the lineup SDT model.

Thus, the sdtlu package offers eyewitness researchers a number of specialized functions that are not available in packages designed for more general applications of ROC analysis. The most downloaded packages are ROCR (Sing, Sander, Beerenwinkel, & Lengauer, 2005, downloaded ~63 k times in the month prior to 8/10/19) and pROC (Robin et al., 2011, downloaded ~46 k times), with the more functionally limited sROC in third place (Wang, 2012, downloaded ~5 k times). ROCR and pROC both offer AUC estimation, including for partial AUCs, as well as statistical comparison of two ROC curves, smoothing of data, and a range of plotting options. They also both provide tools for analysis of ROC data based on continuously valued measures; in psychological research, the reaction-time based ROC is one example (Thomas & Myers, 1972). pROC easily implements bootstrapping of samples and plotting of confidence intervals; ROCR has the advantage of generating predictions for how a new sample will be classified (i.e., as target or lure). One key difference between these packages and sdtlu is that the latter does not just estimate AUC using the empirical ROC points, but defines the theoretical ROC function based on an SDT model designed for lineup tasks and uses iterative quadrature to find the area under this continuous function. Thus, sdtlu provides an option for eyewitness memory researchers who wish to use more theoretically motivated performance measures. Another important consideration is that, among these packages, only sdtlu can estimate the base-rate of lineups that include a guilty suspect. sdtlu also includes functions for generating simulated data sets from an SDT model with either a simultaneous and sequential lineup design, which makes it easy for researchers to conduct parameter recovery simulations or explore model predictions when planning a new study.

In conclusion, the sdtlu package offers a number of unique tools for lineup researchers, and we hope that it will contribute to the growing sophistication in the analysis and interpretation of both empirical and real-world eyewitness identification data.