Background

Balanced decision-making is crucial for adaptive daily functioning. Consequently, anomalies of decision-making processes are present in a range of psychopathological conditions [1,2,3,4]. In more direct relation to the aims of the present study, addiction has been described as a disorder of the ability to make good decisions, namely to make choices regarding the potentially addictive behavior (using drugs, gambling) that overcome impulses, for the sake of more beneficial long-term goals [5,6,7]. Individuals suffering from an addictive disorder thus persist in a harmful behavior in spite of its negative consequences (e.g. economic losses, health, family and work problems) (DSM-5 [8]). In concordance with this observation, a number of studies have tried to test the prediction that addicted patients present domain-general decision-making alterations that could predate the disorder onset, contribute to its chronicity or complication, or occur as a consequence of the addictive process itself [9,10,11].

Among the aspects of decision making that are relevant to understand addictive processes is behavioral flexibility, namely the ability to readjust preferences in response to reward contingency changes in decision-making-under-ambiguity tasks. Inflexibility would be manifested in a transient inability to stop choosing a given option that is no longer advantageous (perseveration), or, more generally, in a difficulty to relearn action-outcome contingencies that depart from the ones that were learnt in the initial acquisition phase.

The relevance of decision-making inflexibility for addictive behaviors stems from the fact that it is hypothesized to reflect domain-general proneness to compulsivity [12, 13], which, in turn, has been theorized to be a transdiagnostic vulnerability factor for addictive disorders and other psychopathologies [14, 15]. In different theories, (a) addictive behaviors transition from being goal-driven to being stimuli or context-driven (habitual/compulsive, [16, 17]), or, alternatively, (b) addiction-related rewards acquire a disproportionate motivational value [18]. Independently of which of these approaches is correct, it seems obvious that being unable to fine-tune the associations between decision options and outcomes, or to behave accordingly, would render the individual more vulnerable to the progression of loss of control over potentially addictive behaviors.

There are several ways to operationalize decision-making inflexibility in the laboratory [19, 20]. However, none of the available protocols is sensitive only to inflexibility. Crucially, reacquisition after contingency change is inextricably linked to more general contingency learning differences, i.e. any individual differences in acquisition will contaminate gross reacquisition differences. So, detecting inflexibility previously requires experimentally dissociating acquisition learning and artifact-free (in)flexibility (for a detailed discussion on this matter and its methodological subtleties, see [21]).

This distinction is not only of methodological importance. There is some consensus that acquisition learning is computationally simpler than adjustment to unsignaled contingency changes. Extinction, for instance, is not just the vanishment of previous conditioning, but a context-dependent learning process about the omission of the reinforcer [22]. Similarly, reversal learning seems to require high-order mechanisms to restructure the set of learned associations (for an updated view, see [23]). So, once the dissociation is granted, computational modelling is required to identify the cognitive processes originating the two parts of such a dissociation [21]. Although the present study concerns only the first stage of this process, the constraints it imposes on computational modelling will be sketched in the discussion section.

The probabilistic learning task and problematic gambling

The most pervasively used task to investigate decision-making inflexibility in the lab is the Probabilistic Reversal Learning Task (henceforth, PRLT). In each trial of this task, two choice options are presented to the learner, one advantageous (more likely to ensue reward; e.g. virtual points or money), and the other disadvantageous (more likely to ensue some kind of punishment). Initially, the individual has no other possibility than making her decision at random, but her choices grow attuned to reward and punishment contingencies as the task progresses. However, at some point, and without prior notice, the contingencies are reversed, and the individual needs to update her preferences on the basis of the new contingencies.

As noted above, decision-making inflexibility can be used as an individual-differences measure of compulsivity [24, 25]. Unfortunately, most previous attempts to compare PRLT performance across groups of individuals with and without addictive behaviors are not free of methodological and interpretational problems. In general, there is no unanimity regarding the best way to measure inflexibility in the PRLT. In a recent meta-analysis and systematic review, van Timmeren et al. [19], found that the studies that used the PRLT do not reveal significant levels of behavioral inflexibility in individuals with gambling disorder. However, this could be due to the diversity of procedures and measures used to operationalize PRLT performance. Different studies used, for instance, the amount of money or points earned [26, 27], the number of correct choices [28,29,30] or the number of consecutive errors after each reversion (i.e. perseverative errors, [31, 32]).

In the present work, we will follow Perandrés-Gómez et al.’s [21] approach to analyze full acquisition and re-acquisition curves in a PRLT with four phases: one in which preferences are first established, and three more resulting from two contingency reversals [33]. The first aim was to determine whether group affiliation (patients with a substance use disorder with symptoms of comorbid problematic gambling vs healthy controls) has any effect on the form of learning curves in each phase, or their variation across phases. Inflexibility can be corroborated by detecting any reacquisition disadvantage in phases with reversed contingencies (2 and 4), relative the ones with contingencies in the original direction (1 and 3) that is experimentally dissociable from global differences in task performance (e.g. phase-independent learning rates or asymptotes). More importantly, we will test whether the effect of contingency reversal depends itself on group (i.e. whether patients show more signs of decision-making inflexibility than controls). Subsequently, we will specifically analyze participants’ performance in relation to the severity of their problematic gambling symptoms, regardless of (or controlling for) group affiliation. By assessing the relationship between gambling severity and PRLT performance, independently of group, we intended to dissociate the effect of problematic gambling from the one of other addictive behaviors.

Decision-making inflexibility in reversed contingency phases of the PRLT has been previously reported to be associated with disordered gambling [21, 29]. This proneness towards compulsivity would explain to some extent the easiness with which disordered gamblers adhere to an initially favorable reinforcement contingency, but are later incapable of abandoning it [34,35,36]. Therefore, from this perspective, PRLT inflexibility would be expected to be linked to higher severity of disordered gambling symptoms, independently of the diagnosis of other addictive disorders.

Methods

Participants

Twenty-five patients under treatment for an addictive disorder were recruited from the centers Centro de Recuperación Nueva Luz and Centro de Recuperación Integral de Alcoholismo y Drogadicción (CRIAD), from Guayaquil, Ecuador. Convenience sampling was used to recruit 25 healthy controls, as closely matched as possible with patients on relevant covariates. Some of the control participants were contacted using announcements in the School of Psychology of the University of Guayaquil, and others were recruited among acquaintances of the patients.

All patients were under treatment for at least one addictive disorder (most of them, for alcohol use disorder), and were diagnosed with the DSM-IV-TR diagnostic criteria. The inclusion criteria for both groups were: (1) being between 18 and 65 years old, and (2) no history of head trauma or neurological problems, and not to be diagnosed with any psychiatric or psychological disorder (apart from the addictive disorder in the group of patients). Patients were included in the sample and considered for further assessments and analyses only if they informed of a previous history of significant problems as a consequence of gambling. Severity of gambling symptoms was assessed with the South Oaks Gambling Screen (SOGS, Spanish version; [37]). Nineteen of the 25 patients actually scored above the SOGS threshold for gambling disorder. The 6 patients who informed to have suffered gambling problems in the past but did not meet the criteria for current gambling disorder were kept in the study sample. The sociodemographic and clinical profile of each group is reported in Table 1.

Table 1 Sociodemographic and clinical features: means, standard deviations, and Bayes factors, expressing support for the alternative hypothesis

Procedure

Each participant was assessed in a single session lasting ~ 2 h. Patients were assessed in the rehabilitation clinics, and control participants in the premises of the School of Psychology of the University of Guayaquil. All the assessments were performed by an Ecuadorian clinical psychologist with a master’s degree in neuroscience. The assessment protocol was divided into four blocks (cognitive tests, computer tasks, paper-and-pencil emotion and personality tests, and a clinical interview). The order of blocks and tasks within each block were counterbalanced for all participants. The instruments used were the Wechsler Intelligence Scale for Adults-III (WAIS-III: vocabulary and matrices [38]), an impulsive behavior scale (UPPS-P [39]), the Sensitivity to Punishment and Sensitivity to Reward Questionnaire (SPSRQ-20 [40]), the South Oaks Gambling Screen (SOGS [38]), the MultiCAGE [41], and the Probabilistic Reversal Learning Task (PRLT [28]). Some of these were however not relevant for the purposes of this study and will not be described here (see [42]; there is an overlap of 12.6% between samples of both studies).

Instruments

The South Oaks Gambling Screen (SOGS, Spanish version; [37]) was originally based on DSM-III-R diagnostic criteria for pathological gambling, but it has been later shown to have good convergence with DSM-IV-TR and DSM-5 [43]. The clinical threshold for gambling disorder has been established at mean score ≥ 5. The Spanish version of this instrument has shown good psychometric properties (Cronbach’s α = 0.94 [37]).

Alcohol and drug misuse were assessed with the eight dichotomous alcohol and drug-related items of the MultiCAGE questionnaire (Spanish version; [41]). The Spanish version of this questionnaire has shown good internal consistency as measured by Cronbach’s α (i.e. all scales presented scores higher than 0.70 [41]). Risk of alcohol misuse was computed as the average response for the 4 alcohol-related items (0–1), and risk of drug misuse as the average for the 4 drug-related items (0–1) of the scale. For the two sub-scales, the threshold for significant risk of misuse has been established at two positive responses (mean score ≥ 0.5) [41].

The Probabilistic Reversal Learning Task (PRLT, [28]), is a computer-based decision-making task, in which the participants have to choose, in each trial, between two different stimuli (by mouse-clicking on one of them). The options consist of two squares of different colors, randomly shifting their positions. The task consists of four phases with 40 trials each. Within each phase, one of the options was “correct” and, when the participant chose it, a symbolic reward was given in most occasions (probabilistically). The other option was “incorrect”, and the participant was notified of the error after choosing it. Participants were rewarded with virtual points, and punished by subtracting points from their account. In this way, phases 1 and 3 were phases with the original contingency sign, and 2 and 4 as phases with reversed contingencies. In phases 1 and 2, the proportion of true/false feedback was 80/20%, whereas in Phases 3 and 4 was 70/30%. This degradation of contingency was introduced to increase uncertainty and thus to avoid close-to-perfect performance in the late phases of the task.

Statistical analyses

The two groups (patients and controls) were first compared in relevant sociodemographics and clinical features using Bayesian Mann–Whitney tests, with default priors as implemented in open JASP software.

PRLT performance was coded trialwise. Each response in each trial was classified as correct (R = 1) if the colored square with the higher probability of reward (in the ongoing phase) was chosen, and incorrect (R = 0) if the color with the lower probability of reward was chosen.

The first PRLT analysis obeyed to a Phase (1, 2, 3, 4) × Trial (1–40) × Group (HC, Patients) design. Response was modeled as a binomial variable with a logit link, using Generalized Linear Mixed-effects Models (GLME), with the glmer function implemented in the lme4 R software package [44]. This analysis is conceptually similar to a logistic regression, but includes both random and fixed-effects factors. In the present case, Phase and Trial were used as within-participant fixed factors, and Group as a between-participant fixed factor. Participant was considered as a random-effects factor, and Trial also as a random slope at the participant level (trial|participant). Additionally, in order to reduce the number of parameters in the model, Trial was treated as a quantitative variable, and was (natural)log-transformed in order to incorporate into the models the general principle that acquisition processes are curvilinear (in relation to Trial), and can thus be modelled as approximately linear in relation to Log-trial. Log-trial was zero-centered with the standard deviation as unit (so Log-trial was expressed in a − 3.16 to 1.07 scale). Standardization is just a linear transformation of the original variable, and is generally recommended for quantitative predictors in this type of models to prevent convergence problems. For presentation purposes, the scale in all figures was restituted to the 1–40 scale.

Main effects in this analysis were thus the ones of Log-trial, Phase, and Group, and the interactions among them. The effect of Phase was decomposed into three orthogonal contrasts [C1 (− 1, − 1, 1, 1), C2 (1, − 1, 1, − 1), and C3 (− 1, 1, 1, − 1)]. The contrast portraying evidence regarding learning inflexibility is C2, as it represents the differential performance in phases with reversed-sign contingencies (2nd and 4th), relative to phases with the original contingency sign (1st and 3rd). Global learning differences are however portrayed by Group and Group × Log-trial main effects.

In order to isolate the contribution of each main effect to model fitting, a saturated model was first fitted. This was contrasted against a simplified one without the Phase × Log-trial × Group interaction, using the Akaike Information Criterion (AIC) and a likelihood ratio test. If the simplified model did not lose fit, it was established as the reference model for further comparisons, and was further simplified by removing the two-way interactions one by one. The same procedure was repeated with marginal effects (Phase, Log-trial, and Group, with the restriction that a marginal effect cannot be removed if it is involved in any of the interactions left in the model in previous steps). Once the best-fitting model was identified, significance of each of the effects in the model was determined using the z statistic, with a p < 0.05 significance level.

A second analysis was aimed at assessing the contribution of SOGS gambling severity to PRLT (both regardless and controlling for group affiliation). A similar backwards hierarchical model fitting procedure was followed, but including SOGS Severity as an individual differences factor (instead of, or along with Group).

Results

Preliminary analyses

Each group consisted of 18 males and 7 females. Education years and Income were not available for one participant, and those two missing data points were imputed using group means. Mean and standard deviation for each group in Age, Education years, and the Monthly income scale are shown in Table 1.

Bayes factors (for the Mann–Whitney U test) yielded support for the alternative hypothesis (BF10 > 3) for Education years, and for the null (BF10 < 1/3) for Age and Monthly income. In other words, the two groups were well matched in Age and Monthly income, but differed in Education years. The same analyses were performed for the SOGS, MultiCAGE drugs, and MultiCAGE alcohol. As expected, given the group sampling procedure, patients were, on average, well above the clinical threshold in the three scales, whereas controls scored clearly within the non-problematic range. Complementarily, neither Monthly income (r = − 0.073, BF10 = 0.201), nor Education years (r = − 0.194, BF10 = 0.429) substantially correlated with SOGS severity across groups.

Before proceeding to the main analyses, we also checked whether log-transforming trial number within phase served the aim of adequately capturing the hypothetical curvilinear shape of the learning function (i.e. the learning process underlying the probability of a correct choice is a linear function of Log-trial), and the shape restrictions imposed by this assumption are not as tight as to not allowing to capture variations of learning curves across levels of the other factors. With that aim in mind, three baseline models were compared (identical to the ones described in the statistical analyses section, except for the non-inclusion of Group, and how Trial was modelled). In the first one (linear), trial was not transformed; in the second one (logarithmic) trial was (natural)log-transformed before entering the model; and in the third one (polynomial) the effect of trial was decomposed into a quadratic and linear component. Both the logarithmic and the polynomial model clearly outperformed the linear one (AIC = 10,509, 10,492, and 10,505, for the three models, respectively), which indicates that the learning process is better conceptualized as curvilinear function of trial. However, despite being more flexible (having less shape restrictions and thus being able to capture a broader range of curves) the polynomial model was outperformed by the logarithmic model. Or, what amounts to be the same, the increase in explained variance does not compensate the increase in complexity of the polynomial model (13 vs 9 parameters).Footnote 1

Fig. 1
figure 1

Predicted values (and confidence intervals) from the saturated model in Table 2, for controls (HC) and patients, across Phase and Log-trial. The vertical axis represents the predicted probability of a correct choice

Fig. 2
figure 2

Predicted values (and prediction confidence intervals) for all Phase × Block conditions in the PRLT, for low and high SOGS level, from the saturated model in Table 4. SOGS reference values were automatically selected as high (+ 1 SD), and low (− 1 SD)

Between-groups differences in PRLT performance

Table 2 shows results for the hierarchical GLME analysis. Removing the three-way interaction from the saturated model (Model 0.a vs Model 1) did not hamper model fit. Removing the Group × Phase interaction or the Group × Trail interaction did not affect model fit either (Model 2.1 and 2.3 vs Model 1). However, removing the Phase × Log-trial interaction from Model 1 (Model 2.2 vs Model 1) did hamper model fit, so that two-way interaction was retained. Further removal of Group from Model 2.4 did not hamper model fit either, so the final, best-fitting model (Model 3) did not contain any marginal or interactive effect of Group.

Table 2 Model selection for PRLT performance in the two-groups sample

Table 3 presents estimates for all effects (OR, odds ratios) in models 1 and 3, along with their confidence intervals and significance levels, resulting from running the models. The directions of these effects are shown in Fig. 1. Predicted values from the saturated model show, that, as expected, the proportion of correct responses increased with Log-trial within phases; and, second, that performance in reversed phases decreased relative to acquisition-sign-contingency phases, regardless of Group. There was no significant evidence that patients made fewer correct choices in general, or in phases with reversed contingencies relative to non-reversed ones (and so the absence of Group marginal or interactive effects in the best-fitting model, and particularly of interactions involving Group × C2).

Table 3 Effect estimates for Model 1 and the best-fitting model (Model 3) of correct choices in the PRLT

PRLT performance as a function of SOGS gambling severity

The patients group consisted of individuals receiving treatment for substance use disorders who also presented gambling problems. So, in order to test gambling problems in a more specific way, the impact of SOGS gambling severity on PRLT was analyzed. We did so by including SOGS score as a continuous predictor, along with its interactions with Phase and Trial, in the best-fitting model from the previous analyses, and reached the best-fitting model following the hierarchical procedure described earlier (Table 4, left panel). Although Group did not have any marginal or interactive effects in previous analysis, for the sake of robustness, effects were also estimated for SOGS effects while controlling for Group and Group × Phase (i.e. Group-related effects that could explain SOGS and SOGS × Phase effects away; Table 4, right panel).

Table 4 Model selection for PRLT performance, including SOGS severity

Interestingly, SOGS interaction with Phase involved only contrast C2 (see Table 5), namely the one reflecting learning inflexibility (reacquisition during phases with reversed contingencies, relative to the one in phases with the original contingency sign). The C2 contrast is significant in all models considered so far, i.e. learning is poorer in reversed contingency phases than in non-reversed ones. However, this pattern was more intense in high-SOGS individuals. As can be seen in Fig. 2, high-SOGS individuals showed a more intense drop in the probability of making a correct choice in Phase 2 relative to Phase 1, and in Phase 4 relative to Phase 3, if compared with low-SOGS.

Table 5 Effect estimates for saturated (0.c) 1 and best-fitting models (7.1) of correct choices in the PRLT (controlling for Group)

Discussion

The first aim of the present study was to test the existence of PRLT differences, and, more specifically, signs of decision-making inflexibility, in a group of patients with addictive disorders and gambling problems, relative to matched controls. As depicted in Fig. 1 (see Additional file 1: Figure S1 for a depiction of observed, instead of predicted, responses), although patients showed less steep within-phase learning functions, between-group differences did not reach significance.Footnote 2 There were no specific learning efficiency drops either in phases with reversed contingencies (phases 2 and 4), relative to non-reversed ones (phases 1 and 3). Some studies have found that addiction may have a generalized deleterious effect on feedback-based decision-making [27, 28], but this does not make learning necessarily more inflexible.

Results were contrastingly different when PRLT performance was analyzed as a function of gambling severity (measured with the SOGS questionnaire). As shown in Fig. 2 (and Additional file 1: Figure S2), participants with stronger disordered gambling symptoms made substantially fewer correct choices in phases 2 and 4, relative to phases 1 and 3, regardless of group. Or, more precisely, the more severe gambling was, the more marked this pattern grew. This result also aligns with the ones from the study by Torres et al. [29], in which gambling intensity (monthly use) was associated with increased reversal costs, restricted to reversed-contingency phases, and, especially, with Perandrés-Gómez et al.’s, [21] in which patients with gambling disorder were found to behave more inflexibly in the PRLT, independently of drug use. To our knowledge, the only study in which this pattern has been reported in patients with substance use disorders is the one by Moreno-López et al. [28], where reversal learning deficits were observed to be associated with cocaine use severity and diminished cerebellar gray matter volume.

In summary, according to our results, restricted effects on reversal deficits (namely, specific difficulties in learning reversed contingencies), can arise independently of differences in general learning deficiencies in the PRLT. As shown here, only the former seem to be associated with degrees of gambling severity; value updating of choice options seems to be more hindered in more severe gamblers.

Beyond this effect, this work also presents some methodological advances. So far, PRLT performance had been assessed either by extracting summary performance indices (e.g. number of perseverative errors; [30, 31]) or analyzing learning curves in a blockwise fashion (number of correct responses per 5-trial or 10-trial block: [28, 29]). These summary parameters, however, present interpretation problems. For instance, individuals reaching higher learning asymptotes in the preceding phase tend to perform transitorily worse in the first trials of the ongoing phase. This means that assessing decision-making inflexibility by means of perseverative errors is likely to confound ‘true’ and ‘apparent’ perseverative errors, with the latter being attributable to pre-reversal differences. Blockwise analyses, in turn, are likely to be insensitive to effects that occur in the trial-by-trial scale. Moreover, our trialwise analyses of responses allows to model them as they really are, dichotomous (0/1) responses, instead of response counts (number of perseverative errors, number of correct choices per block) with distributional features that are seldomly taken into account in standard, general linear model-based analyses.

Taken together, results fit well in the Gambling Space Model formulated by Navas et al. [45] (see also [42, 46]). In this model, articulated as a development of the seminal Pathways Model [47], transition from recreational to disordered gambling is driven by the kind of reinforcement schedules that have been experimentally shown to also facilitate transition from goal-driven to compulsive behaviors. This transition towards ‘gambling-specific’ compulsivity can be speeded or made more likely in vulnerable individuals showing trait-like signs of compulsivity (as also shown by animal translational research; [48]). In view that in gambling disorder there is no chemical agent to hijack reinforcement circuits, individual differences in compulsivity proneness could play a larger role than in substance use disorders. Tentatively, this could explain why signs of compulsivity are easier to detect in patients with gambling disorder than in other populations of addicted individuals, and also why, in the present study, decision-making inflexibility did not emerge in the between-group comparison, but did when gambling severity was specifically taken into consideration.

As noted in the introduction, modelling inflexibility in computational terms is beyond our current aims. However, inflexibility, as operationalized here, arises as a behavioral pattern these models must be able to accommodate. For instance, the experience-weighted attraction model [EWA; 49, 50], includes an experience weight parameter (ρ) to capture the well-known fact that updating becomes slower as experience accumulates. Our finding that high-SOGS individuals are more prone to persevere in the phase of negative feedback (during reversed phases) is likely to be accounted for by an increased experience decay factor, as opposed to more aspects of reward learning (i.e. learning rate, α). Alternatively, our and Perandrés-Gómez et al.’s [21] results may be explained by differences in more complex (sequential) exploratory behavior as captured by another recent model [VSE; 51]. An example of how parameters included in computational models, estimated from behavior in decision-making-under-ambiguity tasks (i.e. the Iowa Gambling Task), can be used as individual differences variables to predict clinically-relevant gambling behavior can be found in a recent work by Kildahl et al. [52].

Limitations and final remarks

The present work is not free of limitations. First, convenience sampling did not allow carrying out an a priori power analysis. Although taking all observations into account increases power (relative to, for example, block-wise analyses), 25 participants per group are probably still insufficient to reach adequate power for all relevant effects.

Second, the cross-sectional nature of the study does not allow either to make strong claims about causal directionality. More specifically, the fact that inflexibility is associated with gambling severity is no direct proof that gambling severity and inflexibility are causally related in one way or the other.

And third, and relatedly, controlling for group (that is, for the diagnosis of a substance use disorder) when testing SOGS-inflexibility association does not stand as a strong control of other potentially relevant variables as, for example, cognitive deterioration or drug use beyond the established diagnosis. Although MultiCAGE measures are available for all participants, this questionnaire is a screening test, and thus not adequate as a continuous measure of addiction severity.

In summary, the evidential value of the present findings must be assessed in combination with previously reported signs of proneness to learning inflexibility (i.e. domain-general compulsivity) in patients with disordered gambling. Seemingly, inconsistency in previous research can be attributed to differences in the way in which learning inflexibility in the Probabilistic Reversal Learning Task (PRLT) is operationalized. In the present study, we adopted a mostly-data driven approach to identify specific signs of learning inflexibility (anomalies restricted to reacquisition in phases with a reversed contingency sign). Only patients with more severe symptoms of problematic gambling showed specific signs of learning inflexibility. These results align with gambling disorder models in which domain-general compulsivity is linked to vulnerability to develop gambling-specific problems with exposure to gambling opportunities.