Evidence suggests that stress negatively correlates with cognitive functioning (e.g., Banks & Boals, 2016; Hyun, Sliwinski, & Smyth, 2018; Klein & Barnes, 1994; Klein & Boals, 2001a; Shields et al., 2017; Sliwinski, Smyth, Hofer, & Stawski, 2006). In a highly cited and continually influential article (cited 330 times overall and 122 times since 2015 on Google Scholar as of December 28, 2019), Klein and Boals (2001a) found that subjects who perceived more negative stressful life events (over the course of their lifetime) had lower working memory capacity than did subjects who perceived less negative stressful life events. That is, increased self-reports of events as both negative and impactful on the Life Experiences Survey (LES; Sarason, Johnson, & Siegel, 1978) were negatively correlated with working memory capacity, as measured by a complex span task: in Experiment 1, r(20) = −.46, p = .03 (this was reported as being < .01 in the original manuscript, but it appears to be an error); in Experiment 2, r(64) = −.36, p < .01. Klein and Boals (2001a) speculated that perceiving negative life stress increases the number of unwanted thoughts. These unwanted thoughts compete with other ongoing cognition for cognitive resources, thus limiting cognitive abilities. In the current paper, we investigated Klein and Boal’s (2001a) claim and speculation based on their Experiments 1 and 2.

Klein and Boals (2001a) are not alone in reporting evidence for a link between negatively perceived life event stress measured by the LES and cognition. Yee, Edmondson, Santoro, Begg, and Hunter (1996) found that cumulative negative life event stress was significantly associated with poorer accuracy on a sentence verification task, r(84) = −.23, p = .03, but not to an explicit, r(84) = −.21, p = .052, or implicit memory task, r(84) = −.13, p = .23, suggesting that life event stress may disrupt active information processing carried out by the working memory system. However, other studies using the perceived negative stress measured by the LES have failed to find a statistically significant relation with working memory capacity (Banks & Boals, 2016). More recently, Korten, Sliwinski, Comijs, and Smyth (2014) using a modified LES that asked if events had occurred in the last 12 months (the unmodified LES asks about events in three time periods: within the last 6 months, 7 months–1 year ago, and over 1 year ago), tested if two temporal variants of perceived life event stress (past [when the event occurred] vs. current severity; the unmodified LES asks about severity “at the time of the event”) were associated with measures of working memory capacity. Contrary to Klein and Boals (2001a), Korten et al. found that only the current rated severity of the life event stress related to working memory capacity. The number of events and the past severity ratings (what Klein and Boals used) did not.

Some evidence has accrued suggesting that mind wandering may be a link between life event stress and reduced working memory capacity. Besides a large amount of work finding a negative relation between the propensity to mind wander and working memory capacity (McVay & Kane, 2009, 2012a, 2012b; Meier, 2019; Robison & Unsworth, 2015; Unsworth & McMillan, 2013), Yee et al. (1996) included a one-item (7-point Likert-type scale with higher numbers indicating more mind wandering) self-report measure of mind wandering to examine the relationship between mind wandering and life stress. Consistent with Klein and Boals’ (2001a) speculation on the process linking negative life event stress and unwanted thoughts, responses to Yee et al.’s (1996) mind wandering item was positively associated with perceived negative life event stress, r(84) = .32, p < .01. Moreover, Banks and Boals (2016) provided evidence that negative life events impact working memory capacity by increasing mind wandering. In a model with a measure of the frequency of avoidant and intrusive thoughts experienced during the past 7 days (Impact of Events Scale; Horowitz, Wilner, & Alvarez, 1979) and mind wandering (measured with probes during working memory tasks) mediating the relation between negatively perceived life stress and working memory capacity, a (numerically) small but statistically significant effect was reported for the indirect path linking negative life event stress and working memory capacity.

It seems reasonable that experiencing stress related to negative life events may increase the amount of negatively valenced mind wandering and thus hurt working memory performance. Indeed, Banks, Welhaf, Hood, Boals, and Tartar (2016) found that only negatively valenced mind wandering reports related to impaired performance on both working memory and sustained attention tasks. The tendency to ruminate is one way in which previous life stress could be transported to the current moment to affect cognition. Rumination has been linked to deficits in controlling the contents of working memory (Joormann & Gotlib, 2008). Notably, Joormann, Levens, and Gotlib (2011) found that working memory control deficits were specific to negatively valenced items.

While the evidence suggests that intrusive thoughts may be a mediator of a relation between perceived life event stress and working memory capacity, dispositional mindfulness has potential to moderate the relation along such a path. Mindfulness training has been shown to improve working memory functioning, limit instances of mind wandering, and increase sustained attention (Chambers, Lo, & Allen, 2008; Mrazek, Franklin, Phillips, Baird, & Schooler, 2013; Zeidan, Johnson, Diamond, David, & Goolkasian, 2010). Mindfulness may act as a buffer against the negative effects of stress on working memory (Banks, Welhaf, & Srour, 2015; Jha, Stanley, Kiyonaga, Wong, & Gelfand, 2010). For example, Jha et al. (2010) compared working memory capacity between predeployment Marines who completed 8 weeks of mindfulness training to those who had not. Measures of working memory capacity were obtained at the beginning and end of the study. Subjects in the mindfulness training group who spent more time practicing showed increased working memory performance. Military subjects who did not participate in mindfulness training showed decreases in working memory performance from Time 1 to Time 2.

The current study

The current study tested if cumulative perceived life event stress is associated with working memory capacity and if intrusive thoughts may lead to such an association. We had four main goals: (1) Because the evidence is mixed for an association between cumulative perceived negative life event stress and working memory capacity, but the Klein and Boals (2001a) study still exerts a strong influence on research, we wanted to get a precise estimate of this association. (2) We tested whether overall mind wandering, the sum of the negative thought reports, or the propensity to ruminate linked perceived life event stress and working memory capacity. (3) We tested whether dispositional mindfulness moderates the association between life stress and working memory capacity. (4) And finally, we wanted to replicate and extend the work on the association between mind wandering valence and task performance. To these ends, we had subjects complete measures of rumination, dispositional mindfulness, life event stress, working memory capacity, and a sustained attention to response task with embedded valence thought probes followed by probes asking about the depth of mind wandering.

Method

We report how we determined the sample size, all data exclusions, all manipulations, and all measures in the study (Simmons, Nelson, & Simonsohn, 2011). This study was preregistered on September 7, 2018 (https://aspredicted.org/zp2w6.pdf).

Subjects

A total of 371 undergraduate students at Western Carolina University (mean incoming student SAT scores range from 1085 to 1118 for cohorts entering Fall 2016 through Fall 2018) completed the informed consent for this study (one subject withdrew during the first task [i.e., operation span task]). We collected demographic data from 368 of these subjects (data from three subjects were lost due to technical errors; one student erroneously reported an age of 1 in the demographics; that age was not included in these statistics; one subject did not respond when asked their age). Of these 368 subjects, 59% were female (four subjects did not respond). Subjects had a mean age of 19 years (SD = 1.5; three subjects did not respond). Of the subjects who gave ethnicity information, 278 identified as White, 41 as Black, 18 as multiracial, 19 as other, and 10 as Asian (two subjects did not respond). As compensation for their participation, subjects received partial credit for a course requirement. The stopping rule for data collection was the end of the semester in which we had collected data from at least 225 subjects. This sample size was chosen on the basis that correlations as weak as ρ = .10 stabilize within a narrow window when approaching 250 subjects (Schönbrodt & Perugini, 2013), thus allowing precise estimates. Data collection terminated at the end of the Spring 2019 semester.

General procedure

In one 90-minute session, either individually or in groups of two, subjects completed measures in the following order: operation span, symmetry span, sustained attention to response task (SART), Five Facet Mindfulness Questionnaire–Short Form, Ruminative Response Scale, Life Experiences Survey, and a brief demographics questionnaire. All tasks were administered on computers using E-Prime software (Psychology Software Tools, Pittsburg, PA). Before completing these measures, subjects completed an informed consent, and after completing the measures, subjects were debriefed and given an opportunity to ask questions.

Measures

Complex span tasks

To limit the amount of time for each experimental session and therefore maximize the number of subjects we could test, we used shortened complex span tasks (Foster et al., 2015) to measure working memory capacity. Subjects completed one block each of two complex span tasks (the traditional variants of these tasks use three blocks). Using two one-block complex span tasks provides superior measurement properties over using one complete (three-block) span task (Foster et al., 2015). More specifically, Foster et al. (2015) found that using one block of a symmetry and operation span task (what is used in the current study) accounted for 79% of the variance in fluid intelligence (a latent variable consisting of variance from three measures) accounted for by a battery of three complete (i.e., three-block) complex span tasks while a three-block operation span task (like that used by Klein and Boals, 2001a) only accounted for 64% of this variance.

Operation span (Unsworth, Heitz, Schrock, & Engle, 2005)

Following practice with math problems and remembering letters first individually and then combined, subjects began the scored task. In the scored task, subjects were presented with a math problem that they had to judge as true or false (e.g., [2 × 2] + 1 = 5; half were true). After the math problem, subjects were presented with a letter to remember from a set of 12 possible letters (presented for 1 second each). This sequence was repeated three to seven times resulting in five trials (i.e., each set size was presented once). The order of trials was random. At the end of each trial, subjects were presented with a grid of all 12 possible letters and asked to select the letters in the correct order (that they were presented in) by clicking on them using a computer mouse. Scores were calculated by adding the total number of letters recalled in the correct serial order (Conway et al., 2005). The maximum score for this task was 25. Cronbach’s alpha for one block of this task has been reported as .69 (Foster et al., 2015).

Symmetry span (Kane et al., 2004)

In this task, subjects were asked to recall a series of red squares presented within a matrix while simultaneously engaging in a symmetry-judgement task. Subjects are first presented with a symmetry-judgement task in which they are shown an 8 × 8 black matrix with some of the squares colored black and asked to decide if the image is symmetrical about its vertical axis. This was followed immediately by a 4 × 4 matrix, with one cell filled in (red), for 650 ms. During the recall phase, subjects indicated the location and sequence of red squares in the previous displays by clicking on the cells in an empty matrix. There was one trial of each set size, ranging from two to five. Set sizes were presented in random order. Scores were calculated by obtaining the number of correct items recalled in the correct position. The maximum score for this task was 14. Cronbach’s alpha for one block of this task has been reported as .61 (Foster et al., 2015).

We created a working memory composite score by averaging z scores on both the symmetry and operation span tasks. Using a composite score for working memory is consistent with evidence indicating that measured differences in working memory performance are largely due to domain-general processes (Kane et al., 2004).

Sustained attention to response task (SART; Kane et al., 2016)

The SART is a go/no-go task designed to measure sustained attention. Subjects were required to respond quickly to all nontarget stimuli and withhold responses to target stimuli (Robertson, Manly, Andrade, Baddeley, & Yiend, 1997). Subjects responded to nontarget stimuli by pressing the space bar. The nontarget stimuli are words from one category (animals; 89% of trials). Words from a second category, vegetables (11% or trials) served as the target stimulus (see McVay & Kane, 2012a). The SART consisted of 540 total trials, divided into four blocks, each consisting three miniblocks of 45 trials. During each miniblock, the task presented 40 nontarget stimuli (animal names) and five target stimuli (vegetable names). To the subjects this task appeared to be one continuous block. On all trials, subjects were presented with a word for 300 ms followed immediately by a mask for 1,500 ms. Dependent measures for this part of the task were d' (i.e., hit rate to animals minus false-alarm rate to vegetables) and the standard deviation (SD) of RTs to “go” (animals) trials.

Additionally, we used the SART to measure the frequency and valence of mind wandering by having subjects complete imbedded thought probes (nine probes per block, 36 probes total). During the presentation of the thought probe, subjects were asked to respond to the prompt, “What were you just thinking about?” Response options included: “a. Task-related thoughts pertaining to the current task”; “b. Task-related evaluative thoughts–positive”; “c. Task-related evaluative thoughts–negative”; “d. Task-unrelated thoughts, neutral content”; “e. Task-unrelated thoughts, positive content”; and “f. Task-unrelated thoughts, negative content.” We operationalized mind wandering as thoughts unrelated to the task. We created a mind wandering score by summing all off-task response options selected and then dividing by the number of probes presented to calculate a percentage of mind wandering (d, e, and f). We calculated scores for positive, negative, and neutral mind wandering reports by summing each off-task response option individually (d, e, and f). Following these thought probes, subjects were asked about how off-task or on-task they were on a 5-point Likert scale (1 = completely on-task, 2 = mostly on-task, 3 = both on-task and off-task, 4 = mostly off-task, 5 = completely off-task). Within the manuscript, we refer to these as depth ratings.

Questionnaires

Life Experiences Survey (LES; Sarason et al., 1978)

The LES is a self-report questionnaire designed to measure the amount of life stress that a subject has experienced. The LES presented 47 events (e.g., death of close friend, serious injury of close family member). In this computerized version, the first screen contained the prompt: “Have and when did you experience [the event].” This screen provided the following answer options: 1 = Never; 2 = 0–6 months ago; 3 = 7–12 months ago; 4 = over 1 year ago. Following this screen, subjects saw this prompt: “Indicate the extent to which you view the event as having either a positive or negative impact on your life,” with the answer options of numbers ranging for −3 to +3 and “never experienced.” Subjects responded to these prompts by pressing a number on the keyboard that corresponded with their intended answer. Sarason et al. (1978) report test–retest reliabilities from 0.56 to 0.88. We calculated perceived negative life event stress by summing all negative ratings made by the subject (as was done in Klein and Boals, 2001a). We created two scores. One score was for events in the 0–6 months ago (i.e., recently perceived life event stress) and the other was a total score for all reported events regardless of the time since event.

Ruminative Response Scale (Nolen-Hoeksema & Morrow, 1991)

Subjects were presented with 22 items related to ruminative response styles. These statements were either self-focused, focused on the consequences/causes of the subject’s moods, or symptom focused. Subjects indicated how frequently each statement occurs on a scale of 1 (almost never) to 4 (almost always). We obtained scores by summing all responses, with higher scores indicating higher levels of rumination. For this scale, the internal consistency has been reported as .90 and the test–retest correlation as .67 (Treynor, Gonzalez, & Nolen-Hoeksema, 2003).

Five Facet Mindfulness Questionnaire–Short Form (Bohlmeijer, ten Klooster, Fledderus, Veehof, & Baer, 2011)

A 24-item questionnaire designed to measure dispositional levels of mindfulness and encompasses five facets; observing (four items), describing (five items), acting with awareness (five items), nonjudging (five items), and nonreactivity (five items). Subjects were asked to rate their responses on a 5-point Likert-type scale ranging from 1 (never or very rarely true) to 5 (very often or always true). Scores were then added to form a total score. High scores indicate more mindfulness. We used the total score (and not the facet scores) of this task for all analyses. The internal consistency (alpha) of this measure (overall) has been reported with a range of .70 to .91 (Bohlmeijer et al., 2011).

Results

We conducted analyses in the R system for statistical analysis (R Core Team, 2017). Data, analysis code, and outputs are available at the following link: https://osf.io/shxb6/ . We carried out the linear mixed models (LMMs) using the “lme4” package (Bates, Maechler, Bolker, & Walker, 2015). P values in the LMMs were computed using the Satterthwaite approximation contained in the “lmerTest” package (Kuznetsova, Brockhoff, & Christensen, 2017). The Satterthwaite approximation has been shown to produce p values in line with actual false positive rates (Luke, 2017). Bayes factors (BFs) for correlations were computed with the “BayesFactor” package (Morey & Rouder, 2018).

Data loss

We dropped all data for seven subjects based on notes in our subject log: two subjects who were noncompliant with instructions across tasks; one subject who fell asleep during the SART task; three subjects who reported hitting incorrect response keys for part of the Life Experiences Survey Questionnaire; and one subject for whom English was a second language. We made these decisions without consulting the subject’s data. We removed seven subjects for exceeding our preregistered criterion for SART nontarget response time standard deviation (NTSD). After accounting for NTSD criteria and subject log exclusions, 356 subjects were included in analyses. Due to computer error, we were missing data from four of these subjects in the operation span task and one subject for the Rumination questionnaire. One additional subjects’ operation span data were lost due to a fire alarm. Additionally, 18 thought probe responses (across all subjects) were removed for falling outside of the range of acceptable values (1–6), and for one subject the responses to the depth of mind wandering probes were not recorded. Not all subjects had a span score for each complex span task. For these subjects, working memory capacity (WMC) composites could not be formed, so they were not included in analyses using this composite. For all analyses, we used the maximum amount of data available after accounting for data loss and exclusions; therefore, Ns differ across analyses. Descriptive statistics and correlations for key variables can be seen in Tables 1 and Table 2, respectively.

Table 1 Descriptive statistics
Table 2 Correlations among measures with coefficient alphas (for uncombined measures) presented on the diagonal

Estimating the association between life event stress and working memory capacity

Correlations

In addition to the frequentist interpretation of these correlations, we also examined them using BFs. The BF allowed us to assess if the correlation estimate is more likely from a point-null distribution (i.e., null hypothesis) or from a Cauchy distribution where 50% of the distribution lies between −.33 and .33 (i.e., the alternative hypothesis). This specification of the alternative model was chosen because it best represented the magnitude of the estimates provided in Klein and Boals (2001a). Here, numbers greater than one supported the alternative hypothesis, and numbers less than one supported the null hypothesis (of no association). Total negative life event stress was not significantly correlated with working memory capacity, r(349) = −.02, p = .70, 95% CI [−.13, .08], BF10= .13, with data being more consistent with the null hypothesis by a factor of 7.7 (see Fig. 1), or mind wandering, r(354) = .06, p = .28, 95% CI [−.05, .16], BF10= .22, with data being more consistent with the null hypothesis by a factor of 4.5 (see Fig. 2). We thought recently experienced life stress (i.e., experienced within the last 6 months) may exert a greater influence on working memory capacity than total life event stress would, so we estimated the associations with recent life event stress as well. Like our findings with total life event stress, recent life event stress was not associated with working memory capacity, r(349) = −.04, p = .48, 95% CI [−.14, .06], BF10= .17, or mind wandering, r(354) = .07, p = .17, 95% CI [−.03, .18], BF10 = .31 with both estimates favoring the null hypothesis.

Fig. 1
figure 1

Correlation between total negative life event stress and working memory capacity (WMC). Histograms for each variable are presented across from each axis. BF10 = Bayes factor with numbers less than one favoring the null hypothesis and numbers greater than one favoring the alternative hypothesis

Fig. 2
figure 2

Correlation between total negative life event stress and SART mind wandering (MW). Histograms for each variable are presented across from each axis. BF10 = Bayes factor with numbers less than one favoring the null hypothesis and numbers greater than one favoring the alternative hypothesis

Klein and Boals (2001a) suggested (but did no statistical tests to confirm) that negative life event stress is more strongly associated with working memory capacity when the working memory task is more difficult. That is, they reported that performance on the longest sets of the working memory task was more associated with perceived negative life event stress (i.e., r = −.38 for a set size of seven vs. r = −.21 for set sizes of five and six). Although not listed in our preregistration as part of the proposed analyses, we explored this relationship by calculating subjects’ scores on the highest two set sizes per span task (operation span set sizes of six and seven; symmetry span set sizes of four and five). The correlation between the working memory composite for these most difficult items and total life event stress was estimated as r(349) = −.02, p = .67, 95% CI [−.13, .08], BF10 = .14. The correlation between these difficult working memory items and recent life event stress was r(349) = −.03, p = .52, 95% CI [−.14, .07], BF10 = .15.

Mediation analyses

Despite the lack of a significant direct association between life event stress and working memory capacity, it remained possible that an indirect effect—through a mediating variable—was present. Specifically, it was possible that this indirect effect was masked by unidentified countervailing effects (Zhao, Lynch, & Chen, 2010). That is, life event stress may have had an effect on working memory that was masked by opposing forces. In separate models (using the “Lavaan” package; Rosseel, 2012), we tested if either overall mind wandering propensity, the sum of negatively valenced thought reports, or rumination statistically mediated the relation between negatively perceived life events and working memory capacity. We ran these models twice. Once with total life event stress and once with recent life event stress as the dependent variable. We requested 5,000 nonparametric bootstrap samples to provide estimation of both direct and indirect effects.Footnote 1 We judged mediation present if the bootstrap confidence interval for the indirect effect did not include zero (Hayes & Rockwood, 2017). As can be seen in Table 3, none of the indirect paths met our criterion for statistical significance.

Table 3 Mind wandering propensity, sum of negatively valenced thought reports, and rumination as mediators of perceived negative life event stress

Does mindfulness moderate the relation between life event stress and working memory capacity?

To determine if the indirect effects tested above of negative life events stress on working memory capacity were moderated by mindfulness, we first conducted a linear model with life event stress, the total mindfulness score, and their interaction as predictors of working memory capacity. We did this for both temporal variants of life event stress. In the model with total life event stress, none of the predictors accounted for unique variance in working memory capacity: total life event stress, b = −.04, SE = 0.05, t = −0.7, p = .46, mindfulness total, b = 0.003, SE = 0.008, t = 0.4, p = .68, and the total life event stress by mindfulness total interaction, b = 0.0005, SE = 0.0007, t = 0.7, p = .47. Again, in the model with recent life event stress none of the predictors accounted for unique variance in working memory capacity: recent life event stress, b = −0.006, SE = 0.08, t = −0.1, p = .94, mindfulness total, b = 0.008, SE = 0.006, t = 1.2, p = .21, and the recent life event stress by mindfulness total interaction, b = 0.000003, SE = 0.001, t = 0.0, p = .99.

We also assessed the potential of mindfulness to moderate the effects of perceived life event stress on working memory capacity by adding mindfulness to the same mediation models as those run above (this time using the “processr” package; White, 2019). That is, we added mindfulness (i.e., total mindfulness score) as a moderator to the mediation models where the relations between life event stress indexes and working memory capacity were mediated by overall mind wandering propensity, the sum of negatively valenced thought reports, or rumination scores. Because we had no prior conviction on what path mindfulness should have its effect, we ran two models for every previously tested mediator. In the first model (depicted in Fig. 3a) for each mediator we tested if mindfulness moderated the path between life event stress and the mediator. In the second model (depicted in Fig. 3b), we tested if mindfulness moderated the path between the mediator and working memory capacity.

Fig. 3
figure 3

Moderated mediation models. a Shows moderation between predictor and mediator. b Shows moderation between mediator and dependent variable. LES = life event stress

Because in the prior mediation models we already reported parameter estimates for all paths, here we report only the index of moderated mediation (all parameter values are available in https://osf.io/shxb6/). In none of the 12 models run (six models for total life event stress and six models for recent life event stress) did we find evidence for moderated mediation. In the models with total life event stress (estimate from model 6a reported first), all the confidence intervals for the index of mediated moderation contained 0: mind wandering propensity ([-.0001,.00008], [-.0000001, .000008]) sum of negatively valenced thoughts ([-.00006, .0002], [-.00004, .0002]), and rumination scores ([-.0001, .00004], [-.0003, .0008]). In the models with recent life event stress all the indexes of mediated moderation contained 0 as well: mind wandering propensity ([-.00005,.0004], [-.00001, .000009]) sum of negatively valenced thoughts ([-.00009, .0002], [-.0003, .0002]), and rumination scores ([-.00007 .0002], [-.002, .002]).

Is negatively valenced mind wandering more disruptive to performance than other minding wandering?

To test if negatively valenced mind wandering is more disruptive to performance relative to positively valenced and neutral mind wandering as found in Banks et al. (2016), we conducted separate LMMs predicting nontarget response time standard deviation (RTSD) and SART target accuracy. In the model predicting RTSD, we included all trials with a mind wandering response that were preceded by four consecutive nontarget trials where an RT was recorded. The RTSD in these models is the standard deviation of these four trials. We limited this analysis to subjects who had at least five trials where four nontarget RTs preceded the thought probe screen. One reason to examine RTSD in addition to accuracy is concerns about mind wandering reports being reactive to how the subject performed on the preceding target trial. Subjects are generally aware that they have made an error, but it is assumed that subjects are not aware of the RT variability preceding the target trial. Thus, to be more confident that the valence of the thought report is driving the relation with impaired performance (and not the other way around), negatively valenced thought probes should negatively predict performance in both models (i.e., accuracy and RTSD) above and beyond the other off-task reports.

The model predicting accuracy (also restricted to subjects who indicated mind wandering at least once and trials where a mind wandering response was made) was a generalized LMM to account of the dichotomous outcome variable (i.e., accurate or not). LMMs accommodate unbalanced data without a loss of power and account for the non-independence of data by using subjects as a random variable (Kliegl, Wei, Dambacher, Yan, & Zhou, 2010). We planned on using random intercepts and slopes in these models as a more maximal random effects structure prevents alpha inflation (Barr, 2013). When we ran these models with random slopes and intercepts, we received warnings suggesting that the models have been overfit and thus the estimates may not be stable. To remedy this, we changed the random effect structure so that only the intercept was random (estimates from both models minimally differed and are presented in our supplementary materials). In these models, neutrally valenced probes are the reference level.

In the model predicting RTSD (N = 206; this smaller n is the result of the filtering conditions described in the previous paragraph; see Fig. 4 for subjects’ mean RTSD for each probe response), negatively valenced thought probe responses did predict unique variance above and beyond neutrally valenced, b = 20.3, SE = 5.6, t = 3.6, p < .001, and positively valenced mind wandering thought reports, b = 15.48, SE = 6.6, t = 2.4, p = .02. Positively valenced thought probes did not predict significantly more variance than neutrally valenced thought reports, b =4.8, SE = 5.4, t = 0.9, p = .38. In the model predicting accuracy (N = 330; see Fig. 5 for subjects’ mean accuracy for each probe response), negatively valenced responses predicted unique variance in accuracy compared to neutrally valenced responses, b = −0.84, SE = .11, z = −7.6, p < .001 and positively valenced thought probes, b = 0.67, SE = .13, z = −5.3, p < .001. Neutrally and positively valenced thought reports did not significantly differ from one another, b = −.17, SE = .11, z = −1.6, p = .11.

Fig. 4
figure 4

Green-filled circles represent subject means. Triangles are the distribution mean. Horizontal lines in the middle of the box are the median. Lower hinge represents the 25% of the distribution, and the upper hinge represents the 75% of the distribution. The whiskers extend 1.5 times the interquartile range from the upper and lower hinges. To the right of boxes and circles are density distributions of subject means. MW = mind wandering. (Color figure online)

Fig. 5
figure 5

Green-filled circles represent subject means. Triangles are the distribution mean. Horizontal lines in the middle of the box are the median. Lower hinge represents the 25% of the distribution, and the upper hinge represents the 75% of the distribution. The whiskers extend 1.5 times the interquartile range from the upper and lower hinges. To the right of boxes and circles are density distributions of subject means. MW = mind wandering. (Color figure online)

To better understand why negatively valenced thought probes were more detrimental to performance than were neutral and positively valenced thought probes, we examined depth ratings. We conducted a model on all mind wandering responses with thought probe valence predicting the depth ratings. In the model, neutrally valenced mind wandering was the baseline condition. As can be seen in Fig. 6, subjects reported being the most off-task for negatively valenced mind wandering responses followed by positively and neutrally valenced mind wandering responses. This was confirmed by the model where both positively valenced, b = .46, SE = .06, t = 8.1, p < .001, and negatively valenced mind wandering reports, b = .82, SE = .06, t = 13.5, p < .001, were associated with greater depth ratings than neutrally valenced mind wandering reports. Furthermore, negatively valenced mind wandering reports were associated with greater depth ratings than positively valenced reports, b = .36, SE = .07, t = 5.1, p < .001.

Fig. 6
figure 6

Plum-filled circles represent subject means. Triangles are the distribution mean. Horizontal lines in the middle of the box are the median. Lower hinge represents the 25% of the distribution, and the upper hinge represents the 75% of the distribution. The whiskers extend 1.5 times the interquartile range from the upper and lower hinges. To the right of boxes and circles are density distributions of subject means. MW = mind wandering. (Color figure online)

Moreover, we conducted within-subject models to test if depth probe responses statistically mediate the relation between mind wandering valence and performance on the SART. We did this with separate models for RTSD and accuracy. Here, we focused on the difference between negatively valenced and neutrally valenced thought probes. That is, only thought probe responses of neutrally valenced and negatively valenced mind wandering were considered. For these analyses, we used the “bmlm” package (Vuorre & Bolger, 2018). This package allows for Bayesian estimation of parameters in multilevel models.

In the model of RTSD, the total effect of thought probe response valence (i.e., the differences between neutral and negatively valenced thought probes) was estimated as 9 ms with 95% of the most plausible values of this parameter falling between 3 ms and 15 ms (these are often called credibility intervals). The mediation effect (i.e., the indirect effect) was 1 ms with 95% of the most plausible values falling between −.1 ms and 3 ms. After taking depth probes into account the path between valence thought probe responses and RTSD was estimated at 8 ms with 95% of the most plausible values falling between 1 ms and 14 ms. Thus, we judged that we did not detect depth probe responses as a mediator of the relation between valence thought probe response and RTSD.

For accuracy, the total effect of thought probe response valence (tracking the difference in accuracy between targets when negatively versus neutrally valenced mind wandering was reported) was estimated at −.44, with 95% of the most plausible values falling between −.58 and −.31. The estimate of the mediation effect was −.15, with 95% of the most plausible values falling between −.22 and −.09. When taking depth probe response into account the path between valence thought probe responses and accuracy was −.29, with 95% of the most plausible values falling between −.43 and −.15. Because the most plausible values of the mediation effect fell between −.22 and −.09 and the effect between valence thought probe responses and accuracy was reduced but not eliminated, we interpreted this as providing evidence for partial mediation.

Discussion

Klein and Boals (2001a) provided evidence that experiencing negative life events is associated with lower working memory capacity. Here, we found evidence inconsistent with this claim. In the current study, the correlations between both total and recently perceived negative life event stress and working memory capacity favored the null hypotheses (i.e., r = 0) over the alternative hypothesis (i.e., r ≠ 0) by factors of 7.7 and 5.9 respectively. Because Klein and Boals (2001a) speculated that the effects of life event stress may be particularly damaging to more difficult cognitive operations, we correlated both total and recent negative life event stress with subjects’ performance on the two longest set sizes of the working memory tasks. Here again, we found no evidence of an association between life event stress and working memory capacity. In addition to the bivariate correlations, we conducted mediation analyses that allowed us to assess if life event stress had an indirect effect on working memory capacity through the potential mediators of mind wandering propensity, sum of negative thoughts, or ruminative response style. In none of these analyses, did negative life event stress have an indirect effect. In short, although we conducted multiple analyses in a sincere attempt to detect the association, no association was found.

There are many reasons why an effect may not replicate. One potential reason is that even in a perfectly replicated study there is a chance that a true effect can be missed because of a lack of statistical power. Here, however, our sample sizes used in the correlations have been demonstrated to provide precise estimates. Another reason that a finding can fail to replicate is that the effect found in the original work may have been a false positive or smaller than the original estimate (related to the power issue mentioned above). Finally, an effect may fail to replicate if the conditions of the original study were not recreated in the replicating study.

Although we cannot definitively adjudicate between these alternative explanations, Gelman’s (2016) time-reversal heuristic offers a guide about how to think about the differences between the two studies (i.e., the present study and Klein & Boals, 2001a) in evidential weight. If this current study had been reported first with evidence supporting no relation between working memory capacity and cumulative negative life event stress from a substantially larger sample, better measurement practices (i.e., two measures or working memory capacity versus one), a preregistered analysis plan, and using both frequentist and Bayesian analyses and then Klein and Boals (2001a) reported their findings of the negative association between life stress and working memory capacity with inadequate sample sizes (N = 22 and N = 66) and no preregistered analysis plan, how would someone adjust their beliefs about the association? We believe the answer to this question is clear and the answer is that the findings from Klein and Boals would not change the belief of no association between negative life event stress and working memory capacity. More directly, the current finding should be afforded more evidential weight. Bolstering the findings reported here, Banks and Boals (2016) did not show a statistically significant total effect of life even stress on working memory capacity, suggesting that the unmodified LES measure of negative life event stress is not (or only very weakly) associated with working memory capacity.

The second goal of this study was to test potential mechanisms through which negative life event stress may hurt working memory capacity (we fully appreciate the limitations of cross-sectional mediation analyses to identify causal mechanisms; Bullock, Green, & Ha, 2010). As reported in the mediation models, neither mind wandering propensity, the sum of negatively valenced mind wandering reports, or rumination response style carried a significant amount of variance between negative life event stress (for both temporal variants) and working memory capacity. These analyses allowed us to assess the weaker claim (than that put forth by Klein & Boals, 2001a) that although direct associations (i.e., the total effect) between negative life event stress and working memory were not found, there may be some indirect effects that are canceled out by other unknown variables. As stated above, we found no evidence for indirect effects. Additionally, a measure of dispositional mindfulness did not moderate any of these indirect effects.

We also examined the effects of negatively valenced off-task responses on SART task performance. Banks et al. (2016) produced some evidence that negatively valenced thought reports were more deleterious to performance in working memory and sustained attention tasks relative to positive but not neutral thought reports. In both the models predicting RTSD and accuracy negatively valenced thought probes uniquely predicted variance in the outcomes. That is, negatively valenced probe responses were associated with lower accuracy and greater RTSD. These results are consistent with previous work suggesting unique deleterious effects of negatively valanced mind wandering. If negatively valenced mind wandering is uniquely disruptive to cognition, it could be that negatively valenced thought brings someone further off-task than other-valenced thoughts. Here, we found some evidence for this assertion with negatively valenced thought being associated with a greater reported depth of mind wandering and this greater reported depth of mind wandering mediated the association between negatively valenced thought probes and accuracy (but not RTSD). The current results are consistent with findings from Klein and Boals (2001b) that increases in working memory task performance followed an expressive writing manipulation for individuals that wrote about a negative, but not a neutral or positive life event. Specifically, increases in working memory task performance were predicted by decreases in intrusive thoughts—which seem like the negatively valenced mind wandering measured in the current study.

However, when looking at the LMMs, the within-subjects mediation analyses, and the broader pattern of thought probe responses and their relations with accuracy and RTSD in Figs. 5 and 6, a directional confound may exist and should be ruled out in future research. More specifically, the patterns observed are consistent with an association between negatively valenced off-task reports and accuracy being at least partially influenced by reactivity to target performance. That is, failing to withhold a response to a target (thus being inaccurate) induces negative affect, which may guide the subject to (at least sometimes) report being off-task with a negatively valenced thought and greater depth, and because RTSD and accuracy are associated, r(352) = −.46, BF10 = 1.173 × 1017, any association between RTSD and negatively valenced thought may be an artifact of the RTSD and accuracy relation.

As a first attempt at this issue, we conducted a linear mixed model on RTSD only looking at the RTSD preceding accurate target trials (negatively valenced thought probe responses were the reference level). In this analysis, negatively valanced thought probe responses did not predict more RTSD than neutrally valenced, b = -8.3, SE = 7, t = -1.2, p = .24, or positively valenced thought probes, b = −3.8, SE = 8.3, t = −.5, p = .65. We regard this evidence as preliminary because of the less than ideal sample size (N = 101), the total number of trials that result when only looking at correct trials (959 trials: 187 negatively valenced, 573 neutrally valenced, and 199 positively valenced), and these trials taking place in the context of a task where subjects are assumed to be aware of their accuracy in responding to targets. To rule out this confound, future work should focus on estimating the association between RTSD, depth of mind wandering, and negatively valenced mind wandering using tasks that do not have an accuracy criterion that is discernible to the subjects. The metronome response task (Seli, Cheyne, & Smilek, 2013) is one task that meets this criterion.

Conclusion

Our findings are inconsistent with the claim that cumulative perceived negative life event stress is associated with working memory capacity. To be clear, we are not suggesting that there is no evidence that any and all stress does not associate with working memory capacity but rather that the evidence base for cumulative perceived negative life event stress experienced over relatively long time intervals (as reported in Klein & Boals, 2001a, and here) is suspect and (unless it becomes more established) should not be used unquestionably as the motivation for or justification of future work. Recent work over shorter time intervals (i.e., over the course of two weeks preceding the testing; Shields et al., 2017; Shields, Ramey, Slavich, & Yonelinas, 2019) seems promising, but needs to be independently replicated. Currently, the evidence base for the effects of acute in-the-moment stress negatively affecting working memory capacity seems much more secure (Banks & Boals, 2016; Qin, Hermans, van Marle, Luo, & Fernández, 2009; Schoofs, Preub, & Wolf, 2008; Sliwinski et al., 2006). If the LES is modified to ask about current perceived stress, as was done in Shields et al. (2017) and Shields et al. (2019), or when the nonmodified LES happens to be associated with current intrusive thoughts, as in Banks and Boals (2016), an association with working memory capacity may be detected. Additionally, we found support for the claim that negatively valenced mind wandering impairs cognitive task performance, but because of a potential directional confound with accuracy flavoring valence reports, causal claims are premature.