Retrieving information from memory is not a neutral event, but it shapes memory. On the one hand, it is generally beneficial to the subsequent accessibility of retrieved information (Allen, Mahler, & Estes, 1969; Bjork, 1975; Roediger & Karpicke, 2006). On the other hand, retrieval can also cause forgetting of related, nonretrieved information. These opposing retrieval effects are mostly investigated separately, however. Here, we examined whether retrieval-induced forgetting (RIF) depended on strengthening caused by retrieval.

Typically, RIF is analyzed in a paradigm that consists of three main phases (Anderson, Bjork, & Bjork, 1994). In the learning phase, participants study several sets of items in combination with a shared (category) cue that defines the specific set of items. In the subsequent retrieval-practice phase, participants are cued to recall half of the studied items from half of the sets. In the test phase, recall performance for all items is tested. The recall of practiced items (Rp+ items), nonpracticed items from practiced sets (Rp− items), and nonpracticed items from nonpracticed sets (Nrp items) is compared. Rp+ items usually profit from retrieval practice and are better recalled than are Nrp items. RIF manifests itself in significantly lower recall of Rp− items as compared with Nrp items. This effect has been demonstrated for a wide variety of materials, for example, for different kinds of verbal materials (Anderson & Bell, 2001; Anderson et al., 1994; Carroll, Campbell-Ratcliffe, Murnane, & Perfect, 2007; Tempel & Frings, 2015), images (Ciranni & Shimamura, 1999; Koutstaal, Schacter, Johnson, & Galluccio, 1999; Shaw, Bjork, & Handal, 1995), or motor actions (Reppa, Worth, Greville, & Saunders, 2013; Tempel & Frings, 2013, 2014b, 2015; Tempel, Loran, & Frings, 2015).

Many studies on RIF have aimed to examine whether RIF is caused by memory inhibition (for an overview, see Murayama, Miyatsu, Buchli, & Storm, 2014). The inhibitory account of RIF (Anderson, 2003) assumes that during retrieval practice of one item, the other items of this set interfere in a competition for conscious recollection. To resolve this interference, the Rp− items are inhibited. The inhibition persists until the final test phase and accounts for the lower accessibility of Rp− items. Several properties of retrieval-induced forgetting correspond to predictions by the inhibitory account. First, retrieval-induced forgetting emerges not only in tests using the same cues as during retrieval practice but also in tests probing memory with independent cues (cue independence; Anderson & Spellman, 1995; Weller, Anderson, Gómez-Ariza, & Bajo, 2013). Second, only selective retrieval induces forgetting, but retrieval-free kinds of selective practice (restudy) do not (retrieval specificity; Ciranni & Shimamura, 1999; Staudigl, Hanslmayr, & Bauml, 2010; Tempel & Frings, 2016). Third, the strength of interference predicts retrieval-induced forgetting (competition dependence; Chan, Erdman, & Davis, 2015; Tempel, Aslan, & Frings, 2016), whereas, fourth, its occurrence is independent from strengthening of Rp+ items (strength independence; Hulbert, Shivde, & Anderson, 2012; Storm & Nestojko, 2010). In a recent paper by Raaijmakers (2016), however, the existing evidence in favor of strength independence has been criticized, concluding that, so far, this property has not been examined adequately.

Raaijmakers (2016) reports simulation studies demonstrating that even if recall of Rp− items did depend on the strength of Rp+ items, it is highly unlikely to obtain a significant correlation between difference scores representing RIF (Nrp − Rp-) and Rp+ strengthening (Rp+ − Nrp). Only very large sizes of both effect measures would result in a modest correlation, whereas the commonly obtained effect sizes in RIF studies would seldom entail a significant correlation between RIF and Rp+ enhancement scores. Therefore, reports of nonsignificant correlations (Hanslmayr, Staudigl, Aslan, & Bauml, 2010; Hulbert et al., 2012; Staudigl et al., 2010; Tempel & Frings, 2014a, 2016; Weller et al., 2013) are not sufficient evidence in favor of strength independence. In fact, they correspond to an alternative, noninhibitory theory assuming blocking by Rp+ items to account for the low recall of Rp− items in the test phase. Blocking here refers to any theoretical explanation that assumes that forgetting results from previously strengthened traces getting in the way of retrieval routes.

Additionally, Raaijmakers (2016) criticizes experiments that compared conditions impacting RIF but not the level of Rp+ recall. Such studies contrast a standard retrieval-practice task with either selective restudy or modified retrieval practice (retrieving a category’s name instead of an item) that is not considered to produce competition among items of one category. Restudy and noncompetitive retrieval practice have been found not to induce forgetting, in correspondence with the properties of retrieval specificity and competition dependence (see above; Anderson, Bjork, & Bjork, 2000; Ciranni & Shimamura, 1999; Staudigl et al., 2010; Tempel & Frings, 2016). Yet both kinds of practice did result in equal levels of Rp+ recall as standard retrieval practice, a finding that has been interpreted as support for strength independence (e.g., Storm & Levy, 2012). However, Raaijmakers and Jakab (2013) explain that the recall of Rp+ items may underestimate the strengthening received by standard retrieval practice because only successfully retrieved items are strengthened, whereas all items receiving restudy or noncompetitive retrieval practice (which typically is very easy, with a retrieval success close to 100%) are strengthened. The same average recall success in the test phase implies that standard retrieval practice strengthens Rp+ items more so than restudy or noncompetitive retrieval, but comes at the cost that those items not successfully retrieved are not strengthened at all. Thus, the occurrence of RIF only as a consequence of standard retrieval practice might indicate that RIF actually was not independent from Rp+ strengthening but, on the contrary, sufficient strengthening seemed to be necessary to induce forgetting.

We followed a different approach than those studies and did not examine correlations between difference scores, nor did we aim at manipulating the occurrence of RIF while controlling for equal Rp+ recall. We used a manipulation of Rp+ strengthening and examined whether this manipulation affected RIF. This approach was inspired by an incidental observation in a recent investigation that focused on examining the influence of retrieval modality on RIF (Tempel & Frings 2017). Using motor sequences as to-be-learned item material, three experiments showed that RIF occurred only when modalities at retrieval practice and test matched. Aside from the core manipulation of retrieval modalities, retrieval practice with or without feedback was used across the experiments. When looking at the overall pattern of results across experiments, we noticed that feedback increased Rp+ recall, whereas the size of RIF effects remained about the same. A post hoc reanalysis of the data confirmed that feedback during retrieval practice only influenced Rp+ recall significantly, but not Rp- recall. Correspondingly, the meta-analysis by Murayama et al. (2014) did not report feedback during retrieval practice to be a moderator of RIF. In fact, the mean weighted effect size in studies with feedback was slightly smaller than in studies without feedback.

Moreover, a study by Erdman and Chan (2013) reported a similar finding. In two experiments, they manipulated the provision of feedback during retrieval practice. Whereas feedback improved memory for practiced items, it did not affect RIF. However, these results could actually correspond to the blocking account as well because of the use of a category cued recall test in the first experiment and a recognition test in the second experiment. In contrast to item-specific cued recall, only presenting a category name as a recall cue does not control for output position of recalled items within that category. In Erdman and Chan’s first experiment,

“providing feedback during retrieval practice could have eliminated inhibition by reducing retrieval competition, but a retrieval-induced forgetting effect was nonetheless observed because feedback differentially strengthened the tested items, causing these items to block recall of the nontested items during the category cued recall test." (Erdman & Chan, 2013, p. 697)”

Regarding their second experiment, the contribution of blocking in an old–new recognition test providing copy cues of the studied items is generally much lower than in recall tests, if not even null. Thus, the absence of a significant moderation of RIF might simply reflect that blocking did not substantially impact Rp− items in either condition, with or without feedback during retrieval practice. This interpretation would match the inhibitory account but does not speak of whether retrieval-induced forgetting is strength independent.

In the present study, we therefore used an item-specific cued recall test and, based on the previous observations suggesting that feedback during retrieval practice was unrelated to the size of RIF effects, we manipulated the provision of feedback during retrieval practice to directly test whether an experimental manipulation of Rp+ strengthening influenced the strength of RIF.

Method

Participants

One-hundred-and-eight undergraduate students at the University of Trier participated in the experiment. They either received course credit for their participation or were paid 4 Euro. Sample size was calculated according to estimates from the meta-analysis by Murayama et al. (2014), assuming small-to-medium sized RIF effects in the present study.

Design

Retrieval-practice status (Rp+, Rp−, Nrp) was manipulated within participants, as was provision of feedback during retrieval practice (feedback, no feedback). Rp− items were tested before Rp+ items in order to preclude any output interference by Rp+ items (Anderson et al., 1994; Roediger & Schmidt, 1980). Rp− items accordingly were compared with the first half of Nrp items tested (Nrp−), whereas Rp+ sequences were compared with the second half of Nrp items tested (Nrp+).

Material

The experiment was conducted using Dell Optiplex 755 PCs with Eizo FlexScan S1901 monitors and standard German QWERTZ keyboards. The software PXLab (Irtel, 2007) served for running the experiment. The items consisted of 72 experimental items (eight exemplars each from nine categories) plus eight filler items (four exemplars from two further categories; see Appendix Table 2).

Procedure

The experiment consisted of four phases (study, retrieval practice, distractor task, test). Instructions were given on the screen. In the study phase, participants were instructed to memorize exemplars from several categories in connection with their corresponding category names. On each trial, the category was displayed together with one exemplar for 6,000 ms, followed by an empty screen for 1,000 ms. The study phase started with four filler items serving as primacy buffers. Subsequently, the 72 experimental items were presented in random order. The study phase ended with another four filler items serving as recency buffers.

In the retrieval-practice phase, participants had to retrieve half the items of six categories. First, the category name appeared on the screen for 2,000 ms. Then, a one-letter word stem of an exemplar of that category appeared. Participants were supposed to type in the missing letters and to confirm their input by pressing the return key. They were encouraged to guess if they were not certain about a response but also had the option of proceeding without giving a response. For three categories, feedback appeared after responding; that is, after an empty screen for 500 ms, the correct response was shown for 3,000 ms followed by another empty screen for 500 ms. For the other three categories, an empty screen appeared for 4,000 ms instead. Counterbalancing of items to retrieval practice with or without practice resulted in twelve retrieval-practice sets. These 12 sets were counterbalanced between participants. Experimental items were retrieval-practiced in five cycles. Trials with four filler items at the beginning and at the end of the retrieval-practice phase again acted as primacy and recency buffers.

The subsequent distractor task was a sudoku puzzle of medium difficulty printed on a sheet of paper. After 5 minutes, instructions for the test phase appeared on the screen, at which point participants stopped working on the sudoku.

In the trials of the test phase, a category name was presented simultaneously with a one-letter word stem of an exemplar. Participants were supposed to type in the missing letters and confirm their input by pressing the return key. The first four trials tested filler items. Then, the experimental items were tested blocked by category. Rp− items were tested before Rp+ items. Nrp categories and the two types of retrieval-practiced categories were tested alternately. Three test sequences, beginning either with an Nrp with a retrieval-practiced category that had received feedback or with a retrieval-practiced category that had not received feedback, were counterbalanced between participants.

Results

A one-factor ANOVA (item type: Rp− with feedback, Rp− without feedback, Nrp−) examined RIF. The main effect of item type was significant, F(2, 214) = 4.67, p = .010, ηp2 = .04. Planned contrasts showed that recall of Nrp− items was significantly higher than recall of Rp− items, F(1, 107) = 8.35, p = .005, ηp2 = .07, whereas recall of Rp− items from categories with or without feedback during retrieval practice did not differ significantly, F < 1 (see Table 1, Fig. 1). In a Bayes factor analysis comparing Rp− items from categories with or without feedback, the BF01 was 10.712, suggesting that the present data are 10.712 more likely to be observed under the null hypothesis.

Table 1. Mean percentages of items recalled as a function of retrieval-practice status and provision of feedback
Fig. 1
figure 1

Retrieval-induced forgetting (RIF; percentage of recalled Nrp− items − percentage of recalled Rp− items) and Rp+ enhancement (percentage of recalled Rp+ items − percentage of recalled Nrp+ items) as a function of feedback (FB) during retrieval practice is shown. Error bars depict standard error of the mean

A one-factor ANOVA (item type: Rp+ with feedback, Rp+ without feedback, Nrp+) examined Rp+ enhancement. The main effect of item type was significant, F(2, 214) = 207.62, p < .001, ηp2 = .66. Planned contrasts showed that recall of Nrp+ items was significantly lower than recall of Rp+ items, F(1, 107) = 224.46, p < .001, ηp2 = .68, and significantly more Rp+ items that had received feedback during retrieval practice than Rp+ items that had not received feedback during retrieval practice were recalled, F(1, 107) = 184.03, p < .001, ηp2 = .63.

A 2 (retrieval-practice status: Rp−, Rp+) × 2 (with feedback, without feedback) ANOVA showed a significant interaction, F(1, 107) = 82.53, p < .001, ηp2 = .44. The significant effect of feedback on Rp+ items differed significantly from the nonsignificant effect of feedback on Rp− items.

In addition, we compared Nrp− and Nrp+ items. Significantly fewer Nrp+ items were recalled, F(1, 107) = 5.14, p = .025, ηp2 = .05, indicating that the earlier retrieval of Nrp− items impaired access to Nrp+ items (replicating the typical pattern of output interference).

Discussion

RIF occurred independently from feedback during retrieval practice. If RIF was strength dependent, it should have been stronger for categories receiving feedback because this substantially increased Rp+ strengthening. However, RIF in categories that had received feedback did not differ reliably from RIF in categories that had not received feedback, numerically it was even slightly smaller.

It is not surprising that feedback boosted Rp+ recall so drastically. Although research on testing effects has shown that retrieval practice without feedback benefits accessibility after relatively long retention intervals (compared with restudy), after only a short interval (of a few minutes) the benefit is much smaller or absent. Providing a restudy opportunity after retrieval-practice trials, however, boosts memory immediately. Studies on test-potentiated learning (Arnold & McDermott, 2013; Izawa, 1966; Tempel & Kubik, 2017) show that restudy enhances memory more if it is preceded by retrieval practice; that is, more learning takes place during repeated study trials if they are interleaved with test trials. Combining this indirect testing effect with the direct effect of testing that benefits successfully retrieved items in retrieval-practice trials is the method of choice for maximizing strengthening by retrieval practice.

The present findings complement previous studies suggesting strength independence of RIF. In particular, the study by Erdman and Chan (2013) and the meta-analysis by Murayama et al. (2014) found that RIF effects were not larger when retrieval practice involved feedback. The experimental manipulation of feedback here corroborates this observation, raising serious doubts of the central claim of the blocking account of RIF. A dramatic boost of Rp+ strengthening did not affect RIF at all. These results also correspond to studies demonstrating that RIF occurs even in the absence of any strengthening. Storm, Bjork, Bjork, and Nestojko (2006) used an impossible retrieval task during the retrieval practice phase; that is, they provided word stems as cues that did not correspond to any existing exemplars of categories used in the learning phase of the experiment (see also Storm & Nestojko, 2010). Moreover, studies examining retrieval specificity and competition dependence of RIF document moderators of RIF that do not affect Rp+ recall. Although those studies were not designed to examine strength independence, of course, they can be regarded as consistent supplements.

Taken together, the present data corroborate previous results concerning the effects of feedback on RIF. We set out to investigate that the occurrence of RIF was independent from the magnitude of strengthening of Rp+ items. In an experiment that was a priori designed to test this hypothesis, we demonstrate that feedback only affected RP+, but not RP−, items and thereby showed that RIF is strength independent, a key prediction of the inhibitory account of RIF.