It often is while explaining our opinions that we find ourselves questioning, developing, and strengthening them. We may learn, for example, that the arguments we generate are not as convincing as we expected or that they rest on assumptions that are no longer valid. Alternatively, we may discover additional arguments that provide newfound support for our position, leading us to give it even more credence in the future than we would have otherwise. Research on self-persuasion, for example, has shown that attempting to generate new arguments in favor of a position can make people more likely to agree with that position than they would have been had they simply been exposed to the same arguments instead (Cialdini & Petty, 1981; Janis & King, 1954; King & Janis, 1956). Similarly, generating explanations for why an event might occur or why a theory might be true can cause people to judge them to be more likely to occur or be true (Campbell & Fairey, 1985; Hirt & Sherman, 1985; Hoch, 1985; Sherman, Zehner, Johnson, & Hirt, 1983).

In the context of memory, the generation of new arguments and explanatory stances can enhance a person’s ability to remember relevant information (Chi, Bassok, Lewis, Reimann, & Glaser, 1989; Chi, De Leeuw, Chiu, & LaVancher, 1994; Crowley & Siegler, 1999; Murphy & Allopenna, 1994; Renkl, 1997; Rittle-Johnson, 2006; Ryoo & Linn 2012). When participants generate arguments or self-explanations related to categories or scientific phenomena, for example, they often remember relevant concepts better than they would have otherwise. Such consequences seem particularly tied to the way arguments and explanations are generated. Specifically, generations that involve referencing and elaborating upon concepts and knowledge representations that are already well learned seem to be particularly powerful.

Certainly, using existing information in the creation of new arguments and explanations can be beneficial for learning and remembering new information. To what extent, however, does the generation of new arguments and explanations alter the accessibility of existing information? Coming up with new arguments and explanations involves the retrieval and generation of relevant information, processes that have been shown to serve as powerful memory modifiers (Bjork, 1975). Information that is retrieved or generated tends to become more recallable in the future than it would have been otherwise (Roediger & Karpicke, 2006; Slamecka & Graf, 1978), and related information that is not retrieved or generated tends to become less recallable in the future than it would have been otherwise (Anderson, Bjork, & Bjork, 1994; Bäuml, 2002). Such dynamics suggest that the generation of new arguments or explanations has the potential, at least in some instances, to cause existing information in memory to become less accessible.

Retrieval-Induced Forgetting

The finding that the retrieval of some information can cause the forgetting of other information has been studied extensively in the literature on retrieval-induced forgetting (Anderson et al., 1994). In the typical paradigm, participants are exposed to a series of category-exemplar pairs (e.g., Drink-Bourbon, Drink-Gin, Animal-Rabbit, Animal-Parrot) before practicing the retrieval of half of the exemplars from half of the categories (e.g., Drink-Bo___). After a brief delay, participants are given a final memory test. On this test, practiced items (i.e., Bourbon) are typically recalled best, thus demonstrating the benefits of retrieval practice. Nonpracticed items from practiced categories (i.e., Gin), however, are typically recalled worst, and significantly less well than items from non-practiced categories (i.e., Rabbit, Parrot). It is this negative consequence of retrieval—causing related items to become less accessible than baseline items—that is referred to as retrieval-induced forgetting.

Retrieval-induced forgetting has been shown to be a remarkably robust and general phenomenon, occurring in a variety of contexts and with many types of materials (for reviews, Murayama, Miyatsu, Buchli, & Storm, 2014; Storm et al., 2015). Indeed, effects similar to retrieval-induced forgetting have been observed in almost any situation in which someone must attempt to retrieve or generate some specific subset of information while not retrieving or generating other information (e.g., Bäuml, 2002; Johnson & Anderson, 2004; Healey, Campbell, Hasher, & Ossher, 2010; Levy, McVeigh, Marful, & Anderson, 2007; Storm, Angello, & Bjork, 2011; Storm, Bjork, Bjork, & Nestojko, 2006; Storm & Patel, 2014).

A number of theoretical accounts have been proposed to explain retrieval-induced forgetting. The most supported account, however, or at least the one that has received the most attention in the literature, is the inhibitory account (Anderson, 2003; Storm & Levy, 2012). According to the inhibitory account, cues presented during retrieval practice activate both target and nontarget items. “Drink-Bo____,” for example, might activate the target item, “Bourbon,” as well as the nontarget item, “Gin.” Inhibition is assumed to counteract the competition caused by the inappropriate activation of non-target items by suppressing them, thus facilitating access to the target item while rendering the non-target items less accessible in the future than they would have been otherwise. Other researchers have explained retrieval-induced forgetting as the consequence of other mechanisms, such as associative interference, strategy disruption, and inappropriate contextual cuing (Jonker, Seli, & MacLeod, 2013; Raaijmakers & Jakab, 2013; Verde, 2012). According to the interference account, for example, nonpracticed items are forgotten not because they are inhibited during retrieval practice but because the strengthening of practiced items causes interference at test.

Importantly, there are conditions under which retrieval fails to cause forgetting. When items within categories are highly integrated or semantically similar (Anderson & Bell, 2001; Anderson & McCulloch, 1999; Goodmon & Anderson, 2011; Smith & Hunt, 2000), for example, or when participants are explicitly instructed to use nontarget information to help them generate target information (Ditta & Storm, 2016; Storm & Patel, 2014), retrieval and generation have been shown not to cause the forgetting of other information. In fact, when materials are designed to be particularly well integrated, the retrieval of some information can cause related nonretrieved information to become better recalled in the future than it would have been otherwise, a phenomenon referred to as retrieval-induced facilitation (Chan, 2009; 2010, Chan, McDermott, & Roediger, 2006; Cranney, McKinnin, Morris, & Watts, 2009). Although contrary to the work on retrieval-induced forgetting, such findings can be readily accommodated by associative theories of memory, particularly with respect to assumptions about spreading activation (Anderson, 1996; Collins & Loftus, 1975; Raaijmakers & Shiffrin, 1981). In many instances, the activation of one item or subset of items in memory should facilitate the activation of other related items.

Modifying Memory through Argument Generation

Generating new arguments to explain one’s position on a given issue seems likely to affect memory for preexisting arguments related to that issue. Whether such generation causes preexisting arguments to be forgotten or facilitated, however, is unclear. People may generate arguments that are highly related or even explicitly integrated with existing arguments. They might use existing arguments to help them come up with new arguments, thus linking new information with old information in the same way that explanation has been shown to form new relationships with existing knowledge in semantic memory (Chi et al., 1989; 1994). In other words, generating new arguments may prompt people to refer back to existing arguments to respond to them or build upon them, leading to the construction of new arguments that are well integrated with the existing arguments. If this is the case, then one would expect the generation of new arguments not to require inhibition or result in fan interference (Anderson & Bell, 2001; Anderson & McCulloch, 1999; Radvansky, 1999; Radvansky & Zacks, 1991), and thus that existing arguments would be either unaffected or facilitated by the generation of new arguments.

There also are reasons to think that the generation of new arguments would cause forgetting. Looking across the literature on retrieval-induced forgetting, for example, evidence of forgetting is far more common than evidence of facilitation (Murayama et al., 2014; Storm et al., 2015), and people may not spontaneously generate arguments in a way that prevents existing arguments from suffering an effect similar to that of retrieval-induced forgetting. Indeed, most research designs showing evidence of facilitation either used materials carefully designed to foster integration or explicitly instructed participants to use related information to help them retrieve or generate other information. In the work by Storm and Patel (2014) and Ditta and Storm (2016) using the Alternative Uses Task, it was only when participants were directly and explicitly instructed to use old ideas in their efforts to generate new ideas that generating new ideas failed to cause the forgetting of old ideas. These findings suggest that most participants may not use or refer back to information from existing arguments when generating new arguments and thus that existing arguments might be relatively more likely to suffer forgetting than benefit from facilitation.

Current Study

To investigate the consequences of generating new arguments on memory, we devised a paradigm similar to that used in the study of retrieval-induced forgetting. First, participants were exposed to a list of six proposal statements, each paired with four associated arguments. One proposal statement was that “Colleges and universities should require their students to spend at least one semester studying in a foreign country.” Participants were shown the statement along with four arguments (all of which either supported or argued against the stated position): (1) Too expensive for many students. (2) New environment might cause anxiety. (3) Students will treat it like a vacation. (4) Less tuition paid to American schools (see the Appendix for a list of all proposal statements and associated arguments). Having participants study four arguments associated with each proposal statement is important, because participants are likely to come into the lab with different opinions and therefore have different existing arguments in mind to draw upon while generating new arguments. By having participants study arguments related to each proposal, we can ensure that all participants are at least aware of such arguments, and more importantly, we can assess whether such arguments become more or less recallable as a consequence of generating new arguments.

After studying all of the proposal statements and associated arguments, participants generated new arguments for four of the six studied proposals, thus creating three levels of a within-subjects manipulation. For two proposals, participants were asked to generate new arguments that agreed with the position of the studied arguments (Agree condition). For two other proposals, participants were asked to generate new arguments that disagreed with the position of the studied arguments (Disagree condition). For the two remaining proposals, participants were not asked to generate any new arguments (Baseline condition). Finally, after a brief delay, participants were tested on their ability to recall the studied arguments. By comparing recall performance for the studied arguments in the Agree and Disagree conditions versus the Baseline condition, we were able to assess the consequences of generating new arguments on memory for existing arguments, with above-baseline performance indicating that the generation of new arguments caused argument-induced facilitation, and below-baseline performance indicating that the generation of new arguments caused argument-induced forgetting.

Based on the logic put forth in the Introduction, we expected the generation of new arguments to cause the forgetting (or diminished accessibility) of the associated arguments that had been studied. Moreover, because memory is organized and updated in such a way that helps people maintain a sense of consistency or self-coherence (Conway 2005; Conway & Pleydell-Pearce, 2000; Levine, 1997; Ross, 1989), we might expect the generation of disagreeing arguments to cause more forgetting than the generation of agreeing arguments. Specifically, by considering and attempting to generate arguments in favor of one position, participants may come to take ownership of that position, thus rendering the studied arguments that disagree with that position relatively more susceptible to forgetting than studied arguments that agree with it. Indeed, research using both the retrieval-practice paradigm and the directed forgetting paradigm has shown that people are more likely to forget information that is inconsistent with their attitudes or beliefs than they are to forget information that is consistent with their attitudes or beliefs (Dunn & Spellman, 2003; Waldum & Sahakyan, 2012).

Another important factor likely to influence whether forgetting is observed is the nature of the relationship between the studied arguments and the arguments that are generated by the participants. In consideration of the prior work on integration and retrieval-induced forgetting, participants may be less likely to forget the studied arguments when they generate new arguments that are highly related or integrated with the studied arguments than when they generate new arguments that are less related or not well integrated with the studied arguments. Note that in this context we are defining relatedness as the nature of the association between the studied and generated arguments (e.g., the explicit overlap in content) and that the generated arguments may vary in relatedness to the studied arguments regardless of whether they agree or disagree with the studied arguments. In either condition, participants may use the studied arguments as prompts to help them come up with new arguments, for example, either to reinforce the original arguments or to argue against them. Based on the prior literature, if the generated arguments directly reference or explicitly overlap with studied arguments, then the integration associated with such interrelatedness should prevent the studied arguments from suffering forgetting. Thus, we predicted that participants who generate arguments highly related to the studied arguments would exhibit significantly reduced forgetting effects relative to participants who generate arguments less highly related to the studied arguments.

One benefit of measuring relatedness as opposed to manipulating it is that we can assess whether people spontaneously build upon pre-existing arguments (in this case, arguments provided by the experimenter) while attempting to generate new arguments. Moreover, it allows us to assess whether the tendency to do so varies as a function of whether participants attempt to generate new arguments that agree or disagree with a given position. It is possible, for example, that participants will be more likely to build upon existing arguments that agree with the position for which they are arguing than those that disagree with the position for which they are arguing. More generally, the study provides insight into the consequences of generating new arguments without explicitly controlling the way in which arguments are generated, thus also allowing us to examine how differences in the way individuals go about generating new arguments affects the fate of existing information in memory.

Method

Participants

Sixty-nine undergraduates participated for course credit in a psychology course. Data associated with four participants were misplaced and thus unable to be included in the analysis.

Materials

Six proposal statements were chosen from the pool of issues provided by the Educational Testing Service for the GRE Analytical Writing section (“Pool of Issue Topics,” 2015). The proposals consisted of statements for which college undergraduates are expected to be able to generate opinions and to explain their positions by generating persuasive arguments.

Four arguments were identified for each proposal statement (see the Appendix for a list of all proposals and arguments). The arguments were constructed to be brief (3 to 8 words) and specific to each proposal. For some proposals, supporting arguments were identified. For other proposals, opposing arguments were identified. In this way, all participants were exposed to some proposals with supporting arguments and other proposals with opposing arguments.

Procedure

The experiment consisted of three phases: Study, Argument Generation, and Final Test.

Study

Participants were first asked to read the six proposal statements, which were each presented one at a time on the computer along with four associated arguments. Participants were told that they might be asked to remember the arguments later in the experiment. Reading was self-paced and the order of the proposals (and associated arguments) was randomized for each participant. Participants pressed a key on the keyboard to advance through the proposals and arguments until all six were completed.

Argument Generation

Participants were given the following instructions at the start of the Argument Generation phase:

“You will now be asked to come up with some arguments of your own. In a moment, you will be presented with a statement and a position you will be taking. PAY CLOSE ATTENTION TO THE POSITION, it may or may not match the position of the arguments you just read. Write down as many arguments for that position as you can. Try to make your arguments as convincing as possible. Do not worry about correct grammar or spelling and please number your arguments. You will have 5 minutes to come up with arguments for each statement you are shown.”

After reading the instructions, participants were presented with one of the proposal statements along with directions to generate new arguments either in support of it or in opposition to it. For the first proposal statement, half of the participants were instructed to generate arguments that agreed with the position put forth by the arguments they had studied, whereas the other half of the participants were instructed to generate arguments that disagreed with the position put forth by the arguments they had studied. They were told to use the entire 5 minutes and to keep trying if they felt like they were stuck. It was also stressed that the goal was to generate new arguments. The process was then repeated for three additional proposal statements, with participants in total generating arguments that agreed with the studied arguments for two of the proposal statements and that disagreed with the studied arguments for two of the other proposal statements. By having participants generate new arguments in response to four of the six proposal statements, we created a three-level within-subjects manipulation (Agree vs. Disagree vs. Baseline). The studied arguments associated with proposal statements for which participants attempted to generate agreeing and disagreeing arguments are referred to as Agree and Disagree items, respectively. The studied arguments associated with proposal statements for which participants did not generate any new arguments are referred to as Baseline items. The order of the proposal statements used in the argument generation task was randomized across participants and counterbalancing ensured that each of the proposal statements and associated arguments served equally often in the three conditions.

Final Test

After a 2-min delay (solving a maze), participants were tested on their ability to remember the originally studied arguments. Participants were never tested on their ability to remember the arguments they generated.

The final test cues were created by taking each of the arguments associated with a given proposal statement and removing a single word. For example, when asked about the proposal Universities should require every student to take a variety of courses outside the student's field of study, participants were shown “Encourages _________ thinking.” (Answer: creative). We chose to remove specific words that were deemed to be essential to the meaning of the arguments and that would be relatively difficult to guess using only the sentence context, thus requiring participants to rely on their memory for the original studied arguments. The specific words that were removed to create the final test cues are shown in bold in the Appendix. Each of the cues was shown along with the proposal statement for 15 seconds each and participants were asked to type the missing word into a box on the screen. They were allowed to move on to the next trial after completing their response.

We chose to employ item-specific test cues for several reasons. Most importantly, research on retrieval-induced forgetting has shown that such cues provide advantages over more general, nonspecific cues, such as if we had simply asked participants to recall all of the studied arguments associated with a given proposal (Anderson & Levy, 2007; Murayama et al., 2014; Schilling, Storm, & Anderson, 2014). First, they control for the order of output, thus reducing the likelihood that any differences in forgetting might be the consequence of output interference at test. Second, they direct participants to the appropriate phase of the experiment. Item-specific cues help ensure that participants attempt to recall items from the study phase instead of the generation phase, thus helping to bypass a potential source of confusion and ideally providing a better measure than cues not specific to each item of the extent to which participants can actually remember the content of the arguments they studied.

Results

Analysis of Argument Generation

Participants generated a total of 1,361 new arguments. Each argument was rated by an experimenter blind to condition and recall performance on whether it agreed or disagreed with the studied arguments. The rater’s indications were compared against the assigned argument position to ensure that participants followed directions. Importantly, only 1 of the 65 participants generated new arguments inconsistent with the assigned positions (i.e., they generated arguments that went against the position they were assigned). All data from this subject were removed before analysis.

Overall, participants generated more agreeing arguments than disagreeing arguments, t(63) = 2.03, p = 0.047, d = 0.25, CI 95% of the diff = [0.01, 0.73]. Specifically, participants generated 5.44 (SE = 0.23) arguments per proposal when attempting to generate arguments that agreed with the studied position, whereas they generated 5.07 (SE = 0.22) arguments per proposal when attempting to generate arguments that disagreed with the studied position.

The generated arguments were also coded to assess their relationship to the studied arguments. Specifically, each argument was compared to the list of studied arguments specifically seen by a given participant and judged to be related if it either directly referred to one or more of the studied arguments or contained the gist of one or more of the studied arguments. To be considered related, an argument needed to overlap explicitly in content with one of the studied arguments and not just be loosely associated in terms of some semantic relationship. Using this coding scheme, 2.4 arguments per participant (or 11% of the total number of arguments generated) were identified as passing the threshold of relatedness. We predicted that participants would be more likely to generate new arguments related to existing arguments in the Agree condition than in the Disagree condition. Consistent with this prediction, relatedness varied across condition, t(63) = 6.99, p < 0.001, d = 0.83, CI 95% of the diff = [1.3, 2.4], with participants generating an average of 2.1 (SE = 0.3) related arguments in the Agree condition and an average of only 0.3 (SE = 0.1) related arguments in the Disagree condition.

Analysis of Performance at Final Test

To briefly sum up the predictions made in the Introduction, we expected to observe an overall forgetting effect such that the generation of new arguments would cause the forgetting of the studied arguments. Critically, however, we also expected significantly less forgetting to be observed when participants generated agreeing arguments than when they generated disagreeing arguments. We also expected participants to show less forgetting when they generated arguments that were highly related or integrated with the studied arguments than when they generated arguments that were less related to the studied arguments.

Final test responses were scored as correct if they either matched the missing word verbatim or were consistent with the gist of the missing word. With regard to the item “Encourages _________ thinking,” for example, performance was coded as correct if the participant responded with the same word that had been studied (i.e., “creative”) or with a word that denoted a similar meaning (e.g., “innovative”). Performance was coded as incorrect if the participant either failed to respond or responded with a word that denoted a different or less specific meaning (e.g., “better”). As in the analysis of argument generation, this coding was completed by an experimenter blind to condition.

The proportion of items recalled correctly was analyzed using a repeated-measures Analysis of Variance (Agree vs. Disagree vs. Baseline). A significant effect was observed, with participants recalling 58% (SE = 3%), 57% (SE = 3%), and 64% (SE = 3%) of the items in the Agree, Disagree, and Baseline conditions, respectively, F(2, 126) = 3.15, MSE = 0.03, p = 0.046. As confirmed by paired-samples t-tests, recall performance was significantly below baseline when participants generated agreeing arguments, t(63) = 2.05, p = 0.044, d = 0.26, CI 95% of the diff = [0%, 13%], as well as when they generated disagreeing arguments, t(63) = 2.12, p = 0.032, d = 0.27, CI 95% of the diff = [1%, 13%]. Contrary to expectations, the difference between the Agree and Disagree conditions was not significant, t(63) = 0.13, p = 0.89, d = 0.02, CI 95% of the diff = [−5%, 6%]. These results suggest that generating new arguments can cause the forgetting of studied arguments, but that the effect does not seem to differ as a function of whether participants generate new arguments that agree or disagree with the studied arguments.

Final Test Performance as a Function of Relatedness between Studied Arguments and Generated Arguments

We predicted that participants who tended to generate arguments related to the studied arguments would exhibit significantly smaller forgetting effects than participants who generated relatively unrelated arguments. To test this prediction, we calculated forgetting scores for each participant by subtracting performance in the average of the Agree/Disagree conditions from that of performance in the baseline condition. Thus, a positive forgetting score indicated that performance was significantly lower in the average of the two argue conditions than in the baseline condition. A relatedness score was also calculated for each participant as the total number of generated arguments judged to be related to at least one of the studied arguments. Measured in this way, the relatedness score allowed us to assess whether participants who generated more related arguments were less likely to forget the studied arguments than participants who generated less related arguments.

Using these overall measures of forgetting and relatedness, a negative correlation was observed such that participants who generated more related arguments exhibited smaller forgetting effects than participants who generated fewer related arguments (Pearson’s r = −0.35, p = 0.005; Spearman’s rho = −0.33, p = 0.008). Indeed, the negative correlation remained significant even when baseline levels of recall were controlled and when the forgetting effects were examined separately in the Agree and Disagree conditions (all p values < 0.05). These results suggest that whether participants forget the studied arguments as a consequence of generating new arguments can be significantly predicted by the nature of the relationship between what they studied and what they generated.

This pattern of results is illustrated further by splitting participants into groups as a function of relatedness. At one extreme, 14 participants failed to generate a single argument coded as related to any of the studied arguments (Low Relatedness). At the other extreme, 18 participants generated 4 or more arguments coded as related to the studied arguments (High Relatedness). Finally, 32 participants made up the middle (Moderate Relatedness). Consistent with the correlational analyses reported above, a 2 (Condition: Argue vs. Baseline) × 3 (Group: Low-Relatedness vs. Moderate-Relatedness vs. High-Relatedness) ANOVA revealed a significant interaction, F(2, 61) = 4.32, MSE = 0.02, p = 0.018. Participants in the low-relatedness group exhibited a substantial forgetting effect (Argue: M = 54%, SE = 7%; Baseline: 68%, SE = 7%; d = 0.53, CI 95% of the diff = [−1%, 29%]), as did participants in the moderate-relatedness group (Argue: M = 52%, SE = 3%; Baseline: 63%, SE = 4%; d = 0.50, CI 95% of the diff = [3%, 18%]). Participants in the high-relatedness group, however, exhibited a trend towards facilitation (Argue: M = 69%, SE = 5%; Baseline: 64%, SE = 5%; d = −0.33, CI 95% of the diff = [−14%, 3%]). As shown in Figure 1, the pattern of results did not vary as a function of whether participants attempted to generate agreeing or disagreeing arguments.

Figure 1
figure 1

Mean percentage of studied items recalled at final test is shown as a function of group (Low-Relatedness vs. Moderate-Relatedness vs. High Relatedness) and condition (Agree vs. Disagree vs. Baseline). Error bars indicate the standard error of the mean.

The effect of relatedness was examined further by analyzing the data at the level of individual arguments, thereby allowing us to determine whether a particular studied argument was less likely to be forgotten by a particular participant if that participant happened to generate a new argument that was directly related to that studied argument. Specifically, a binary logistic regression was run with three predictor variables: relatedness (whether a related argument was generated by the participant), condition (Agree vs. Disagree), and the interaction between relatedness and condition. Consistent with the results reported in the paragraphs above, a given studied argument was significantly less likely to be forgotten when the participant generated a new argument related to the studied argument than when they did not, exp(β) = 0.62, p = 0.02. Significant effects were not observed with regard to the condition variable, exp(β) = 0.85, p = 0.71, or the interaction, exp(β) = 2.06, p = 0.11. Moreover, the same pattern of results was observed when several other variables (i.e., baseline recall performance, proposal topic, and the particular studied argument) were included in the model (Relatedness: exp(β) = 0.58, p = 0.02; Condition: exp(β) = 1.49, p = 0.49; Interaction: exp(β) = 1.68, p = 0.32). A general estimating equation was also conducted to account for repeated measures within participants with relatedness, condition, and their interaction entered as possible predictors. The results were consistent with the regression presented above, with relatedness being the only significant predictor of argument recall, Wald χ 2 = 3.90, p = .020.

Discussion

Generating arguments related to a position, as often occurs in the context of explaining one’s opinions or beliefs, can have important consequences for the accessibility of associated information in memory. As shown in the present study, generating new arguments pertaining to an issue can cause other arguments (e.g., arguments studied earlier in the experiment) to become less recallable in the future than they would have been otherwise. This finding, which adds to the accumulating body of work showing that the selective retrieval or generation of a subset of information can cause the forgetting of other information (for review: Storm et al., 2015), suggests that generating arguments affects not only one’s memory for the arguments and stances being advocated, but memory for other information as well. In this way, generating arguments and providing explanations does not merely reflect the passive consideration or read-out of the contents of one’s memory and knowledge base, but the action of processes that have the power to alter the accessibility of the contents of one’s memory and knowledge base.

Importantly, generating new arguments did not cause forgetting when the generated arguments were highly related to the studied arguments, suggesting that integration, or the way in which people attempted to come up with new arguments, served as a moderating variable. This finding fits well with previous work on retrieval-induced forgetting and fan interference, which has shown that the selective strengthening of some information—whether through retrieval practice, generation, or some other process—does not result in the impaired accessibility of other information when the two subsets are integrated or highly related (Anderson & McCulloch, 1999; Chan et al., 2006; Radvansky, 1999; Storm & Patel, 2014). In the present context, this finding suggests that the way in which people go about attempting to generate new arguments in support of a position can determine the mnemonic consequences of such attempts.

Due to the inherent limitations of correlational analysis, it is unclear whether the moderating effect of relatedness can be attributed to the particular arguments generated by participants or to some other variable that covaried with relatedness. It is possible that the generation of related arguments facilitated the recall of studied arguments directly, perhaps because the studied arguments mediated the process of generation. Conversely, it is possible that individuals who generated related arguments simply went about the task differently than participants who did not generate related arguments and that it was this difference in how they went about the task that moderated the effect of forgetting. Participants who attempted to generate new arguments by responding to the studied arguments, for example, might have been more likely to engage in the task in a way that protected the studied arguments from forgetting even if the outputting of specific arguments per se did not confer that protection. It stands to reason that the arguments generated by participants (and their relationship to the studied arguments) did have at least some effect on recall performance, but there are good reasons to think that a participant’s general approach to the task played a role as well. Indeed, participants who failed to generate even a single disagreeing argument related to a studied argument still exhibited a reduced forgetting effect in the Disagree condition (compared with other participants) if they tended to generate related arguments in the Agree condition. Perhaps these participants consciously considered the studied arguments when attempting to generate disagreeing arguments, for example, or used the studied arguments as a starting place in their generative efforts, even if they refrained from outputting related responses when it came time to writing out their responses during argument generation.

Considering the finding that generating related arguments protected the studied arguments from forgetting, it is somewhat puzzling that more forgetting was not observed in the Disagree condition than in the Agree condition. Participants, after all, did generate a higher proportion of related arguments in the Agree condition than in the Disagree condition. One possibility is that when aggregated across participants, even in the Agree condition, participants did not generate enough related arguments to substantially protect themselves from forgetting. Another possibility is that because of the way in which relatedness was coded, it may have simply been easier to identify generated arguments as being related to studied arguments when they were constructed to agree with studied arguments than when they were constructed to disagree with studied arguments. Indeed, participants may have considered or responded to the studied arguments just as often while generating disagreeing arguments as they did while generating agreeing arguments. Although this strategy may have been sufficient to reduce the amount of forgetting that was observed—thus leading to similar forgetting effects overall in the two conditions—it may not have necessarily resulted in participants outputting the type of generated arguments in the Disagree condition that would have been clearly identified as being related to the studied arguments.

Future work may attempt to investigate the role of factors, such as relatedness and strategy, more directly through experimental manipulation. If participants are instructed to employ an integrative or responsive strategy that promotes the generation of arguments related more directly to studied arguments, then the magnitude of the forgetting effect may be greatly diminished or even reversed. It would be interesting to see, if the correlation between relatedness and forgetting persists when the approach and strategies employed by participants during argument generation are controlled. One advantage of the present study’s design is that it provides insight into the consequences of what people do spontaneously in the context of generating arguments. That is, by not giving participants explicit instructions on how to generate arguments, we could assess whether the way in which people naturally generate arguments involves the sort of integrative behaviors or strategies that prevent forgetting. Clearly, for the majority of participants in the current study, it did not. Even though many participants did generate related arguments, when analyzed across the entire sample, an overall forgetting effect was observed.