Don’t believe everything you hear: Routine validation of audiovisual information in children and adults

Piest, Benjamin A.; Isberner, Maj-Britt; Richter, Tobias

doi:10.3758/s13421-018-0807-7

Don’t believe everything you hear: Routine validation of audiovisual information in children and adults

Published: 05 April 2018

Volume 46, pages 849–863, (2018)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Don’t believe everything you hear: Routine validation of audiovisual information in children and adults

Download PDF

Benjamin A. Piest¹,
Maj-Britt Isberner² &
Tobias Richter¹

1466 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Previous research has shown that the validation of incoming information during language comprehension is a fast, efficient, and routine process (epistemic monitoring). Previous research on this topic has focused on epistemic monitoring during reading. The present study extended this research by investigating epistemic monitoring of audiovisual information. In a Stroop-like paradigm, participants (Experiment 1: adults; Experiment 2: 10-year-old children) responded to the probe words correct and false by keypress after the presentation of auditory assertions that could be either true or false with respect to concurrently presented pictures. Results provide evidence for routine validation of audiovisual information. Moreover, the results show a stronger and more stable interference effect for children compared with adults.

Source credibility modulates the validation of implausible information

Article Open access 10 July 2020

Stimulating inference-making in second grade children when reading and listening to narrative texts

Article Open access 28 July 2023

Online emotional inferences in written and auditory texts: a study with children and adults

Article 15 April 2016

Many situations in which language is used require not only the comprehension but also the validation of incoming linguistic information—that is, judging whether the comprehended information is true or false. Recent studies investigating the relationship between comprehension and validation support the assumption that validation occurs immediately and routinely during language comprehension (e.g., Isberner & Richter, 2013, 2014a; Richter, Schroeder, & Wöhrmann, 2009; Singer, 2006). All of these studies have used written materials as stimuli, thus limiting the extant empirical evidence for routine validation to the domain of reading. However, if validation is an inherent component of language comprehension (Cook & O’Brien, 2014; O’Brien & Cook, 2016; Richter et al., 2009; Singer, 2013), it should not be restricted to the processing of written language. Spoken language is often used in face-to-face communications that are characterized by a richer pragmatic context, which includes the physical environment in which the communication is situated. This context potentially forms the basis for validation. Moreover, for communication to be successful, listeners need to align their mental representation with that of the speaker. The comprehension of definite expressions (e.g., sentences with demonstrative pronouns such as This is a car) is a case in point. For comprehending such expressions, listeners need to identify the intended referents that the speaker has in mind (Chafé, 1976). We propose that validation plays a crucial part in this process, as it allows for monitoring the consistency of the content of a spoken message with the visual information that is in the focus of the listener’s visual attention. In this way, validation of audiovisual information might play a major role in establishing and maintaining common ground (Clark & Brennan, 1991) during conversation.

In order to fulfil this function, validation of audiovisual information should proceed in a similarly passive and involuntary manner as the validation of written information. To test this assumption, we conducted two experiments using a Stroop-like paradigm adapted from Isberner and Richter (2014a), one with adults and the other with children. In the following, we will give a short overview of the theoretical background of our study and of previous research regarding language comprehension and validation.

Validation during language comprehension

Language comprehension involves more than the analysis of words, sentences, and texts. The meaning of a sentence must be integrated with information from prior sentences as well as with pertinent background knowledge. This integration process results in a mental representation of the state of affairs described in the text (situation model: van Dijk & Kintsch, 1983; Zwaan & Radvansky, 1998). However, there is a growing consensus that comprehension involves routine and nonstrategic validation processes (O’Brien & Cook, 2016; Richter, 2015; Singer, 2013). Validation processes are assumed to check incoming information for inconsistencies and implausibility and have an effect on whether or not a particular piece of information becomes part of the current situation model (epistemic monitoring: Richter et al., 2009; Schroeder, Richter, & Hoever, 2008). Conscious and strategic validation of incoming information, like that assumed by two-step models of comprehension and validation (Gilbert, 1991), would be unsuited for this purpose. In contrast to the kind of validation highlighted in these models, epistemic monitoring processes are assumed to require little cognitive resources because they rely on knowledge that is activated through passive memory-based processes (e.g., McKoon & Ratcliff, 1995; Myers & O’Brien, 1998; O’Brien & Albrecht, 1992) and are themselves passive and involuntary. Therefore, no conscious and resource-demanding strategies are needed to activate the knowledge that incoming information is validated against, and no such strategies are needed to use the activated knowledge for validating incoming information.

A growing body of research supports the idea of routine and nonstrategic validation during language comprehension (e.g., Isberner & Richter, 2013, 2014a; O’Brien & Cook, 2016; Richter et al., 2009; Singer, 2006; see Isberner & Richter, 2014b, for an overview). Some of these studies have used an epistemic version of the Stroop paradigm (Stroop, 1935). In applications of this paradigm, participants judged whether the last word of a sentence was spelled correctly or incorrectly (Isberner & Richter, 2013; Richter et al., 2009), whether the last word of a sentence had changed color (Isberner & Richter, 2013), or which one of two possible probe words (correct or false) appeared immediately after the presentation of a sentence (Isberner & Richter, 2014a; a task originally introduced by Wiswede, Koranyi, Müller, Langner, & Rothermund, 2013). The experimental sentences were either true (e.g., Mountains are high) or false (e.g., Soft soap is edible) and presented word-for-word at a fixed rate of presentation (e.g., 300-ms/word). Importantly, the truth value of the sentences was irrelevant for responding to the focal task (i.e., orthographic judgments, color change judgments, or probe-word identification), in analogy to the original Stroop task, where the meaning of a word is irrelevant for the task of naming the color in which it is printed. One difference to the original Stroop task is the (sometimes) asynchronous presentation of the sentence that is validated and the stimulus for the focal task (e.g. the probe word correct or false that requires pressing one of two keys). However, it is important to note that the validation response can be formed only at the point where the truth value of the sentence can be computed, and almost all applications of the paradigm (including the present experiments) have presented the stimulus for the focal task immediately after this point (the only exception being the experiment by Wiswede et al., 2013).

In all applications of the paradigm, a congruity effect between the validity of a sentence and the required response in the judgment or identification task occurred (epistemic Stroop effect: Richter et al., 2009). Participants showed slower response times for conditions in which the validity of the sentence and the required response in the task were incongruent compared with congruent conditions. For example, spelling judgments requiring the response “yes” (Is the word spelled correctly?) were slower after invalid sentences (e.g., Soft soap is edible) compared with valid sentences (e.g., Mountains are high; Richter et al., 2009, Experiments 3 and 4). These results may be interpreted as supporting the idea of routine and nonstrategic validation processes during language comprehension, as participants were not able to ignore the validity of the sentences even when it was irrelevant to the task. Other studies using reading times or event-related potentials as indicators of validation (e.g., Ferretti, Singer, & Patterson, 2008; Singer, 2006) similarly support the idea of routine and nonstrategic validation of incoming information during language comprehension (although they, unlike studies using the epistemic Stroop paradigm, do not directly test the involuntary nature of validation; i.e., whether it can be suppressed if necessary). However, common to all of the abovementioned studies is that they are concerned with validation during reading, although the assumption of routine and nonstrategic validation processes, from a theoretical perspective, is not limited to comprehension in a specific modality. So far, there are no studies that have tested the assumption of routine validation processes during the comprehension of spoken language. Therefore, given the potential relevance of validation in face-to-face communications, one goal of the present study is to examine the epistemic Stroop effect (Richter et al., 2009) for oral language comprehension.

Integrating and validating linguistic information with visual information

In many situations involving oral language comprehension, such as watching TV or engaging in conversations, the incoming auditory information is accompanied by information from the listener’s visual environment. Very often, comprehension requires processing these different sources of information in conjunction, which necessitates directing visual attention to (potential) referents of the linguistic input in the real world. Tanenhaus, Spivey-Knowlton, Eberhard, and Sedivy (1995) were among the first to investigate systematically the interplay between visual and linguistic information. They had participants listen to auditory sentences like “Put the apple on the towel in the box,” in which one phrase (in this example, “on the towel”) was ambiguous (it could be either a modifier or a destination), while concurrently presenting them with sets of objects and recording their eye movements. The results showed that listeners presented with ambiguous sentences use visual information immediately to avoid a syntactic misanalysis (Tanenhaus et al., 1995). This paradigm used by Tanenhaus and colleagues is now known as the visual world paradigm and has inspired a large body of research during the past decades (for a detailed review, see Huettig, Rommers, & Meyer, 2011). In later experiments, the visual world paradigm has been used to show that people draw on both the visual context and their world knowledge to anticipate upcoming linguistic input (e.g., Altmann & Kamide, 1999, 2007, 2009; Knoeferle & Crocker, 2006, 2007). Altmann and Kamide (1999) presented participants with semirealistic pictures of scenes (e.g., a boy, a cake, toys) and auditory sentences regarding these scenes (e.g., “The boy will eat/move the cake”). They showed that participants started to focus significantly earlier on the cake in the “eat” condition compared with the “move” condition. Even though there was no explicit experimental instruction to focus on the screen or, as in the study by Tanenhaus et al. (1995), to move a particular object, participants focused on the relevant objects mentioned in the sentence.

Studies using the visual world paradigm provide compelling evidence that comprehenders immediately and incrementally integrate linguistic and visual information, enabling them to both disambiguate and anticipate linguistic input. However, this also suggests the existence of an efficient mechanism to detect discrepancies between these two sources of information. We propose that validation serves as such a mechanism, just as it allows detecting discrepancies between linguistic information and prior knowledge. In the present study, we tested this assumption with a variant of the epistemic Stroop paradigm, which combined pictures with auditorily presented sentences that either matched or mismatched the content of the picture. The paradigm emulated a basic requirement in face-to-face communication (i.e., for listeners to monitor whether their visual attention is focused on the correct target of demonstrative referential expressions).

How Stroop-like is the epistemic Stroop effect?

Most of the previous studies investigating validation during language comprehension using a Stroop-like paradigm have reported an interference effect between the required response in a simple judgment or decision task and the task-irrelevant truth value of an assertion presented immediately before (e.g., Isberner & Richter, 2013, 2014a; Richter et al., 2009). However, one may wonder whether this effect is indeed evidence for routine and nonstrategic validation processes or whether it could just be an artifact of the task that might induce an evaluative mindset (as criticized by Wiswede et al., 2013). To address this question, it is useful to examine how the epistemic Stroop effect compares to the classical Stroop effect (Stroop, 1935). The classical Stroop paradigm (Stroop, 1935) as a commonly used and well-evaluated paradigm has proven to be a good instrument to investigate routine and automatic processes. Although the Stroop effect is known to be robust, studies have shown that participants can learn to control Stroop interference with practice (Dulaney & Rogers, 1994; Ellis & Dulaney, 1991; Ellis, Woodley-Zanthos, Dulaney, & Palmer, 1989). Therefore, the magnitude of the Stroop effect seems to be a function of the time on task. Thus, if the epistemic Stroop effect is indeed due to routine validation processes during language comprehension, the interference effect should be strongest at the beginning of the experiment and decrease over the course of the experiment. This prediction is rooted in the assumption that individuals will be able to learn to suppress the response tendency resulting from the validation process, as they are able to learn to suppress the reading response in the classical Stroop color-naming task.

Is the epistemic Stroop effect stronger for children than for adults?

Developmental studies investigating the classical Stroop effect reported an inverted U-shaped trajectory of Stroop interference. The Stroop effect appeared in elementary school children already able to read and increased during the first 2 to 3 years of reading practice, and then decreased continuously during adolescence (e.g., Comalli, Wapner, & Werner, 1962; Dash & Dash, 1982; Peru, Faccioli, & Tassinari, 2006; Rand, Wapner, Werner, & McFarland, 1963; Schadler & Thissen, 1981; Schiller, 1966). It has been suggested that the Stroop interference is a function of reading practice, but that this positive relationship is overshadowed by the concurrent development of inhibitory capacity, which improves during early adolescence (Comalli et al., 1962). That would explain the increase of Stroop interference in the first years after reading acquisition, and the decrease during adolescence. In line with this idea, a number of studies have shown larger Stroop interference for children than for adults (e.g., Carter, Mintun, & Cohen, 1995; Comalli et al., 1962; Guttentag & Haith, 1978; Vurpillot & Ball, 1979), and there are other studies suggesting that children have relatively weaker inhibitory control (Ridderinkhof, Band, & Logan, 1999; Tipper, Bourque, Anderson, & Brehaut, 1989; cf. Bub, Masson, & Lalonde, 2006). Therefore, if the validation of oral language against information from the visual context is indeed a routine process, similar age-related differences should occur for the audiovisual epistemic Stroop effect as well. Specifically, we expected children to exhibit a larger and more stable epistemic Stroop effect than adults. If the epistemic Stroop effect is already present (and even stronger) in children, this would also constitute evidence for the assumption that validation is not merely a learned higher-level reading process that becomes automatized over time, but indeed inherent to and a fundamental component of language comprehension.

Rationale of the present experiments

The present research aimed at answering three related research questions that revolve around the assumed routine validation of linguistic information. First, given that earlier research on validation focused on readers’ knowledge-based validation in the processing of written information only, we sought to establish an epistemic Stroop effect with audiovisual information. We base this endeavor on the assumption that validation processes play an important role in aligning linguistic and visual information in communication situations. In two experiments, we tested this assumption by showing participants pictures of objects and concurrently presenting auditory assertions about these objects that were either true or false. After the presentation of each picture-assertion combination, participants responded to one of two probe words (correct or false) by pressing one of two different keys. The probe words could be either congruent or incongruent with the truth value of the picture-assertion combination (audiovisual epistemic Stroop task). More specifically, the experimental sentences were simple assertions with a demonstrative that required a visual context for interpretation (e.g., This is a car). However, the task was not to interpret or validate the assertion or to relate its content to the picture but only to identify the probe word that appeared after the assertion by pressing the corresponding key. For conditions in which the truth value of the picture-assertion combination and the target word were incongruent, we expected slower responses to the target word and higher error rates compared with conditions in which the truth value of the picture-assertion combination and the target word were congruent (audiovisual epistemic Stroop effect).

The second aim was to show that the audiovisual epistemic Stroop effect is strongest in the beginning of an experiment and decreases over the course of an experiment as the number of incongruent picture-assertion combinations that a participant has seen increases. This pattern would provide additional evidence for the notion of routine validation processes that occur spontaneously but whose interference can be suppressed through strategies learned over the course of an experiment, just like the interference by the reading response in the original Stroop task.

The third aim was to investigate whether the magnitude of the audiovisual epistemic Stroop effect differs between children and adults. We conducted two experiments, Experiment 1 with adults and Experiment 2 with children (fourth graders), to test the assumption that the epistemic Stroop effect is stronger for children compared with adults.

Experiment 1

Experiment 1 used the audiovisual version of the epistemic Stroop paradigm with adults. The trials were divided into three blocks to investigate potential changes in the magnitude of the epistemic Stroop effect over the course of the experiment.

Method

Participants

Sixty-nine undergraduate psychology students from the University of Kassel (Germany) participated in the experiment in exchange for course credit. All participants (54 females and 15 males) reported normal or corrected-to-normal vision and were either native German speakers or spoke German since the age of 6. Their average age was 25.8 years (SD = 7.1).

Stimulus material

The stimuli were valid or invalid auditory assertions about simple pictures. All pictures depicted a colored or black object, for example, a car, on a white background. The pictures were simple, schematic pictures for which conflicts with participants’ world knowledge were highly unlikely. The auditory assertions had the structure “This is [a/an] [concept noun]” (e.g., This is a car) and were 2,000-ms long with a flow time between 150 and 250 ms after the last audible sound. The final stimulus set consisted of 240 assertions and 240 pictures. Two pictures of the same category (e.g., car/bike of the category vehicles) and their corresponding valid assertions (e.g., This is a car/This is a bike) were combined to create an item with four versions (two valid and two invalid versions; see Fig. 1). To make sure that no picture or assertion was presented more than once to the same participant, we created four lists with 120 items each (60 valid and 60 invalid versions), including one version of each item.

Norming study

The complete material was pilot tested. The 14 participants of the norming study were asked to indicate for an original pool of 182 items whether the assertion about the picture was valid (correct) or invalid (incorrect). The items were presented in random order on a computer screen, using four different item lists to counterbalance the four versions of each item. All items with more than two incorrect responses across all four versions were dropped from the item pool. Furthermore, all items with response latencies that deviated more than three standard deviations from a participant’s overall mean, a participant’s mean reaction time for valid items or a participant’s mean reaction time for invalid items were dropped. This resulted in a set of 121 items, 120 of which were selected as experimental items; 62 further items from the original item pool were used as example or icebreaker items.

Procedure

Participants were tested in groups of up to five people and instructed to press one of two keys in response to the probe words correct (German: richtig) or false (German: falsch). Participants responded to the probe word correct by pressing the K key with the index finger of the right hand and to the probe word false by pressing the D key with the index finger of the left hand. Thus, the validity of the picture-assertion combination was irrelevant for the probe-word task. In 30 of the 120 trials, participants were prompted to categorize the object presented in the picture by pressing a key (four response options). These control questions were used to ensure that the participants paid attention to the pictures (and could not, for example, simply close their eyes to suppress the assumed interference). Here, again, the validity of the picture-assertion combination was irrelevant to successfully solve the task.

The sequence of each trial was as follows: After a fixation point displayed for 500 ms, the picture was displayed for 2,200 ms. One hundred ms after the onset of the picture, the participants heard the auditory assertion about the picture via headphones. Each assertion had a length of 2,000 ms. One hundred ms after the offset of the assertion, the picture disappeared and one of the two probe words was presented until participants provided a response to the presented probe word (see Fig. 2). In the 30 trials that contained an additional categorization task, the prompt to categorize the picture and four alternative responses appeared on the screen immediately after the response to the probe word. The items were presented in three blocks of 40 items each. After each block, participants were allowed to take a short break before starting the next block. Furthermore, they received feedback regarding their response latencies and accuracy in the previous block. If a person’s accuracy was lower than 80% in the previous block, the person was reminded that the task was not to validate the truth value of the picture-assertion combination, but to simply respond correctly to the presented probe word. The experiment took on average between 20 and 30 minutes.

Design

The design was a 2 (validity: valid vs. invalid picture-assertion combination) × 2 (probe word: correct vs. false) × 3 (block: 1 vs. 2 vs. 3) within-subjects design. The dependent variables were the response latencies and the response accuracy in the epistemic Stroop task.

Results and discussion

Response latencies and error rates were analyzed with linear mixed-effects models by using the lmer and glmer function of the R package lme4 Version 1.12 (Bates, Mächler, Bolker, & Walker, 2015). Interactions were further analyzed using the lsmeans function in the lsmeans package (Lenth, 2016). In all significance tests, Type I error probability was set at .05 (two-tailed).

Data cleaning

In a first step of data cleaning, responses within 10 ms after stimulus onset or exceeding 5 s were removed from the data set (0.07% of the data points). In the second step, participants and items were screened for unnaturally high error rates. Data from participants with an error rate of more than 40% in the epistemic Stroop task were removed from the data set, resulting in the exclusion of two participants. This cutoff was chosen because even though the task was an easy one, we expected participants to make more errors in incongruent conditions compared with congruent conditions. Thus, an error rate of 50% could be a result of random responses, a very strong epistemic Stroop effect, or a misunderstanding of the task. The cutoff was chosen to be low enough to exclude participants that responded randomly and high enough to keep participants that showed a strong epistemic Stroop effect. The average error rate for the experimental items in the epistemic Stroop task was 1.8%, with no item exceeding an error rate of 8%. Therefore, no items needed to be removed. All participants made less than 40% errors when responding to the control questions, and no participants were removed based on this criterion. Overall, the data cleaning resulted in a data set with 8,038 data points. This data set was used for the analysis of the error rates.

For the analysis of the response latencies, a third step of data cleaning was applied to the data set of correct responses resulting from the first data cleaning. An inspection of the distribution revealed the positive skew typical for response latencies, which often leads to a nonnormal distribution of the residuals (and thus, a violation of the assumptions of linear mixed models). To find the most adequate transformation for achieving a more symmetrical distribution, a Box-Cox analysis was used, revealing a lambda close to zero (λ = −0.11). Based on the ladder of powers (Mosteller & Tukey, 1977), a log transformation was determined to be the most adequate transformation to reduce the nonnormality. Response latencies deviating more than two standard deviations from the log-transformed mean of each participant (4.5% of the data points) were treated as outliers and removed from the data set. This final step of data cleaning procedure resulted in a data set with 7,545 data points.

Response latencies

Response latencies of correct responses were analyzed with a linear mixed model with random effects (random intercepts) of subjects and items.

Table 1 provides estimates and significance tests of the fixed effects. Here and in the remainder of the manuscript, we describe only the effects relevant for our hypotheses. (The other significant effects are displayed in the Tables 1, 2, 3, and 4; none of them affected the interpretation of hypothesis-relevant results.)

Table 1 Estimated coefficients, standard errors, degrees of freedom, and t values for the linear mixed model of the log-transformed response latencies in Experiment 1

Full size table

Table 2 Estimated coefficients, standard errors, and z values for the generalized mixed model of the error rates in Experiment 1

Full size table

Table 3 Estimated coefficients, standard errors, degrees of freedom and t values for the linear mixed model of the log-transformed response latencies in Experiment 2

Full size table

Table 4 Estimated coefficients, standard errors, and z values for the generalized mixed model of the error rates in Experiment 2

Full size table

First, the predicted epistemic Stroop effect in terms of an interaction of probe word and validity of the picture-assertion combination was significant. Planned comparisons revealed that this interaction was due to responses to the probe word false (M = 535 ms, SE = 10 ms) being slower than responses to the probe word correct (M = 515 ms, SE = 9 ms) after valid picture-assertion combinations, t(91.7) = −6.19, p < .001, whereas no significant difference was observed for responses after invalid picture assertions, t(91.2) < 1.

Furthermore, the three-way interaction of probe word, validity, and block was significant (see Fig. 3). Separate follow-up tests for each block revealed a disordinal interaction between validity and probe word, t(7,443.8) = −4.43, p < .001, in Block 1: After valid picture-assertion combinations, responses to the probe word false (M = 553 ms, SE = 11 ms) were slower compared with the probe word correct (M = 536 ms, SE = 10 ms), t(91.4) = 2.83, p < .05. For invalid picture-assertion combinations, the reverse effect occurred. Here, responses to the probe word correct (M = 571 ms, SE = 11 ms) were slower compared with the probe word false (M = 550 ms, SE = 11 ms), t(92) = −3.31, p < .01. Thus, responses were overall slower when the probe word mismatched than when it matched the task-irrelevant validity of the picture-assertion combination.

In Block 2, the follow-up analysis again revealed a significant two-way interaction of probe word and validity, t(7,443.8) = −3.41, p < .001. This interaction was now semidisordinal, with responses to the probe word false (M = 531 ms, SE = 10 ms) still being slower compared with responses to the probe word correct (M = 509 ms, SE = 10 ms) after valid picture-assertion combinations, t(92.1) = 3.84, p < .01. However, the difference between the probe words after invalid picture-assertion combinations disappeared, t(91.7) = −.93, p = .787.

In Block 3, the two-way interaction of probe word and validity was no longer significant, t(7,443.8) = −0.53, ns. Instead, there was a strong main effect of probe word, t(7,446.5) = 5.27, p < .001, with responses to the probe word false (M = 522 ms, SE = 10) now being generally slower than to the probe word correct (M = 499 ms, SE = 10), regardless of the validity of the picture-assertion combination.

Error rates

The error rates in the epistemic Stroop task were analyzed with generalized linear mixed models with subjects and items included as random effects (random intercepts). Table 2 provides estimates and significance tests of the fixed effects.

First of all, the predicted epistemic Stroop effect (i.e., the interaction of probe word and validity of the picture-assertion combination) only showed a tendency in the predicted direction and did not become significant, z = −1.77, p < .1. However, planned comparisons revealed that as predicted, the probability of false responses to the probe word correct (probability = .02, SE = .00) was slightly higher than for the probe word false (probability = .01, SE = .00) after invalid picture-assertion combinations, z = −3.01, p < .01. After valid picture-assertion combinations, error probability did not differ between the probe words false (probability = .01, SE = .00) and correct (probability = .01, SE = .00), z = −0.51, ns. The three-way interaction of probe word and validity with block was not significant. Separate follow-up tests for each block yielded no significant interaction of probe word and validity. Only in Block 1, the interaction between probe word and validity approached significance, z = −1.92, p < .1, again due to a higher error probability for the probe word correct (probability = .02, SE = .01) compared with the probe word false (probability = .00, SE = .00) after invalid picture-assertion combinations, z = −2.64, p < .01 (see also Fig. 4).

In sum, these results support the assumption of routine validation of audiovisual information. An epistemic Stroop effect occurred for the response latencies in the overall analyses across all three blocks of the experiment (and a similar, though not significant, pattern emerged for the error rates). Furthermore, separate analyses for each block showed that the epistemic Stroop effect was present from the very beginning of the experiment. However, it looks like participants were able to develop strategies against the interference of the task-irrelevant validity of the picture-assertion combinations with the task of responding to the probe words. At the beginning of the experiment, participants showed a symmetrical epistemic Stroop effect in the response latencies with slower responses to validity-incongruent probe words after both valid and invalid picture-assertion combinations. However, this effect decreased in Block 2 and was no longer significant in Block 3.

These results suggest that adults are able to avoid the interference by strategically inhibiting the response tendency resulting from the validation process, which somewhat distorts the overall effect. For this reason, in Experiment 2, we applied the paradigm used in Experiment 1 to children in Grade 4. Due to the weaker inhibitory capacity in this population (e.g., Carter et al., 1995; Comalli et al., 1962; Guttentag & Haith, 1978; Ridderinkhof et al., 1999; Tipper et al., 1989; Vurpillot & Ball, 1979) compared with the adult participants of Experiment 1, we expected the audiovisual epistemic Stroop effect to be even stronger and more stable in Experiment 2, while the general pattern of results should be preserved.