1 Introduction

Psychological research at least implicitly takes for granted that scientific results from the laboratory generalize to real-world cognition. Despite this high claim of ecological validity, experimental designs nevertheless often involve the use of simple and static stimuli in an artificial setting lacking many of the potentially important aspects of real-world experiences. An iconic example for the investigation of attentional mechanisms by means of artificial but well-controlled methods is the study by Posner and colleagues (Posner et al. 1978). However, the ecological approach is assumed to lack internal validity and experimental control (e.g., Parsons 2015).

Virtual reality (VR) is a methodology that provides the possibility of bridging the gap between the real world and the laboratory. VR-based paradigms could substantially increase the ecological validity of psychological research under controlled laboratory conditions. This means that the experiment takes place in a real-life environment and every subject experiences precisely the same three-dimensional real-life scenario without the variance that naturally occurs, e.g., whenever persons are involved in an experiment (Parsons 2015). Previous studies have shown that VR elicits lifelike psychological responses. Encoding mechanisms in virtual reality closely resemble real-life mnemonic processing (Kisker et al. 2019a). VR experiences lead to the formation of profound autobiographical memory as opposed to rather superficial episodic memory (Schöne et al. 2017). Wayfinding (Skorupka 2009), as well as orientation (Kuliga et al. 2015), is similar in VR as compared to real-world conditions. Furthermore, virtual experiences correspond to real-life experiences regarding psychophysiological reactions like electrodermal activity, heart frequency, and variability (Higuera-Trujillo et al. 2017, see also Kisker et al. 2019b).

In the present experiment, we intended to extend these findings to the realm of attentional processing. To examine possible differences in attentional processing between laboratory and realistic VR experiences, we opted for a well-known paradigm with a low abstraction level. Each year, thousands of psychology students are left perplexed when confronted with Simons’ and Chabris’ invisible gorilla paradigm (Simons and Chabris 1999), which is an essential part of almost every psychological introductory course. In the paradigm, a gorilla walks through a scene of two teams passing basketballs. Unexpectedly, the gorilla remains unnoticed to most of the observers as they are engaged in a primary monitoring task, which is counting the passes (Simons and Chabris 1999). The fact that most people miss something salient like a gorilla although it walks in front of their eyes contradicts our long-held introspection regarding our cognition and attentional processes, respectively. So, either the data obtained by psychological research does not correspond to real-world cognition, or our introspection is misguided.

In fact, previous studies employing different real-life paradigms provided evidence that the inattentional blindness effect is also profound under realistic conditions and not restricted to conventional laboratory settings using a monitor (e.g., Hyman et al. 2010; Chabris et al. 2011; Furley et al. 2010; Pammer et al. 2017). Up to now, however, no study was able to directly investigate whether attentional processes exhibit the same functional principles under laboratory and real-life conditions based on the same paradigm.

Lifelike responses might play a crucial role in the invisible gorilla paradigm. Specifically, attentional mechanisms could change their mode of operation in a more realistic environment, in which the participant is under the impression of physical presence. A gorilla appearing in the observer’s proximity provides an opportunity of physical interaction, which is not given in the original study’s screen condition, and consequently, the gorilla should not go unnoticed in our VR study. This applies in particular as a gorilla, as an unknown entity, poses a potential threat towards the observer.

Taken together, we hypothesized that the immersive nature of VR, creating a high sense of being present in the scene might beside other cognitive processes also modulate attentional processes. Especially as a salient object, i.e., the gorilla, is placed within the observer’s proximity. This should manifest in a higher noticing rate for the realistic VR condition. In order to investigate the effect of three-dimensionality in VR on attentional processes and as well as upon request by a reviewer, an additional monoscopic VR (2D/360°) condition was included. The rationale behind this approach was to determine whether three-dimensionality or the salience of the gorilla has a more profound effect on the noticing rate.

2 Methods

2.1 Participants

The study was conducted in accordance with the Helsinki Declaration and approved by the local ethics committee of Osnabrück University; participants gave their informed written consent. As we anticipated difficulties in recruiting enough naïve participants, we decided to provide 60 possible slots for the participants and tested everybody who was enrolled in a slot during the measurement period. Psychology students were responsible for recruiting participants and collecting data in the course of a study project. Eventually, 48 subjects participated in the experiment, and due to the surprisingly low exclusion rate, 41 subjects were included in the analysis. The participants gave informed consent and received course credit or were paid 6€ per hour for participation. All of them had normal or corrected-to-normal vision. No subject had to be excluded due to psychiatric or neurological disorders.

All participants were students from Osnabrück University or the University of Applied Sciences of Osnabrück. The students had a large variety of fields of studies. In all groups, some participants had experienced VR before, none of them extensively. Most subjects did report no prior experience with VR.

Participants were randomly assigned to the video or the realistic VR condition (3D/360°) (video: N = 19, Mage = 21.47, SD = 2.412, 16 right-handed, 9 female; realistic VR: N = 22, Mage = 22.5, SD = 3.203, 19 right-handed, 13 female). Upon request after the first review, 57 people participated in the additionally conducted monoscopic VR (2D/360°) condition. Thirty-four participants were excluded from the analysis due to prior knowledge of the inattentional blindness effect, leaving 23 (monoscopic VR: N = 23, Mage = 23.13, SD = 3.334, 21 right-handed, 19 female).

2.2 Materials

We created a video as similar as possible to Simons’ and Chabris’ original. It shows two teams of three players passing standard orange basketballs. Through the color of their t-shirts (white or black), it was distinguishable to which team the players belonged. The basketballs were never exchanged between the teams. Each team passed the basketball as either bounce passes or aerial passes in a regular order (player one would pass to player two, who would pass to player three, who would pass to player one and so forth). The players were instructed to make movements consistent with the overall pattern of action, e.g., wave their arms, dribble the ball and move around in a relatively random fashion. As in the original study, the players were filmed in front of a bank of three elevators in an open area (approximately 3 m deep × 5 m wide). The only difference was that in our video the area was limited by a pillar at each side. After 50 s, a person in a gorilla costume walked from the right to the left pillar, totally visible for roughly 5 s. In the center of the scene, it turned to face the camera. Before and after it hid behind the pillars and could not be seen by the observers. Players continued their actions during and after the event.

A 3D360° camera, set to a height of 1.63 m (average eye level of Europeans), was used to create a 4K video at 60fps and 78 s in length. The footage was taken with an Insta360 Pro (https://www.insta360.com/product/insta360-pro) and rendered with the Insta360 Stitcher (https://www.insta360.com/download/insta360-pro?inspm=77c1c2.89f76e.0.0). The setup of the camera was stationary; hence, the participants in the VR condition could not change perspective by walking around, e.g., discover hidden objects behind the pillars (Fig. 1).

Fig. 1
figure 1

Screenshot from the video. The person in the gorilla costume is in the middle of the scene. The participants in the VR conditions were instructed to keep the two pillars in sight at all times

2.3 Procedure

Participants were held blind to the hypothesis. Before watching the video, they were informed that they would see two teams of three players passing basketballs and were instructed to keep a silent mental count of the total number of passes of the white team. They were asked to write down their count on paper afterwards. Following the procedure of Simons and Chabris (1999), we asked the participants to provide answers to a surprise series of additional questions. I: While you were doing the counting, did you notice anything unusual on the video? II: Did you notice anything other than the six players? III: Did you see anyone else (besides the six players) appear on the video? IV: Did you see a gorilla walk across the screen? If any of the questions were answered with “yes” the experimenter asked for more details. If at any point the participant mentioned the gorilla, the remaining questions were skipped. Subsequently, the participants were asked whether they had ever participated in an experiment similar to this or had heard of such an experiment or the general phenomenon of inattentional blindness. Finally, the participants were debriefed by replaying the video. Each testing session lasted approximately 10 min. Eligibility criteria for including a participant to the final sample were ignorance of the general phenomenon of sustained inattentional blindness and similar experiments. Furthermore, the subjects’ total pass count was not allowed to differ more than one standard deviation from the mean of the other participants in that condition.

Three different conditions were implemented: one condition that used conventional 2D stimuli and two VR conditions; realistic 3D/360° and monoscopic 2D/360°.

The laboratory video condition (2D) was very similar to the original study. The participants sat in front of a monitor (screen diagonal 40 cm, visual angle: 10°) and watched a 2D video. In both VR conditions, the participants stood in the middle of the room wearing an HTC Vive head-mounted display (https://www.vive.com/de/product/). They were standing because it seemed more natural due to the positioning of the camera and the filmed scene and thus increased immersion.

Additionally, they were instructed to avoid looking around, i.e., keep the area between the two pillars in sight. This instruction was given to make sure the participants would not be distracted by the surroundings of the scene and were able to count the passes correctly—and hence, notice the passing gorilla. There were no instructions regarding the eye movements of the participants given. Visual observation of the participants during the experiment made sure these instructions were followed. Like in Simons’ and Chabris’ (1999) study, there were neither sounds nor music added to the video.

We implemented the task with the lowest noticing rates (42%) from the original study (Simons and Chabris 1999), i.e., participants counted the passes of the white team without differentiating between bounce and aerial passes.

The general idea of the experiment was to investigate whether attentional processes in VR differ from those under conventional laboratory conditions. To this end, we implemented a typical holistic VR setup emphasizing the foremost visual divergences compared to the typical laboratory condition. Taken together, differences such as stereoscopic vision, object size, visual angle of objects, the placement of the participant create a sense of presence, i.e., being in the virtual scene. In conclusion, VR and laboratory conditions are not comparable with respect to those attributes, but in order to investigate attentional processes under laboratory and realistic conditions, we did not aim to hold these factors constant.

The original study was preregistered at the Open Science Framework (OSF) in November 2017 (https://osf.io/n724j/). Data and stimulus material, as well as the scripts for the Bayesian data analysis, can be downloaded from: https://osf.io/e8x7t/?view_only=160a7d759e9f47a9a2df2d9b3ff72b35/.

2.4 Statistical analysis

The data was analyzed conventionally using Chi-square test (χ2) as well as with a Bayesian approach. This analysis was conducted using the bmrs package for R (Bürkner 2017), which provides an interface to fit Bayesian generalized multilevel models, as well as with the Tidyverse package (Wickham 2017). The multilevel  models were created by means of the Markov chain Monte Carlo algorithm within the Stan computational framework (http://mc-stan.org/) (Gelman et al. 2015).

3 Results

Out of 105 participants 41 were excluded from analysis due to different reasons: either (I) the subject had an a priori knowledge of the experiment or the general phenomenon (total: n = 39; video: n = 1, monoscopic VR: n = 34, realistic VR: n = 4); (II) the subject’s total pass count differed more than one standard deviation from the mean of the other participants in that condition (total n = 1; video: n = 1, both other conditions: n = 0) or (III) the subject did not follow instructions while watching the video (total n = 1; realistic VR: n = 1, both other conditions n = 0).

In total, the results of 64 participants were included in the analyses. Simons and Chabris (1999) aggregated the results of the four questions that were asked after the presentation of the video. In our study, subject’s responses were always consistent across the four questions; thus, we also report overall rates of noticing. As a consequence, the outcome variable was dichotomous (gorilla noticed vs. not); hence, the χ2-test was used. As a side note, aggregating the data might lead to false-positive answers, therefore making it more difficult to find an effect, which in turn would contradict our hypothesis.

In total, 56.3% of the participants in our study noticed the gorilla, whereas in Simons’ and Chabris’ (1999) study 42% noticed it. In the laboratory 2D condition, more participants failed to notice the gorilla as compared to the realistic VR condition (χ2 = 5.467, p = 0.019, n = 41). Specifically, in the realistic VR group, 68.2% did notice the gorilla, whereas in the laboratory video group only 31.6% noticed it, which means the overall pattern of noticing is nearly reversed. In the monoscopic VR condition (2D360°), 65.2% noticed the gorilla. There is no difference as compared to the realistic VR condition (3D360°) (χ2 = 0.044, p = 0.833, n = 45), but more participants noticed the gorilla as opposed to the participants in the laboratory video condition (χ2 = 4.709, p = 0.03, n = 42). The subject’s total pass count did not differ significantly between the laboratory 2D, the realistic VR condition, and the monoscopic VR condition (F(2,61) = 0.878, p = 0.421; Mvideo = 35.58, MMonoscopicVR = 35.48, MRealisticVR = 35.14). Therefore, it can be concluded that sustained attention was equal in all groups (Fig. 2).

Fig. 2
figure 2

Rates of noticing the gorilla in percent per group. Whereas the conventional laboratory condition replicates the original results, the noticing rates in the VR conditions are much higher

The Bayesian analysis revealed that the probability of detecting the gorilla in the realistic VR and monoscopic VR condition (68.35% and 65.19%, respectively) was notably higher as compared to the laboratory video condition (31.60%). Specifically, the probability of detecting the gorilla in the realistic VR condition as compared to the laboratory video condition was 37.08% [CI 0.07, 0.64] higher. This claim can be made with 98.9% certainty. The comparison of the monoscopic VR condition and the laboratory video condition showed a 34.0% [CI 0.04, 0.60] higher probability of detecting the gorilla. The level of certainty here is 98.65%. The seemingly large confidence intervals result from the sample size; however, the effect is evident given the ~ 99% certainty in both cases. Between the realistic VR condition and the monoscopic VR condition, only a small difference was found: 3.2% [CI − 0.25, 0.31] with a certainty level of 58.9%. Hence, it is only 3.2% more likely to detect the gorilla in the realistic VR condition as compared to the monoscopic VR condition. However, the large confidence interval exhibiting negative values renders this difference rather irrelevant.

4 Discussion

The present study aimed to investigate differences in attentional processing between virtual reality and conventional laboratory conditions at the example of sustained inattentional blindness as observed in the famous invisible gorilla paradigm (Simons and Chabris 1999). In particular, we confronted our participants with stimuli similar to the original study either in a conventional laboratory setting or in immersive VR environments. First of all, under laboratory conditions we were able to replicate the experiment of Simons and Chabris (1999), surpassing the original noticing rate of 42% with 31.6%. The acquired noticing rates for the unexpected event were comparable to those found in similar laboratory experiments (Becklen and Cervone 1983; Stoffregen et al. 1993) (35% and 15%, respectively). Taking all those findings into account, the inattentional blindness effect is to be considered a stable effect in varying experimental settings. However, the noticing rate was nearly reversed when we examined inattentional blindness under more realistic VR conditions. Whereas under laboratory video conditions approximately 70% of the subjects missed the gorilla, in VR the rate decreased to only 30%. Furthermore, in the monoscopic VR condition, a similar missing rate of 35% was obtained as in the realistic VR condition. It should be noted, that although being considerably diminished a significant proportion of the inattentional blindness effect remains intact even under (realistic) VR conditions.

Most importantly, our noticing rate is equivalent to those from experiments investigating inattentional blindness under various real-life conditions (e.g., Chabris et al. 2011; Simons and Schlosser 2017). However, there is no evidence that real-life inattentional blindness is generally less pronounced than laboratory inattentional blindness. It rather seems that the magnitude of the effect varies as a function of the experimental design as opposed to location (i.e., in or outside the laboratory). To our best knowledge, no published study has systematically addressed the differences between the inattentional blindness effect under real life and under laboratory conditions, i.e., investigating the same design in and outside the laboratory. The experiment could give the impression that apples are compared to oranges. We would therefore like to point out that modality effects are investigated in a comparable way: The encoding takes place either visually or auditory, but the retrieval takes place under the same conditions (Murdock and Walker 1969; Penney 1989).

The immersive nature of VR allows ascribing a high degree of reality to these settings (Diemer et al. 2015; Kisker et al. 2019a; Kuliga et al. 2015; Schöne et al. 2017; Skorupka 2009) as participants generally report a high sense of presence, i.e., a strong feeling of actually being in the scene. Hence, in terms of ecological validity, VR directly allows to investigate cognitive and emotional processes under realistic conditions and compare them to laboratory settings. Although there has been research on 3D attentional processing (e.g., Paletta et al. 2013) as well as comparisons between 2D and 3D experiences (e.g., Rooney and Hennessy 2013), up to our knowledge, this is the first experiment to make a comparison between real life and the laboratory using the exact same experimental design. This means that we showed all participants the same video, thereby excluding any variance in the playing sequence, which inevitably would occur in reality.

In the following, we propose some explanatory approaches based on existing models of attention, possibly accounting for the behavioral differences under realistic as opposed to laboratory conditions. We are aware of the fact that further research to explain different modes of operation of the attentional system is needed but hope to provide an initial line of thought.

An obvious explanation for these findings at first glance is the VR’s 3D effect. In particular, the three-dimensionality of VR could lead to a more natural distribution of attentional and cognitive resources as compared to the video condition. Lavies’ load theory is commonly employed to address the inattentional blindness effect (Lavie et al. 2004; Palmer 1999). Under high perceptual load conditions, focused attention on the task at hand consumes all available resources and thus prevents the perception of task-irrelevant stimuli (i.e., early selection). When the perceptual load is low, and the task-relevant stimuli do not allocate the vast majority of attentional resources, the remaining capacity is involuntarily used for the perception of irrelevant stimuli (i.e., late selection).

However, the perceptual load did not differ between the monoscopic VR and the realistic VR condition per se as the video material, and thus task-difficulty was the same. Nevertheless, the attentional resources were likely to be exhausted in the video condition since participants constantly had to calculate depth-information based on relative size and occultation of the players in order to track the ball thereby increasing the computational demand (see Benoni and Tsal 2013). However, if this difference in computational load had occurred, it did not affect the noticing rate as we did not find any difference between monoscopic (2D) and stereoscopic (3D) VR.

Taking three-dimensionality out of the equation, the immersive character of realistic and monoscopic VR remains the most important factor suggesting two further explanatory approaches: First, visual information constitutes the recreation of a 3D default space, an inner representation of the visible and non-visible space (Jerath et al. 2015). This default space is the recreation of a continuous world, including hypotheses about ‘what would I see, if I looked there’ (Parr, and Friston 2017). The decisive feature of VR is that it does not restrict attentional processing to a 2D-plane within this default space but constitutes the entire space itself. Although the 3D effect seems to play a neglectable role in the present experiment, the surrounding nature of VR consequently might aid the attentional system in pursuing the course of the basketball, while leaving attentional resources for the detection of further events.

Another possible explanation is that a different object size, i.e., the size of the gorilla, could account for increased noticing rate in the VR conditions. This could be especially the case, as on the one side the perception angle in the laboratory conditions was restricted to 10° and on the other side spanned the whole visual field in the VR conditions. Object size previously has been shown to have an impact on detection rate, e.g., in visual search paradigms, in which larger objects automatically capture more attention (Proulx 2010). In fact, the relation between the size of an unexpected object and the magnitude of the inattentional blindness effect has not been systematically investigated by varying the object size. Furthermore, the body of scientific literature does not allow to draw conclusions about the impact of object size on the inattentional blindness effect as most studies omit information about the size and/or visual perception angle. However, studies investigating the inattentional blindness effect in real life and thus using real-life-sized objects, do no report decreased noticing rates (Pammer et al. 2017 (Taxis and motorcycles), Chen and Pai 2018 (a clown walking by), Hyman et al. 2010 (a unicycling clown), Furley et al. 2010 (life-sized basketball game players). This also holds for a laboratory study by Furley and Heller (2010), who placed subjects standing in front of a large screen and asked them to make a tactical decision in a recorded basketball game. The reported noticing rate was 42% (non-expert condition), even though the unexpected object was a life-sized person in close proximity. Taken together, these experiments provide evidence that object size per se does not necessarily modulate the inattentional blindness effect, at least to an extent that would account for the enhanced noticing rate in our VR conditions, though an effect due to the relative difference between our conditions cannot be ruled out.

Although the size of the gorilla as well as the perceptual angle differed between the VR and the laboratory condition, our study design does not allow to draw conclusions on how a larger size of the gorilla in the VR conditions might have promoted an increased noticing rate. The goal of our study was to compare a classical laboratory experiment to a real-world scenario. To this end, we aimed to keep the typical perceptual characteristic of the VR settings as well as of the conventional laboratory setting. Consequently, we did neither equate the size of the gorilla or the players between nor the distribution of objects across the visual field as well as any other comparable factor between the conditions.

Furthermore, the attentional processes might not benefit from enhanced realism in the VR condition but suffer from adverse effects of the setup in the laboratory condition. In the VR conditions, the stimulus material spans the whole visual field, creating a continuous and homogenous visual impression. Conversely, in the conventional monitor setup the visual impression is more fragmented, potentially drawing attentional resources, which leads to a decreased noticing rate (see Greene et al. 2017).

Aside from rather perceptual explanations, the salience of the gorilla might also account for the increased noticing rates in both virtual reality conditions. Objects might get processed without attention, but it is the salience of the object that leads to it being consciously perceived (Palmer 1999). Novel, and salient events are processed with high priority, automatically gaining attention, especially when they are potentially dangerous (Gable and Harmon-Jones 2010; Corbetta and Shulman 2002). A person in a gorilla costume is an unexpected event as a masked entity behaving in an unpredictable way poses a threat. In the VR condition, this potential threat is placed within the vicinity of the participants’ personal space, making it a salient event automatically capturing attention. These kinds of elevated emotional responses are a typical feature of virtual reality due to the high sense of being physically present in the scene. Specifically, previous research on VR has shown that the affective responses to immersive environments are much stronger than to conventional screen experiences (Gorini et al. 2010; Higuera-Trujillo et al. 2017). Thus, the perceived self-relevance and the resulting salience could be the decisive factor of virtual experiences when it comes to modulating cognitive and emotional processing (cf. Schöne et al. 2018). To this end, we propose to use more realistic settings in order to further enhance the predictability for real-life attentional effects. Especially as VR might trigger affective self-relevance processes, it seems imperative to use more plausible real-life scenes in future studies. The fact that we found no differences between the monoscopic and realistic VR conditions speaks in favor of this salience effect since detection rates do not increase with the amount of available sensory information.

As aforementioned, the study aimed to compare the results of a classical laboratory experiment to results obtained in a more realistic scenario resembling real life. It is in the very nature of this particular design approach that the conclusions drawn regarding the factors modulating the inattentional blindness effect are limited in their explanatory power. A future study employing a fully computer-generated VR environment with the possibility to manipulate saliency, proximity, three-dimensionality, and the computational load of the “invisible object” would be needed to shed light on this issue. Our results implicate that laboratory results do not necessarily generalize to the real world; the magnitude of the inattentional blindness effect seems to be proportional to the realism of the scenario. Accordingly, participants were more likely to notice an unexpected event in their vicinity than on a computer screen.

As a final note, we would like to add that the nomenclature of VR across experimental designs and scientific disciplines is sometimes fuzzy and propose that VR is necessarily characterized by the hardware (HMD/Cave) and the ability to generate the feeling of being at another place by either shielding or masking the actual physical space (mixed reality). Furthermore, by essence, VR is interactive, which is only the case to a limited degree in the present study as participants can only visually explore the scene. Thus, the applied experimental design could also be classified as immersive visualization or immersive media.

5 Conclusion

As hypothesized, our data shows that under natural conditions attentional processes exhibit different properties as compared to rather artificial laboratory conditions. Specifically, the inattentional blindness effect seems to be diminished in settings which resemble reality (i.e., VR). This effect might be explained by the fact that the immersive nature of virtual reality settings triggers self-relevant attentional processing, due to an increased salience of the sensory input. Thus, VR settings can be considered as a useful tool to investigate real-life experiences while maintaining strict experimental control.