Introduction

The Shepard illusion is a powerful illusion in which a parallelogram appears to have a different width and height when it is rotated by a quarter turn (Fig. 1A). As far as we know, the illusion was first described by Roger Shepard in one of his books (Shepard, 1990). The illusion was presented as a curious phenomenon. No mechanisms were proposed to explain it nor were there any quantitative data presented demonstrating its strength. Subsequent research has shown that the illusion is quite strong. In one of our previous studies, we measured the strength of 13 optical illusions in typically developing adults and found that the Shepard illusion yielded stronger effect sizes than all the other illusions examined, including better known illusions such as the Müller-Lyer, Ponzo, and Ebbinghaus illusions (Chouinard et al., 2016). Despite its strength, not many studies have examined the Shepard illusion since it was first described nearly 30 years ago. A Pubmed search in August 2019 using the terms "Shepard"[All Fields] AND "illusion"[All Fields] yielded only four papers related to the illusion. The lack of published research on such a strong illusion is clearly a gap in the literature that needs to be filled. This study represents the first examination of the Shepard illusion in a developmental context.

Fig. 1
figure 1

Illusion and control matching tasks. The figure displays the illusion and control matching tasks. The illusion task consisted of the Shepard illusion in which a parallelogram appears to differ in its dimensions when it is rotated at a different angle (A). The right side of the top panel (A) demonstrates the full visual display with the buttons at the bottom that the participants used to adjust the comparison stimulus (in this case, the vertical parallelogram on the left) to match the standard (in this case, the horizontal parallelogram on the right). The different tasks (A-C) had similar buttons and differed by the stimuli presented in the black area and the dimensions that had to be matched. The control tasks consisted of the size (B) and shape (C) matching tasks

Optical illusions, such as the Shepard illusion, provide opportunities to learn about the mechanisms underlying perception, particularly those that are helpful for seeing the world in a predictable manner but trick us given the right set of circumstances, correcting where a correction is not necessary (Geisler & Kersten, 2002; Gregory, 1980, 2015; Helmholtz, 1867; Sperandio & Chouinard, 2015). According to Richard Gregory’s theory of inappropriate constancy scaling (Gregory, 1980, 2015), a number of geometrical illusions simulate depth cues that trick the brain into perceiving depth on a two-dimensional surface when there is none, causing us to perceive certain shapes as having different sizes. For example, in the Ponzo illusion, the converging lines are thought to simulate linear perspective cues, which normally informs the brain that the upper part of the visual field is further away. Previous experience dictates that the furthest of two objects with identical retinal image sizes is larger in a real three-dimensional environment, which causes us to perceptually rescale the two identical stimuli superimposed over a Ponzo illusion background as having different sizes.

The Ponzo illusion is one of several illusions that illustrate how perception can be an active process involving memory and other internal processes and not just a passive acceptance of stimuli (Gregory, 1980, 2015; Sperandio & Chouinard, 2015). These mechanisms are important to understand because they teach us about the underlying assumptions and rules that the brain uses to make sense of retinal information – particularly with illusions such as the Shepard illusion that are less known and not well understood. Indeed, there is a growing trend incorporating Bayesian models for explaining illusions and visual perception in general (Geisler & Kersten, 2002). Under these models, the brain derives a percept from computing an optimal combination between the sensory evidence captured by the retina and knowledge acquired from previous experience.

Both Alfred Binet (1895) and Jean Piaget (1969, 1999) argued that illusions offer opportunities to understand cognitive development and tease apart mechanisms that are innate from those that are acquired. Their reasoning arose from their research demonstrating how the strength of illusions can either decrease or increase with age depending on the illusion (Binet, 1895; Piaget, 1969, 1999). Both reasoned that illusions showing decreases in strength with age are more likely driven by innate mechanisms that become attenuated as certain abilities in cognition and a better understanding of the world are acquired. These emerging abilities and a better understanding of the world are used to attenuate an innate percept that is not real and might be maladaptive if left to continue to persist strongly during development. Conversely, both Binet and Piaget also reasoned that illusions showing increases in strength with age are more strongly driven by acquired mechanisms obtained during cognitive development that are normally informative in the real world.

For these reasons, we included other measurements to determine what other abilities might develop in parallel with illusion strength. Our choice of tests was exploratory given that the mechanisms of the Shepard illusion are not entirely known and the lack of precedence of examining this illusion in a developmental context. Based on this preliminary work, future studies could hone into more specific abilities. We assessed perceptual abilities in discriminating shape and size with non-illusory matching tasks for the following reason. A less developed perceptual system in younger children may either exaggerate illusions as a way to compensate for not being able to account for sensory noise (Duffy et al., 2006) or diminish them for not properly processing features that are crucial for seeing them.

We also examined verbal and other non-verbal abilities using the Peabody Picture Vocabulary Test (PPVT) (4th Edition) (Dunn & Dunn, 2007) and Raven’s Progressive Matrices (RPM) (Raven et al., 2003), respectively. Both measures are quick to administer and have been used in other developmental studies as a way to provide an estimate of verbal and non-verbal intelligence in children (Chouinard et al., 2018, 2019; Landry et al. 2019). The RPM also offered the advantage of testing Piaget’s ideas that reasoning skills need to be developed before a child can experience illusions in the same way as adults (Piaget, 1969, 1999). Specifically, an understanding of the world, such as the meaning of contextual cues, can only be achieved after a certain level of reasoning skills are developed. It then follows, according to this line of reasoning, that the effects of contextual cues on perception can also only emerge when these skills are sufficiently developed. Although we had no theoretical reason to think that language skills are important for the development of the Shepard illusion, the PPVT is primarily used as a measure of receptive language. Therefore, it also offered the advantage of assuring that the participants could understand verbal instructions.

With this in mind, the present investigation had two aims. The first was to determine whether the Shepard illusion would fall in the innate or acquired class of illusions. The second was to determine whether individual differences in perceptual discrimination and cognitive development might explain changes in illusion strength. To this end, we recruited children between the ages of 6 and 14 years and quantified the degree to which they experienced the illusion. We also measured abilities in matching the size and shape of stimuli, receptive language, and abstract reasoning to determine if changes in illusion strength were associated with these additional factors. We did not favour any specific hypotheses given the exploratory nature of this study and the lack of precedence investigating this illusion in children. It could equally be the case that the illusion is an innate illusion, whereby its strength decreases with age, or an acquired illusion, whereby its strength increases with age. Likewise, because the mechanisms of the Shepard illusion are ill-defined, it could equally be the case that individual differences in perceptual abilities, as measured by shape and size discrimination tasks, and abstract reasoning, as measured by the RPM, explain the variability in illusion strength.

Methods

Overview

As part of a larger study, typically developing children from primary schools in a regional Australian city (Bendigo, Victoria) completed computerised tasks that assessed the strength of the Shepard illusion (illusion task) and abilities in matching the size and shape of non-illusory stimuli (control-matching tasks). In addition, we assessed receptive language using the Peabody Picture Vocabulary Test (PPVT) (4th Edition) (Dunn & Dunn, 2007) and abstract reasoning using the Raven’s Progressive Matrices (RPM) (Raven et al., 2003). A small subset of the data has been published previously for the purposes of matching an ASD sample who performed similar tasks (Chouinard et al., 2018). This earlier study did not examine the effects of age, which was the primary purpose of the present investigation. Task order was counterbalanced across participants to reduce practice or carryover effects. All procedures were approved by the La Trobe University Human Ethics Committee, the Department of Education and Training of Victoria, Australia, and the local schools. Legal guardians of all participants provided informed written consent and confirmed that their child was never diagnosed with a psychological, psychiatric, neurological or neurodevelopmental disorder, as determined by a questionnaire prior to testing. The questionnaire was completed by the legal guardian at home while all other tests were administered at the child’s school in cooperation with classroom teachers to minimise disruption.

Participants

One hundred and seven typically developing children participated in the study (57 males, age range: 6.0–14.7 years, mean age = 10.0 years). One female was excluded from the analyses based on having a standard score lower than 70 on the RPM. Two females were excluded from the analyses on the basis that they did not complete the illusion task due to availability or time constraints. One male was excluded from the analyses on the basis of a raw score on the size matching task exceeding ± 3 SD from the mean. Four males and two females were excluded from the analyses on the basis of having an illusion susceptibility score exceeding ± 3 SD from the mean. Removing these outliers helped to systematically remove noise from the data that would otherwise reflect various aspects of non-compliance, and/or non-reported problems in vision. This resulted in a final sample size of 97 participants (52 males, age range: 6.0–14.7 years, mean age = 10.1 years). Younger children were not tested because we were concerned that they may not understand instructions and/or be able to perform the required tasks.

Procedures

All participants performed the size and shape control-matching tasks before the illusion task. Half the participants completed the PPVT and RPM before the control-matching and illusion tasks while the other half did it afterwards – the order being randomly assigned for each individual. The illusion and control-matching tasks were programmed in Action Script (Adobe Systems, San Jose, CA, USA) and presented using Flash player (Adobe Systems, San Jose, CA, USA).

For both the illusion and control-matching tasks, the participants had to adjust a comparison stimulus to appear the same along a physical dimension as a standard stimulus by pressing the Decrease and Increase buttons displayed on the bottom-left and bottom-centre of the computer screen (right panel in Fig. 1A). The participants were given as much time as they needed to complete each trial and were asked to press the Done button displayed on the bottom-right of the computer screen when they felt they had matched the comparison stimulus to the standard one. The participants completed one trial for each of the control-matching tasks and four trials for the illusion task. The order of the control-matching tasks was generated randomly for each participant. All displays had a black background.

The participants were encouraged to base their adjustments on how the stimuli appeared and were discouraged from using additional strategies that might help them match the stimuli (e.g., imagining a grid on the computer screen, estimating the stimuli with their fingers, etc.). At the start of each task, the experimenter would say “In this activity, we will be matching different shapes. We want you to make the two shapes [experimenter would say and point to what features needed to be matched] appear the same. You’re going to make this one [experimenter points to the comparison] smaller or larger so it looks like that one [experimenter points to the standard]. This button makes it larger [experimenter points to appropriate button] and this button [experimenter points to appropriate button] makes it smaller. Once you’re happy that the two shapes [experimenter would again say and point to what features needed be matched] look the same, press this button [experimenter points to the Done button].” Further clarification was provided if required.

The illusion task consisted of two yellow parallelograms (Fig. 1A). The parallelogram on the left was oriented vertically while the one on the right was oriented horizontally. On each trial, one of the parallelograms was designated as the standard while the other was designated as the comparison. The width of the comparison stimulus was initially presented either 50% smaller or larger than the standard. The order of the trials, each representing one of four possible starting combinations (i.e. smaller comparison on the left, larger comparison on the left, smaller comparison on the right, larger comparison on the right), was generated randomly. The length of both parallelograms remained fixed at 180 pixels. The width of the standard remained fixed at 75 pixels while the width of the comparison stimulus was adjusted by the participants. The perceived width of the vertical parallelogram on the left was expected to be smaller than the horizontal one on the right when both were physically identical. Scores for illusion strengthFootnote 1 were obtained from the adjusted widths of the comparison stimuli (in pixels) using the following equation:

$$ \frac{\mathrm{Vertical}\ \mathrm{Parallelogram}\ \left(\mathrm{Adjusted}\ \mathrm{Width}\right)\hbox{--} \mathrm{Horizontal}\ \mathrm{Parallelogram}\ \left(\mathrm{Adjusted}\ \mathrm{Width}\right)}{\mathrm{Vertical}\ \mathrm{Parallelogram}\ \left(\mathrm{Adjusted}\ \mathrm{Width}\right)+\mathrm{Horizontal}\ \mathrm{Parallelogram}\ \left(\mathrm{Adjusted}\ \mathrm{Width}\right)} $$

This method of normalising is used in many illusion studies (Chouinard et al., 2019; Chouinard et al., 2013; Chouinard et al., 2016; Chouinard et al., 2017; Chouinard et al., 2018; Schwarzkopf et al. 2011; Sherman & Chouinard 2016) and allows for meaningful comparisons across studies.

In the size-matching control task, which assessed size discrimination, the display consisted of two yellow squares (Fig. 1B). The square on the right was designated as the standard, which remained fixed at 120 pixels in length, while the participant adjusted the overall size of the square on the left, which was designated as the comparison stimulus. The size of the comparison stimulus began at 180 pixels in length. Scores were obtained by calculating the final absolute difference in pixels between the fixed length of the standard and the adjusted length of the comparison stimulus, which in turn provided an index of accuracy with larger scores indicating worse performance.

In the shape-matching control task, which assessed shape discrimination, the display consisted of two yellow four-sided shapes (Fig. 1C). The rectangle on the left was designated as the comparison stimulus while the square on the right was designated as the standard stimulus. The height and width of the standard remained fixed at 120 pixels. The width of the comparison remained fixed at 120 pixels while the height was adjusted by the participants so that it matched the standard. The height of the comparison stimulus began at 60 pixels. Scores were obtained by calculating the final absolute difference in pixels between the fixed height of the standard and the adjusted height of the comparison stimulus, providing an index of accuracy with larger scores indicating worse performance.

We also measured receptive language with the PPVT (Dunn and Dunn 2007). During the test, the participant was presented with a series of pages containing four pictures and was asked to indicate which one they thought best described the item word spoken by the experimenter. The test was administered in accordance with instructions from the test manual (Dunn & Dunn, 2007). Raw scores reported in our study reflect the total number of correct trials plus credit for all trials not administered below the basal start point. Standard scores were also calculated based on normative data from the test manual to characterise verbal intelligence in our sample.

We also measured abstract reasoning with the RPM (Raven et al., 2003). Under this examination, the participant was provided with a booklet of different patterns with a piece missing in each one. For each item, the child was required to select which piece from an array of different options best matched the missing piece. We administered two versions of the RPM – each designed for a different age group. The coloured version was administered in children aged 5–9 years and consisted of 36 trials while the standard version was used in the older individuals and consisted of 60 trials. Raw scores reflected the number of correct trials. For the purposes of data analysis and reporting, all raw scores on the coloured version were converted to the scale of the standard one using the conversion table in the RPM manual (Raven et al., 2003). Standard scores were also calculated based on normative data from the test’s manual to characterise non-verbal intelligence in our sample.

Statistical analyses

We analysed the data in three different ways using GraphPad Prism version 8 (La Jolla, CA, USA), JASP software version 0.8 (University of Amsterdam, Amsterdam, Netherlands), and the Statistical Package for the Social Sciences version 25 (SPSS; IBM Corporation; Armonk, NY, USA). Analysing data using different approaches allowed us to examine slightly different questions and also determine the degree of convergence among them. Greater agreement in findings among different approaches strengthens the validity of interpretations.

The first approach consisted of comparing means between age groups. To this end, we first divided our participants into tertile groups. We chose a tertile split so we could have at least 30 participants in each age group, enabling appropriate sample sizes for between subject comparisons. Allocating the participants in this way also ensured that the groups had similar degrees of variance, which is another important assumption for analysis of variance (ANOVA) (Field, 2017). The Youngest Age Group ranged from 6.0 to 8.7 years (n = 32, span = 2.7 years). The Middle Age Group ranged from 8.8 to 11.5 years (n = 33, span = 3.0 years). The Oldest Age Group ranged from 11.5 to 14.7 years (n = 32, span = 3.2 years). We randomly assigned the Middle Age Group to have an extra participant given that the overall sample size was not divisible by three. We then performed ANOVA with Age as a between-subject factor on the illusion strength scores, the size and shape matching measures, and the raw scores on the PPVT and RPM tests. Raw scores were chosen for the two latter tests so we could chart how these skills develop with age.

Post hoc pairwise comparisons using Tukey’s honestly significant difference (HSD) tests (Tukey 1949), which corrected for multiple comparisons, were performed to test for differences between the different age groups when a main effect of Age was obtained. We also performed one-sample t-tests to determine if illusion strength scores differed from zero, which provides an indication as to when the illusion might first emerge in development. To account for multiple comparisons, we applied a Bonferroni correction to the reported p values (i.e. pcorr = puncorr × number of tests comparing differences against zero).

The second approach consisted of a backward stepwise multiple regression analysis. This analysis began with a full model to explain illusion strength containing all the other measures that correlated with illusion strength as independent variables and gradually eliminating them from the model until a reduced model that best explains illusion strength was found. This type of regression was favoured over others for its exploratory and unbiased nature for determining which variables might account for the most variance. Participants with missing values were excluded from the analysis. Tolerance ranges and variance inflation factors were performed at each step. The former was never less than 0.37 and the latter never exceeded 2.72, which indicates that multicollinearity was never an issue (O’Brien, 2007). The resulting standardised beta coefficients (β) and corrected p values arising from the multiple regression analysis are reported. A correlation matrix between age, illusion strength, size-matching abilities, shape-matching abilities, receptive language, and abstract reasoning is presented as a complement to this analysis.

The third approach calculated linear regression equations that best fit how illusion strength, size-matching abilities, and shape-matching abilities changed as a function of age using the least squares method. The primary purpose of doing this was to determine at what age our sample might reach adult levels based on data gathered from a previous study we published in typically developing adults who performed the same tasks (Chouinard et al., 2016). From this earlier data set, we extracted the means and 95% confidence intervals in adults and determined at what age our children sample from the present investigation might reach these ranges based on the calculated linear regression equations.

All reported p values were corrected for multiple comparisons based on an alpha level of .05 unless specified otherwise.

Results

Table 1 provides descriptive statistics for age, illusion strength, performance on the size and shape matching tasks, PPVT (both raw and standardised scores), and RPM (both raw and standardised scores) for the overall sample and the different age groups. Three of the 97 children included in the analysis did not complete the PPVT due to either availability or time constraints.

Table 1 Descriptive statistics for age, illusion strength, and the other measurements

Analyses of variance

Overall, the ANOVA revealed main effects of Age for every dependent variable (Fig. 2; Table 1) – denoting stronger illusion strength and abilities in size matching, shape matching, receptive language, and abstract reasoning with age.

Fig. 2
figure 2

Analyses of variance. We performed ANOVA to determine if there were changes in illusion strength (A), size matching abilities (B), shape matching abilities (C), receptive language (D), and abstract reasoning (E) between three different age groups. Lower scores on the size and shape matching tasks and higher scores on the other tasks reflect better performance. Daggers (†) denote significant differences against zero after correcting for multiple comparisons using the Bonferroni method (p < .05) while asterisks (*) denote a significant difference between age groups after correcting for multiple comparisons using the Tukey’s HSD method (p < .05). The Youngest Age Group (in orange) ranged from 6.0 to 8.7 years, the Middle Age Group (in green) ranged from 8.8 to 11.5 years, and the Oldest Age Group (in blue) ranged from 11.5 to 14.7 years

For illusion strength, ANOVA revealed a main effect of Age (F(2,94) = 4.65, p = .012, \( {\eta}_p^2 \) = .09) (Fig. 2A; Table 1) driven by the Middle Age Group experiencing a stronger illusion than the Youngest Age Group (p = .012). No other pairwise comparisons were significant (all p ≥ .078). One-sample t-tests revealed that illusion strength was different from zero in the overall sample (p < .001) and in each age group (all p < .001).

For size matching, ANOVA showed a main effect of Age (F(2,94) = 4.52, p = .013, \( {\eta}_p^2 \) = .09) (Fig. 2B; Table 1). This effect was explained by the Oldest Age Group performing better than the Youngest Age Group (p = .016). No other pairwise comparisons were significant (all p ≥ .060).

For shape matching, ANOVA demonstrated a main effect of Age (F(2,94) = 5.82, p = .004, \( {\eta}_p^2 \) = .11) (Fig. 2C; Table 1). Pairwise comparisons found better performance in the Oldest Age Group relative to the Youngest Age Group (p = .003). No other pairwise comparisons were significant (all p ≥ .057).

For receptive language, ANOVA found a main effect of Age on the raw PPVT scores (F(2,91) = 46.82, p < .001, \( {\eta}_p^2 \) = .51) (Fig. 2D; Table 1). Pairwise comparisons revealed that these scores increased with each age group (all p ≤ .004).

For abstract reasoning, ANOVA indicated a main effect of Age on the raw RPM scores (F(2,94) = 22.07, p < .001, \( {\eta}_p^2 \) = .32) (Fig. 2E; Table 1). This was driven by higher scores in the Middle and Oldest Age Groups relative to the Youngest Age Group (both p < .001). The two older age groups did not differ from each other in their scores (p = .964).

Multiple regression analysis

A correlation matrix between age, illusion strength, size matching abilities, shape matching abilities, receptive language, and abstract reasoning is presented in Table 2. Illusion strength correlated with age, receptive language, and abstract reasoning (all p ≤ .028) but not with size and shape matching scores (both p > .999) even though these latter measures correlated with age (both p ≤ .046). Examining various non-linear functions between age and each of the other variables did not provide discernible improvement in fit (Table 3). Thus, linear relationships were assumed to provide appropriate fit, and only age, receptive language, and abstract reasoning were entered as predictors into the multiple regression analysis.

Table 2 Correlation matrix (r) between the different dependent variables
Table 3. Correlation coefficients (r) for linear and non-linear relationships between age and the other dependent variables

Table 4 presents a summary of the outcome of the multiple regression analysis. The first model, which included all three predictors, was significant (F(3,93) = 4.80, p = .004) and explained 13.8% of the variance in illusion strength. None of the predictors alone were significant (β range: 0.07–0.18, all p ≥ .172). The second model, which removed receptive language as a predictor, was also significant (F(2,93) = 7.16, p = .001) and explained 13.6% of the variance in illusion strength. Again, none of the predictors alone were significant (β range: 0.19–0.23, both p ≥ .054). The third model, which removed receptive language and abstract reasoning as predictors, was also significant (F(1,93) = 11.54, p = .001) and explained 11.1% of the variance in illusion strength. The remaining predictor, age, was significant (β = .33, p = .001). Taken together, age appears to be the strongest of the three predictors. Any additional contribution from receptive language and abstract reasoning, beyond those already shared with age, was relatively small (2.5%) and did not improve model fit significantly.

Table 4 Outcome of the backward stepwise multiple regression analysis

Reaching adult levels analyses

The mean scores with confidence intervals for adult performance from our previous study (Chouinard et al., 2016) were 0.215, 95% CI [0.202, 0.227] for the illusion task, 1.250, 95% CI [1.116, 1.384] for the size-matching task, and 1.255, 95% CI [1.047, 1.463] for the shape-matching task. Figure 3 displays the linear regressions, along with their equations, that best fit how illusion strength, size-matching abilities, and shape-matching abilities changed as a function of age. Combining this information, we determined that our sample would reach within the confidence range for adult performance by 11.5 years on the illusion task, 18.7 years on the size-matching task, and 14.2 years on the shape-matching task.

Fig. 3
figure 3

Correlations with age with when adult levels might be reached. Linear regression equations were calculated that best fit how illusion strength (A), size matching abilities (B), and shape matching abilities (C) changed as a function of age. Based on adult levels of performance obtained in a different study (Chouinard et al. 2016), it was determined that the children would reach within the adult 95% confidence range by 11.5 years for the illusion task, 18.7 years for the size matching task, and 14.2 years for the shaping matching task. The linear regression equations are shown on the graphs. The p values on the graphs are corrected for the number of bivariate correlations performed in this study, which are listed in Table 2, using the Bonferroni method

Discussion

Our study’s aims were twofold. First, we sought to determine whether the strength of the Shepard illusion decreases or increases with age in typically developing children. Based on our findings, we conclude that the illusion falls in the acquired class of illusions described in the Introduction. Specifically, the strength of the illusion increased with age, reaching adult level at 11.5 years.

Second, we sought to determine what other abilities might develop with age and could explain the developmental time course of the Shepard illusion. To this end, we also tested for abilities in size and shape matching, verbal abilities with a focus on receptive vocabulary, and other non-verbal abilities with a focus on abstract reasoning. Abilities in size and shape matching increased with age but did not correlate with the strength of the Shepard illusion. Receptive language and abstract reasoning correlated with age and also correlated with the strength of the Shepard illusion. However, a multiple regression analysis revealed that these variables did not contribute beyond their shared variance with age.

Based on these findings, we propose that the illusion requires the maturation of high-level processes before it is experienced to adult levels at 11.5 years. In the ensuing discussion, we first discuss what is currently known about the mechanisms of the Shepard illusion. Then, we deliberate on how this illusion develops in typically developing children. We end our paper by discussing limitations and provide recommendations for future research.

Mechanisms underlying the Shepard illusion

As indicated at the start of our article, a recent Pubmed search yielded only four papers on the Shepard illusion. Consequently, its mechanisms are not really known. However, we do know a few things, which provides some insight as to how the illusion might work. First, we know the illusion has a strong top-down component to it given that it is strongly influenced by the addition of contextual cues. For example, earlier work has demonstrated that the illusion is enhanced when table legs are added below the parallelograms to make the stimuli look like tables (Fig. 4B vs. Fig. 4A) (Mitchell et al., 2010). The longer and shorter legs as projected on the retina inform the brain as to what parts of the table are in the foreground and background. This in turn causes an enhancement in perceptual rescaling that can only be explained by an acquired conceptual understanding of table legs. Similarly, texture and shading gradients specifying depth also enhance perceptual rescaling in the Shepard illusion (Fig. 4C vs. Fig. 4A,B) (Tyler, 2011).

Fig. 4
figure 4

The Shepard illusion is strengthened by the addition of contextual cues. Earlier studies demonstrate that adding contextual elements to the Shepard illusion results in an even stronger perceptual rescaling of the parallelograms (Mitchell et al. 2010; Tyler 2011). This is illustrated in the figure. Panel A shows the typical Shepard illusion with two identical parallelograms presented in two different orientations. Panel B shows a strengthening of the Shepard illusion with the addition of table legs and panel C shows a further strengthening of the illusion with the addition of textures. The cumulative effects of these cues demonstrate how the illusion is influenced by top-down modulation

Second, we know that the strength of the illusion increases as participants actively attend and scan the display – namely the illusion strengthens with the number of saccades made between different elements in the display (Chouinard et al., 2018). This contrasts with other illusions, such as the vertical-horizontal illusion, that increase in strength as participants make less saccades and the image of the display is more stable on the retina (Chouinard et al., 2017). These differential effects of scanning patterns suggest different mechanisms. As we argued in the past, the latter may depend more on low-level visual processing, whereby greater retinal stability facilitates the processing of low-level perceptual effects (Chouinard et al., 2018). In contrast, the Shepard illusion may depend more on higher-level visual processing in which the registration of multiple contextual elements by actively scanning different locations of the scene is important in driving the illusion.

Third, we know the Shepard illusion is not as strong in adults (Mitchell et al., 2010) and children (Chouinard et al., 2018) with an autism spectrum disorder (ASD), and in typically developing adults with high levels of autistic traits (Chouinard et al., 2016). It is doubtful that this reduction in illusion strength is driven by a widely held belief that people with an ASD see the world more veridically. A large-scale systematic meta-analysis reveals that there have been more reports of illusory susceptibility being equal to or greater in persons with an ASD relative to control participants than reports of reduced illusory susceptibility in ASD (Van der Hallen et al., 2015), providing more evidence to counter than support a perceptual style of seeing the world more objectively. The Shepard illusion appears to be one of only a few illusions that is consistently reduced in ASD across multiple studies carried out independently by different investigators (Chouinard et al., 2018; Mitchell et al., 2010).

Fourth, differences in illusion strength in ASD are more likely related to group differences in higher cognitive functioning. Our earlier work reveals decreases in illusion strength as a function of autistic traits representing atypical styles of meta-cognition (Chouinard et al., 2016), including imagination (namely, the illusion decreases in strength as a person has a reduced imagination) and social communication (namely, the illusion decreases in strength as a person has more difficulties in social reciprocity) (Chouinard et al., 2016).

The development of the Shepard illusion

Taken together, the evidence so far indicates that the illusion is primarily driven by high-level cognitive processes rather than low-level sensory ones. If this is the case then one might predict that age-related changes in illusion strength might coincide with the development of cognitive faculties. Indeed, our correlation-based analyses reveal that age-related changes in illusion strength correlated with cognitive development as measured by the PPVT and RPM but not with perceptual development as measured by the size- and shape-matching tasks (Tables 2, 3 and 4). This is not to say that perceptual skills are not necessary for the Shepard illusion. On the contrary, the ability to perceive size and shape is certainly required and likely emerges before the necessary cognitive faculties are sufficiently developed to see the illusion at adult levels. The key point is that changes in the illusion coincide more strongly with cognitive than perceptual abilities.

According to Binet (1895) and Piaget (1969, 1999), illusions may either decrease or increase in strength as cognitive faculties develop, which form two classes of illusions – those that are innate, which decrease with age, and those that are acquired, which increase with age. Our results reveal that the Shepard illusion is an acquired illusion that increases in strength with age and also provides support for some of Piaget’s ideas on acquired illusions. Namely, if illusions depend on the interpretation of contextual cues then one can begin to see illusions as reasoning and other cognitive skills become sufficiently developed. In a different study, we demonstrated how the size-weight illusionFootnote 2 is also an acquired illusion that increases with age (Chouinard et al., 2019), which is consistent with several other studies that investigated the development of the size-weight illusion in children (Dresslar, 1894; Flournoy, 1894; Gilbert, 1894; Philippe & Clavière, 1895; Rey, 1930). Our study on the size-weight illusion revealed that cognitive development was also an important predictor for the illusion.

In contrast, the Müller-Lyer (Binet, 1895; Brosvic et al., 2002; Frederickson & Geurin, 1973; Hanley & Zerbolio, 1965; Pollack, 1970; Porac & Coren, 1981) and Poggendorff (Girgus & Coren, 1987; Leibowitz & Gwozdecki, 1967; Mallenby, 1974; Pressey & Sweeney, 1970; Vurpillot, 1957) illusions have repeatedly been shown to be innate illusions that decrease in strength with age. To explain these developmental changes in the opposite direction, Binet (1895) originally proposed that it could be maladaptive to continue to falsely perceive something extremely different than what it truly is and that children learn to suppress these innate illusions to a level that is more adaptative – but only as cognitive faculties and an understanding of the world develops. However, it is still not clear what precise cognitive faculties, or other abilities, need to be developed before children can see the Müller-Lyer and Poggendorf illusions as adults do.

Methodological considerations and future directions

The study demonstrates that the Shepard illusion has an important acquired component to it given that its strength increases with age. However, one should also consider that even the youngest age group included in our sample perceived the Shepard illusion as determined by one-sample t-tests (Fig 2A). Thus, we cannot preclude the possibility that it is also innate. To resolve this issue, future studies will need to examine younger children to determine when the illusion emerges in earlier development. However, an even simpler paradigm than the one used in the present investigation will need to be designed. Younger children were not tested because we were concerned that they may not understand and comply with instructions or may lack the required sustained attention to complete the task.

Another consideration is the possibility that what is changing with age is really perceptual decision-making processes rather than the perceived qualia of the illusion (Firestone & Scholl, 2016). Unfortunately, this concern looms over most psychophysical research and is difficult to discard entirely. More sophisticated and complicated methods have been developed to diminish the effects of response and cognitive biases (Finlayson et al., 2018; Finlayson et al., 2017; Jogan & Stocker, 2014; Morgan et al., 2013; Patten & Clifford, 2015) that are not feasible to perform in children. The Methods of Adjustment approach was chosen because it was an appropriate method to administer in young children. However, it offers little protection against response and cognitive biases. The question then arises as to whether these biases were present and could have influenced the results.

Important safeguards were in place to diminish response and cognitive biases. First, all participants received the same carefully worded instructions, as indicated in the methods. There is nothing in these instructions that we think might lead to response or cognitive biases. Second, all participants were discouraged from using strategies other than those instructed to them. Third, all participants were naïve to the illusion and the purpose of our study. The children believed they were playing games aimed to understand vision better. In contrast, performing similar procedures in adults would be more problematic. Adults may apply prior knowledge about illusions, particularly if recruited conveniently from an undergraduate psychology course. Fourth, the order of combinations for which of the two stimuli would be the comparison and standard, as well as the starting size of the comparison stimulus, was randomised in each participant. Thus, although the Methods of Adjustment approach offers little protection against response and cognitive biases, we believe the above safeguards diminished the likelihood that these biases were present to confound the results.

Another consideration is the lack of significant changes in illusion strength between the youngest and oldest age group – even though the middle age group demonstrated stronger illusion strength than the youngest one (Fig. 2A). Nevertheless, the difference was trending in the appropriate direction (i.e. p = .078). We attribute the lack of significance to increased variability in the oldest age group (Table 1). Although it would have been preferable to have the oldest age group differ from the youngest one in terms of illusion strength, the correlation analyses made up for any short-comings of statistical power in the ANOVA. In addition, four trials appeared to provide sufficient sampling power to obtain meaningful results. The study would not have yielded significant effects had this not been the case.

Another consideration is that the study did not allow us to determine what specific cognitive skills need to be developed before children can perceive the Shepard illusion as adults do. Future experiments could test abilities related to the manipulation of size and shape of objects. A sensible test might involve mental rotation, which has already been well studied in children across different age groups (Kail et al., 1980; Kosslyn et al., 1990; Lutke & Lange-Kuttner, 2015) and happens to be an area of research Roger Shepard, who came up with this illusion, is renowned for (Shepard & Metzler, 1971). The ability to use imagination to rotate stimuli in the mind’s eye is thought to be required for the task (Kosslyn 1996) and could correlate with illusion strength (Chouinard et al., 2016). We would predict that children may need to master this ability before they can perceive the Shepard illusion at adult levels. Kosslyn et al. (1990) demonstrated that children reach adult levels of mental rotation somewhere between the ages of 8 and 14 years, which corresponds roughly to when we estimated children to reach adult levels in the Shepard illusion.

In addition, the standard scores on the PPVT and RPM indicate that our sample was higher functioning than what is representative of the general population, particularly in the youngest age group (Table 1). This may relate to our recruitment procedures that used an opt-in approach to obtaining parental consent. Parents of higher functioning children were perhaps more likely to agree to participate, which included taking children out of class time, than those with lower functioning children, particularly younger ones. The higher scores in the youngest age group are not too concerning. If this imbalance did affect the results then it would have reduced rather than inflated group differences and the strength of correlations. Thus, there is the possibility that the effects we report in this study are not as strong as they would otherwise have been had we obtained a sample more representative of the general population. Future work can look into this possibility.

The final consideration is the late estimated ages that our sample are expected to reach adult levels of performance in the size and shape matching tasks, which were 18.7 and 14.3 years, respectively (Fig. 3). These results do not imply that the children had difficulties matching size and shape on our computerised tasks. On the contrary, performance was quite good if one considers that the average deviation from achieving perfect accuracy was about 4 pixels in the size matching task and about 2 pixels in the shaping matching task – which confirm the validity of our procedures. Perhaps the reason for the estimated delay in maturation had more to do with that fact that the adults recruited in our previous study (Chouinard et al., 2016) consisted mainly of university students capable of achieving higher levels of precision than those expected from a more representative community sample. Had our procedures for measuring perceived sizes and shapes lacked the required sensitivity for measuring perception in children then the estimated age for reaching adult levels on the illusion task would have been equally delayed. This was not the case. It was estimated that our sample would reach adult levels on the illusion task much earlier (11.5 years) than the size- (18.7 years) and shape- (14.3 years) matching tasks (Fig. 3).