Keywords

1 Introduction

Working memory serves to temporarily maintain and manipulation information for further use [1]. Recently, studies revealed that attention can be guided or biased by the contents in working memory [2,3,4,5], a phenomenon termed as memory-driven attentional capture. In these studies, participants were required to maintain an item while they performed a visual search task. Search reaction time was delayed by a working memory related distractor relative to a working memory unrelated distractor, or was accelerated by a working memory related target relative to a working memory unrelated target. The biased competition model of visual attention was used to explain why content in working memory captures attention [6]. It proposed that items maintained in working memory can pre-activate the representations of these items. When the visual search display appears, visual attention gives preference to those items that contain the same or similar representations in working memory, thus they can survive from the competition. Based on this, a great deal of psychological and neural studies has revealed that working memory and attention (or perception) activate a striking degree of overlap in common neural resources, indicating they may share the same representation [7,8,9,10].

If working memory can be regarded as a process of maintaining recently encountered stimuli, mental imagery can be seen as a process of mentally generating a stimulus from short- or long-term memory. Mental imagery is the representation in a person’s mind of the physical world but the world is not actually being perceived [11]. Visual imagery is accompanied by the experience of “seeing with the mind’s eye”. Likewise, many studies supported that mental imagery also activates the representations in the visual system. In a pilot study from Farah [12], participants were instructed to imagine either an H or a T. Then in a temporal two-interval forced-choice detection task, two successive observation intervals faintly presented one of these letters. Participants should report which interval contained the imagined letter. The letters were better detected when they matched the imagined letter, suggests a shared visual representation between mental imagery and attention (or perception). In a follow-up study, Farah instructed participants to mentally image an H or a T into a grid of empty squares [13]. After that, participants were asked to detect a probe dot that fell on or off the image. Results found that, relative to the dots falling off the image, dots falling on the image were better detected. Pashler and Shiu also showed the connection between mental imagery and visual representation in an attentional blink task [14]. Participants were asked to image a specified object (e.g., tiger), and then they should search for a target digit from a series of rapid sequential presented pictures. Digit detection was impaired when the imaged object was presented before the digit.

Mental imagery and visual attention also activate a great deal of overlapped brain regions [15, 16]. These common cortical regions involved in occipitoparietal and occipitotemporal visual association areas [17,18,19], the primary visual cortex [20,21,22] and the lateral geniculate nucleus [23]. Besides, studies found that, kinesthetic imagery can activate the primary motor cortex [24], tactile imagery can activate the somatic cortex [25]. In a meta-analytic study, McNorgan compared the activation between mental imagery and perception, and revealed that in most studies, mental imagery from different sensory channels can activate the corresponding primary sensory cortex [26]. But there were still some studies showed that the primary visual cortex was not activated in the imagery condition [27, 28], especially when imaging a moving object or scene [29, 30].

Based on the studies above, it is still unclear that whether mental imagery and attention share the same visual representation. In the present study, we examined whether a mentally imagined item can capture attention during visual search. We compared a working memory condition that participants maintained a shape then search for a tilted line target with a mental imagery condition that they generated a shape according to the instructions then searching. In the mental imagery condition, a big shape and a small shape were presented at the beginning of the trial. During a 2,000 ms blank, participants were required to mentally subtract the smaller shape from the bigger one, then maintained the new shape. After a 100 ms mask, a search display of four shapes was presented. One of the shapes was the same shape as the participants generated. There was a line in each shape. One was a tilted line, and others were vertical lines. Participants were asked to discriminate the orientation of a tilted line in one of the shapes; therefore, the shape could be valid or invalid cues for the discrimination task. At the end of the trial, a probe shape was presented to test their working memory. In the working memory condition, all the stimuli were identical as in the mental imagery condition, except that they just maintained the shape that was presented at the beginning of the trial. If the visual search is delayed by the mentally imagined item when it is an invalid cue for the search target, then we can infer that mental imagery also utilizes the similar visual representation as visual attention.

2 Method

2.1 Participants

Twenty undergraduate and graduate students were paid for their participation. Two were exclude due to a low memory accuracy (55% and 66% respectively). Therefore, a total of 18 participants (15 females; 17–24 years) were included in the final analyses. All participants were right-handed, had normal or corrected-to-normal visual and none reported color blindness.

2.2 Materials

The to-be-memorized or to-be-generated items were transformed from 5 original shapes: a hexagon (radius 1° visual angle), a parallelogram (length 2.5°, height 2° visual angle), a pentagon (radius 1° visual angle), a rhombus (length 2.2°, height 2.2° visual angle), and a square (length 2°, height 2° visual angle). All shapes appeared in gray (RGB: 85, 85, 85) and were presented on a black background. In the mental imagery condition, a shape and a part of the shape (a small triangle) were presented. Participants were required to mentally generate a shape via subtracting the smaller one from the bigger one. In the working memory condition, the generated shape was directly presented. The example stimuli were shown in Fig. 1. In the formal test, each original shape had 4 variations.

Fig. 1.
figure 1

Example stimuli for this study. Left and right shapes were presented at the beginning of the working memory condition and the mental imagery condition respectively.

In the search display, most of the shapes contained a black vertical line (length 0.8° visual angle, width 2 lb). One of the shapes contained a black tilted line which was rotated left or right 15° around the center of the vertical line.

2.3 Apparatus

The experiment was programmed using E-prime 2.0, and was run on a 17-in. LCD at a viewing distance of approximately 60 cm without a chin rest. The monitor was set to a 1024 × 768 resolution with a 85 Hz refresh rate and 32-bit colors.

2.4 Procedure

The procedure was illustrated in Fig. 2. The background of the experiment was black. At the beginning of each trial, in the working memory condition, a shape was presented in the screen’s center for 2,000 ms. Participants were asked to remember this shape until the end of the trial. After the shape disappeared, a 2,000 fixation was inserted during which participant could consolidate the shape they maintained. In the mental imagery condition, an original shape and a part of this shape were presented. During the fixation presenting phase, participants were required to mentally subtract the smaller shape from the bigger one, then maintained the new shape they generated. Next, a 100 ms mask (length 14.82°, height 11.11° visual angle) was presented to prevent participants from refreshing the item.

Fig. 2.
figure 2

Experimental procedure and example stimuli for this study.

The search display appeared after the mask disappeared. This display consisted of 4 shapes. They were placed on the vertices of an imaginary square (length 12°, height 12° visual angle), which was centered on the fixation. One of the shapes was the same shape as they maintained. Each shape contained a line in the shape center. Three of them were vertical lines, only one was a left-tilted or right tilted line. Participants were asked to find the tilted line and respond with the left arrow key or the right arrow key when the line was left-tilted or right-tilted. Therefore, the shape could be valid or invalid cues for the search task.

After another 500 ms blank screen, participants were instructed to match a probe shape that was presented in the screen’s center. If the probe shape matched the shape they maintained, participants responded with the left arrow key; otherwise, press the right arrow key. The probe shape matched on 1/2 of the trials. In each trial, the probe shape and the maintained shape were always transformed from the same original shape.

2.5 Design

This was a 2 (Experiment type: working memory condition vs. mental imagery condition) × 2 (Cue type: valid vs. invalid) design experiment. There was a total of 160 trials with 40 trials per treatment combination. All the trials were randomly intermixed across the whole experiment. Each participant first received 16 practice trials, then they completed 4 blocks with 40 trials each, with a 1-min break between blocks. The experiment will last less than 30 min.

3 Results

The accuracy of the visual search task and the memory test were 99.55% and 92.95% respectively. A repeated measures ANOVA with experiment type and cue type as within factors revealed that, memory accuracy was higher in the working memory condition (94.14%) than in the mental imagery condition (91.72%), as confirmed by a main effect of experiment type, F(1, 17) = 7.28, p = .015, \( \eta_{p}^{2} = 0.300 \). There were no main effect of cue type, F(1, 17) = 0.001, p = .978, and interaction between experiment type and cue type, F(1, 17) = 0.019, p = .891.

The result of search reaction time was illustrated in Fig. 3. A repeated measures ANOVA with experiment type and cue type as within factors revealed that, the main effect of experiment type was not significant, F(1, 17) = 0.04, p = .842. There was no reaction time difference between the mental imagery condition and the working memory condition. But the main effect of cue type was significant, F(1, 17) = 7.85, p = .012, \( \eta_{p}^{2} = 0.316 \), indicating the valid cue condition had faster reaction time than the invalid cue condition. Importantly, there was a significant interaction between experiment type and cue type, F(1, 17) = 5.28, p = .035, \( \eta_{p}^{2} = 0.237 \). Post hoc analysis using the LSD test revealed that the valid cue condition had faster reaction time than the invalid cue condition in the mental imagery experiment type (p = .002), but in the working memory experiment type, the valid and invalid cue condition had the similar reaction time (p = .621).

Fig. 3.
figure 3

Search reaction time as a function of experiment type and cue type. Error bars represent 95% within-participants confidence intervals with Masson and Loftus’s method [31].

4 Discussion and Conclusion

This result revealed that a mentally imagined shape can capture attention during visual search, indicating that mental imagery and attention may also share the same visual representation. But surprisingly, there was no memory-driven attentional capture was found.

This result was consistent with the notion proposed by other researchers. Thomas assumed that there is only quantity difference, but no quality difference, between imagery and perception or attention [32]. Pearson, Naselaris, Holmes and Kosslyn proposed that, imagery is a weakening perception [33]. To some extent, imagery can replace perception and plays the similar role as perception. A series studies provided the evidences to support these viewpoints. Using the mental scanning paradigm, Borst and Kosslyn found that participants scanned the imagery and perception content at the same rate, indicating visual imagery and perception had the similar representation at the early stage of information processing [34]. Using the binocular rivalry paradigm, other studies found that, the beforehand formed visual imagery could interfere participants’ perceptual pattern at the later stage [35, 36]. Also during the stage of generating imagery, binocular rivalry can be disturbed by an inserted visual distractor [37].

With respect to the relationship between mental imagery and attention, the perceptual anticipation theory proposed that mechanisms that used to generate mental imagery involve processes that used to anticipate perceiving stimuli [38, 39]. Imagery can lead to retinotopic activity in early visual areas, which is similar with the perceptual process. In the present study, once the mental imagery was generated, it provided a cue for the perceptual task. When performing the visual search task, the imagery could be regarded as a top-down priming cue for the search task, hence increased the search efficacy when it was a valid cue.

However, there was no memory-driven attentional capture found in this study, which was contradicted with a series of relative studies [2,3,4,5]. Several factors played crucial role on determining whether the memory-driven attentional capture occurs or not. Studies found that there will be observed less or no memory-driven attentional capture when the memory item is easy to verbalize [4, 40, 41], the time course between the onset of the memory item and the search display is long (longer than 3,500 ms) [40], and the visual search task is perceptually easy [2]. In is concluded that, maintaining a sufficiently strong visual representation then sustaining it during retention is necessary for the memory-driven attentional capture. In the present study, to keep consistence between the working memory condition and the mental imagery condition, and to make sure participants generate a stable imagery, the time course between the onset of the memory item and the visual display was 4,000 ms. Such a long time course was unfavorable for participants to maintain a strong visual representation in the working memory condition, so it was possible that memory-driven attentional capture was not observed.

It should be noted that, more and more researches started to think that mental imagery, perception (or attention) and memory are integrated, interrelated and intertwined, and may be affiliated to the same cognitive system [42]. When the formation of mental imagery relies on the perceptual information, mental imagery could guide attention to the representation of that object as a priming cue. When the formation of mental imagery relies on the information from memory, it can be regarded as a memory cue to lead people recall imagery-related information, or disturb memory at the memory retrieval phase [44]. Thus Tong proposed that mental imagery was a dynamic element of visual working memory [44]. Therefore, mental imagery connects perception and memory. It contains two different processes, one emphasizes the influence of outside input information on imagery, which could be regarded as a bottom-up processing. The other emphasizes the retrieval of past experience, which is a top-down processing.

In conclusion, this study provided the behavior evidence that mental imagery captures attention during visual search. The result may have implications on imagery training in different ways. For example, according to the imagery training on meter display and fault display, novice drivers could better adapt to and identify display problems in their further driving. Imagery training on vigilance could help the air traffic controller monitor the to and from aircrafts more efficiently. Also imagery training on different forms of dangerous goods helps the securities detect the dangerous targets faster and more accuracy.