Memory and learning for visual signals in time and space

Maharjan, Sujala; Gold, Jason M.; Sekuler, Robert

doi:10.3758/s13414-017-1277-x

Memory and learning for visual signals in time and space

Published: 09 February 2017

Volume 79, pages 1107–1122, (2017)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Memory and learning for visual signals in time and space

Download PDF

Sujala Maharjan¹,
Jason M. Gold² &
Robert Sekuler¹

1129 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Vision is often characterized as a spatial sense, but what does that characterization imply about the relative ease of processing visual information distributed over time rather than over space? Three experiments addressed this question, using stimuli comprising random luminances. For some stimuli, individual items were presented sequentially, at 8 Hz; for other stimuli, individual items were presented simultaneously, as horizontal spatial arrays. For temporal sequences, subjects judged whether each of the last four luminances matched the corresponding luminance in the first four; for spatial arrays, they judged whether each of the right-hand four luminances matched the corresponding left-hand luminance. Overall, performance was far better with spatial presentations, even when the entire spatial array was presented for just tens of milliseconds. Experiment 2 demonstrated that there was no gain in performance from combining spatial and temporal information within a single stimulus. In a final experiment, particular spatial arrays or temporal sequences were made to recur intermittently, interspersed among, non-recurring stimuli. Performance improved steadily as particular stimulus exemplars recurred, with spatial and temporal stimuli being learned at equivalent rates. Logistic regression identified several shortcut strategies that subjects may have exploited while performing our task.

Twenty years of load theory—Where are we now, and where should we go next?

Article 04 January 2016

Gillian Murphy, John A. Groeger & Ciara M. Greene

No one knows what attention is

Article Open access 05 September 2019

Bernhard Hommel, Craig S. Chapman, … Timothy N. Welsh

Guided Search 6.0: An updated model of visual search

Article 05 February 2021

Jeremy M. Wolfe

Introduction

Many cognitive functions build on an ability to register that a previously encountered stimulus has recurred. By manipulating the statistical characteristics within stimuli, researchers have uncovered some key principles of sensory processing and short-term memory. In those studies, detection of recurrence has been explored with auditory, temporal sequences (Julesz 1962; Julesz & Guttman 1963; Guttman & Julesz 1963; Pollack 1971, 1972, 1990), as well as with visual stimuli presented as spatial arrays (Pollack 1973). More recently, Agus et al. (2010) presented listeners with 1 s long sequences of noise sampled at 44 kHz, and asked them to judge whether the samples in the stimulus’ last half replicated the samples in the first half. Overall, listeners performed this challenging task quite well, although success rates did vary among listeners. Subsequently, Gold et al. (2013) adapted Agus et al.’s task to the visual domain. Their subjects saw sequences of eight quasi-random luminances presented at 8 Hz to one region of a computer display. Subjects were instructed to judge whether the final four luminances matched the first four, that is, whether there was a pairwise match between luminances n ₁ and n ₅, n ₂ and n ₆, n ₃ and n ₇, and n ₄ and n ₈. Performance roughly paralleled what Agus et al. had found with auditory noise. Parallels included not only large individual differences but also evidence of learning with particular sequences that had been preserved (“frozen”) and then presented intermittently at random times throughout the experiment. The improved performance with frozen sequences implicated the formation of some trans-trial memory, whose development was incidental to the subjects’ explicit task, which was just to detect within-sequence recurrence (McGeogh & Irion, 1952).

Results from a novel change detection task, (Noyce et al. 2016) delineated vision’s and audition’s distinct specializations, with vision excelling in spatial resolution, and audition excelling in temporal resolution. Additionally, another recent study showed that task demands dynamically recruit different modality-related frontal lobe regions: a visual task with rapid stimulus presentation activates cortical regions normally implicated in auditory attention, while an auditory task that demands spatial judgements activates regions normally implicated in visual attention (Michalka et al. 2015). These results encouraged us to contrast processing and learning for visual stimuli presented as a temporal sequence, as a spatial display, or, in one experiment, a combination of the two. Although many different tasks and stimuli could have served our purpose, we decided to adapt Gold et al.’s stimuli and task for this purpose. This choice allowed us to build on what they had found, and also to address a question that their paper left unanswered. In the interest of equitable comparisons between spatial and temporal modes of presentation, our subjects made the same kind of judgment with all modes of stimulus presentation. Moreover, stimuli for all modes of presentation were constructed by drawing samples from the same pool of items (random luminances). Even when tasks and stimuli are similar, we expect performance with spatial presentations of visual stimuli to be substantially better than performance with temporal presentations of the same items.

Experiment 1

Using visual stimuli and a task like those in Gold et al., we evaluated the ease with which repetition of luminance subsets could be detected when delivered all at once (as a spatial array) or over time (as a temporal sequence). Of particular interest was the way that performance with spatial stimuli would vary with duration. We reasoned that brief presentations would undermine subjects’ ability to carry out the item-by-item comparisons implied by the task instructions, perhaps forcing subjects to fall back to some form of summary statistical representation (Ariely, 2001; Alvarez & Oliva, 2008; Haberman et al., 2009; Albrecht & Scholl, 2010; Piazza et al., 2013; Dubé & Sekuler, 2015). Additionally, we wanted to re-examine Gold et al.’s report that subjects could not detect mirror-image replication of items within temporal sequences of random luminances. Vision’s well-documented sensitivity to mirror symmetry made Gold et al.’s result surprising. Unlike the many different kinds of spatial displays in which mirror symmetry is readily detected, mirror symmetry in temporal sequences seemed to be virtually undetectable. However, the cause of that finding is uncertain. It could have resulted from the sequential presentation of stimulus components, from the use of random luminances as stimulus components, or from some combination of the two. So, this experiment included a condition designed to clarify the point. We hypothesized that differences between responses to mirror-image symmetry embedded in spatial displays and responses to mirror symmetry in temporal sequences reflects a difference between how spatial information and temporal information are processed, rather than some singularity of random luminances.

Methods

Subjects

Fourteen subjects, seven female, who ranged from 18 to 22 years of age, took part. In this and our other experiments, all subjects had normal or corrected-to-normal vision (measured with Snellen targets) and each was compensated ten dollars (U.S.). Subjects gave written consent to a protocol approved by Brandeis University’s Committee for the Protection of Human Subjects.

Apparatus

Stimuli were generated in Matlab (version 7.10) using the Psychophysics Toolbox extensions (Brainard 1997). An Apple iMac computer presented the stimuli on a cathode ray tube display (Dell M770) set at 1024 ×768 pixels screen resolution, and 75-Hz frame rate. A gray background on the display was fixed at 22 cd/m². A chin rest enforced a viewing distance of 57 cm. The room was darkened during the experiment. Unless otherwise specified, the conditions just described were maintained across all experiments.

Stimuli

Stimulus luminances were generated using an algorithm from Gold et al. For each temporal stimulus, eight luminances were presented in succession to the same small region of the display, with no break between items. Presentation of a sequence took 1 s. For spatial stimuli, luminances were presented simultaneously, cheek to jowl as a horizontal array. For both types of stimuli, subjects performed the same task, namely, judging whether a subset of contiguous luminances was or was not replicated within the stimulus. For temporal stimuli, subjects were told to judge whether the last four luminances matched the first four; for spatial arrays, subjects were told to judge whether the rightmost four luminances matched the leftmost four.

Luminances were sampled from a Gaussian distribution whose mean, 22 cd/m², was equal to the display’s steady uniform background. The distribution’s standard deviation (8.66 cd/m²) was supplemented by upper and lower cutoffs that forced possible luminances to fall within the range from 2.33 to 41.67 cd/m² (for additional details, see Gold et al.). Note that the small variance among luminances made the items in a stimulus relatively homogeneous. This reduced the likelihood that any one luminance would stand out.

The experimental design entailed five conditions of stimulus presentation: one temporal (Temporal), and four different spatial (Spatial). Each Temporal stimulus comprised eight luminances, presented one after another at 8 Hz to the same square region at the center of the display. Each Spatial stimulus comprised a horizontal array of luminances presented simultaneously around the display’s center (see Fig. 1). Although the luminances and timing of Temporal stimuli were identical to those in Gold et al., an item in any stimulus sequence here was 1.25^∘ square, ∼4× smaller than in that study. This reduced size ensured that no item in a spatial display would lie no more than 5^∘ to the left or right of fixation.

Each Spatial stimulus was displayed for either 66, 133, or 253 ms. Hereafter, these stimuli are referred to as S _{S
h
r
t}, S _{M
e
d}, and S _{L
o
n
g} stimuli, respectively. Note that variation in display timing made designations accurate to just ±1 ms. To the three conditions with Spatial stimuli of varying duration, we added a condition in which left and right halves of spatial arrays were mirror reflections of one another. This type of stimulus, which we call Spatial Mirrored (S _{M
i
r
r}), was presented for 66 ms, the same duration that was used for S _{S
h
r
t}. In order to prevent two items of identical luminance from lying adjacent to one another at the center of the array, which would have been a highly distinctive diagnostic feature, S _{M
i
r
r} stimuli comprised just seven square regions instead of eight. Each Spatial array subtended 10^∘ horizontally, while S _{M
i
r
r} arrays were slightly smaller, 8.75^∘. For all Spatial displays, components were aligned horizontally with no gaps in between.

Design

Each subject was tested in all conditions in an order that was counterbalanced across subjects. Subjects completed ten blocks of 110 trials, two blocks for each condition, for a total of 1100 trials per subject. Each block contained equal number of Non-Repeat and Repeat stimuli. The first ten trials in each block were treated as practice, and have been excluded from data analysis. The order of trials within each block was randomized anew for each subject.

Procedure

After subjects gave written informed consent, they were given a series of diagrams and verbal explanations meant to familiarize them with their task and the types of stimuli they would see. During the experiment, every stimulus was centered on the video display. After each stimulus, a message on the display prompted subjects to press one of two keyboard keys in order to signal whether they thought the stimulus had been Repeat (corresponding luminances matched) or Non-Repeat (corresponding luminances not matched). Immediately after a correct response, a distinctive tone provided feedback.

Results and discussion

We began by evaluating overall performance, expressed as d’ for the Temporal condition and for the four Spatial conditions, S _{S
h
r
t}, S _{M
e
d}, S _{L
o
n
g}, and S _{M
i
r
r}. For each block of trials and subject, d’ was calculated by subtracting z.pr(false positives) for Non-Repeat trials from z.pr(hits) on Repeat trials. Hits were defined as responses of “Repeated” to Repeat stimuli; false positives were defined as responses of ”Repeated” to Non-Repeat stimuli. Figure 2 shows the mean performance in each condition. An analysis of variance (ANOVA) revealed a difference among the five conditions (F(4,52) = 36.017, p<.001, η ² = 0.73). Follow-up tests used Bonferroni-adjusted alpha levels of .016 per F-test (.05/3) and .025 per t-test (.05/2).

Drilling down more deeply into the effect of presentation mode, we compared performance in the three Spatial conditions, S _{S
h
r
t}, S _{M
e
d}, and S _{L
o
n
g} against performance in the Temporal condition. Detection of repeated items within any of the three types of Spatial stimuli was significantly greater that with Temporal stimuli. This was shown by a repeated-measures ANOVA in which the mean of the three Spatial conditions was contrasted against performance in the Temporal condition (F(1,13) = 54.337, p<.001, η ² = 0.81). A follow up t-test showed that even with S _{S
h
r
t}, the briefest Spatial stimulus, performance was better than with Temporal stimuli (t(13) = 4.76, p<.001, d = 1.31). Remarkably, the spatial mode of stimulus presentation produced superior performance even when a spatial stimulus was presented for only 1/60^th the duration required to present a Temporal sequence. Later, in the General Discussion, we explore possible explanations of this result.

Next, we isolated the effect of duration for Spatial stimuli. An analysis of variance limited to the three Spatial conditions confirmed what can be seen in Fig. 2, namely that performance differed significantly among Spatial conditions of varying duration (F(2,26) = 6.921, p<.01, η ² = 0.35). Follow-on t-test showed that the difference between the briefest and the longest durations, that is, S _{S
h
r
t} and S _{L
o
n
g}, was significant (t(13) = 4.06, p < 0.01), but the remaining two comparisons were not.

Figure 2 shows that performance was best with S _{M
i
r
r} stimuli. This confirms and extends previous demonstrations that with spatial displays, mirrored (reflectional) symmetry is detected more readily than are other forms of repetition (Baylis and Driver, 1994; Bruce & Morgan, 1975; Corballis & Roldan, 1974; Barlow & Reeves, 1979; Machilsen et al., 2009; Palmer & Hemenway, 1978; Wagemans, 1997). Our linear arrays, which comprised just a few, relatively large, individual items, differ from displays with which mirror symmetry has been explored previously. As Levi and Saarinen (2004) noted in their study of mirror symmetry detection by amblyopes, “rapid and effortless symmetry perception can be based on the comparison of a small number of low-pass filtered clusters of elements.” Once extracted, these distinct clusters would be operated on by some longer-range mechanism that compares clusters located in corresponding positions within a display. The relatively large luminance patches in our displays would have made it easy for vision to isolate the clusters (regions of uniform luminance) that would then enter into a second-stage, comparison process. Because of their proximity, one might imagine that the third and fifth luminance patches (the ones that bookended the middle item) would be most easily compared by the kind of longer-range mechanism that Levi and Saarinen postulated. This led us to ask whether those third and fifth luminance patches made some special contribution, say via selective attention, to subjects’ superior performance with S _{M
i
r
r} stimuli.

To test the possibility that with S _{M
i
r
r} stimuli these two luminance patches were especially influential, we simulated performance under two different assumptions with logistic regression. The first model assumed that subjects based their responses to S _{M
i
r
r} stimuli solely on the difference between luminance patches n ₃ & n ₅; the second model assumed that subjects based their responses on comparisons between items in each corresponding pair of luminance patches, that is, n ₁ & n ₇, n ₂ & n ₆, and n ₃ & n ₅. Note that for all Repeat stimulus exemplars, these two models would always predict exactly the same, error-free performance. That is, no matter what the model, every stimulus would be correctly categorized as Repeat. Therefore, we decided to confine our analysis to Non-Repeat stimulus exemplars, for which the models’ predictions would diverge and therefore be more informative. Each logistic regression predicted pr(false positive responses) as a function of the difference between the luminances singled out by the model: either only the difference between luminances n ₃ & n ₅, in one regression, or every difference between corresponding luminance pairs, in the second model. We followed up the regressions with a X² difference test on the two nested models. The result showed that the model in which responses to S _{M
i
r
r} stimuli are based only on luminances n ₃ & n ₅ gave a significantly poorer fit (X²(2) = 69.905, p<.0001). It seems unlikely, then, that superior performance with S _{M
i
r
r} stimuli came from selective attention just to luminances n ₃ & n ₅. Rather, superior performance with S _{M
i
r
r} stimuli might be more easily understood within the framework that has been proposed for perception of mirror symmetry in other kinds of displays. For example, pre-attentive processes, which have been implicated in mirror symmetry detection (e.g., Wagemans 1997), could be especially potent for displays, like ours, made up of just a few, large elements arranged around a vertical axis at the visual field’s center.

Any explanation of the ease with which our subjects detected mirror symmetry begs the question of why Gold et al. found mirror symmetry to be virtually undetectable. In their study, mirror symmetrical stimuli were generated by the same algorithm that we used, but those stimuli were presented sequentially (at 8 Hz), rather than as a spatial array. Before concluding that this difference in results arose from the difference between temporal and spatial presentation modes, we had to rule out the contribution of stimulus size. In particular, each item in our S _{M
i
r
r} spatial stimuli was one-quarter the size of an item in Gold et al.’s stimuli. Recall that we shrank the stimuli for our experiments so that items in Spatial arrays would not fall too far out in the visual periphery.

To determine whether stimulus size mattered for our results, eight new subjects were each tested on four conditions, temporal mirror and Temporal conditions with stimulus items the same size as those in Gold et al.’s study, and with the same two conditions with stimulus items the same reduced size we used in Experiment 2. Prior to analysis, two subjects’ data were discarded because of cell phone use during testing (one confirmed; the other strongly suspected based on his reaction times). Table 1 shows the mean d’ values and within-subject standard errors for each condition. For each stimulus size, performance in the Temporal condition was significantly better than the temporal mirror condition. For stimuli with smaller luminance patches (t(5) = 4.04, p<.05, d = 2.55), and for stimuli with larger patches (t(5) = 3.17, p<.05, d = 2.14). Thus, the poor performance Gold et al. found with mirror symmetrical temporal sequences reflected the mode of presentation, not the nature of the items comprising the sequences.

Table 1 Mean d’ and standard errors from the supplementary experiment

Full size table

We next asked whether the S _{M
i
r
r} condition’s superior performance came from the fact that, unlike other Spatial stimuli, each S _{M
i
r
r} stimulus contained just seven luminance patches rather than eight. Perhaps having fewer luminances in a stimulus facilitated detection of a repetition within the stimulus. In order to assess this possibility, we tested three new subjects on S _{M
i
r
r} and S _{S
h
r
t} conditions. In both conditions, every stimulus comprised eight luminance patches. For S _{M
i
r
r} stimuli, the two middle luminances, namely items 4 and 5, were replicates of one another. Table 2 gives the results from these control measures, along with the means and standard errors for the analogous conditions from Experiment 2.

For each subject, d’ was appreciably higher with S _{M
i
r
r} than with S _{S
h
r
t} stimuli. Moreover, for each subject the difference between stimuli comprising seven items and stimuli comprising eight items was close to the difference found in Experiment 2. This result indicates that the difference between the two conditions in Experiment 2 did not result from the difference in the number of luminance patches in the stimuli. Note that the overall d’ values for the three control subjects was less than the corresponding mean value from Experiment 2’s subjects. The relatively small standard errors associated with Experiment 1’s results leave us at a loss to account for this discrepancy.

Table 2 d’ values: Control results and Experiment 2 results

Full size table

Experiment 2 showed a large difference between performance when luminances were presented spatially and when the same luminances were presented temporally. Although the spatial and temporal dimensions of early vision are to some degree separable (Wilson 1980; Falzett and Lappin 1983), many psychophysical and physiological results suggests a link between the processing of spatial information and the processing of temporal information (e.g., Doherty, Rao, Mesulam, & Nobre, 2005; Rohenkohl, Gould, Pessoa, & Nobre 2014). These links include a suggestion that information from the two dimensions of processing converges at some site in the parietal lobe (e.g., Walsh 2003; Oliveri, Koch, & Caltagirone 2009). Such convergence might support a combination or coordination of temporal and spatial streams of information, as some psychophysical studies have shown (Goldberg et al. 2015; Keller and Sekuler 2015). Although the tasks in those studies differed from ours, their results do suggest the possibility that by making both sources of information available at the same time, concurrent spatial and temporal presentations could enhance performance over what would be produced by either source alone. Experiment 2 addressed this possibility, comparing the detection of repetition embedded in spatial, temporal, and spatio-temporal stimuli, in which the two dimensions were combined.

Experiment 2

Experiment 2 showed that detection of repetition within Spatial stimuli was considerably better than it was within Temporal stimuli. In the natural world, many events are characterized not by spatial or temporal information alone, but by some combination of the two. For example, stimulus variation over both space and time contributes importantly to event recognition and understanding (Cristini et al. 2007; Shipley and Zacks 2008). Experiment 2 examined whether the availability of spatial information and concurrent temporal information would facilitate detection of repetition within a stimulus. One detail of the experiment’s design was motivated by the directional bias seen previously, when subjects processed spatio-temporal stimuli (Sekuler 1976; Sekuler et al. 1973; Corballis 1996). Specifically, previous studies have shown either a left-to-right or a right-to-left order advantage in processing visual stimuli presented in rapid sequence. As there was no consensus on the direction of the bias, we generated spatio-temporal stimuli with both left-to-right and right-to-left orders of item presentation. So, in addition to assessing the impact of combining temporal and spatial information, we examined how direction of presentation influenced performance.