An animal’s environment is rich with affordances. Different possible actions are specified by visual information while competing for dominance over neural dynamics. Affordance competition models account for this in terms of winner-takes-all cross-inhibition dynamics. Multistable phenomena also reveal how the visual system deals with ambiguity. Their key property is spontaneous instability, in forms such as alternating dominance in binocular rivalry. Theoretical models of self-inhibition or self-organized instability posit that the instability is tied to some kind of neural adaptation and that its functional significance is to enable flexible perceptual transitions. We hypothesized that the two perspectives are interlinked. Spontaneous instability is an intrinsic property of perceptual systems, but it is revealed when they are stripped from the constraints of possibilities for action. To test this, we compared a multistable gestalt phenomenon against its embodied version and estimated the neural adaptation and competition parameters of an affordance transition dynamic model. Wertheimer’s (Zeitschrift fur Psychologie 61, 161–265, 1912) optimal (β) and pure (φ) forms of apparent motion from a stroboscopic point-light display were endowed with action relevance by embedding the display in a visual object-tracking task. Thus, each mode was complemented by its action, because each perceptual mode uniquely enabled different ways of tracking the target. Perceptual judgment of the traditional apparent motion exhibited spontaneous instabilities, in the form of earlier switching when the frame rate was changed stepwise. In contrast, the embodied version exhibited hysteresis, consistent with affordance transition studies. Consistent with our predictions, the parameter for competition between modes in the affordance transition model increased, and the parameter for self-inhibition vanished.
In a natural environment a functionally adaptive perceptual system has to guide behavior with respect to multiple and competing opportunities for action (Rietveld & Kiverstein, 2014; Saltzman & Caplan, 2015). William James (1904) defined this as the problem of the multiplicity of potential structure.Footnote 1 James Gibson (1966) later extended this to the problem of how the same optic array from the same environment can specify multiple and sometimes mutually exclusive possibilities for action. The table in front affords climbing on top or crawling underneath. How do we quickly select actions from what the current environment affords and make our way through the world without wasting time to consider every possible path?
This problem has been studied extensively in the context of two largely unrelated classes of paradigms: unstable visual phenomena and switching between perceived affordances. The former class is exemplified by binocular rivalry (Blake & Logothetis, 2002) and the geometric displays of the Necker cube and other gestalt phenomena (Kruse & Stadler, 1995). Their characteristic dynamic property is lack of stability, expressed in the form of spontaneous alternation among possible perceptual configurations. The latter is exemplified by affordance boundary experiments that study how environmental variables, called control parameters, specify affordances (Warren, 1984). For instance, step height divided by leg length is a body-scaled control parameter that specifies the ability to step on a stair (Warren & Wang, 1987). The characteristic dynamic property of affordance boundary experiments is stability expressed in the form of hysteresis, whereby the switching point for perceived affordance in response to a gradually changing control parameter comes later for increasing relative to decreasing direction of change (for a review, Dotov, de Wit, & Nie, 2012).
Unstable visual phenomena can be understood as separate modes of neural dynamics characterized by self-inhibition (Ditzinger & Haken, 1989; Frank, Profeta, & Harrison, 2015) or neural adaptation (Kruse, Carmesin, Pahlke, Strüber, & Stadler, 1996; Laing & Chow, 2002). Self-inhibition explains why a dominant mode cannot be sustained indefinitely, leading to spontaneous switching. The functional significance of neural adaptation therefore is that it enables flexible switching. Such theoretical models are not complete, however, because they deal with a special case with no obvious mapping to perception in general (Braddick, 2018) and place the visual system in a different role from its default, which is to guide action (Warren, 2012). On the other hand, action-selection but not spontaneous switching has been addressed by the affordance competition hypothesis, an impactful theory about the neural integration of spatial and other contextual information about action (Cisek, 2007; Cisek & Kalaska, 2010). It posits that the dorsal stream activations for action alternatives are subject to a winner-takes-all dynamic but also that the competition is biased in a top-down manner by the action relevance, action planning, and reward associated with the stimulus and resolved by other areas such as the ventral stream and frontal areas.
We propose the notion of embodied gestalts to fuse these two parallel theoretical traditions, along with their respective experimental paradigms. Whereas the contemporary continuations of gestalt theory focus mainly on unstable visual phenomena and explain them in terms of intrinsic neural dynamics (Friston, Breakspear, & Deco, 2012; Kruse & Stadler, 1995), embodied gestalts involve nested dynamics of the brain, body, and environment. This has been formalized in a model that combines the two constraints, namely neural adaptation for self-inhibition and competition for action selection in a natural environment (Lopresti-Goodman, Turvey, & Frank, 2011, 2013). In particular, the past work predicts that the relative balance between the two constraints within the same model can account both for the spontaneous or early transitions characteristic of bistable visual phenomena and the late transitions or hysteresis characteristic of affordance transitions. To test this, we designed a task in which a rivalry visual phenomenon known to exhibit spontaneous instabilities was endowed with action relevance by being embedded in a perception–action task. We then compared the stability of perceptual modes with and without the action selection constraint. This approach was inspired by studies of the effects of sensory–motor congruency on unstable phenomena. Using hand movements coupled to the stimulus to report the direction of an ambiguously rotating sphere had a stabilizing effect, extending the dominance of the movement-congruent mode (Maruya, Yang, & Blake, 2007). Dimensional congruency produced a similar effect (Beets, Rösler, Henriques, Einhäuser, & Fiehler, 2010).
A suitable phenomenon is the bistable apparent motion seen with stroboscopically flashed luminant dots. Magni-φ is a more stable but also a more versatile version (Steinman, Pizlo, & Pizlo, 2000) of the classic apparent motionFootnote 2 studied by Wertheimer (1912). Magni-φ involves a circular arrangement of an arbitrary number of dots in which a new dot is masked consecutively at each successive video frame (see Fig. 1A). The frame update rate acts as a control parameter and determines the stability of each perceptual mode. At low rates, an individual dot appears to make one step like in an animated movie (β), leaving an open space to which the following dot will move at the subsequent video frame.Footnote 3 At high rates, a contour-less shadow (φ) appears to move on top of the ring of stationary dots. Rates in the middle range enable a rivalry phenomenon, because both β and φ are possible (roughly between 5 and 20 fps; see the Method sections and Appendix A), making this an instance of a bistable visual display (Shaw, Flascher, & Mace, 1996). The usual method for studying the stability of the β and φ modes is to have participants classify the stimulus over successive presentations, either verbally or with a button press. In the case of β–φ classification, early transitions are expected, due to the inherent instability of the phenomenon.
In our embedded φ–β version of the task, the same visual display served as a stimulus in a game of tracking a moving object. If a particular dot is indicated at the beginning of the trial and has to be identified by the participant at the end, this requires it to be tracked continuously, because all dots in the φ–β phenomena have the same appearance. Importantly, each of the two modes of apparent motion specifies a different trajectory for the individual dots. In what we refer to as φ-tracking, the target dot undergoes repetitive occlusions by the rotating shadow. In what we refer to as β-tracking, the target dot moves around the ring. In this way, φ-tracking is a different action from β-tracking. The perceptual mode is implicitly recorded through the participants’ responses at the end of each trial, because different final locations are specified by φ and β; see Fig. 1B.
The stability of perceptual modes against a changing stimulus was probed by delivering two sequences of trials, one with an ascending and one with a descending control parameter, in the form of an affordance boundary experiment, with β→φ transitions in ascending sequences and φ→β transitions in descending sequences. The same procedure was repeated once in the tracking condition and once in the classification condition. The same stimuli and sequences were seen in both conditions, but in classification the instruction was not to track a particular object but to observe and report which mode was seen—for simplicity, labeled as “shadow” and “beads” in relation to what is seen to move in φ and β, respectively (see Fig. 1A). To define the measure of stability more specifically, if αcrit is the critical value of the control parameter, then instability and early switching are associated with lower values in ascending relative to descending sequences (αcrit,ascending < αcrit,descending), and stability is associated with larger values in ascending relative to descending sequences (αcrit,ascending > αcrit,descending) (see Fig. 2A and B, respectively). Late switching was expected in the tracking condition, in agreement with affordance transition studies (e.g., Fitzpatrick, Carello, Schmidt, & Corey, 1994; Lopresti-Goodman et al., 2013; van der Kamp, Savelsbergh, & Davis, 1998). Conversely, early switching due to inherent instability was expected in the classification condition, in agreement with past studies with reduced action relevance of affordance classifications (Fitzpatrick et al., 1994; Pisarchik, Jaimes-Reátegui, Magallón-García, & Castillo-Morales, 2014; Tuller, Case, Ding, & Kelso, 1994; van Rooij, Bongers, & Haselager, 2002).
We also estimated parameters for the affordance transition model to test whether the same theoretical account can explain both dynamic regimes (Lopresti-Goodman et al., 2011, 2013). The affordance transition model combines a destabilizing, self-inhibition (neural adaptation) dynamic at a rate h and a competition, winner-takes-all dynamic at a rate g. The relative strength of the two determines the relative timing of transitions in ascending and descending trials (Ditzinger & Haken, 1989; Lopresti-Goodman et al., 2011, 2013). The model is solved such that values of the competition parameter (g) and the adaptation parameter (h) are calculated from the two transitions points in an ascending and a descending trial sequence; see Appendix B.
A single trial, shown schematically in Fig. 1A and B, consisted of a brief stimulus presentation followed by a stationary period, during which the response was made. In visual classifying, the manner of responding was only symbolically related to the perceptual judgment, in that participants had to indicate one of two words, “beads (β)” or “shadow (φ),” by clicking with the mouse. In visual tracking, which bears resemblance to a multiple object tracking (MOT) task, a target dot was indicated at the beginning of the trial and then the response at the end of the trial consisted of clicking the target object at its final location. In an ascending sequence of trials, as frame rate progressively grew from very slow to very fast participants were expected to switch from β to φ, and vice versa in the descending sequence. In the tracking condition, the number of the last few frames was controlled so that the two possible target trajectories, implied by β-tracking (the target moves along the ring) and φ-tracking (the target is stationary but periodically occluded), would terminate within the opposite halves of the ring.
Thirteen students in an Introductory Psychology course at the University of Connecticut participated in partial fulfillment of a course requirement. All had normal or corrected-to-normal vision. The experiment had the approval of the University’s Institutional Review Board.
Materials and apparatus
A desktop computer, mouse, and a CRT screen with 120 Hz update rate were used to perform the task. A Matlab script employing the capabilities for real-time video frame control of the PsychoPhysics Toolbox (Brainard, 1997; Pelli, 1997; Pelli & Zhang, 1991) implemented all aspects of data collection and visual stimulus. A frame consisted of a black background with nine dots (0.3° of visual angle) each with a yellow center and white border arranged in circle (1.5° visual angle), with the circle centered on the screen and a black mask covering one dot. In each frame the mask was displaced clockwise to the following dot location. The frame update rate α was incremented between trials in ascending or descending manner. The 16 update rates in a sequence of 16 trials were distributed according to an exponential function (α = e(–.4+.28n), n∈N0 [1:16], ranging from about 1.20 to 60 Hz), more densely distributed for higher frequencies (Kruse et al., 1996). The frame update rate acted as a control parameter for β- and φ-tracking, just as step height relative to leg length limits step-on-ability, because empirical considerations from visual persistence indicated that the limit for tracking a dot as it moves to adjacent positions in the β-tracking mode is about 20 fps, or 50 ms (see Appendix A).
The perceived apparent motion mode in the φ–β phenomenon specifies the target trajectory. There were two possible correct target locations, corresponding to φ- and β-tracking, respectively, and seven incorrect ones. Deviation in tracking was the absolute angular deviation between the click location and the closer of the two possible correct locations. Failure in tracking was defined as an absolute deviation larger than 40° (the distance between adjacent dots).
Ascending and descending α defined the independent variable sequence order. The independent variable of task had two levels: classifying and tracking. Participants performed four sequences, one in each of the four conditions in a two (sequence order) by two (task) repeated measures design. Conditions were randomized but blocked by task.
Introduction to the experiment was followed by an adjustment of the viewing distance from the monitor to 80 cm. Participants were requested not to lean forward. A sequence of demonstration and practice trials was then performed in both tracking and classifying conditions. The sequence included values from the high and low end of the frame update rate. Several participants required additional practice to perform β-tracking. The given instructions explained that each trial would gradually shift from the one end to the other end of frame update rate and, by implication, participants would likely switch from the one perceptual mode to the other in the course of the trial. The task in classifying was to indicate what was seen whereas in tracking it was to visually track a target dot and click it at the end. No instructions were given about when and how to switch, where to focus on the screen, how to move one’s eyes. Hand or finger movements during stimulus presentation were discouraged in both conditions. Four sequences of experimental trials were performed after preparation.
αcritical in a trial was defined as the value of α corresponding to the first mode transition in that sequence. Multiple transitions back and forth within the same sequence were rare. The model parameters g (competition among modes) and h (self-inhibition) were calculated from the pair of αcritical recorded in the ascending and descending sequences (see Appendix B).
A 7.21% error rate for tracking indicated the feasibility of the task and that participants understood the task requirements. All recorded trials resulted in transitions. Multiple transitions in the same trial were rare. As Fig. 3A suggests, in tracking the observed αcritical was higher for ascending sequences (M = 13.29, SD = 3.68) than for descending sequences (M = 6.25, SD = 2.63) in the tracking task, indicating relative stability. The relation was reversed in classifying, Fig. 3A, with the mean αcritical lower in ascending sequences (M = 8.38, SD = 2.88) than in descending sequences (M = 8.94, SD = 4.00), indicating relative instability.
A 2×2 repeated measures analysis of variance (ANOVA) showed no main effect of task, F(1, 12) = 2.22, p = .16, ηp2 = .16. Order was significant, F(1, 12) = 58.66, p < .001, ηp2 = .83, such that αcritical was higher for ascending sequences than for descending sequences. An interaction between the two factors was found, F(1, 12) = 14.16, p < .01, ηp2 = .54. A Bayesian approach found the same pattern of results when testing against the null (intercept) hypothesis. The Bayes factor (BF) indicated no evidence for the model with task (BF < 1), strong evidence with order (BF = 25.69) and with task and order (BF = 11.94), and very strong evidence if the interaction of task and order was included (BF = 20,042.49). Pairwise comparisons, t tests with df = 12 and the Holm correction, showed a significant difference between the ascending and descending mean transition values in tracking, p < .001, indicating positive hysteresis, but not in classifying, p = .56, indicating no negative hysteresis. When taking the interaction into account, ascending sequences tended to switch earlier in classifying than in tracking, p < .01, but the opposite difference for the descending sequences was not statistically significant, p = .07.
The manipulation of action relevance proved effective in modulating the transition points. Perceptual instabilities leading to a perceptual mode transition occurred earlier in classifying sequences (no hysteresis) than in tracking sequences (positive hysteresis). Strong evidence for early switching (negative hysteresis) was not found.
The affordance transition model explains changes in the size of hysteresis in terms of the strength of self-induced destabilization h at the slower time scale. It is possible, however, that this instability is due to how long one stares at the stimulus, not to whether the perceptual process is bound to an action selection. To address this possibility, Experiment 2 employed a sequence of the control parameter with the same range as in Experiment 1 but shorter and more numerous trials. Additionally, the step size at the upper end of the frequency range of Experiment 2 was increased relative to Experiment 1 and decreased in the middle and lower end. If perceptual instability is mediated by duration of exposure to the stimulus, then transitions is expected to happen later with respect to the control parameter in the descending sequences in Experiment 2 than in Experiment 1.
All aspects of Experiment 1 were reproduced save for two details, N = 10 and the distribution of the stimulus update rates. A sequence consisted of 29 trials with a mean duration of 2.80 . The update rates were defined piecewise. The first ten values beginning with 60 fps decayed faster than in Experiment 1 and then followed α = e(–.9+.1n), n∈N0 [11, 12, . . . , 29].
A 6.55% error rate implied that the task was feasible and that participants understood the task requirements. All recorded trials resulted in transitions and multiple transitions in the same trial were rare.
Observed α critical
In tracking, the transition value was higher for ascending sequences (M = 12.19, SD = 3.66) than for descending sequences (M = 7.33, SD = 2.22). The relation was reversed in classifying, with the mean αcritical lower in ascending sequences (M = 7.49, SD = 2.19) than in descending sequences (M = 12.55, SD = 6.93), as is shown in Fig. 3B.
A 2×2 repeated measures ANOVA showed an interaction between task and sequence order, F(1, 9) = 13.90, p < .01, ηp2 = .62, but no main effect of either (Fs < 1). A Bayesian approach showed the same pattern of results when testing against the null (intercept) hypothesis. The Bayes factor indicated no evidence for the models with task (BF < 1), order (BF < 1), or task and order (BF < 1), but strong evidence if the interaction of task and order was included (BF = 9.04). Pairwise comparisons, t tests with df = 9 and Holm correction, showed a significant difference between the ascending and descending transition values in tracking, p < .01, and no effect in classifying, p = .09. Taking the interaction into account, ascending sequences switched earlier in classifying than in tracking, p < .01, and, similarly, descending sequences switched earlier in classifying than in tracking, p < .05.
α critical across Experiments 1 and 2
Given that Experiments 1 and 2 differed only in terms of the response schedule—how often participants had to respond within the trial—the two datasets can be combined in a design with response schedule serving as a between-subjects factor. The distributions of perceived φ and β modes per frame rate are summarized in Fig. 4, separately for sequence order, condition, and response schedule. A mixed-design ANOVA revealed no main effect of schedule, F(1, 21) < 1; a main effect of sequence order, F(1, 21) = 5.10, p < .05, ηp2 = .19; no interaction of schedule with task, F(1, 21) < 1; an interaction of schedule with sequence order, F(1, 21) = 5.43, p < .05, ηp2 = .22; and no three-way interactions, F(1, 21) < 1.
We found a significant interaction between the task and order (ascending vs. descending), F(1, 21) = 28.34, p < .001, ηp2 = .58. Pairwise comparisons, t tests with df = 22 and Holm correction, showed significantly larger transition values in ascending (M = 12.74, SD = 3.79) than in descending (M = 6.80, SD = 2.54) sequences in tracking, p < .001, and significantly lower transition values in ascending (M = 7.94, SD = 2.66) than descending (M = 10.70, SD = 5.68) sequences in classifying, p < .05. Ascending sequences switched earlier in classifying than in tracking, p < .001, and, similarly, descending sequences switched earlier in classifying than in tracking, p < .01.
Three participants were excluded because they made accidental response clicks, complicating the model-based parameter estimation. As is shown in Fig. 3C, the competition parameter g in Experiment 1 was larger in tracking (M = 2.97, SD = 1.35) than in classifying (M = 1.24, SD = .25). The pattern was the same in Experiment 2, with tracking (M = 1.81, SD = 1.34) being larger than classifying (M = 1.08, SD = .26). A Bayesian repeated measures ANOVA showed strong evidence for the effect of task in both experiments (BF = 30.08 and BF = 2938.86, respectively). A 2×2 mixed-design ANOVA revealed a significant effect of task, F(1, 18) = 13.19, p < .01, ηp2 = .42, and a significant effect of response schedule, F(1, 18) = 5.38, p < .05, ηp2 = .23. There was no task-by-schedule interaction, F(1, 18) = 2.14, p = .16.
As is shown in Fig. 3D, the self-inhibition rate parameter h in Experiment 1 was smaller in tracking (M = .004, SD = .007) than in classifying (M = .058, SD = .090). The pattern was the same in Experiment 2, with tracking (M = .002, SD = .006) smaller than classifying (M = .142, SD = .088). A Bayesian repeated measures ANOVA showed strong evidence for the effect of task in both experiments (BF = 9.12 and BF = 20.75, respectively). A 2×2 mixed-design ANOVA revealed a significant effect of task, F(1, 18) = 22.00, p < .001, ηp2 = .55, and no effect of schedule, F(1, 18) = 4.42, p = .050. The task-by-spacing interaction was not significant, F(1, 18) = 4.30, p = .053.
Experiment 2 reproduced the effect of the manipulation of action relevance: Perceptual instabilities happened earlier in classifying sequences, both ascending and descending. Moreover, combining the power of Experiments 1 and 2 revealed that the hysteresis in tracking was converted to early switching (negative hysteresis) in classifying. Under the boundary conditions of classifying, perceptual instabilities occur prematurely. Note, however, that early and late switching do not correspond categorically to the two classes of perceptual tasks. Only reduction of hysteresis size but not negative hysteresis was found in a judgment condition relative to a dynamical condition of performance (Hock, Bukowski, Nichols, Huisman, & Rivera, 2005; Hock & Ploeger, 2006).
The objective of the present study was to address the relative (in)stability of perceptual modes as a function of their action relevance, and to show that dynamic properties of visual rivalry phenomena and affordance competition are two manifestations of the same underlying process. An object affords an activity for an agent on a given occasion if and only if the object and the agent are mutually compatible on dimensions of relevance to the activity (Petrusz & Turvey, 2010). Here we manipulated dimension of relevance (or functional distance) experimentally over and above compatibility in order to determine how it affects the dynamic stability of perception. We designed an experiment on the basis of a gestalt-like rivalry phenomena that typically has minimal relevance to body activity but could be embedded in a perception–action task. The usual method for such stimuli is the perceptual judgment task in which participants observe the apparent motion for a few seconds and then indicate symbolically, with words or buttons, the perceptual state. We tested if the dynamics of perceptual transitions would change if apparent motion was endowed with action relevance by linking the alternative perceptual modes to competing tracking modes, in which participants were engaged in perception–action rather than reflecting on what they were seeing (Heft, 1993). This distinguishes our study from stimulus–response compatibility paradigms in which relevance and intention are kept the same but the physical alignment between stimulus and action is altered.
Indeed, possibility for action was found to be a boundary condition stabilizing a perceptual system supplied with an otherwise unstable stimulus. Instability indicated by early switching, also called negative hysteresis, was observed in the perceptual judgment condition and greater stability, or hysteresis, in the action selection condition. That we could convert an unstable visual phenomenon to stable by embedding it in a task involving coupled movement and real consequences only goes half of the story. Importantly, the same affordance transition model could account for both scenarios because the estimated parameters were in agreement with the theory. Stronger competition g was observed in the tracking condition in which action selection was necessary, and only in that condition did it exceed the value of one that implies no competition. Conversely, self-inhibition h was higher in the classification condition and it approached zero in tracking.
The affordance transition model states that dynamics becomes destabilized early, leading to negative hysteresis, if there is intrinsic decay of perceptual modes and no competitive, mutually inhibiting coupling. Alternatively, the perceptual modes remain stable for longer, leading to positive hysteresis, if the decay rate is diminished and/or the competition between modes is stronger. This implies that visual phenomena are unstable because they lack relevance to action and fail to activate the top-down processes that would otherwise break the symmetry at the lower level of visual processing; absence of opportunities for action leads to loss of constraints leads to unstable visual phenomena.
Converging evidence indicates that affordances prioritize the pickup of relevant information by tuning the visual system (Thomas, 2015). The reduction in hemodynamic response with repeated presentations is reduced with 3-D objects that afford manipulation as opposed to images that do not (Snow et al., 2011). Similarly, passive exposure leads to contrast adaptation in the activity of single cells in the visual cortex but this is reversed with added behavioral context (Keller et al., 2017). Eye action as a process of information pickup in coordination with a rotating target is likely to be recruited differently with and without task constraints. The present study is limited by the lack of eyetracking but future work could address how the local subprocesses conform to the overall embodied gestalt.
The embedded φ–β task is formally similar to an affordance boundary setup for two reasons. First, the task has conditions of satisfaction: tracking the target in an array of identical objects can be successful or unsuccessful, contrary to perceptual judgments of visual phenomena that cannot be right or wrong. Second, two actions, or ways of focusing on and following the moving objects, are afforded and this possibility is constrained by a parameter of the stimulus. This setup also opens the opportunity to study how the control of eye movements as search for information is recruited by the different task constraints. Here we did not use eyetracking, however, and only sought to test how the involvement of purposeful action affects the transition times of perceptual modes.
Why are neural dynamics unstable while perception–action is stable?
Theoretical accounts of the perceptual response to a changing environment have to explain how perceptual systems resolve the conflict between stability and flexibility. In our everyday dealing with the world we can fluently and without notice switch among opportunities for action (Rietveld, 2008). Arguably, transient neural dynamics play an adaptive role in this context because they increase the capacity of the brain to deal with complicated environments (Friston, 2000). A range of proposed notions of self-destabilization converge on the same fundamental principle of neural organization to account for this flexibility (chaotic neural dynamics, Skarda & Freeman, 1987; habituation, self-organized instability, and autovitiation, Friston et al., 2012; Pastukhov et al., 2013).
If the neural substrate of perceptual systems is a system fraught with instabilities then what gives it the stability that ensures reliable operation in our daily lives, why do we not report spontaneous perceptual instabilities more often? The theoretical model suggests that the dynamics of neural systems are intrinsically unstable but become stable when their counterpart of mutual sensory and motor constraints is added. In any realistically complex environment there is an abundance of such constraints; hence, neural dynamics are typically to be found embedded in a stable perception and action loop (Cisek, 2007; Cisek & Kalaska, 2010). Input from the environment is to be understood as an enabling constraint in that it restricts the dynamic regime of otherwise unstable neural activity and hones it into a functional perception–action loop (Anderson, 2015; Raja & Anderson, 2019).
Our approach to model building can be described as out–in and constraint-driven. In particular, we seek to understand the external constraints such as what information is available for a given task, the layout of the task space, and the causal effects of the participant’s actions in this task space. Alternatively, an in–out approach would begin by listing the relevant mental processes such as attention and action selection and their corresponding neural substrates. For instance, it might be possible to account for the present results with a model that links action and attention to perceptual information by way of response activation (Welsh, Weeks, Chua, & Goodman, 2009). This accounts for competing perceptual categories in terms of neural processes that race for activation dominance, where action-based priming can give a head start to one process over another. In fact, this comes close to the way the embodied gestalt model operates (see Eq. B4).
The model proposed here has one important difference and leads to different predictions about neural dynamics. In the racing and other decision-making theories, the perceptual categories are incremental processes that grow positively from zero to a threshold in the style of evidence accumulators (Cisek & Kalaska, 2010). Yet, such models make no predictions about the stability of perceptual categories. Here we showed that a negative, self-inhibition process is also necessary to account more fully for perceptual and action selection dynamics. We also showed that factors such as availability for action can change this dynamic, leading to inhibition of inhibition. This implication is important when studying the neural basis of diseases associated with altered dynamics, be it excessively unstable (Bystritsky, Nierenberg, Feusner, & Rabinovich, 2012; Lerner et al., 2012; Rabinovich, Muezzinoglu, Strigo, & Bystritsky, 2010) or excessively stable (Tass et al., 2012). To give one example in which this becomes relevant, schizophrenia is associated with reduced affordance perception (Kim & Kim, 2017), unstable real-world perception (Rolls, Loh, Deco, & Winterer, 2008) and, paradoxically, unusually stable perception of apparent motion (Frank & Dotov, 2016).
In conclusion, we have argued for a unification of theories dealing with action selection and unstable visual phenomena. The spontaneous switching of gestalt-like phenomena and late switching in affordance selection are both seen as perceptual transitions, but they are positioned at opposite ends of the stability continuum, where stability is determined by the relative balance of competition and self-inhibition. In this sense, gestalt figures are a special, impoverished case of the embodied gestalts that constitute perception–action in natural environments. Arguably, self-inhibition driven by the intrinsic instability of neural dynamics explains visual instabilities and, importantly, is a mechanism for functional flexibility. Here we suggest that this principle from theoretical neuroscience can be even more powerful if it is complemented by another principle, that of externally specified opportunities for action. External constraints from an environment rich with affordances act as enabling constraints, because they break symmetry and stabilize otherwise unstable internal dynamics.
American philosopher William James conceptualized perception as the realization of a relational structure between knower and known in terms of active selection from a multiplicity of potential structure—“The one self-identical thing has so many relations to the rest of experience that you can take it in disparate systems of association, and treat it as belonging to opposite contexts” (James, 1904, p. 481)—and importantly, not only things but also relations between things are directly perceivable (Heft, 2001). J. J. Gibson, a student of James’s student, considered the same problem: “The world is often like a three-ring circus to a child—too many things happening too fast . . .” (1966, p. 309). How does a perceiver use subtle contextual information in a changing environment to select a particular opportunity for action out of the many (or indefinitely many) possible ones, or switch from one to another?
As a historical note, pure motion was Wertheimer’s (1912) important discovery, because motion was seen in the absence of any change in the location of the objects constituting the stimulus. This implies that the perception of motion is not necessarily an integration of successive object locations. The φ value showed that motion as such is a fundamental dimension of experience, independent of perceptions of successive locations. As King and Wertheimer (2005) underscored, Max Wertheimer viewed the whole not only as more than the sum of its parts but as prior to or entirely different from the sum of its parts, in that it determines the nature of its parts.
A less stable version of β exists, in which all dots are seen to be spinning grouped as a solid circular structure. Pilot trials found this mode to be difficult to observe with the present settings of the display parameters (i.e., visual angle among the elements).
Anderson, M. L. (2015). Beyond componential constitution in the brain: Starburst amacrine cells and enabling constraints. In T. Metzinger & J. M. Windt (Eds). Open MIND (Article 1). Frankfurt am Main, Germany: MIND Group. https://doi.org/10.15502/9783958570429
Beets, I. A. M., Rösler, F., Henriques, D. Y. P., Einhäuser, W., & Fiehler, K. (2010). Online action-to-perception transfer: Only percept-dependent action affects perception. Vision Research, 50, 2633–2641.
Blake, R., & Logothetis, N. K. (2002). Visual competition. Nature Reviews Neuroscience, 3, 13–21. https://doi.org/10.1038/nrn701
Braddick, O. (2018). Illusion research: An infantile disorder? Perception, 47, 805–806. https://doi.org/10.1177/0301006618774658
Brainard, D. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436. https://doi.org/10.1163/156856897X00357
Bystritsky, A., Nierenberg, A. A., Feusner, J. D., & Rabinovich, M. (2012). Computational non-linear dynamical psychiatry: A new methodological paradigm for diagnosis and course of illness. Journal of Psychiatric Research, 46, 428–435.
Cisek, P. (2007). Cortical mechanisms of action selection: The affordance competition hypothesis. Philosophical Transactions of the Royal Society B, 362, 1585–99. https://doi.org/10.1098/rstb.2007.2054
Cisek, P., & Kalaska, J. F. (2010). Neural mechanisms for interacting with a world full of action choices. Annual Review of Neuroscience, 33, 269–298. https://doi.org/10.1146/annurev.neuro.051508.135409
di Lollo, V. (1977). Temporal characteristics of iconic memory. Nature, 267, 241–243.
di Lollo, V. (1980). Temporal integration in visual memory. Journal of Experimental Psychology: General, 109, 75–97.
Ditzinger, T., & Haken, H. (1989). Oscillations in the perception of ambiguous patterns: A model based on synergetics, Biological Cybernetics, 61, 279–287.
Domínguez-Hüttinger, E., Christodoulides, P., Miyauchi, K., Irvine, A. D., Okada-Hatakeyama, M., Kubo, M., & Tanaka, R. J. (2017). Mathematical modeling of atopic dermatitis reveals “double switch” mechanisms underlying four common disease phenotypes. Journal of Allergy and Clinical Immunology, 139. 1861–1872.
Dotov, D. G., de Wit, M. M., & Nie, L. (2012). Understanding affordances: History and contemporary development of Gibson’s central concept. AVANT, 3, 28–39.
Farrell, J. E., Putnam, T., & Shepard, R. N. (1984). Pursuit-locked apparent motion. Bulletin of the Psychonomic Society, 22(4), 345–348. https://doi.org/10.3758/BF03333838
Fitzpatrick, P., Carello, C., Schmidt, R. C., & Corey, D. (1994). Haptic and visual perception of an affordance for upright posture. Ecological Psychology, 6, 265–287.
Frank, T. D., & Dotov, D. G. (2016). Coarse-grained order parameter dynamics of the synergetic computer and multistable perception in schizophrenia. In A. Pelster & G. Wunner (Eds.), Self-organization in complex systems: The past, present, and future of synergetics (pp. 247–262). Berlin: Springer.
Frank, T. D., Profeta, V., & Harrison, H. (2015). Interplay between order-parameter and system parameter dynamics: considerations on perceptual–cognitive–behavioral mode–mode transitions exhibiting positive and negative hysteresis and on response times. Journal of Biological Physics, 41, 257–292.
Friston, K. J. (2000). The labile brain: II. Transients, complexity and selection. Philosophical Transactions of the Royal Society B, 355, 237–52.
Friston, K., Breakspear, M., & Deco, G. (2012). Perception and self-organized instability. Frontiers in Computational Neuroscience, 6, 44. https://doi.org/10.3389/fncom.2012.00044
Gibson, J. J. (1954). The visual perception of objective motion and subjective movement. Psychological Review, 61, 304–314.
Gibson, J. J. (1966). The senses considered as perceptual systems. Boston, MA: Houghton Mifflin.
Haken, H. (1983). Synergetics, an introduction: Nonequilibrium phase transitions and self-organization in physics, chemistry, and biology (3rd ed.). New York, NY: Springer.
Haken, H. (1991). Synergetic computers and cognition. Berlin, Germany: Springer.
Haken, H. (1993). Advanced synergetics: Instability hierarchies of self-organizing systems and devices. New York, NY: Springer,
Heft, H. (1993). A methodological note on overestimates of reaching distance: Distinguishing between perceptual and analytical judgments. Ecological Psychology, 5, 255–271.
Heft, H. (2001). Ecological psychology in context. Mahwah, NJ: Erlbaum.
Hock, H. S., Bukowski, L., Nichols, D. F., Huisman, A., & Rivera, M., (2005). Dynamical vs. judgmental comparison: Hysteresis effects in motion perception. Spatial Vision, 18, 317–335.
Hock, H. S., & Ploeger, A. (2006). Linking dynamical perceptual decisions at different levels of description in motion pattern formation: Psychophysics. Perception & Psychophysics, 68, 505–514. https://doi.org/10.3758/BF03193693
James, W. (1904). Does “consciousness” exist? Journal of Philosophy, Psychology and Scientific Methods, 1, 477–491.
Keller, A. J., Houlton, R., Kampa, B. M., Lesica, N. A., Mrsic-Flogel, T. D., Keller, G. B., & Helmchen, F. (2017). Stimulus relevance modulates contrast adaptation in visual cortex. ELife, 6, 4–15. https://doi.org/10.7554/eLife.21589
Kim, N.-G., & Kim, H. (2017). Schizophrenia: An impairment in the capacity to perceive affordances. Frontiers in Psychology, 8, 1052. https://doi.org/10.3389/fpsyg.2017.01052
King, D. B., & Wertheimer, M. (2005). Max Wertheimer and Gestalt theory. Piscataway, NJ, US: Transaction.
Kruse, P., Carmesin, H., Pahlke, L., Strüber, D., & Stadler, M. (1996). Continuous phase transitions in the perception of multistable visual patterns. Biological Cybernetics, 75, 321–330.
Kruse, P., & Stadler, M. (1995). Ambiguity in mind and nature (Springer Series in Synergetics, Vol. 64). Berlin, Germany: Springer.
Laing, C. R., & Chow, C. C. (2002). A spiking neuron model for binocular rivalry. Journal of Computational Neuroscience, 12, 39–53.
Lerner, I., Bentin, S., Shriki, O., Andreason, N., Neely, J., Minzenberg, M., … Treves, A. (2012). Excessive attractor instability accounts for semantic priming in schizophrenia. PLoS ONE, 7, e40663. https://doi.org/10.1371/journal.pone.0040663
Lopresti-Goodman, S. M., Turvey, M. T., & Frank, T. D. (2011). Behavioral dynamics of the affordance “graspable”. Attention, Perception, & Psychophysics, 73, 1948–1965. https://doi.org/10.3758/s13414-011-0151-5
Lopresti-Goodman, S. M., Turvey, M. T., & Frank, T. D. (2013). Negative hysteresis in the behavioral dynamics of the affordance “graspable”. Attention, Perception, & Psychophysics, 75, 1075–1091.
Maruya, K., Yang, E., & Blake, R. (2007). Voluntary action influences visual competition. Psychological Science, 18, 1090–1098.
Pastukhov, A., García-Rodríguez, P. E., Haenicke, J., Guillamon, A., Deco, G., & Braun, J. (2013). Multistable perception balances stability and sensitivity. Frontiers in Computational Neuroscience, 7, 17, https://doi.org/10.3389/fncom.2013.00017
Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. https://doi.org/10.1163/156856897X00366
Pelli, D., & Zhang, L. (1991). Accurate control of contrast on microcomputer displays. Vision Research, 31, 1337–1350.
Petrusz, S. C., & Turvey, M. T. (2010). On the distinctive features of ecological laws. Ecological Psychology, 22, 44–68.
Pisarchik, A. N., Jaimes-Reátegui, R., Magallón-García, C. D., & Castillo-Morales, C. O. (2014). Critical slowing down and noise-induced intermittency in bistable perception: Bifurcation analysis. Biological Cybernetics 108, 397–404.
Rabinovich, M. I., Muezzinoglu, M. K., Strigo, I., & Bystritsky, A. (2010). Dynamical principles of emotion–cognition interaction: Mathematical images of mental disorders. PLoS ONE, 5, e12547:1–10. https://doi.org/10.1371/journal.pone.0012547
Raja, V., & Anderson, M. L. (2019). Behavior as an enabling constraint. In M. Viola & F. Calzavarini (Eds.), New challenges in philosophy of neuroscience (Studies on Mind and Brain). Berlin, Germany: Springer.
Rietveld, E. (2008). The skillful body as a concernful system of possible actions: Phenomena and neurodynamics. Theory and Psychology, 18, 341–363.
Rietveld, E., & Kiverstein, J. (2014). A rich landscape of affordances. Ecological Psychology, 26, 325–352.
Rolls, E. T., Loh, M., Deco, G., & Winterer, G. (2008). Computational models of schizophrenia and dopamine modulation in the prefrontal cortex. Nature Reviews Neuroscience, 9, 696–709. https://doi.org/10.1038/nrn2462
Saltzman, E., & Caplan, D. (2015). A graph-dynamic perspective on coordinative structures, the role of affordance–effectivity relations in action selection, and the self-organization of complex activities. Ecological Psychology, 27, 300–309.
Shaw, R. E., Flascher, O. M., & Mace, W. M. (1996). Dimensions of event perception. In W. Prinz & B. Bridgeman (Eds.), Handbook of perception and action, Vol. 1 (pp. 345–395). London, UK: Academic Press.
Shioiri, S., & Cavanagh, P. (1992). Visual persistence of figures defined by relative motion. Vision Research, 32, 943–951.
Skarda, C. A., & Freeman, W. J. (1987). How brains make chaos in order to make sense of the world. Behavioral and Brain Sciences, 10, 161–173.
Snow, J. C., Pettypiece, C. E., McAdam, T. D., McLean, A. D., Stroman, P. W., Goodale, M. A., & Culham, J. C. (2011). Bringing the real world into the fMRI scanner: Repetition effects for pictures versus real objects. Scientific Reports, 1, 130:1–10. https://doi.org/10.1038/srep00130
Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs: General and Applied, 74(11, Whole No 498), 1–29.
Steinman, R., Pizlo, Z., & Pizlo, F. (2000). Phi is not beta, and why Wertheimer’s discovery launched the Gestalt revolution. Vision Research, 40, 2257–2264.
Tass, P. A., Qin, L., Hauptmann, C., Dovero, S., Bezard, E., Boraud, T., & Meissner, W. G. (2012). Coordinated reset has sustained aftereffects in Parkinsonian monkeys. Annals of Neurology, 72, 816–820.
Thomas, L. E. (2015). Grasp posture alters visual processing biases near the hands. Psychological Science, 26, 625–632. https://doi.org/10.1177/0956797615571418
Tuller, B., Case, P., Ding, M., & Kelso, J. (1994). The nonlinear dynamics of speech categorization. Journal of Experimental Psychology: Human Perception and Performance, 20, 3–13.
van der Kamp, J., Savelsbergh, G. J., & Davis, W. E. (1998). Body-scaled ratio as a control parameter for prehension in 5-to 9-year-old children. Developmental Psychobiology, 33, 351–361.
van Rooij, I., Bongers, R. M., & Haselager, W. P. F. G. (2002). A non-representational approach to imagined action. Cognitive Science, 26, 345–375.
Warren, W. H. (1977). Visual information for object identity in apparent movement. Perception & Psychophysics, 21, 264–268. https://doi.org/10.3758/BF03214238
Warren, W. H. (1984). Perceiving affordances: Visual guidance of stair climbing. Journal of Experimental Psychology: Human Perception and Performance, 10, 683–703. https://doi.org/10.1037/0096-15188.8.131.523
Warren, W. H. (2006). The dynamics of perception and action. Psychological Review, 113, 358–389. https://doi.org/10.1037/0033-295X.113.2.358
Warren, W. H. (2012). Does this computational theory solve the right problem? Marr, Gibson, and the goal of vision. Perception, 41, 1053–1060. https://doi.org/10.1068/p7327
Warren, W. H., Jr., & Whang, S. (1987). Visual guidance of walking through apertures: Body-scaled information for affordances. Journal of Experimental Psychology: Human Perception and Performance, 13, 371–383. https://doi.org/10.1037/0096-15184.108.40.2061
Welsh, T. N., Weeks, D. J., Chua, R., & Goodman, D. (2009). Perceptual–motor interaction: Some implications for human–computer interaction. In A. Sears & J. A. Jacko (Eds.), Human–computer interaction fundamentals. Boca Raton, FL: CRC Press.
Wertheimer, M. (1912). Experimentelle Studien uber das Sehen von Bewegung. Zeitschrift fur Psychologie, 61, 161–265.
Open Practices Statement
The data and materials for all experiments are available upon request.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Determining the upper limit for β-tracking
Stroboscopic stimulation differs from so-called “real” stimulation only in being discontinuous when the latter is continuous. The relations of order are the same in both. Gibson (1954, p. 307)
Object identity in the visual displays used in Experiments 1 and 2 is specified solely and fully by the apparent motion events: occlusions in φ, and translations in β. Target tracking in the latter case is constrained by the ability to detect an opening in the array of objects. In a stroboscopic presentation, this ability is limited by visual persistence. Visual persistence for luminant dots has been estimated in the range between 40 and 120 ms, depending on the conditions of testing (di Lollo, 1977, 1980; Farrell, Putnam, & Shepard, 1984; Shioiri & Cavanagh, 1992; Sperling, 1960; Warren, 1977). The upper bound can be lower in nonideal circumstances (di Lollo, 1977, 1980). For this purpose, 50 ms was selected to be the maximum theoretical rate for β-tracking, converting the frame update rate α into an ability-scaled control parameter α*.
Appendix B: Theoretical formalism
Embodied gestalts are macroscopic patterns of perception–action defined over the microscopic dynamics of the brain, body, and its environment (Warren, 2006). Formally, these patterns are so-called order parameters, defined over the microscopic neural and bodily dynamics; transitions in perception or action selection are transitions between order parameters (Haken, 1983, 1991, 1993). The affordance transition model shows that it is necessary to combine two forms of dynamics to explain both hysteretic and early transition (negative hysteresis) behaviors in perception–action. The first is the winner-takes-all, cross-inhibiting dynamic used to explain neural competition and consistent with affordance competition theory. The second is a so-called self-inhibition, or neural adaptation term that posits that the amplitude of a mode in neural dynamics has the tendency to decrease unless the mode is functionally integrated. The winner-takes-all part is as follows.
This states that the order parameters (or modes) ξj tend to grow to an asymptotic value and stability determined by λj, while also being subject to cross-inhibition with strength g, which is larger than unity (g > 1) for multistable systems (Haken, 1991). There are two sets of stable fixed points, depending on the parameters.
α is the control parameter and corresponds to the parametric manipulation of the environment such as frame rate in the present study or step height in others. The constants Lj,0 determine a baseline relative strength of the order parameters with respect to a symmetric environmental input. Incrementing α from zero to one causes the system to transition from ξ1–domination to ξ2–domination. Correspondingly, decreasing α from one to zero causes the system to transition from ξ1–domination to ξ2–domination. For example, body-scaled size of an object takes the role of control parameter α in grasping affordance studies (Lopresti-Goodman et al., 2011, 2013).
The topological considerations portrayed in Fig. 5 imply that a second layer of dynamics is needed to account for early and spontaneous instabilities. This is the aforementioned second, self-inhibiting dynamic. The two-tiered affordance transition model (Lopresti-Goodman et al., 2011, 2013) adds this as a slower decay process (Ditzinger & Haken, 1989). Therefore, λj, which are fixed parameters at the faster Tier 1 time scale of action selection, become dynamical variables at the slower time scale of the sequence of trials.
(In accordance with the discrete character of paradigms in perceptual judgment and affordance transition, Eq. B4 is written as a map instead of continuous dynamics.) The Heaviside function u(ξ) = 1 when ξ > 0 and u(ξ) = 0 otherwise. Equation B4 states that λj is a decaying variable that converges over successive trials to some baseline value plus or minus the control parameter. When the corresponding perceptual mode is negligible (ξj≈ 0), convergence is to Lj,0 ± α, whereas when the mode is active (ξj> 0) convergence is to a lower value Lj,0 ± α – h. The parameter h thus determines the rate of decay of active modes and accounts for the spontaneous decay or adaptation responsible for early switching. If h is zero then this intrinsic habituation is stopped, which stands for inhibition of inhibition. With the inclusion of the logical function u(ξ) the Eqs. B1–B4 become a so-called hybrid system with continuous dynamics and discrete switch variables. Such dynamics might be a general feature of biological processes (Domínguez-Hüttinger et al., 2017).
An important advantage of this framework is that it allows analytical solutions for the parameters. Model-based parameters are estimated from empirical data, namely the transition points in a pair of ascending and descending sequences and assuming L1,0 = 1 (Lopresti-Goodman et al., 2013). In particular, the relevant parameters are self-inhibition or decay,
and cross-inhibition or competition,
They are calculated from the ascending and decreasing trial transition points using
The necessity for the multistable perception model Eq. B1 to be extended to Eq. B4 is proven by way of a topological argument from the stability domains of early transitions (negative hysteresis) and late or normal transitions, schematized in Fig. 5. The stability domain of two alternative perceptual modes is inferred separately for ascending and descending sequences and then superimposed. The topology found in ordinary hysteresis consists of two monostable domains in the extreme ends of the parameter and a bistable domain between them (Fig. 5C). It follows that system history must be the additional constraint that helps to determine the actualized mode in the bistable domain. In the case of negative hysteresis, however, the topology cannot be resolved solely on the basis of a control parameter and history because, as Fig. 5D shows, the multistable part of the parameter space would contain both modes in both stable and unstable regimes. How do perceptual systems resolve the ambiguity apparent in Fig. 5D? Mathematically, ambiguity can be thought of as a set of interpretation functions. This is equivalent to a parameterized interpretation function in which the parameter is not defined. When defined, the parameter resolves the ambiguity. Hence, an additional parameter is needed to account for the early switching observed in certain conditions.
About this article
Cite this article
Dotov, D.G., Turvey, M.T. & Frank, T.D. Embodied gestalts: Unstable visual phenomena become stable when they are stimuli for competitive action selection. Atten Percept Psychophys 81, 2330–2342 (2019). https://doi.org/10.3758/s13414-019-01868-4
- Action selection
- Multistable phenomena
- Perceptual rivalry
- Perceptual instabilities
- Transient neural dynamics