Faces elicit different scanning patterns depending on task demands

Boutet, Isabelle; Lemieux, Chantal L.; Goulet, Marc-André; Collin, Charles A.

doi:10.3758/s13414-017-1284-y

Faces elicit different scanning patterns depending on task demands

Published: 09 February 2017

Volume 79, pages 1050–1063, (2017)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Faces elicit different scanning patterns depending on task demands

Download PDF

Isabelle Boutet¹,
Chantal L. Lemieux¹,
Marc-André Goulet¹ &
…
Charles A. Collin¹

1614 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Eye movements were recorded while participants discriminated upright and inverted faces that differed with respect to either configural or featural information. Two hypotheses were examined: (1) whether featural and configural information processing elicit different scanning patterns; (2) whether fixations on a specific region of the face dominate scanning patterns. Results from two experiments were compared to examine whether participants’ prior knowledge of the kind of information that would be relevant for the task (i.e., configural vs featural) influences eye movements. In Experiment 1, featural and configural discrimination trials were presented in random order such that participants were unaware of the information that would be relevant on any given trial. In Experiment 2, featural and configural discrimination trials were blocked and participants were informed of the nature of the discriminations. The results of both experiments suggest that faces elicit different scanning patterns depending on task demands. When participants were unaware of the nature of the information relevant for the task at hand, face processing was dominated by attention to the eyes. When participants were aware that relational information was relevant, scanning was dominated by fixations to the center of the face. We conclude that faces elicit scanning strategies that are driven by task demands.

Modulations of eye movement patterns by spatial filtering during the learning and testing phases of an old/new face recognition task

Article 07 October 2014

Face familiarity revealed by fixational eye movements and fixation-related potentials in free viewing

Article Open access 23 November 2022

Saccade execution increases the preview effect with faces: An EEG and eye-tracking coregistration study

Article Open access 02 November 2023

Introduction

Faces play an important role in human interactions and humans have a remarkable ability to recognize faces across a wide variety of conditions. While face perception has been a focus of psychological research for several decades, what information is most useful for discrimination and recognition of faces is still open to debate (for reviews see Richler, Palmeri & Gauthier, 2012, and Rossion, 2013). A popular account posits that face recognition relies more heavily on processing information derived from the relations between the features of the face than on information derived from individual features (e.g., eyes, nose, mouth) (Diamond & Carey, 1986; see reviews by Maurer et al., 2002; McKone & Robbins, 2012). Relational information is thought to be important for face recognition because faces are homogeneous and recognized at the individual level (Diamond & Carey, 1986). Different meanings and measures of relational information processing exist in the literature (reviewed by Richler et al., 2012). In this article, we adopt the term featural processing to refer to the analysis of individual features, configural to refer to the analysis of distances among individual features, (e.g., Freire, Lee, & Symons, 2000; Leder & Bruce, 2000; Rossion 2008, 2013; Taschereau-Dumouchel et al., 2010), and holistic to refer to the binding of the different face parts into an unparsed whole (e.g., Tanaka & Farah, 1993; Young, Hellawell, & Hay, 1987; see Richler et al., 2012, and Rossion, 2013, for alternative definitions). While configural information might be construed as referring to first-order relations—that is, categorizing a stimulus as a face because its features are arranged with two eyes above a nose, which is above a mouth (Maurer et al., 2002) —we use the term configural to refer exclusively to second-order relations.

Because eye location and covert attention are highly associated in complex tasks (Rayner, 2009), patterns of scanning can reveal the type of information that is critical for the viewer. Indeed, recognition performance is significantly impaired when participants are instructed to maintain fixation on a central point during encoding (between the eyes: Henderson, Williams, & Falk, 2005). One interpretation of this finding is that steady fixation impairs recognition performance because it prevents the deployment of transitional saccades between internal features. According to this interpretation, interfeatural saccades play a functional role in face processing by conveying information regarding interfeatural distances (Henderson et al., 2005). Expanding on this idea, Bombari et al. (2009) have proposed that faces elicit functionally different scanning patterns depending on the type of information that is relevant for the task at hand. They measured eye movements during matching of faces that were scrambled or blurred in order to encourage featural and configural processing, respectively (e.g., Collishaw & Hole 2000; Lobmaier & Mast, 2007). Discrimination of blurred faces elicited more interfeatural saccades than discrimination of scrambled features. In contrast, discrimination of scrambled features triggered more fixations on individual features than discrimination of blurred faces. The authors suggested that these scanning patterns reflect featural and configural processing, respectively. However, this interpretation is limited by the use of scrambled faces, where the first-order relations (Maurer et al., 2002) between face features are disrupted. As such, the differences in eye movements reported might be attributable to differences in processing first-order versus second-order relations rather than differences between featural and configural information per se.

Contrary to Bombari et al. (2009), a study by (Xu and Tanaka 2013) suggests that featural and configural processing elicit comparable scanning patterns. In their study, eye movements were recorded while participants discriminated featural and configural modifications in both upright and inverted faces. Featural and configural information processing was measured via discrimination of faces that differed with respect to small increments along a continuum for each dimension. Featural changes were made by changing the size of individual features (eyes or mouth) and configural changes were made by changing interocular distance or the position of the mouth. The authors did not find significant differences between scanning patterns elicited during featural and configural discriminations, for both upright and inverted faces. While these results appear to contradict Bombari et al. (2009), the manner in which featural and configural information was manipulated might not have adequately distinguished these two processes. Indeed, manipulating the size of individual features induced configural changes in Xu and Tanaka (2013). Furthermore, their results did not replicate the finding that inversion has a greater negative impact on performance for configural discriminations than featural discriminations (Boutet, Collin, & Faubert, 2003; Freire et al., 2000; Leder & Bruce, 2000, but see Yovel & Kanwisher, 2004). Hence, it remains to be determined whether faces elicit functionally different scanning patterns depending on whether featural or configuration information is task-relevant.

Since the seminal work by Yarbus (1967) on eye movements to faces in the 1960s, studies have reported a consistent pattern of results whereby fixations are predominantly directed to the eyes, followed by the nose, and mouth (e.g., Althoff & Cohen, 1999; Luria & Strauss, 1978; Mertens, Siegmund, & Grüsser, 1993; Stacey, Walker, & Underwood, 2005). This viewing pattern might reflect an automatic preference for the eye region because it conveys information diagnostic for identifying individual faces (e.g., Barton et al., 2006). Alternatively, focusing on the center of the face might provide diagnostic information for face identification. In Bombari et al. (2009), discrimination of intact faces elicited long fixations on the nose region, even though both featural and configural information were available. The authors proposed that this scanning pattern reflects holistic processing in that it facilitates the simultaneous capture of information across the entire face (Blais et al., 2008; Bombari et al., 2009; Miellet et al., 2011). Considering the importance of holistic information in face recognition (e.g., Boutet, Gentes-Hawn, & Chaudhuri, 2002; Richler & Gauthier, 2014; Todorov, Loehr, & Oosterhof, 2010), it is possible that faces tend to elicit fixations on the center of the stimulus irrespective of task demands.

In the present study, we investigated these hypotheses by directly manipulating the type of information relevant for the task at hand. Participants discriminated two sequentially presented faces that differed with respect to either featural information (by exchanging one feature with the corresponding feature of another face) or configural information (by moving the eyes, the nose, or the mouth of an original face) (Freire et al. 2000; Leder & Bruce, 2000). Featural and configural discriminations were made to faces presented in both upright and inverted orientation. Inverted faces are often included in studies on face processing because they offer a control condition wherein the images contain all of the same low-level information as upright faces while not being subject to the expertise that humans have developed for upright faces (see e.g., Gauthier & Bukach, 2007, for a discussion of expertise in other object categories). Furthermore, inversion has a greater negative impact on performance for configural discriminations than featural ones (e.g., Boutet et al., 2003; Freire et al., 2000; Leder & Bruce, 2000). As such, using inverted faces can provide information on whether disrupting the mechanisms that are primarily engaged during processing of upright faces influences scanning patterns (e.g., Farah et al., 1998; Gauthier and Logothetis, 2000; Rossion, 2008; Sekuler et al., 2004).

The first hypothesis we examined is whether faces elicit functionally different scanning patterns depending on the information that is relevant for the task being performed. In keeping with Bombari et al. (2009), this hypothesis would be supported if configural discriminations elicited more interfeatural saccades, and if featural discriminations elicited longer gaze duration on individual features. Moreover, if featural and configural discriminations yield different scanning patterns, then one would expect inversion to affect them differently. Indeed, because inversion disrupts performance on configural discriminations to a greater extent than featural discriminations, one would expect that inversion would diminish the number of interfeatural saccades recorded during configural discriminations. In contrast, inversion should have little impact on eye movements recorded during featural discriminations.

An alternative hypothesis is that faces will elicit fixations on a specific region of the face irrespective of task demands. While most research to date points to the eye region, fixations on the nose might instead dominate to allow the extraction of holistic information (Blais et al., 2008; Bombari et al., 2009; Miellet et al., 2011). This hypothesis would be supported if participants spent more time fixating on either the eye or nose region than on other areas of the face irrespective of whether the task-relevant information is featural or configural.

In Experiment 1, featural and configural modifications were randomly presented and participants were unaware of which type of modification was relevant on any given trial. In Experiment 2, featural and configural modifications were blocked and participants were informed of the type of modification to be discriminated. Whereas participants performed a discrimination task based on either featural or configural modifications in both experiments, in Experiment 1, the processing strategy is driven primarily by the information present in the stimuli. In Experiment 2, the processing strategy is driven both by the information present in the stimuli and by prior knowledge of which modification is relevant. To our knowledge, this is the first study to investigate whether prior knowledge of task demands influences eye movements elicited during configural and featural face processing tasks. In light of behavioral evidence suggesting that cognitive set can influence the manner in which faces are processed (Richler, Bukach, & Gauthier, 2009; Wegner & Ingvalson, 2002), we predicted that differences between configural and featural conditions, if any, would be more pronounced in Experiment 2.

Experiment 1

In Experiment 1, eye movements were recorded while participants discriminated faces that were either the same or different with respect to their individual features or their configuration. The order of presentation of the modified faces was randomized such that participants were naive regarding the nature of the relevant modification.

Method

Participants

Twenty-two participants (6 male, 16 female), aged 18-30, were recruited from the University of Ottawa undergraduate subject pool and compensated with class credits for their participation. All participants were either Caucasian or had been residing in Canada for at least 10 years. All participants had normal or corrected-to-normal vision. The University of Ottawa’s Research Ethics Board approved the study.

Stimuli and materials

Figure 1 illustrates examples of the stimuli used in this experiment. Computer-generated faces taken from the Max-Planck Institute for Biological Cybernetics face database (http://faces.kyb.tuevingen.mpg.de/) were converted to an eight-bit grayscale image using MATLAB (Version 2010a; www.mathworks.com). The faces represent young Caucasian individuals. Face stimuli were standardized for luminance and root-mean-square (RMS) contrast. Stimuli were created in the following fashion using the Adobe Photoshop software. Ten faces were cropped so that the eyes, noses, and mouths were removed from their external features. The size of the external features was averaged across the faces such that all faces subtended a viewing angle of 11° × 16. 75° at a viewing distance of 57 cm. Using this set of external features, ten different baseline faces were created using internal features from different original faces to ensure that the baseline faces and the faces that had undergone featural or configural modifications were equally realistic. Nine modified versions of each of the ten baseline faces were then created: six for the interfeatural distance (configural) modifications (three for moving the feature up and three for moving the feature down) and three for the featural modifications. Featural modifications were made by swapping the eyes, nose, and mouth of the baseline face with that of a new face. Again, the resulting faces contained sets of features that all came from different original faces. Interfeatural distance modifications were made by moving the eyes, nose, or mouth up or down by 3 mm.

The experiment was programmed in Experiment Builder and run on a PC comprising part of an EyeLink I000 head-mounted video-based eye tracking system (SR Research, Ottawa, ON; www.sr-research.com). This was used to record participants’ eye movements at a frequency of 500 Hz, with 0.5° spatial accuracy. Participants viewed stimuli on a 21.5-in. LCD monitor. Viewing distance was enforced at 57cm with a chin rest.

Procedure

A sequential 2AFC discrimination paradigm was used. To initiate a trial, the participant had to press the spacebar. A fixation point then appeared on one side (left or right) of the screen. This remained until the participant fixed her/his gaze on it, after which the first face appeared and remained on screen for 3 s, followed by another fixation point on the same side as the previous one. The participant had to fixate on this second fixation point in order for the second face to appear. Participants responded, as quickly and as accurately as possible, by pressing a key to indicate whether the two stimuli were “same” or “different”. The second face remained on the screen until a response was made. Both faces were presented in the middle of the screen. The participants were told to complete the experiment at their own speed.

Participants were tested on two counterbalanced blocks: one block of 120 trials where the faces were upright and one block of 120 trials where the faces were inverted. Out of these 120 trials, 12 trials were randomly presented for each of the ten baseline faces. Of these 12 trials, six were same trials and six were different trials. The six same trials consisted of the sequential presentation of the same two faces. The face presented was one of the six modified versions of the baseline face: three trials presented the featural modification of the baseline face (eyes, nose, mouth) and three presented the configural modification of the baseline face (eyes moved, nose moved, mouth moved; either up or down, randomly chosen). The six different trials consisted of the presentation of the baseline face, followed by one of the six modified versions as described in the same trials. Therefore, a total of 240 trials were shown in the experiment: 2 orientations × 10 faces × 12 trials per face.

A calibration procedure preceded the experiment, wherein the participant had to fixate targets at each edge and corner of the screen. Once the eye-tracking device was calibrated, the participant began the experiment. Participants were asked to keep their heads as still as possible during the experiment. During the experiment, the eye-tracker was recalibrated as required. Prior to the practice session, participants were instructed that they would be shown two faces, one after another, and that they would have to determine whether the faces were same or different. No information regarding the nature of the change was given. This was followed by a single practice session of eight trials (four for upright faces and four for inverted faces), after which the experimenter answered any question the participant might have. This was followed by the 240 trials for the main experiment. The face used during practice was not presented in the main experiment. The entire testing session lasted about 90 min, including the calibration of the eye-tracking device.

Data analysis

Discrimination performance was measured using both d' (Macmillan & Creelman, 2004) and response times (RTs). Only the d' data is reported here because none of the effects involving RTs were significant. Data recorded for same trials were omitted from the analysis of the eye movement data because these trials do not require participants to make a judgment based on either featural or configural information. Hence, eye movements recorded during same trials might not reflect potential differences between these two cognitive processes. Eye movements were analyzed according to three dependent variables: interfeatural saccades, gaze duration, and proportion of time spent on a given area of interest (AOI). The threshold velocity, distance, and amplitude parameters for defining saccades and fixations were based on the default parameters of the Eyelink 1000 software. An interfeatural saccade was defined as a ballistic movement between two consecutive fixations on different AOIs. Three AOIs were manually and individually defined for each face to reflect the regions of the face that had been modified: eyes (including both eyes and the nasion, or bridge of the nose^{Footnote 1}), nose, and mouth. Interfeatural saccade counts were normalized by total exposure duration in seconds because trial duration varied based on RT. Gaze duration was defined as the sum of the durations of consecutive fixations on the same AOI, that is, the time spend on an AOI before leaving it. The data were normalized by dividing by the total fixation time for a given trial. Proportion time was defined as the percentage of a trial’s duration spent on a given AOI, regardless of whether fixations were performed consecutively in the same AOI or not. It can be regarded as a measure of the relative amount of visual attention granted to each AOI.

To reduce the effect of outliers, a procedure was applied to all dependent variables on the basis of the single-pass method of VanSelst and Jolicoeur (1994). Outliers were replaced by the corresponding winsorized score (replaced by the next highest/lowest inlying value) (Erceg-Hurn & Mirosevich, 2008). Altogether, fewer than 4% of cells were winsorized for each DV.

We analyzed behavioral and eye movement data using planned contrasts because we had defined two hypotheses to test at the onset of the experiment (Rosenthal & Rosnow, 1985). Prior to calculating contrasts, the data for the three eye movement dependent variables were analyzed using a 2 × 2 × 3 × 3 repeated-measures ANOVA with Orientation (upright, inverted), Modification Type (configural, featural), Modified Feature (eyes, nose, mouth), and AOI (eyes, nose, mouth) as variables. d' was analyzed using a 2 × 2 repeated-measures ANOVA with Orientation (upright, inverted) and Modification Type (configural, featural) as variables. The error term for the relevant interaction was then used for the contrast analyses. Considering the small number of contrasts conducted relative to the total number of possible comparisons, the alpha level was not adjusted and α = 0.05 was used as significance threshold (Rosenthal & Rosnow, 1985, p. 45). Only the results of the contrast analyses are reported here. ANOVA tables are provided in the “Appendix.” Effect sizes were measured as r (Rosenthal & Rosnow, 1985, p. 62).

While there is no convention for calculating power with a factorial repeated-measures design, we nonetheless attempted to determine whether our sample size was appropriate by using the “Repeated-Measures” function in G-power. Sample size calculated for one group, four measurements, a power of 0.8 and an effect size of 0.25 was 29. Considering that this estimate is likely to be too conservative, we feel that our sample size is adequate for the interpretation of a non-significant finding.

Results

Behavioral performance: d'

Figure 2A shows the mean d' values for discriminating configural and featural modifications in upright and inverted faces. As reported elsewhere (Freire et al., 2000; Leder & Bruce, 2000), inversion significantly impaired performance for configural [F(1, 21) = 30.82, p < 0.01, r = 0.77] and featural discriminations [F(1, 21) = 8.42, p = 0.01, r = 0.54], but the effect was stronger in the configural condition.

Eye movements

Hypothesis 1: Do configural versus featural discriminations elicit different scanning patterns?

We tested whether featural discriminations elicited longer gaze durations on individual features than configural discriminations by contrasting the average gaze duration recorded during configural versus featural discriminations for upright and inverted faces separately (see Fig. 3). For both upright and inverted faces, featural discriminations did not elicit more mean gaze duration than configural discriminations [upright faces: F(1, 21) = 1.1, p > 0.05, r = 0.23; inverted faces: F(1, 21) < 1, p > 0.05, r = 0.17].

We tested whether configural discriminations elicited more interfeatural saccades than featural discriminations by contrasting interfeatural saccades elicited during configural versus featural discriminations for upright and inverted faces separately. For both upright and inverted faces, configural discriminations did not elicit more interfeatural saccades than featural discriminations [upright faces: F(1, 21) < 1, r = 0.18; inverted faces: F(1, 21) < 1, r = 0.05].

Hypothesis 2: Do fixations on a specific region of the face dominate scanning patterns?

Figure 4 illustrates the mean proportion of time spent on each AOI. The figure shows that participants spent a greater proportion of time fixating on the eye region than on other areas of the face. Contrast analyses comparing the eyes to either the mouth or the nose were significant for all conditions [upright configural: eyes vs. mouth: F(1, 21) = 47.94, p < 0.01, r = 0.83; eyes vs nose: F(1, 21) = 9.49, p = 0.01, r = 0.56; upright featural: eyes vs mouth: F(1, 21) = 104.95, p < 0.01, r = 0.91; eyes vs nose F(1, 21) = 43.73, p < 0.01, r = 0.82; inverted configural: eyes vs mouth F(1, 21) = 14.23, p < 0.01, r = 0.64; eyes vs nose F(1, 21) = 5.33, p = 0.03, r = 0.45; inverted featural: eyes vs mouth: F(1, 21) = 3.02, p > 0.05; r = 0.35; eyes vs nose: F(1, 21) = 33.34, p < 0.01, r = 0.78].

Discussion

The behavioral data replicate previous findings on the featural/configural paradigm (Freire et al., 2000; Leder & Bruce, 2000), with inversion having a slightly greater impact on configural than featural discriminations. This finding provides evidence that our behavioral manipulation was effective in triggering different processing strategies that are specific to upright faces. The eye movement data does not support the hypothesis that configural and featural processing elicit different scanning patterns: configural discriminations were not associated with more interfeatural saccades, nor were featural discriminations associated with longer mean gaze duration. This finding is in contradiction to Bombari et al. (2009). However, in their study, featural processing was measured by scambling face parts. Their results might therefore reflect differences between processing first-order relations and second-order relations rather than differences between featural and configural information as was measured herein. In contrast, our results are consistent with Xu and Tanaka (2013), where similar scanning patterns were found irrespective of whether featural or configural information was task-relevant.

Our results support the hypothesis that fixations on one region of the face dominate scanning patterns. Participants looked at the eye region for longer periods of time than the other regions in all four experimental conditions, even if modifications to the nose and mouth were relevant in two-thirds of the trials. Furthermore, spending most time on the eyes did not prevent participants from accurately making featural and configural discriminations regarding the nose and the mouth. This result supports the importance of the eyes in face scanning (e.g., Althoff & Cohen, 1999; Walker-Smith, Gale, & Findlay, 1977; Xu & Tanaka, 2013; Yarbus, 1967) and face recognition (e.g., McKelvie, 1976; Fraser, Craig, & Parker, 1990; Haig, 1985, 1986; Tanaka & Farah, 1993; Walker-Smith, 1978).

Finally, the scanning patterns we observed with upright faces were replicated with inverted faces. Whether inversion leads to qualitative or quantitative changes in face processing is controversial (e.g., Farah et al., 1998; Gauthier & Logothetis, 2000; Rossion, 2008; Sekuler et al., 2004) and studies that have examined this issue using eye movements have yielded inconsistent results (Barton et al. 2006; Hills, Sullivan, & Pake, 2012; Hills, Cooper, & Pake, 2013; Williams & Henderson, 2007; Schwaninger, Lobmaier, & Fischer, 2005a, b; Xu & Tanaka, 2013). We focus here on Xu and Tanaka (2013) because the manipulation they used is comparable to ours. Consistent with our findings, Xu and Tanaka (2013) found that inversion does not affect eye movements made during featural versus configural discriminations differently. However, inversion diminished the number of fixations on the eyes, while increasing the number of fixations on the mouth and nose. Because the focus of this study is not on inversion, we had not planned to test this pattern of results at the onset of the study. Nonetheless, we conducted a posteriori contrasts comparing proportion of time spent on each AOI in upright vs. inverted faces. None of the analyses reached, or were close to significance. Two differences between our study and that by Xu and Tanaka (2013) might explain this discrepancy. First, in Xu and Tanaka (2013), configural changes to the eyes were made by changing the distance between the eyes; here, configural changes were made by moving the eyes up or down. This discrepancy may have lead to processing differences because focusing on the eyes only would be sufficient for detecting a change in distance between the eyes, but not for detecting a change in the position of the eyes relative to the rest of the features. Moreover, in Xu and Tanaka (2013), the predicted effect of inversion on configural versus featural discriminations was not replicated (Freire et al., 2000; Leder & Bruce, 2000), suggesting that their manipulation may have differed in some important way from that used here and in these previous studies. In the current study, inversion had a greater negative impact on discrimination of configural modifications than featural modifications, yet we did not find differences in eye movements between the two conditions. This finding further supports the notion that featural and configural discriminations do not elicit distinct scanning patterns.

Experiment 2

In Experiment 1, participants were unaware of whether featural or configural information would be relevant for the task prior to presentation of the test stimulus. It is possible that this uncertainty drove participants to adopt a scanning strategy based on information that is diagnostic in natural viewing conditions, irrespective of the information relevant for the task at hand. To test this hypothesis, we replicated Experiment 1 while presenting featural and configural discriminations in separate blocks. Participants were informed regarding the nature of the task at the onset of each block, removing uncertainty regarding which kind of information would be relevant.