The role of spatial frequency in emotional face classification
Previous studies with emotional face stimuli have revealed that our ability to identify different emotional states is dependent on the faces’ spatial frequency content. However, these studies typically only tested a limited number of emotional states. In the present study, we measured the consistency with which 24 different emotional states are classified when the faces are unfiltered, high-, or low-pass filtered, using a novel rating method that simultaneously measures perceived arousal (high to low) and valence (pleasant to unpleasant). The data reveal that consistent ratings are made for every emotional state independent of spatial frequency content. We conclude that emotional faces possess both high- and low-frequency information that can be relied on to facilitate classification.
KeywordsFace emotion classification Spatial frequency
The ability to recognize the emotions in others’ faces is of considerable importance (Darwin, 1872). A quick glance can reveal whether someone is sad, excited, or worried, and their facial emotion is often indicative of future behavior (Ekman, 1982; Izard, 1972). Moreover, the detectability and saliency of a face is dependent on its emotional content. For example, experiments using a dynamic flash suppression paradigm (Tsuchiya & Koch, 2005) have shown that faces exhibiting fearful expressions emerge from suppression faster than faces containing neutral or happy expressions (Yang et al., 2007). In addition, visual search times for faces are influenced by their emotional expressions (Frischen, Eastwood & Smilek, 2008); for example angry faces are detected more quickly and more accurately than happy faces (Pitica et al., 2012). The conclusion from these studies is that ecologically relevant emotional expressions enjoy preferential treatment by the visual system, giving them a fast route into awareness. The rapid and accurate detection of a fearful or angry face may provide information about a potential local danger or threat, signaling the need for appropriate action to maintain survival.
The well-known contrast sensitivity function, or CSF, which describes how sensitivity to contrast varies with spatial frequency, is inverse U-shaped with a peak around 2-6 cycles/deg under normal viewing conditions (Campbell & Robson, 1968; Graham & Nachmias, 1971). The CSF is believed to be the envelope of a number of underlying spatial frequency channels, each narrowly selective for a given range of spatial frequency (Sachs, Nachmias & Robson, 1971). Numerous studies have examined how the detection and perception of faces, including their emotional content, is influenced by spatial frequency content. The faces in these studies are typically high- and low-pass filtered. In general high spatial frequency content provides the fine details of the face and low spatial frequency content its overall structure; however, this belies a more nuanced involvement of spatial frequency in the perception of facial emotion. Low spatial frequencies appear to be important for the detection of threats (Bar et al., 2006) and play a prominent role in identifying faces that express pain (Wang, Eccleston & Keogh, 2015), happiness (Kumar and Srinivasan, 2011), and fear (Holmes, Winston & Eimer, 2005), although this has been challenged for fearful faces (Morawetz et al., 2012). Conversely, high spatial frequencies have been shown to play a prominent role in identifying sad, happy, or again fearful emotional faces (Fiorentini, Maffei & Sandini, 1983; Wang, Eccleston & Keogh, 2015; Goren & Wilson, 2006).
The above studies typically employ only small numbers of emotional states, for example, sad vs. happy vs. neutral. Moreover, different studies have tested different emotions, used different faces, used different high- and low-spatial-frequency cutoff points, and used different methods, e.g., identification, detection in noise, and reaction times. All of these factors make cross-study comparisons problematic. In particular, the differences found in previous studies may have been due to the task employed, for example a detection task may favor low-frequency faces.
Eighteen observers took part in the experiment (10 females), with mean age 22.3 ± 2.1. All observers had 6/6 vision, in some cases achieved through optical correction.
The experiment was performed using a MacBook Pro (Apple Inc.) installed with a 2.9-GHz i7 processor and 4 GB of DDR3 memory running OS X (El Capitan, version: 10.11.6). The gamma corrected display had a resolution of 1,280 x 800 pixels and a frame rate of 60 Hz. MatLab (Mathworks Inc.). PsychToolbox V3 (Brainard, 1997; Pelli, 1997; Kleiner et al., 2007) was used to present the stimuli, while observer responses were submitted via the built-in trackpad. During data collection observers were positioned 60 cm from the display.
Stimuli, experimental procedure, and data analysis
Twenty-four emotional expressions were selected from the McGill Face Database (Schmidtmann et al., 2016). They were affectionate, alarmed, amused, baffled, comforting, contented, convinced, depressed, entertained, fantasizing, fearful, flirtatious, friendly, hateful, hostile, joking, panicked, playful, puzzled, reflective, relaxed, satisfied, terrified, and threatening. The stimuli used to isolate the high- and low-spatial-frequency ranges were generated by using custom-written MatLab software, in which the images were filtered using a bank of log-Gabor filters, with four orientations (0, 45, 90, and 135°) and with the DC component being restored in the high-frequency stimuli. The frequency ranges selected were based on previous studies (Vuilleumier et al., 2003; Bannerman et al., 2012) and correspond to cutoff frequencies of ~20 cycles per face for the high-frequency condition and ~6 cycles per face for the low-frequency condition in the horizontal direction; this removed spatial frequencies known to be important for normal face perception.
The different conditions were interleaved in random order in a novel “point-and-click” computer-based task. One trial consisted of the presentation of a single image for a duration of 200 ms, then replaced with an image of an Arousal-Valence emotion space (Russell, 1980), as illustrated in Fig. 1b: the red labels in the figure give the reader the gist of the emotions represented by each region but were not shown during testing. The space defined two dimensions of emotion: (i) the arousal level, for example a panicked or annoyed face would convey a high arousal level, whereas a contented or relaxed face would convey a low arousal level; (ii) the valence, i.e. pleasant vs. unpleasant, for example an unpleasant, horrified or disgusted expression would be placed towards the left-hand side of the space (negative valence) and a pleasant expression, for example, a happy or amused face would be placed towards the right-hand side of the space (positive valence). Faces perceived to have neutral emotions would be positioned towards the center of the space. Each face subtended ~5° horizontally and the square response area subtended ~12.3 x 12.3° of visual angle.
The task for each observer was to position the on-screen cursor using a computer trackpad and click the location in the emotion space that corresponded to the emotion being expressed, i.e., the emotion of the face and not the emotion, if any, induced in the observer. Data for each observer was collapsed per emotion and the mean emotion coordinates were calculated for each of the three spatial frequency conditions. To reveal any shift in location of a given emotion resulting from the frequency content manipulation, spatial statistics were performed and corrected p values were obtained.
Results and discussion
Each emotion was analysed in turn by performing multiple (Bonferroni corrected) two-tailed t-tests for each combination of spatial frequency (unfiltered, high frequency, and low frequency) to test for differences in classification location in both the arousal and valence directions. The statistics reveal no differences in the arousal direction for any combination of unfiltered, high-frequency, or low-frequency stimuli for a given emotion type. The lowest and highest p values in the arousal direction were for the differences in location between the unfiltered and high-frequency conditions for the fearful expression (t(17) = −3.10, p = 0.47) and between the unfiltered and high-frequency conditions for the relaxed expression (t(17) = 0.03, p = 1). In the valence direction, the statistics again revealed no differences in the locations between any combination of unfiltered, high-frequency, or low-frequency stimuli for each emotion type. The lowest and highest p values in the valence direction were for the differences in location between the unfiltered and high-frequency conditions for the depressed expression (t(17) = −1.76, p = 0.29), and between the low- and high-frequency conditions for the threatening expression (t(17) = −0.025, p = 1).
The data obtained using this new psychophysical method for classifying perceived emotions demonstrates that emotional faces can be perceived and hence classified independent of high- or low-spatial-frequency content. This is the case for ratings in both the arousal and valence dimensions.
Some emotions can be recognised from just the mouth (Guarnera et al., 2015). However, by employing faces conveying a wide range of emotions, Schmidtmann et al. (2016) showed that accuracy for selecting an emotionally descriptive word in a 4AFC task was equal when either the entire face or a restricted stimulus showing just the eyes was employed. It is thus plausible that arousal-valence ratings are based on eye information only. Our classification data however cannot confirm this as our stimuli were spatial-frequency filtered, and the face feature that is salient might be spatial-frequency-dependent.
Our data form a U-shaped curve within the emotion space, leaving some regions of the space unused. This is partly to be expected, because it is unlikely that a face with a “neutral” valence could be perceived as highly arousing. The U-shape distribution is similar to that obtained using the International Affective Picture System (Lang et al., 1988; Libkuman et al., 2007), where observers made ratings using just a coarse scale, a result that has since been replicated cross culturally (Silva, 2011). The task employed by Libkuman et al. was for observers to indicate the intensity of the emotional response they experienced. Interestingly, between the three aforementioned and the present study, in which observers were required to classify the perceived emotion in a face, similar results were found.
Although different information is contained in images of emotional faces filtered to isolate either their high- or low-spatial-frequency content, human observers can utilize either to identify emotional states.
This work was funded by the Canadian Institute of Health Research grant #MOP 123349 given to FAAK.
- Banjanovic, E. S. & Osborn, J.W. (2016). Confidence Intervals for Effect Sizes: Applying Bootstrap Resampling. Practical Assessment, Research & Evaluation. 21:5.Google Scholar
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Routledge. ISBN 1-134-74270-3.Google Scholar
- Ekman, P. (1982). E motion in the human face (2nd ed.). Cambridge: Cambridge University Press.Google Scholar
- Izard, C. E. (1972). Patterns of emotion: a new analysis of anxiety and depression. New York: Academic Press.Google Scholar
- Kleiner, M., Brainard, D., Pelli, D. (2007). What’s new in Psychtoolbox-3? Perception 36 ECVP Abstract Supplement.Google Scholar
- Lang, P. J., Öhman, A., & Vaitl, D. (1988). The International Affective Picture System [Photographic slides]. Gainesville: University of Florida, Center for Research in Psychophysiology.Google Scholar
- Schmidtmann, G., Sleiman, D., Pollack, J., & Gold, I. (2016). Reading the Mind in the Blink of an Eye - A novel database for facial expressions. Perception, 45, 238–239.Google Scholar