The study of emotion in relation to cognition, behavior, and health has grown exponentially over the last several decades. Indeed, emotion responses and emotion regulatory strategies are increasingly recognized as central to many, if not most, psychological processes. As such, there is a growing reliance on laboratory paradigms employing emotional stimuli to induce, alter, or simulate emotional contexts for investigation across the social sciences and, most notably, in psychology. Although there are currently a variety of methods used including emotional images (e.g., International Affective Picture System: Lang, Bradley, & Cuthbert, 2008), music (Koelsch, 2010; Logeswaran & Bhattacharya, 2009), and personal recollection (e.g., Lench & Levine, 2005; Papa & Bonanno, 2008), there is an increasing reliance on emotional film clips. For example, a search for “emotion elicitation” and “film clips” on Google Scholar and on psychology-specific databases (e.g., APA) yielded over 1,000 results. The use of film clips for emotion elicitation has many advantages. Clips are easily standardized and therefore reliable as compared to idiographic methods (e.g., personal recollection; see Mills & D'Mello 2014; Salas, Radovic, & Turnbull, 2012). Film clips readily engage participants for extended periods and allow for an ecologically valid induction, progression, and assessment of emotional responses (Kring & Gordan, 1998). Clips capture participant attention in a manner that is consistent with contemporary life and allow for the simulation of real world conflicts or distress with relatively few ethical concerns (Rottenberg, Ray, & Gross, 2007). Moreover, clips can be transformed or adapted to fit specific needs, and film clips are accessible to varying populations. Due to these advantages, film clips will continue to be an integral part of emotion elicitation in research.

For the present investigation, we focused on testing the efficacy of clips in a contemporary young adult population, targeting emotions that are less easily differentiated, and using online administration of emotionally evocative film clips that were initially piloted in a laboratory setting (see Supplemental Material for details). There has been a tremendous increase in the application of emotion stimuli to online research with relatively little validation. In particular, there have only been two prior studies evaluating the efficacy of film clips to elicit emotions online (Aldao & Nolen-Hoeksema, 2013; Samson, Kreibig, Soderstrom, Wade, & Gross, 2016). Moreover, previous laboratory studies have noted difficulties encountered by participants in differentiating certain emotional experiences with word labels, particularly for positive emotions (Ellsworth & Smith, 1988; Herring, Burleson, Roberts, & Devine, 2011). Similarly, some negative emotions are frequently co-elicited and have been traditionally more challenging to reliably target, particularly online (Samson et al., 2016). For instance, anger is commonly recognized as especially difficult to elicit with standard stimuli, and there are few highly reliable film clips derived from contemporary films. This may be because anger often co-occurs with disgust, particularly when content is perceived as morally disgusting (Salerno & Peter-Hagene, 2013; Whitton, Henry, Rendell, & Grisham, 2014). As such, in this investigation, we focused on testing the utility of presenting film clips online to elicit specific, discrete emotions that are frequently co-elicited, specifically happiness and amusement, as well as anger and disgust. We selected film clips geared towards college students that we hypothesized would discretely elicit happiness, amusement, anger, or disgust. For these film clips, we evaluated elicitation of the target emotion as well as the paired, potentially co-elicited emotion in young adults, a highly common sample in emotion-related research across all of the behavioral sciences. However, in our efforts to identify useful evocative stimuli for these challenging applications, we recognized the lack of a contemporary catalog of film clips to aid emotion elicitation researchers in identifying validated stimuli for future investigations. Therefore, this investigation’s goal was two-fold: in conjunction with our own validation study on effective film stimuli for the discrete elicitation of frequently co-elicited emotions, we also provide here a database reflecting a compilation of validated, emotionally evocative film clips to facilitate future emotion elicitation research.

Online validation study

We conducted an online study (N = 784 young adults) testing the effectiveness of 15 film clips, identified through in-lab pilot research (see Supplemental Material). We employed a rigorous and conservative analytic strategy that explicitly tested the discreteness of each clip in eliciting the targeted emotion(s). Specifically, we tested the utility for a given clip in discretely eliciting each of two possible, frequently co-elicited, target emotions, as well as their degree of shared activation. Typically, prior research studies (indeed most of the studies in the catalog) have tested the utility of a clip in eliciting one target emotion (e.g., sadness in The Champ: Gross & Levenson, 1995) and thus the effectiveness of the clip is only analyzed in comparison to other clips targeting the same emotion. However, given our clip selection and motivation to identify clips particularly effective at eliciting emotions that are often harder to differentiate, we employed a slightly different strategy. We tested the utility of the clip in discretely eliciting both the target emotion as well as a likely secondary emotion, as in the case of disgust and anger or amusement and happiness. As such, most clips were analyzed twice and compared against a broader range of films. In addition, we generated a mixed-feelings score (Hemenover & Schimmack, 2007) to determine the shared intensity of emotions likely to be co-elicited (anger and disgust, happiness and amusement).

We based our film clip selection on pilot research in the laboratory with groups of college-aged participants (total pilot N = 393), enabling us to evaluate the effectiveness of film clips across study settings (i.e., in the laboratory and online). The pilot research is described in complete detail in the accompanying Supplemental Material. However, a brief summary of the research is as follows: from an initial pool of approximately 40 clips, 13 clips were selected and evaluated by presenting to groups of 2–15 college students. Two separate cohorts (Sample A, n = 91; Sample B, n = 302) of participants were employed for the pilot, and film clips were presented in a fixed order under supervision by laboratory personnel. Participants completed emotion ratings on the same index as in the Online Validation Study. For more information regarding the specifics of film clip presentation and emotion rating assessments, sample demographics, statistical evaluation, and global findings, please refer to the Supplementary Material. Importantly, the validated film clips from the pilot study were consistent with those validated in the Online Validation Study. However, limitations of the pilot study included the fixed-order presentation of film clips in the pilot study, the lack of a film clip eliciting disgust in the absence of any anger confounds, and the absence of a neutral film clip.

As stated above, the goals of this study were to investigate the effectiveness of a set of film clips in eliciting emotions through an online medium, replicating the in-lab pilot work. In addition, we sought to address limitations from our pilot study to strengthen the present investigation. First, to evaluate the clips’ elicitation of discrete emotions in the absence of potential order confounds, we took advantage of an online format – which also allowed for a considerably larger sample due to reduced lab personnel demand – and randomized the order of clip presentation to avoid any order confounds. Second, we included two additional clips in the present study that improved our ability to contrast clip effectiveness. Specifically, we included a previously validated clip eliciting only physical disgust (not anger or moral disgustFootnote 1) as a comparator against clips that might co-elicit (moral) disgust when anger was the target emotion. Feelings of moral disgust are often reported when content that is intended to elicit anger is perceived as morally upsetting (Salerno & Peter-Hagene, 2013; Whitton et al., 2014). In addition, we included a novel clip for potential use as a neutral reference stimulus. These two clips were included with the 13 preliminarily evaluated in our pilot study (see Table S1).

Procedure

Participants aged 18–44 years were recruited (N = 784, age M = 19.98 SD = 3.09, 76% female) from the undergraduate subject pool of a large public university in the Midwest for a study on reactions to emotion films. This is after removal of non-English speaking participants (N = 8), as well as omission of any participants that completed the survey twice (N = 5). All materials for the Online Validation Study were distributed via Qualtrics secure server to participants’ email addresses. The investigation involved each participant viewing five (randomly selected out of a possible 15) emotion elicitation film clips in randomized order, followed by a sixth mood-lifting clip (to ensure no lasting mood effects). After each of the five emotional film clips, participants were asked to complete affect ratings then answer an accuracy question about the film clip they just viewed. Time between clips was approximately 2 min. Participants were compensated with course credit after completing the study. Refer to Supplemental Appendix 1 for Qualtrics programming details.

Film stimuli

Fifteen clips were assessed, garnered from commercially available films or publically available documentaries, internet programs, and personal videos. Twelve new clips (Alive, Between Two Ferns, College Conspiracy, Crash, D2: The Mighty Ducks, Fahrenheit 911Bin Laden, Fahrenheit 911Recruitment, Funny Cats, The Office, Police Brutality, The Road to Guantanamo, Whose Line is it Anyway; see Table S1 for details) were utilized based on our pilot investigations to elicit specific emotional responses in college students including happiness, amusement, anger, disgust, and sadness.Footnote 2 A clip from Big Cat Diary (BBC Earth, 2010) was investigated for potential use as a neutral reference film clip. Two of the 15 clips were included based on prior research: The Champ (MGM 1979, c.f. Gross & Levenson, 1995; also included in our pilot studies) to providing a contrasting negative emotion (sadness), and Trainspotting (Miramax, 1996) previously validated to elicit disgust without eliciting anger (see Schaefer, Nils, Sanchez, & Philippot 2010). Each of the 15 film clips evaluated were approximately 5 min in length.

Emotion ratings

Immediately after each film clip, participants were instructed to rate how they were feeling using a Likert scale from 1 (none) to 7 (strong). The following emotions were rated: anger, fear, sadness, guilt, surprise, interest, happiness, amusement, affection, and disgust. These words were selected as they are most consistently used in this type of emotion elicitation research (e.g., Gross & Levenson, 1995; Rottenberg et al., 2007) and were restricted to a total of ten to limit demand on participants.

Verification of participant engagement with online content

Because this study was reliant on participants engaging with the film content on their own from their personal computer (as compared to in lab during the pilot study; see Supplemental Material), we included several additional features to also assess the degree of participant engagement in the task itself. For example, after every film clip was viewed, participants were asked to respond to an accuracy question. This accuracy question was the primary means for determining whether participants had indeed engaged in watching the film clips, as online presentation of the videos inherently lacks the authoritative oversight of experimenters as in the pilot study. Those participants that failed to answer the accuracy question correctly for a particular video, or who indicated that they experienced problems when viewing the film clip, were excluded from analysis for that video (see Table S8 in the Supplemental Material for details about accuracy questions and answers). For example, the accuracy question for the Trainspotting clip was “In what room did most of the action in this film clip take place?” with a fill-in-the-blank response format. This clip involves the main character entering a room with the word “TOILET” on the door and proceeding to reach and crawl into a porcelain toilet bowl. Responses such as “bathroom,” “toilet,” “restroom,” “in a run-down gross bathroom,” or “in a nasty bathroom” were considered correct, and data from those participants were included in analyses. Responses such as “family,” “dining,” “office,” “I forget,” “household room,” “dorm room,” or “the interrogation room” were incorrect, and data from these participants were excluded from analysis of this video. Those participants who responded by saying “I do not know because the video never showed up,” “video was not working,” or “my film didn’t appear” were also excluded. Such open-ended questions allowed for identification of participants who actually were engaged in watching the video to rigorously ensure reliability of affect ratings and eliminated potentially inaccurate data from subsequent analyses (e.g., for Trainspotting 218 participants viewed the film clip, but 51 participants (23.4%) were excluded due to inaccurate answers). Participants could also email a study coordinator (contact information was provided in the instructional email) if they experienced any issues with viewing film clips. See Table S9 in the Supplemental Materials for specifics on the number of participants excluded for each film clip in the Online Validation Study. Across all films, the mean exclusion was 32 individuals, range = 6–64.

In addition to the accuracy question for each film clip, we included several programming features to permit additional preventative and confirmative measures to ensure participant involvement (cf., Crump, McDonnell, & Gureckis, 2013; Ferrer, Grenen, & Taber, 2015; Gureckis et al., 2015; Mason & Suri, 2012; Woods, Velasco, Levitan, Wan, & Spence, 2015;). These supplemental data accuracy checks included pre-experiment software download instructions, limiting the time for responses to ratings and accuracy questions, minimizing the delay between film clip presentation and affect reporting, and quantifying mouse clicks during the video presentation (see Table S10 in the Supplemental Material). These permitted analyses of behaviors indicating lack of engagement with the film clips (e.g., how many mouse clicks occurred during the film clip presentation). We also evaluated whether prior exposure to the film clips would correlate with reduced intensity of the targeted emotional responses. Overall, there were few meaningful associations.Footnote 3 However, these techniques proved quite useful in reducing the sample to one that had the highest likelihood of having actually engaged with each film clip in question.

Results

Data analytic strategy

Mean ratings across all emotions for all film clips are presented in Table 1. Responses to clips were analyzed in three steps, employing first techniques developed by Gross and Levenson (1995), then following recent examples (e.g., Jenkins & Andrewes, 2012), and finally using tools specifically designed for evaluating co-elicitation (Hemenover & Schimmack, 2007). In particular, we strove to test the distinctness of each clip in eliciting the target emotion as well as a secondary emotion (or common confounds), particularly because clips were targeting emotions that are often hard for participants to reliably differentiate (e.g., anger from disgust). Specifically, following the example of Gross and Levenson (1995), we first identified the most successful film clips (i.e., the “success index”) using a combination of the standardized mean intensity of a given clip for the target emotion(s) (e.g., mean rating of sadness in The Champ) as well as a discreteness value (hit rate), which reflects the percentage of participants rating the target emotion(s) at least one point above all non-target emotions for a given clip. We categorized films by target emotions, and for each target emotion, the mean intensity and mean discreteness across the sample within each film clip were standardized as z-scores. The sum of these intensity and discreteness z-scores determined the success index of each film clip relative to the other film clips assessed for that same target emotion, taking into account the ability of the film clip to both intensely and discretely elicit the targeted emotion. Clips were compared against all others eliciting similar emotions for their success rate for each emotion. As such, all clips potentially eliciting anger or disgust were compared against each other for anger and for disgust, respectively. This permitted identification of the film clip most successful at eliciting the particular target emotion in relation to the other clips potentially eliciting that emotion. Definitive thresholds do not exist for a film clip to be considered successful in eliciting an emotion, particularly in instances when emotions are co-elicited, such as anger and disgust, or happiness and amusement. Traditionally, the highest success rate has been the recommended clip (see Gross & Levenson, 1995).

Table 1 Online Validation Study: Film clips for eliciting discrete emotional states online

Next, we tested the efficacy of the clips in eliciting the target emotions by conducting within-subject comparisons of emotion rating by film (c.f. Jenkins & Andrewes, 2012). Specifically, we conducted a ten-level within-subject ANOVA comparing the ten rated emotions within each clip, using a Bonferroni correction for multiple comparisons and a Greenhouse-Geisser correction (ε) due to sphericity assumptions being violated (Gruber, Dutra, Eidelman, Johnson, & Harvey, 2011; Uhrig et al., 2016; von Leupoldt et al., 2007). The dependent variables were average intensity ratings, and pairwise comparisons were conducted between the target emotion and each non-target emotion using contrasts. The advantage of this approach is that it best accounts for individual differences in emotion ratings when testing for the effectiveness of a given clip. Finally, because we were specifically interested in identifying clips that elicit emotions, such as anger, that often co-activate other emotions, we employed a strategy to test the degree to which mixed emotions were present in participant ratings. Following Hemenover and Schimmack (2007), we calculated a mixed-feelings score by participant for each pair of target emotions (e.g., anger and disgust in The Road to Guantanamo), consisting of the minimum shared intensity of the two target emotions.

Each method of analysis offers certain statistical benefits, although there is some overlap. For example, the success index takes into account both how discretely the film clip elicited the target emotion, in addition to how intensely the target emotion was elicited, and allows comparisons across clips to identify the most effective one for the target emotion. The ten-level within-subject ANOVA compares the mean intensities of all measured emotions within a given clip, sharing an emphasis on intensity and discreteness with the success index analysis, but giving more specific information about that single clip regarding its emotion elicitation pattern and accounting for individual differences in reporting/rating. In contrast, the mixed-feelings score attempts to capture the shared intensity of targeted emotions, so that lower scores can suggest films that are highly discrete elicitors and higher scores suggest films that more strongly elicit mixed emotions. When interpreting the results, we extrapolated across analyses with the primary goal of identifying and best characterizing clips that consistently and reliably indexed the target emotion(s) above others.

Positive emotion films: success indices

We first evaluated the success index of the film clips by combining both the mean intensity for each film clip’s target emotion as well as the corresponding discreteness for that particular emotion for that film (the percentage of participants rating each target emotion at least one point above all others). Films were evaluated twice when there were two target emotions (happiness vs. amusement; anger vs. disgust) and compared against all other clips eliciting that target emotion (see Table 2 for a summary). Surprisingly, Funny Cats was found to have the highest success index for both amusement and happiness, with the latter being primarily attributable to the high mean intensity of happiness elicited. Between Two Ferns had the second highest success index for amusement, followed closely by The Office and Whose Line is it Anyway. Regarding happiness, Alive and D2: The Mighty Ducks had, respectively, the second and third highest success rates after Funny Cats.

Table 2 Online Validation Study: Mean intensity, discreteness, and success rates

Positive emotion films: within-subject ANOVAs

Within-subject ANOVAs revealed that several of the film clips did not have significant differences between interest and the target emotion, indicating high engagement of the participants with the film clip (see Table 3). Alive significantly elicited happiness more so than all other emotions F(4.91, 941.70) = 121.81, p < 0.001, ε= 0.545, and was not suited for evoking amusement. Between Two Ferns, D2: The Mighty Ducks, and Whose Line is it Anyway had happiness levels significantly different from all other emotions except interest (Between Two Ferns F(2.83, 521.15) = 178.74, p < 0.001, ε= 0.315; D2: The Mighty Ducks F(3.63, 555.12) = 217.64, p < 0.001, ε= 0.403; Whose Line is it Anyway F(3.55, 639.37) = 311.69, p < 0.001, ε= 0.395). Mean intensities of amusement and interest were also not significantly different for D2: The Mighty Ducks and The Office F(3.58, 648.06) = 175.76, p < 0.001, ε = 0.398, though all other emotions of interest were significantly lower than amusement (p < 0.001). Between Two Ferns had amusement levels significantly higher than all other emotions, including interest. Both happiness and amusement were significantly different from all other emotions, including each other, for the Funny Cats film clip F(3.48, 588.60) = 378.03, p < 0.001, ε= 0.387.

Table 3 Within-subject ANOVAs, including pairwise comparisons of target to all non-target emotions, by film, for the Online Validation Study

Positive emotion films: mixed-feelings scores

Mixed-feelings scores (Table 4) were generated to indicate the minimum intensity rating shared by the two target emotions (amusement, happiness) that could be co-elicited. For film clips intended for elicitation of happiness or amusement, Alive had the lowest mixed-feelings score, followed by D2: The Mighty Ducks. The low mixed-feelings score of Alive may be attributable to the overall lower intensity of reported emotions elicited by this film. Funny Cats, which had the highest success index score for both amusement and happiness, also had the highest mixed-feelings score. Whose Line is it Anyway also had a high mixed-feelings score, while The Office and Between Two Ferns had moderate mixed-feelings scores.

Table 4 Online Validation Study: Mixed-feelings scores

Negative emotion films: success indices

Police Brutality had the highest success index (Table 2) for anger. Trainspotting, the film clip intended to evoke disgust without confounding anger elicitation, was found to have the highest success rate for disgust. College Conspiracy and Crash had the next highest success indices for anger and disgust, respectively. The latter finding suggests Crash may have elicited moral disgust, in comparison to the physical disgust evoked by Trainspotting. Surprisingly, The Road to Guantanamo had a lower success rate for anger but a moderately higher success index when evoking disgust. Both Fahrenheit 911 film clips had low success rates for anger and disgust. As with the pilot study, the majority of film clips were without sex effects on the mean intensities of the target emotions when evaluated by t-test; exceptions are noted in Table 1.

Consistent with our pilot study as well as other reports (c.f. Gross & Levenson, 1995), The Champ elicited sadness effectively, with high intensity (M = 5.28, SD = 1.64) and discreteness (67.24%). A success index was not calculated for The Champ, as it was the only film clip evaluated for evoking sadness.

Negative emotion films: within-subject ANOVAs

For the negatively valenced film clips, Trainspotting was effective for evoking disgust higher than all other emotions F(4.51, 721.03) = 225.56, p < 0.001, ε = 0.501, whereas anger was not significantly different from fear, sadness, happiness, or amusement (Table 3). This suggests that, as expected, this film did not carry an anger confound when eliciting physical disgust. Both Crash F(5.87, 1350.16) = 243.09, p < 0.001, ε= 0.652 and Police Brutality F(6.08, 1466.09) = 267.40, p<0.001, ε= 0.676 had significantly higher levels of both anger and disgust than all other emotions, and these emotions were also significantly different from each other. College Conspiracy had mean intensities of anger and disgust that were not significantly different from interest nor from each other, but were higher than all other emotions F(6.17, 1376.13) = 238.43, p < 0.001, ε= 0.686. Anger was not significantly different from interest or fear for The Road to Guantanamo, though disgust was significantly higher than all other emotions F(5.70, 1356.49) = 223.02, p < 0.001, ε=0.633. For the Fahrenheit 911: Bin Laden Family film clip, anger and disgust were not significantly different from surprise, and disgust was also not different from interest F(5.27, 1154.56) = 112.94, p < 0.001, ε= 0.586. Furthermore, anger and disgust were not significantly different from sadness nor from each other for the Fahrenheit 911: Recruitment clip, and disgust was not significantly different from interest F(6.16, 1410.22) = 117.06, p < 0.001, ε= 0.684.

The Champ was effective at significantly (p < 0.001) eliciting greater sadness than all other emotions F(6.72, 1552.04) = 219.09, p < 0.001, ε= 0.747, in agreement with previous reports.

Negative emotion films: mixed-feelings scores

Evaluation of mixed-feelings scores (Table 4) of anger and disgust for the negative emotion films revealed that Trainspotting had the lowest score, in agreement with its ability to elicit physical disgust without evoking anger. Police Brutality and Crash had the highest mixed-feelings scores, followed by College Conspiracy and The Road to Guantanamo. The two Fahrenheit 911 clips had low-to-moderate mixed-feelings scores, but this is likely attributable to the overall low intensity of emotional responses elicited in response to viewing these clips.

Neutral film clip

Big Cat Diary, the film clip included as a potential neutral reference, was actually found to elicit significantly (p < 0.01) greater interest than all other emotions except happiness F(2.55, 542.29) = 221.48, p < 0.001, ε= 0.283. In turn, happiness was significantly greater (p < 0.001) than all other emotions except interest, amusement, and affection. Because this was the only neutral film clip evaluated, a success index was not calculated.

Discussion

In this Online Validation Study, we had participants view five randomly selected videos (out of 15 possible clips) in a randomized order to address one of the limitations we encountered in our pilot study (see Supplemental Material). These were evaluated using three complementary statistical approaches, allowing for a robust test of the utility of the candidate film clips and following established conventions (e.g., Jenkins & Andrewes, 2012). First, the success index took into consideration the relative ability of film clips, compared to other film clips being evaluated for eliciting the same emotion, to discretely and intensely elicit the target emotion. Next, the ten-level within-subject ANOVA assessed, within each film clip, the distinctive profile of emotions elicited. This permitted evaluation of the intensities of different emotions elicited by a single film clip by individual, providing an alternative way to examine the ability of a single clip to evoke a target emotion in comparison to the other emotions reported and most effectively accounting for individual differences in emotion reporting. Finally, the mixed-feelings score specifically indexed the shared intensity of co-elicited emotions (e.g., anger and disgust). Together, these three analytical methods contribute distinct yet complementary assessments of the film clips’ utility in discretely eliciting one of a pair of frequently co-elicited emotions, and all three are in agreement regarding the most effective clips out of those evaluated here.

Based on the success index, within-subjects ANOVA, and mixed-feelings score analyses for the Online Validation Study, we were able to validate the use of The Office to elicit amusement, as this film clip predominantly evoked strong amusement intensities with considerable discreteness in the absence of the similarly strong happiness elicitation achieved by Funny Cats. Similarly, our data validate use of D2: The Mighty Ducks for the evocation of happiness without a corresponding elicitation of amusement, as occurs with Funny Cats. Between Two Ferns and Alive were moderately effective in discretely eliciting amusement and happiness, respectively. However, our data do demonstrate that the Funny Cats clip is particularly powerful (and Whose Line is it Anyway is moderately useful) for circumstances where strong evocation of both happiness and amusement are desired, such as dimensional or mood induction research.

Police Brutality was the most powerful in eliciting anger when compared across the six film clips expected to evoke anger. Consistent with prior research (Schaefer et al., 2010), we were able to verify Trainspotting for elicitation of physical disgust in the absence of an anger confound. Crash was also effective in its elicitation of disgust, despite some residual evocation of anger, and is recommended for elicitation of disgust in instances when moral, rather than physical, disgust is desired. Indeed, the co-elicitation of anger and moral disgust remain significant challenges to emotion elicitation research (Ottaviani et al., 2013; Salerno & Peter-Hagene, 2013; Whitton et al., 2014). College Conspiracy was moderately effective in evoking anger. The Road to Guantanamo moderately evoked both anger and disgust in participants and, along with College Conspiracy, could be implemented to reliably elicit negative mood states. We also verified the ability of The Champ to elicit sadness.

Presentation of a film clip from Big Cat Diary was intended as a neutral film clip, but based on a previously used criterion for neutral clips (i.e., a mean intensity <2.5 on a scale of 7 for all emotions of interest; Gross & Levenson, 1995; Hewig et al., 2005) it did not qualify as neutral because mean intensities for interest, happiness, amusement, and affection were >3. Still, the mean intensity of happiness for Big Cat Diary was less than all other positively valenced films, and the mean amusement intensity was similarly lower than the comparatively positive films except for Alive. Thus, Big Cat Diary could be useful as an engaging yet mildly positive film clip, particularly in light of the drawbacks of film clips that are neutral to the point of eliciting boredom or other ambiguous emotional responses (Gross & Levenson, 1995; Hewig et al., 2005; Samson et al., 2016).

Future investigators seeking to preferentially elicit one of these pairs of frequently co-elicited emotions are therefore recommended to use the following clips: The Office for amusement, D2: The Mighty Ducks for happiness, Police Brutality for anger, Trainspotting for physical disgust, and Crash for moral disgust. Dimensional researchers may wish to validate Funny Cats for eliciting positive or pleasant affective states, and The Road to Guantanamo or College Conspiracy (depending upon target population) for negative or unpleasant emotional responses. More broadly, the present findings permit evaluation of film clips that were first piloted in a laboratory setting to elicit emotions via an online medium. To date, only two studies have validated film clips for elicitation of emotion using the internet (Aldao & Nolen-Hoeksema, 2013 ; Samson et al., 2016), and only one has evaluated the same film clips both online and in the laboratory setting as we do here (Samson et al., 2016).

Assembling a database of emotionally evocative film clips

In 1995, Gross and Levenson published what is often considered the first major set of emotion-inducing film clips, although film stimuli had certainly been used before in research (e.g., Lazarus, Speisman, Mordkoff, & Davison, 1962) and previous emotion film sets existed (e.g., Philippot, 1993). Since that time several other sets have been put forth, most emphasizing the elicitation of discrete emotional states (e.g., Gabert-Quillen et al., 2015; Hewig et al., 2005; Rottenberg et al., 2007; Schaefer et al., 2010), others emphasizing broader dimensions of affective experience (e.g., Carvalho, Leite, Galdo-Álvarez, & Gonçalves, 2012; Samson et al., 2016) and the influence of key demographic factors in the effectiveness of the stimuli (e.g., age; Hazer et al., 2015; Jenkins & Andrewes, 2012). The advantages of each film set are clear. However, to date, there has been no attempt to integrate all of these stimuli sets into one large database of emotion-eliciting film clips in order to facilitate more effective and accessible use of this growing body of work. Indeed, the valuable advantage of such a database would be to provide an initial source permitting rapid identification of highly valid clips suited precisely for each researcher’s individual needs. This is particularly useful given the variety of journals that film clip validation studies have been published in, some of which are not as prominent as others. Therefore, the complementary goal of this paper is to address an identified deficiency in the literature that we have recognized during our own research investigations. We sought to assemble the first integrated database of emotionally evocative film clips validated in prior research, and have included clips validated in our research presented here. In doing so, we provide critical data to users on the number and type of studies validating a given clip as well as improve identification of clips useful in eliciting a wide range of emotional responses. Based on the film clip validation information we provide here, researchers can rapidly identify the film clip(s) they wish to pilot for their own investigations, identify the corresponding article, and contact the respective author(s) for more details on their methodology and access to otherwise publically unavailable film clips.

Emotion theory and the organization of emotion film stimuli

Over the last several decades there has been a lively theoretical debate as to the underlying nature of emotional responses in humans (e.g., LeDoux, 2012; Panksepp, 2007). On one end of the discussion are discrete or basic emotion theorists (e.g., Ekman, 1992; Tooby & Cosmides, 2008) who posit that emotions are relatively brief and discrete episodes of loosely coordinated responses on multiple dimensions (e.g., autonomic, behavioral, experiential, etc.) that evolved to facilitate adaptation to specific environmental demands. For example, anger is associated with loosely coordinated responses that evolved to facilitate instances of goal blockage (Carver & Harmon-Jones, 2009), versus sadness which has a distinct set of loosely coordinated responses to manage instances of loss (Bonanno, Goorin, & Coifman, 2008). On the other end are dimensional or psychological constructionist theorists (e.g., Barrett, 2006) who argue instead that specific emotional responses can be understood as overlying two neurobiologically based dimensions of response to environmental demands. For example, some argue that the hedonic valence (e.g., the pleasantness vs. unpleasantness) and arousal (e.g., low vs. high) of the provoking stimuli determines an individual’s response on multiple dimensions (e.g., behavioral, autonomic etc.). As such, specific emotion labels (e.g., anger vs. fear vs. sadness) are socially-constructed, resulting from individual differences in the appraisal or conceptualization of the provocation (e.g., Barrett & Bliss-Moreau, 2009).

Although a detailed review of the literature is beyond the scope of this paper, we present these seemingly opposing theories of discrete vs. dimensional emotions to justify the inclusion and organization of a wide array of film clip stimuli that would facilitate the work of researchers who are operating from these different theoretical positions and/or have different research agendas. For example, researchers interested in experimentally manipulating mood in order to examine influences on executive attention might most benefit from employing clips validated using a dimensional framework. Conversely, researchers interested in examining the influence of fear on visual search and memory would be better served employing clips validated using a discrete emotions framework. As such, we compiled two classes of film clip stimuli researched over the last four decades into our catalog. The first includes stimuli demonstrated to elicit specific or discrete emotional responses, and the second includes stimuli demonstrated to support dimensional or psychological constructionist views. In addition, we include data from our own research that largely employs a discrete emotions framework while taking into consideration the complexities of co-elicitation of similar emotions.

Literature review and development of film stimuli catalogues

In order to identify prior research indicating evidence of effective film clip stimuli, keyword searches (e.g., “emotion elicitation,” “film clips”) yielding over 1,000 results were first conducted on PsycINFO and Google Scholar and reference sections of relevant studies were examined. From the resulting literature, validated film stimuli were included if they met the following four criteria: (1) validation data supporting the utility of each clip for emotion elicitation were provided (either as published or submitted by the author via personal communication); (2) the stimuli were validated on a healthy population; (3) the researchers generally followed established conventions (i.e., methodological, statistical) for the evaluation of the stimuli (e.g., Gross & Levenson, 1995); and (4) the data were reported in the last four decades. The total yield was 24 studies. In some cases, multiple investigations provided data in support of almost identical film clips (e.g., The Champ, MGM 1979), using the same or different indices of emotional responses (i.e., affective report, autonomic responses, facial behavior). In those cases, we cite all relevant studies supporting a given clip and indicate which indices were used in each investigation.

Validated film stimuli from the literature (including the clips validated in our Online Validation Study) are presented in two comprehensive tables. Table 5 lists all clips researched for discrete emotion elicitation. Clips are listed by emotion and basic information is included on the type of supporting data (i.e., which emotion response indices) provided by the respective investigations (these are also listed in the reference section and denoted by an *). We do not include the specific details of the construction/location of the clip (with the exception of those for which we provide data) because this can be found in the referenced article(s) which should be consulted for greater details on the methodology employed. Table 6 lists all clips researched for the elicitation of dimensions of emotional valence, arousal, and/or dominance. Clips are listed by dimension and included are information on the type of supporting data and references. In some cases, clips were evaluated for the elicitation of both discrete and dimensional emotion responses. In this case, they are only listed in Table 5 but given special designation (**). Finally, we specify in both Tables 5 and 6 the language of the clip (other than English) or if there was no language spoken, the type of adult sample the clip was validated in (community versus student), and the length of the clip, in order to facilitate easier identification of candidate film clips for future investigations.

Table 5 Comprehensive list of film clips to elicit discrete emotional responses
Table 6 Comprehensive list of film clips to elicit dimensional emotion responses

General discussion

Our goals for this investigation were two-fold. First, we validated several clips for discrete elicitation of particularly challenging, frequently co-elicited emotions using online presentation of film clips in randomized order. Our investigations here provide clear evidence supporting the use of the following film clips for elicitation of these discrete emotional responses: happiness – D2: The Mighty Ducks; amusement – The Office; anger – Police Brutality; disgust – Trainspotting and Crash; sadness – The Champ. The inclusion of two clips for disgust reflects inherent differences in elicitation of physical disgust versus moral disgust (respectively). Evaluation of these film clips in an online population after first being piloted in a laboratory setting strengthens the findings of the former by suggesting that the ability of these particular film clips to elicit emotional responses are generally consistent despite being presented in very different environments. Other clips demonstrated moderately consistent elicitation of anger and/or disgust or happiness and/or amusement (The Road to Guantanamo and College Conspiracy for anger and disgust; Alive for happiness; Between Two Ferns for amusement; Funny Cats and Whose Line is it Anyway for happiness and amusement). These clips can be considered as effective stimuli for use in research where they may serve the specific purposes of researchers better because of their content or because the research is not dependent upon eliciting one specific emotion. The second goal of this investigation was compilation of an extensive catalog of film clip stimuli, integrating evidence demonstrating the validation of films for elicitation of both discrete and dimensional emotion responses from research published over the past four decades.

After incorporating the six clips from the Online Validation Study here with the existing literature, we have assimilated 295 film clips from 24 articles to generate a current, unified point of reference for emotionally evocative videos. The variety within this catalog is considerable, including: clips in many languages (e.g., English, French, German, and Italian); clips with and without audio tracks; clips differing in length; clips of both professional quality and those taken of real-life occurrences; and of course, clips designed to elicit a wide array of emotional responses. This diversity is indicative of widespread interest in research on emotion and reflects the complexity of this particular methodology. Our goal was to make this resource a relatively simple yet useful reference tool in order to help support the rapidly growing field of emotion research.

Most film clips employed in the study presented here did not result in significant differences in reported intensities of emotions between the sexes. Though evaluating sex differences in responding to emotionally evocative film clips was not a primary goal of this paper, we nonetheless recognized the unique strength in our data set that would allow us to detect consistent or inconsistent differences in reported target emotion intensity between the sexes. Only one film consistently exhibited a sex difference across both the pilot study and the Online Validation Study – The Champ resulted in significantly higher reported sadness intensities in females versus males. Other clips suggesting sex differences in the present study are indicated in Tables 1, S2, and S3.

Over the course of our research we recognized some key limitations to investigations validating film clips in both the articles we sought as well as our own studies. Most important is that the vast majority of the data supporting the use of these clips was garnered using affective self-report only. Although the utility of affective or experiential report of emotion is debated (e.g., Robinson & Clore, 2002), most would agree that there are limitations associated with relying on only one index of emotional responses. Indeed, some of our own research has yielded instances where affective report is discrepant from other response modes (including autonomic activity: Coifman, Bonanno, Ray, & Gross, 2007). However, there are a number of benefits associated with validation using self-report, most critical of which is efficiency. Indeed, studies in which no a priori expectations or demands are suggested to participants and the samples are large are likely to be the strongest in terms of validity. Moreover, researchers are encouraged to employ these stimuli with the understanding that most predictable will be self-reported emotion and that reactions observed through other measures (e.g., behavioral) may or may not be consistent with self-reported responses.

An additional issue resulting from the use of self-reported affect ratings is that this tool carries inherent challenges. There are many potential emotion words that can be used for participant ratings, and here we selected ten of the most consistently used words in this type of research that are also consistent with dominant models of discrete emotion (Ekman, 1992). Had we used a larger number of words we may have increased our ability to detect emotions that were elicited, but this comes at the cost of time and attentional demands on the participant. Moreover, there is growing evidence that the capacity to differentiate emotion words may vary considerably between individuals, particularly for negative emotions (Kashdan, Barrett, & McKnight, 2015), and as such, more words may not necessarily have been more comprehensive. Further, participants can experience difficulties in disentangling similar emotional experiences, be inaccurately aware of, or improperly attend to, their emotional states, or have problems with the fluid and malleable definitions attributed to emotion words (Ellsworth & Smith, 1988; Gross & Levenson, 1995; Herring et al., 2011; Jerritta, Murugappan, Wan, & Yaacob, 2014; Maffei et al., 2014; Salerno & Peter-Hagene, 2013; Whitton et al., 2014). These challenges are particularly prevalent for emotions with positive valence (Ellsworth & Smith, 1988; Herring et al., 2011). Also challenging is the frequent co-elicitation of anger with moral disgust (Gross & Levenson, 1995; Jerritta et al., 2014; Salerno & Peter-Hagene, 2013; Whitton et al., 2014), but not with physical disgust.

It is for these reasons that we adopted a more encompassing and rigorous evaluation of the film clips in our Online Validation Study. Indeed, because we anticipated certain film clips would be more advantageous in eliciting one particular emotion over another, we sought to more objectively evaluate the data by taking into consideration these obstacles that participants face when self-reporting emotions. Consequently, we assessed emotions that are known to be evaluated and reported similarly (i.e., amusement and happiness; anger and disgust) so that we could best categorize the evocative capacity of film clips in the context of self-reported emotions. This approach permitted identification of film clips that were effective in eliciting emotional responses that may not have otherwise been evaluated (e.g., recognition of the ability of Crash to elicit disgust, though it was originally expected to elicit anger). This approach thereby facilitates the use of these film clips by future investigators that have a myriad of different experimental needs, yet ultimately are seeking to evoke a particular self-reported emotional experience in participants.

A second limitation is that the vast majority of literature validating these clips utilized undergraduate samples. This is a limitation that is evident in our own research as well. There are relatively few investigations examining the impact of age on responses to emotional film stimuli, but some data (e.g., Hazer et al., 2015; Jenkins & Andrewes, 2012) have accumulated to suggest that important differences can emerge. Indeed, gender, culture, age, and a variety of other factors should be considered when selecting clips for use with varying participant populations (see Alghowinem et al., 2014; Fernández, Pascual, Soler, & Fernández-Abascal, 2011; Gabert-Quillen et al., 2015; Liang, Hsieh, Weng, & Sun, 2013). Moreover, utilization of community, rather than college, samples through crowd-sourced venues such as Amazon Mechanical Turk might reveal different elicitation responses or altered patterns of co-elicitation, given the relative heterogeneity (Peterson, 2001) and distinctive motivations for study participation (Paolacci & Chandler, 2014) that exist in community populations. Aside from demographics, potential confounds in the form of survey satisficing, or financial or academic motivation (e.g., extra credit for study participation), should also be given substantial consideration when selecting film stimuli and sample populations (Hamby & Taylor, 2016). Despite the variation in college attributes (e.g., size, geographical location, international diversity, student culture), college students as a population could exhibit characteristically different emotional responses to film stimuli than community members. These myriad of considerations further emphasize the importance of both selecting and piloting film stimuli, an endeavor we sought to facilitate here both through our own validations and the assembly of the larger validation literature.

For the current investigation, we specifically included recent film clips that we hypothesized would be particularly salient to a college-aged population (e.g., Between Two Ferns, College Conspiracy; Fahrenheit 911: Recruitment). Our research, like much across the behavioral sciences, is highly dependent on college samples. Indeed, given the prevalence of university students in emotion elicitation research using film clips (71% of studies in Tables 5 and 6 used student samples), we chose to execute our investigation in this very population to maximize the utility of this work in future studies. However, we recognize and acknowledge the value of validating film clips in other populations, and have clear plans to do so in our future work. Moreover, we encourage researchers to apply these clips to more diverse populations with the goal of better understanding the impact of age, culture, genders, and ethnicities in emotion processing and emotion research methodology. Finally, there is increasing use of film clips validated on healthy populations for use with clinical samples. Although intuitively this makes sense and we have chosen to include only clips examined in healthy samples, there is evidence (e.g., Aldao, Mennin, & McLaughlin, 2013; Ellard, Farchione, & Barlow, 2012) suggesting that psychopathology and other illnesses evident in a given sample should also be considered as additional factors when selecting clips to elicit a given response.

A particular strength of this study is the evaluation of film clips – which were first piloted in small, in-person participant groups – in eliciting emotion when presented online to individuals. Use of online emotion elicitation methods is burgeoning in psychology (Ferrer et al., 2015), and offers advantages in sample sizes, population accessibility, and relative speed of data generation. To date, only two other studies have validated film clips for the elicitation of emotional responses in an online format (Aldao & Nolen-Hoeksema, 2013 ; Samson et al., 2016). Of course, disadvantages include less control over environmental conditions and computer capabilities, as well as reliance on the accuracy of participants’ reports of demographic values. Presence of authoritative experimenters, such as in the pilot study (see Supplemental Material), helps to curb drifting attention and prevent technical difficulties, minimizing or even preventing the need to exclude participants. This is, of course, labor-intensive and precludes the ability to individually randomize film presentation order to participants as was done in the present study. The strengths and disadvantages of web-based experimentation in psychology are discussed in recent articles (Crump et al., 2013; Ferrer et al., 2015; Gureckis et al., 2015; Mason & Suri, 2012; Woods et al., 2015), and for our Online Validation Study we employed many of the recommended checks to ensure data accuracy.

Conclusion

The purpose of this research was to fill some notable gaps in the literature regarding film clips effective in discretely eliciting frequently co-elicited, and therefore more challenging, emotions (e.g., anger) using an online medium. In our pursuit, we also developed an integrative catalog of film clip stimuli for use in future emotion elicitation research. Over the last several decades the presence of validated film sets as well as research employing them has increased exponentially. Indeed, the utility of a resource cataloging the wide array of film stimuli is evident in the rapidly increasing amount of emotion research broadly across disciplines including medicine, economics, sociology, and psychology. Though there are limitations to relying on the piloting/validation of other researchers examining responses in other samples, we believe that this catalog will serve as a useful resource for future investigations and help support the ongoing growth of emotion research.