Significance

Although psychologists admit that events can span essentially any length of time, most research in event perception and cognition has had participants segment events of a few minutes in duration or less. Moreover, current views suggest that events are nested meronomically; that is, smaller events exist within larger events and, more particularly, the larger events share boundaries with the first and last of the smaller events within their scope. The study of popular movie structure allows three queries to test event theory. (1) There is ample psychological evidence for movie scenes as events. For example, in the spirit of Aristotle they can be shown to have beginnings, middles, and ends. These scenes can last from a few seconds to several minutes. Can one find psychological evidence for a movie unit larger than scenes? (2) If so, do they share endpoints with the bordering scenes they subsume? (3) Event theory suggests that smaller units are driven by physical information in the stimuli, and the evidence from movies is consistent with this. But the event theory also suggests that large events are discerned more through cognitive means. Is this true for larger-scale events in movies as well? Results of the segmentation of seven popular movies released between 1927 and 2011 are consistent with the notions that there are larger-scale events in movies between about 20 and 35 min in length, that they share boundaries with the scenes/events they subsume, and that cognitive skills are necessary to segment them.

Background

Our experience is filled with units of different sizes. We may surf the internet within the task of writing an email letter, within the span of using our laptop, within a bus trip, within a vacation, and beyond to within the span of a particular employment, to within a career. Each of these can be called a unit of experience. Life is not only “one thing after another” (Keillor, 2013, 8:6; Radvansky, 2012, p. 269) but it is also a layered set of sequential “things” one after another, the larger encompassing the smaller.

Many of these things are called events, and the criterion for an event is that it must be perceived or thought of as a unit, one that typically—following Aristotle—has a beginning, a middle, and an end (Cutting, 1981; Zacks & Tversky, 2001). Importantly in this conception, the boundary of a large event is one generally shared with a smaller event, and perhaps one still smaller. That is, a smaller event does not typically lie astride the boundary of two superordinate events. Moreover, in contrast to a hierarchy, this structure is called a meronomy; Zacks and Tversky (2001) called it a partonomy. That is, lower-order units are parts of, not types of, higher-order units.

Most temporal art forms are also meronomical, constructed with layers of units of different sizes. A symphony often has four parts (a common structure is sonata, adagio, scherzo, and allegro), each part with a number of sections. The sonata form typically has an exposition, a development, a recapitulation, and an optional coda. And the exposition will often have several main melodic themes that later repeat. Similarly, in reading a novel one will likely encounter chapters, and paragraphs, and sentences. A play will likely have several acts, and scenes within them, and spoken parts and actions within those. A dance, whether choreographed or improvised, will typically have parts within parts. Moreover, the layered structure of movies is roughly the same. Indeed, most of us are accustomed to thinking of movies as having shots within scenes and sequences within the movie. But is there another unit larger than the scene or sequence and smaller than a movie? Indeed, Gibson (1979), p. 391 suggested that a “film is composed of events and superordinate events.” What might these superordinate events be?

Among movie units, shots may seem to be events but they are typically not (Magliano & Zacks, 2011). When asked to detect cuts in a scene, viewers often miss them (Smith & Henderson, 2008) and the later visual areas of the brain take little notice of cuts (Baldassano et al., 2017; Zacks, Speer, Swallow, & Maley, 2010). Thus, although some shots are full scenes, most are not, and they have little status as events in the larger flow of a movie. Scenes are the first units of movies that meet the various criteria for events—they have a normatively scalloped shape in terms of shot duration (longer, then shorter, then longer) and shot scale (wider angle, then perhaps a closeup, then often backing off; Cutting, Brunick, & Candan, 2012) and they have a characteristic brain response, registering the unpredictability of action at a scene boundary but less so within a scene (Zacks, Speer, Swallow, Braver, & Reynolds, 2007). Similarly, sequences are events—made up of smaller scene-like units that have their boundaries disguised and typically oscillate between two characters, two places, or two time frames (Cutting, 2019a).

One candidate for Gibson’s “superordinate event” in movies is analogous to the act in a play. Indeed, Field (2005) has called them acts. However, Bordwell (2006) and Thompson (1999) have simply called them large-scale parts, and Bellour (1976) called them major parts, for the obvious reason that the term act can have misleading implications. The end of an act in a play may completely halt the action (often with a curtain lowered) whereas no analogous thing occurs in most movies.

However, insofar as we know, no psychological evidence supports a larger act-like unit in movies. To be sure, there are ample theoretical and pragmatic statements about such structures, but without psychological support such theories are, well, just theories. Since events are psychological units they should have psychological evidence in their support. The same should be true of “superordinate events”. The goal of this article is to assess whether there is psychological evidence from young, nonprofessional viewers in support of the larger units. Movies are a good venue for this since, at least in contemporary films, the boundaries between these units are not typically obvious on the basis of their surface form. However, older movies may have fades and dissolves that can assist the viewer in segmentation.

There are two basic approaches to large-scale events in cinema, and both have shared attributes with the works of German novelist and playwright Gustav Freytag (MacEwen, 1900), who proposed a five-part structure for plays, and more importantly with the mythologist Joseph Campbell (1949) and his “hero’s journey”. Campbell proposed a three-part structure having a departure, an initiation, and a return. One approach to the narrative structure of movies comes from screenwriting manuals. Perhaps allied with Campbell, it is most associated with Field (2005) and is known as the three-act structure.

The first quarter of a movie is aptly named the setup. In it the main characters are typically introduced and we learn about their goals. There is likely a turning point (also called an inciting incident) typically about halfway through it, which raises a dramatic question (e.g., will the protagonist outwit the antagonist?) that will be answered at the end of the movie. The second act, the confrontation, typically contains the middle half of the movie, in which the protagonist attempts to solve the problem created by the inciting incident. The confrontation has a midpoint, where the protagonist may enlist the help of other characters. The third act is the resolution or climax where the story and its subplots are resolved and the dramatic question is answered.

The second approach comes from film theory and is not much different. For Bordwell (2006, 2007) and Thompson (1999, 2008) the setup and the climax are pretty much as Field describes, although the climax may have an epilog in which the diegetic social order is restored. One way in which this approach differs from Field’s is that it divides his second act in half to create two other large-scale parts of roughly the same length as the setup and climax. Its first half (and second large part of the movie) is the complicating action in which the protagonist’s goals are sharpened but still not met, and the second half (and third large part) is the development, which can create new goals for the protagonist, deepen characterization, enlist minor characters, or simply sustain the situation.

Two important features accompany this approach. First, the parts are generally, but not necessarily, the same length; and second, the number of parts is not limited to four. Instead, there is a rough time limit of 20 to 30 min (Thompson, 1999) or 25 to 35 min (Bordwell, 2007; Thompson, 2008) that will tend to dictate the number of larger parts. Thus, a movie of 90 min or less is likely to have only three parts and no development section, whereas a movie of near 150 min or more is likely to have five parts and two development sections. Thompson (1999), pp. 43–44 believed that balanced-length parts:

cater to the attention span of the spectator. The studios need not have pinpointed exact timings consciously, but careful attention to the minute-by-minute reactions of preview audiences (used since the 1920s) may have given practitioners an instinctive sense of when to change the direction of action. Time and again scriptwriters have described this instinctive feel for structure … These generalizations about the large-scale parts of narratives [however] do not offer a detailed or definitive explanation as to why they exist. Such an explanation could lie in the realm of cognitive psychology.

The purpose of this paper is not to investigate the reason for these 20–35 min parts. Instead, it seems prudent first to determine whether or not these large-scale units in the Thompson/Bordwell scheme are actually psychological events for nonprofessional viewers.

Experiment 1: Segmentations by average viewers

Method

As part of a seminar course undergraduate students viewed seven movies as an ensemble, each movie in a single sitting. Rather than being a convenience sample, this is a sample target audience for most popular movies. None of the students had taken a film course, most were computer science or psychology majors, and most reported normally seeing about one movie per week. Thirteen to seventeen viewed each movie around a seminar table in a darkened classroom. Movies were viewed in chronological order by release year, and on average about 10 days apart. There were ten females and seven males in the class.

We chose these movies because they were unlikely to have been seen before (which proved true, no student had seen any of them) and because they had widely varying narrative structures and narrational techniques—single or multiple narrative threads; linear and nonlinear plot lines; considerable differences in shot durations, shot scales, luminances, and motion; their uses (or not) of dissolves and fades; and the presence (or not) of flashbacks. This research was exempted from full review by the Cornell Institutional Review Board and informed consent was obtained.

The movies

Several attributes of the movies, and of the results, are given in Table 1. Here, we describe each movie in more detail, with information about its general reception and historical context.

Table 1 Films and segmentation information

Wings (Wellman et al., 1927) is a drama/romance/war movie in black and white. It is also a “silent” movie, the only one in this sample. That is, it is accompanied by music but has no voice track. It won Academy Awards for Best Picture and for its technical accomplishments in the portrayals of air battles. Its Internet Movie Database (IMDb) rating is 7.7 and its Rotten Tomatoes ratings are 95% by critics and 78% by general audiences.Footnote 1 It has 2061 shots (1797 live action shots with 264 intertitles), and average shot durations of 3.9 s without intertitles and 5.0 s with them. Silent movies typically have two types of intertitles, conversational and expositional. Expositional intertitles act typically like extended scene transitions and are inserted by the filmmakers to form boundaries among scenes of other narrative units. Among the non-cut transitions between shots in Wings are one intermission, 36 dissolves, 37 fades out and in (sometimes in pairs with title cards in between), and two wipes. It also has a linear narrative style with temporal gaps as it progresses. There is only one brief flashback.

Grand Hotel (Goulding et al., 1932) is a black-and-white drama, and an early “talkie” (with indigenous sound and dialog). It is perhaps the earliest example of a network narrative in popular cinema (Bordwell, 2015). That is, each of the five major characters has his or her own narrative thread, and these threads crosscut throughout the movie, usually paired with at least one other character. Grand Hotel also won the Academy Award for Best Picture; its IMDb rating is 7.5 and its Rotten Tomatoes ratings are 86% from critics and 77% from general audiences. It has only 380 shots, yielding a very long average shot duration of 17.6 s. Among its non-cut transitions are five dissolves, seven fades out and in, and three wipes. It has no flashbacks.

Passage to Marseille (Curtiz et al., 1944) is an adventure/drama/war movie in black-and-white. It is a follow-up to Casablanca (Curtiz et al., 1942) with much the same cast and again focused on the French resistance in World War II. It has a complex plot. Most notably, it has a flashback within a flashback within a flashback within the dominant diegetic story. Its IMDb rating is 6.9 and has no Rotten Tomatoes critics’ rating (it received almost no reviews at the time in part because it was released when it was already apparent that the Allies would win the war), and an audience rating of only 56%. It has 1039 shots, with average shot duration of 6.2 s. Among its shot transitions are 78 dissolves, five fades out and in, and eight wipes.

Rope (Hitchcock et al., 1948) is an experimental drama/suspense movie. Hitchcock regarded it as a stunt and a failure (Truffaut & Scott, 1983, pp. 179–184). It was also his first color movie. It has an IMDb rating of 8.0 and Rotten Tomatoes ratings of 97% from critics and 90% from general audiences. It has only 11 shots (average shot duration = 434 s) and was designed to appear to have (almost) no edits. Edits were included for pragmatic reasons of film-reel length in shooting, and during theatrical presentations to cue the projectionist when to start the next reel. It has five straight cuts and five dissolves (typically across the backs of male characters with dark jackets) and no standard fades or wipes. The story takes place in real time in one room.

All About Eve (Mankiewicz et al., 1950) is a black-and-white drama. It won the Academy Award for Best Picture and several other awards. It has an IMDb rating of 8.3, Rotten Tomatoes ratings of 100% from critics and 94% from general viewers, and the American Film Institute has it currently ranked as the 16th best movie of all time. It has 784 shots with average shot duration of 12.9 s. Among its transitions are 16 dissolves, four fades out and in, and no wipes. The bulk of the movie (87%) is embedded in one flashback.

Ordinary People (Redford et al., 1980) is a color drama, and also won the Academy Award for Best Picture. Its IMDb rating is 7.8, with Rotten Tomatoes ratings of 90% from critics and 88% from general audiences. It has 1182 shots and average shot duration of 6.1 s. Among its transitions it has nine dissolves in its opening montage, but otherwise no fades, wipes, or other dissolves. It has 42 short flashbacks (average duration of 3.34 s, and a median duration of 2.17 s). Moreover, much of the movie is edited in a parataxic style; that is, there is often little apparent relation or lead in from one scene to the next. Rather, the simple juxtaposition of scenes forces the viewer to figure out some of the narrative threads and connections as the movie progresses.

Source Code (Jones et al., 2011) is a color action/science fiction/puzzle film. Puzzle films (see Buckland, 2009) are designed to be complex and break the boundaries of classical plot structure. Source Code has an IMDb rating of 7.5 and Rotten Tomatoes ratings of 92% from critics and 82% from audiences. It has 1478 shots, yielding average shot duration of 4.36 s. It has many complex, compound transitions (swirls), but no standard dissolves, fades, or wipes. It also has 27 changes of venue—cycling back and forth between two locales.

Procedure

Each movie was projected in a classroom with institutional LCD and sound equipment from a laptop computer with mp4 files downloaded from commercial DVDs. The movies differed in aspect ratios (image width/height)—the first five at 1.37 (Academy ratio), the sixth at 1.85 (widescreen), and the last at 2.35 (Cinemascope)—but the width of the projected images on the screen was constant. Its lateral subtense varied according to the position of the viewer in the room from about 45° (comparable to the view from the middle of a standard movie theater) to about 20° (comparable to viewing a movie on a laptop).

Numbered, sequential content summaries (henceforth called scenarios) for each movie were purpose written for these studies. Each entry was concise with a mean description length across movies of seven to twelve words. The number of itemized entries for each movie is given in Table 1. Scenarios were handed out immediately prior to viewing.Footnote 2 As the room was quite dark, only a few looked at these at the time. Across movies, an average of 55% of the entries in these summaries indicated a scene boundary (here, a change in location and/or time; but see Cutting et al., 2012; Polking, 1990); the rest elaborated content continuing within a scene. These data are shown in Table 2 and are discussed later.

Table 2 Viewer segmentations at scene and non-scene boundaries

No classroom discussion occurred after the screenings. Given constraints of course scheduling there was no time to do so. Instead, students were encouraged to think independently about the movie over the next days and then, with the aid of the scenario, segment the movie into two to six parts according to their own interpretations of the narrative by placing lines between the particular entry numbers that marked the transition between one narrative part and the next. This range appeared not to constrain results. Across the seven movies the percentages for values of two to six segments were 2, 9, 24, 50, and 15%, respectively. Table 1 gives more specifics for each movie.

Viewers then wrote a three- to five-page essay justifying their divisions. They were explicitly told that there was no correct answer, only justifiable ones. At the time of their viewings, they had not been instructed about various theories of large narrative parts in movies (Field, 2005; Thompson, 1999), although classroom discussion revealed that several were aware of the general notion of a three-act structure (Field, 2005). Given the number of large segments reported for each movie in Table 1, there is clearly more going on in viewers’ responses than simply trying to impose three acts. Essays and segmentations were gathered at the next class; only the segmentations are discussed here. Although we were unable to control possible collaborations, for six of the seven movies none of the segmentation data across all possible pairs of viewers correlated perfectly (0 of a total of 800 comparisons). We discuss the seventh film below.

Each viewer’s segmentations were recorded for each movie on a spreadsheet. Boundaries were entered as corresponding to the numbered entry on the scenario that began a new segment. These values were averaged across entries for all viewers, becoming y axis values. Finally, for graphical purposes, the sequential increments along this vector were adjusted to reflect the movie duration corresponding to each entry; thus, becoming x axis values. The result was then plotted as boundary agreement (the proportion of viewers indicating each entry as the onset of a new narrative segment) by running time through the movie.Footnote 3 For four of the seven movies these vectors were compared to professional reports: two by Thompson (1999) and two by Bordwell, one from his blog (Bordwell, 2011) and one from personal correspondence. For the other three, we segmented the movies according to the published guidelines and descriptions of Bordwell (2007) and Thompson (1999). We will call all of these film-theoretic segmentations.

We will consider the movie narratives and viewers’ corresponding results in the chronological order of the movies’ release dates. In what follows, the duration of each movie (given in Table 1) was normalized to 1.0. Detailed synopses are given in the Appendix. There, segmentation boundaries selected by at least two observers are indicated as a function of the proportional runtime through the movie, and expressed as a proportion of the number of viewers of the movie.

Results and preliminary discussion

Wings

The results for Wings are shown in the left panel of Fig. 1. Again, on the ordinate is the segmentation mean (proportion of agreement among viewers that a boundary has occurred), and on the abscissa is the proportion of runtime through the movie. The thin blue lines are the combined viewers’ data and the red, thicker-lined spikes are segmentation points derived from Thompson’s (1999), p. 357 analysis of this movie. We determined these by matching her published segment durations with our time stamps.

Fig. 1
figure 1

The correspondence between the aggregated viewer responses (proportional agreement that a narrative boundary has occurred) and theoretical considerations in segmenting three early films. The thin blue spikes are the viewers’ data; the thicker red spikes in the left and middle panels derive from Thompson (1999), and the thicker green spikes derive from descriptions by her and by Bordwell (2006). The width of the base of the spikes is due to two factors. The left flank represents the duration of the last scene before the transition, and the right flank the duration of the scene following it. The right panel also has a series of numbers that represent the beginnings and ends of the nested flashbacks

The three main divisions are marked—separating the setup (33 min in length) from the complicating action (29 min; here and elsewhere simply called the complication) from the development (27 min)Footnote 4 from the climax (37 min). These breaks are given values akin to 30% agreement so that they can be more easily compared across movies and panels in Figs. 1, 2, and 3. Thompson also designated an epilog for Wings. Since she proposed that epilogs are part of the climax, the red spike for the epilog is given a smaller value, here at 15%. Values of zero are given for all other entries. Segmentation points within the narrative and with their particular agreement across viewers are noted in the synopsis in the Appendix.

Fig. 2
figure 2

The correspondence between viewer segmentations (agreements that a narrative boundary had occurred; thin blue spikes) and those that that follow the rubrics of Thompson (1999) and Bordwell (2006) as thicker green spikes, and those provided by Bordwell (personal communication) as thicker red spikes. The numbers in the right panel denote the time period of the flashback

Fig. 3
figure 3

The correspondence between viewer segmentations (thin blue spikes) and segmentations for two more recent films that follow the rubrics of Thompson (1999) and Bordwell (2006) in the left panel, and that correspond to Bordwell’s (2011) segmentations in the right panel

The mean correlation among viewers’ response patterns was substantial, and is given in Table 1.Footnote 5 Analogous to leave-one-out cross-validation (e.g., Shao, 1993), we correlated each viewer’s segmentation pattern with the average segmentation pattern of all the other viewers across the 82 scenario entries. We then took the mean of those leave-one-out correlations (mean r = 0.58, t (80) = 6.3, p < 0.0001, d = 1.4). Moreover, correlating the aggregated viewer data with the Thompson predictions of four large narrative parts plus an epilog shows that the agreement is also quite impressive (r = 0.75, t (80) = 10.1 p < 0.0001, d = 2.26). We take this result to be an endorsement of the Thompson/Bordwell theory that traditional popular movies have four large narrative parts with the last (the climax) having an optional epilog.

Yet there are many fades in the movie bracketing an intertitle and these might help viewers segment the narrative. Indeed, we created a 2×2 table for each viewer, with fade transitions that were chosen as boundaries (hits) and non-fade transitions chosen as boundaries (false alarms), versus fades not chosen (misses) and non-fades that were not chosen (correct rejections). We then calculated signal-detection indices corresponding to viewers’ boundaries—mean d’ = 0.90 (t (11) = 5.18, p = 0.0003). Thus, and unsurprisingly, the particular transitional information used by the filmmakers could have been used by viewers while encoding the narrative. However, no indication of the transition type (fade, dissolve, cut, or even intertitle content) was given on the scenarios. Thus, by the time the viewers set out to segment the movie, this particular information would almost certainly not be remembered.

Grand Hotel

The results for Grand Hotel are shown in the middle panel of Fig. 1. Thompson’s (1999), p. 357 large parts are 24, 32, 27, and 27 min in duration. The viewers’ data show that three of the five largest data peaks occur at locations given by Thompson for this movie, but the results appear a bit less impressive than those for Wings. Again, all the peaks in the figure are noted in the synopsis in the Appendix.

The mean leave-one-out correlation of response patterns in the 42 scenario entries across viewers was more modest than that for Wings, but still reasonable (mean r = 0.37, t (40) = 2.42, p = 0.02, d = 0.77). Overall and again, the pattern of correspondence between the aggregated data and the predictions by Thompson is quite strong (r = 0.67, t (40) = 5.7, t (40) = 5.7, p < 0.0001, d = 1.8), with three substantial peaks in the data at the boundaries that Thompson assigned.

However and again, there are many fades in the movie and this low-level visual information could have helped some viewers in their segmentations. Signal detection analysis (fades as segment boundaries and not, against non-fades as boundaries and not) again yielded results roughly consistent with their use (mean viewer d’ = 0.42, t (15) = 2.09, p = 0.054). These, of course, were designed by the filmmakers to do exactly that. But again, there was no indication of transition type on the scenario, so this information if relevant could only have been used in storing mental models of the narrative at the time of watching the movie (e.g., Swallow, Zacks, & Abrams, 2009).

Passage to Marseille

The right panel of Fig. 1 shows the data, the large parts, and the numbered flashback locations for Passage to Marseille. Shown as green spikes and prior to gathering the data, we segmented the major parts of the movie at points 0.33, 0.70, and 0.88, with a turning point at 0.12 and an epilog at 0.98, yielding large narrative parts of 36, 38, 20, and 13 min. Overall segmentation data are again given with the synopsis in the Appendix.

The leave-one-out correlations among viewer segmentations were very substantial (mean r = 0.82, t (59) = 14.7, p < 0.0001, d = 3.8), as were aggregate viewer results to the film-theoretic divisions (r = 0.85, t (59) = 12.2, p < 0.0001, d = 3.2). However, it should be no surprise that these results are driven by the many flashback patterns of the movie, noted numerically in the right panel of Fig. 1. Indeed, likely because of these and unlike any of the other movie data in this sample, some of the pairwise segmentation comparisons across viewers yielded perfect correlations (18 of 105).Footnote 6

Clearly, the filmmakers used flashbacks (and their accompanying fades and dissolves) to aid segmentation. Using the 2×2 table of boundaries with and without fades by non-boundaries with and without fades, this information could have been influential in segmentation for individual viewers at the time of encoding (mean d’ = 0.53; (t (15) = 4.16, p = 0.0008). Perhaps because flashbacks dominate viewer segmentations, filmmakers after the 1940s and 1950s decided that the block construction (Bordwell, 2017) of the narrative generated by long flashbacks constrain storytelling too much and stopped using them as often.

Rope

Because it is a short feature film (80 min), Rope is a candidate for having only three major sections (Thompson, 1999). The left panel of Fig. 2 shows the response data. Our prior-to-viewing divisions, based on Bordwell’s (2006) and Thompson’s (1999) general descriptions yielded segments of 27, 29, and 23 min in duration. Again, mean segmentations are discussed in the Appendix.

There was considerable correspondence among viewer responses (mean r = 0.47, t (66) = 4.3, p < 0.0001, d = 1.06) and between their pooled responses and film-theoretic segmentations (r = 0.47, t (66) = 4.3, p < 0.0001, d = 1.06). The importance of both results in this context is that Rope has no cuts, fades, or flashbacks that helped with segmentation, and only one dissolve associated with a response peak (which was not a major boundary). Thus, these results must be solely driven by viewers’ cognitive inferences while encoding the story.

All About Eve

All About Eve is a long and complex movie as outlined in the Appendix. Fortunately, we were able to enlist David Bordwell, who provided us with an authoritative set of divisions.Footnote 7 He suggested that it was appropriate to divide it into five parts with durations 27, 36, 21, 31, and 21 min. The addition is a second development stage. The correspondences between the viewers’ and his segmentations are shown in the right panel of Fig. 2.

There was adequate correspondence among viewer segmentation patterns (mean r = 0.30, t (57) = 2.3, p = 0.023, d = 0.61). And, despite some discrepancies and the dominance of the flashback, the relation between the viewers’ segmentations and Bordwell’s was solid (r = 0.58, t (57) = 5.38, p < 0.0001, d = 1.42). However and again, the patterns of fades and dissolves may have played a part in viewers’ responses (mean d’ = 0.46, t (14) = 3.97, p = 0.0014).

Ordinary People

The left panel of Fig. 3 shows the data and our segmentations for Ordinary People, which created large segments of 32, 36, 26, and 25 min. The correspondence among viewer responses patterns is substantial (mean r = 0.36, t (71) = 3.2, p = 0.002, d = 0.76) as is that between viewers’ collective data and our segmentations (r = 0.46, t (71) = 4.4, p < .001, d = 1.04). Moreover, as with Rope, there were no fades or dissolves to help viewers along the way while watching the movie.

Source Code

Like Rope, Source Code is a relatively short feature. Thus, Bordwell (2011) segmented it into three parts (33, 34, and 18 min) with a short epilog, as shown with red spikes in the right panel of Fig. 3. As expected for a puzzle film and as indicated in the Appendix, Source Code has a complex, nonstandard story. Because there are at least 27 changes back and forth between two diegetic locations—a Chicago commuter train and a Nevada Army laboratory—12 segments of which are more than a few minutes long, many other segmentations were possible. Nonetheless, there is satisfactory correspondence among viewers (mean r = 0.25, t (106) = 2.63, p = 0.01, d = 0.51), and a solid correspondence between the mean responses by the viewers and by Bordwell (r = 0.44, t (106) = 5.04, p < 0.0001, d = 0.98). Finally, the presence of swirling distortions as transitions between the two venues (train and lab) provided no aid in viewers’ segmentations (mean d’ = − 0.05).

Aggregated results

Turning points

The correspondence of the viewers’ data with the larger narrative segments proposed by Thompson and Bordwell is quite robust across all seven movies. However, in six of the movies there is reasonable evidence that the viewers also thought that a turning point within the setup also marked an important boundary. Indeed, it was the most prominent boundary in Ordinary People and Source Code, the second most prominent in Rope, and quite substantial in Wings, Passage to Marseille, and All About Eve.

Field (2005) placed emphasis on the concept of an inciting incident (also called a turning point) in the setup; Bordwell (2016) noted that screenwriting manuals typically promote an early inciting incident; and Thompson (2008) has suggested that an inciting incident is one of many turning points. Be that as it may, viewers in this context were clearly influenced by an early turn in the narrative independent of the larger parts that followed. From a screenwriting and filmmaking perspective this is a well-known design feature. Indeed, Ebert (1985) suggested that an inciting incident early in the movie is necessary because of what he called Brotman’s law (named after a Chicago movie exhibitor): “If nothing has happened after the first reel [10 to 12 minutes], nothing is going to happen.” In other words, film practice necessitates some early event that hooks the viewer into the narrative. Waiting until the setup/complication boundary may be too late.

Accumulated movie segmentations

We next combined the results from all seven movies by placing viewers’ responses into nine categories: (1) averaging all segment boundaries placed in all entries before the setup, (2) those at the setup/complication boundary, (3) those between that boundary and the complication/development boundary, (4) those at the complication/development boundary, (5) those between that boundary and the development/climax boundary, (6) those at the development/climax boundary, (7) those between that boundary at the beginning of the epilog, if any, (8) those at the beginning of the epilog, if any, (9) and those after the beginning of the epilog. Mean observer agreement and 95% confidence intervals are shown in Fig. 4. The omnibus effect across these nine categories (with films entered as a nominal variable) was robust (F (8,43) = 13.4, p < 0.0001, η2 = 0.67), and contrasting the boundary and within-large-segment results was equally so (t (56) = 8.72, p < 0.0001, d = 2.33). Interestingly, there was no substantial difference among the movies (F (6,43) = 1.4, p = 0.23).

Fig. 4
figure 4

Gray bars show the mean interobserver correlations in segmentation performance at nine sections pooled across the seven movies: (1) at all scenario entry boundaries before the end of the setup, (2) at the setup/complication scenario boundary, (3) at all scenario boundaries within the complication, (4) at the complication/development boundary, (5) at all boundaries within the development, (6) at the development/climax boundary, (7) at all boundaries within the climax before the epilog, (8) at the epilog boundary, and (9) at all scenario boundaries after the beginning of the epilog. Black ribbons indicate the 95% confidence intervals

In addition, since this research is a within-subjects design, we looked at some overall results in the correlations between the viewers’ segmentations on the one hand and the film-theoretic segmentations on the other across the seven movies. Perhaps somewhat surprisingly, there was little evidence for individual differences (F (17,87) = 1.32, p = 0.20). Our surprise is based on the widespread notion that there is considerable differential appreciation for all forms of entertainment. Indeed, movies in particular are often touted as one of the arenas where people vary greatly (e.g., Chamorro-Premuzic, Kallias, & Hsu, 2013; Rentfrow, Goldberg, & Zilca, 2011). However, one should remember that the ability to segment a movie is not the same thing as liking a movie. Segmentation is a strong correlate of understanding (Sargent et al., 2013), not of affinity.

Boundary sharing across event sizes

In addition, and in keeping with a meronomical approach to event cognition, we compared viewers’ responses at points within each film that were scene boundaries (changes in location and/or time) on the scenarios with those that were not. We accumulated boundary judgments across observers at each scene break and non-scene break and compared the two distributions. As shown in Table 2, for the six movies that had scene boundaries (Rope does not) there were more segmentations in each movie at scene boundaries than within scenes, and together the aggregate revealed a strong effect (t (418) = 7.12, p < 0.0001, d = 0.70). This result is consistent with the idea that larger events share boundaries with the smaller events that they subsume, which is the criterion for meronomy. An example of a scenario fragment is given in Table 3. It has six non-scene breaks and eleven scene breaks, two of which is a major break in the narrative. Also shown are the location changes, time changes, and the number of segmentations offered by the viewers.

Table 3 A fragment of the scenario for All About Eve (1950)

One might worry about the data of Experiment 1. That is, one might suspect that individuals not having seen a movie could appropriately segment its narrative simply on the basis of studying the written scenarios. After all, there the narrative is laid out in plain view and with some effort its structure ought to be reasonably discernible. We certainly believed this to be true, but we thought it possible that the responses of non-viewers might not conform to the film-theoretic segmentations as well as those of viewers who had recently seen the movie. Thus, it seemed appropriate to assess the possible differences between segmentations of viewers and non-viewers. Non-viewers were recruited online from Mechanical Turk.

Methods

Four of the seven movies were selected—Wings, Rope, All About Eve, and Ordinary People. In a Qualtrics survey, instructions from Experiment 1 were repeated as closely as possible. To encourage thorough reading the scenarios were modified so that four numbered entries appeared on the display screen at a time, and the workers had to page through the listed entries. Following the last page of entries, they were provided with the same numbered list, but this time the entire list of entries was on one page, and each entry had an empty check box to the left of it. Workers were instructed to segment the film into two to six parts by checking the boxes next to the entries that they believed began a new segment of the film. To ensure that they thoughtfully completed the task, they were also instructed to provide a brief narrative summary of each segment in text boxes provided below the complete list of entries.

Each participant was randomly assigned to segment one of the four movies. A total of 201 workers were enlisted, but 77 were then eliminated because they didn’t follow instructions (they segmented the narrative into too many units, they didn’t distribute segmentations throughout the movie, or they didn’t complete the narrative summaries). An additional 38 were eliminated because we believed they rushed through the task (we retained only those who took at least 10 min to complete the segmentations). This left us with 86 usable response sets—23 for Wings, 22 for Rope, 18 for All About Eve, and 23 for Ordinary People. The reported average age of these non-viewers was 35 years with a range from 18 to 69, and with 42 reporting to be male, 39 female, and five not reporting. While 11 of our 86 participants reported having seen their movie before (two for Wings and three each for the others), we included their data due to the fact that they were unlikely to have seen the movie recently, and were therefore unlikely to have retained a thorough memory of the narrative. Those viewers are noted in Fig. 5.

Fig. 5
figure 5

A comparison of the individual subjects’ data (dots and squares) expressed as correlations of their segmentations against film-theoretic segmentations. Viewers are the students in Experiment 1 who watched the movies and segmented them on written scenarios; non-viewers were the online workers of Experiment 2 who did not see the films but who read and responded only to the scenarios. The 11 non-viewers indicated by small squares claimed to have seen their film before. Faint, gray vertical lines indicate means and gray bands indicate standard errors of the means

Results

The non-viewers offered slightly fewer segmentations (4.28) than did the viewers (4.74, t (147) = 3.15, p = 0.002), although this difference seems relatively small and with unclear impact. More importantly, shown in Fig. 5 are comparisons of the correlations for non-viewers’ and viewers’ responses with the film-theoretic segmentations. The viewers’ responses matched the film-theoretic segmentations only slightly better than those of non-viewers for Wings (t (34) = 1.48, p = 0.15) and for All About Eve (t (33) = 0.65, p = 0.52), but matched them better for Rope (t (36) = 2.74, p = 0.01) and for Ordinary People (t (38) = 4.53, p < 0.0001). An aggregated mixed-model analysis of correlations with film-theoretic divisions across the four films within viewers and between films for non-viewers revealed a difference between groups (t (38.2) = 3.65, p = 0.0008, d = 1.18). In a post-hoc analysis after inspecting Fig. 5, we also found it impressive that 27 of the 86 non-viewer/film-theoretic segmentation correlations hovered around zero (including 7 of the 11 non-viewers who claimed to have seen their movie before), whereas only 6 of the 57 viewer/film-theoretic correlations did so.

It should also be noted that the non-viewers’ responses, as we suspected, were not random, but correlated with the film-theoretic segmentations for three of the movies. One-sample tests revealed reasonable segmentation ability for Wings (t (22) = 5.8, p < 0.0001), Rope (t (21) = 3.17, p = 0.0046), and All About Eve (t (17) = 3.84, p = 0.0013). Only the non-viewer response patterns for Ordinary People (t (22) = 1.83, p = 0.08) were generally unstructured.

It occurred to us that some of this “above chance” segmentation performance of non-viewers might be due to surface linguistic cues—particularly words that signal time or location changes. However, we found little evidence of this. For example, “meanwhile” never appears in the scenarios, “later” occurs at a major boundary once out of the seven times it appears, “next” occurs at a boundary one out of six times, and “then” once out of 17 times. We also tracked overt changes in locations (going from inside a room to outdoors, a new city mentioned, going from one activity to another when those can only be done in different locations, etc.). We found ten of these corresponded to major boundaries, but there were 79 other such location changes in the scenarios overall. Our best guess, then, is that many of the non-viewers really did understand aspects of the structure of the narrative from reading the scenarios; they just didn’t generally perform as well as the viewers.

Conclusions and a speculation

We can hardly claim that these seven movies are representative of popular cinema, of English-language cinema, or even of Hollywood cinema. However, they do represent a swath of narrative formats and time periods—a silent movie; a contemporary movie; a network movie; movies with flashbacks; movies with three, four, and five large parts; and movies across a selection of genres—dramas, action films, a suspense film, and a science fiction/puzzle film. And this small sample is helped by the fact that Bordwell (2006) and Thompson (1999) have claimed that there has been a consistency in the overall narrative structure of popular movies regardless of genre or changes across a century of popular filmmaking. Our results endorse that view.

Thus, although one might rightly complain about (1) our bobtailed selection of movies, (2) the fixed order in which they were viewed, (3) the fact that viewers did not segment the movie while watching it, as is typically done in other event segmentation experiments, and (4) the fact that viewers had access to the scenarios while viewing each movie (in the dark), the results are fairly robust and fivefold.

First and most importantly, viewers were reasonably consistent in segmenting movies into large parts, and with no substantial individual differences. This provides evidence for large-scale events in movies.

Second, those segmentations comport well with film-theoretic segmentations by professionals in four cases (Wings, Grand Hotel, All About Eve, and Source Code), with those we provided in advance following film-theoretic guidelines for three others (Passage to Marseille, Rope, and Ordinary People), and with no noticeable differences between the two sets. To be sure, the overall results are far from showing uniform congruence among viewers, but across participants these segmentation findings are roughly of the same strength as those for segmenting smaller events (e.g., Zacks, 2004).

Third, the segmentations of viewers in Experiment 1 match the film-theoretic segmentations generally better than non-viewers in Experiment 2 who had access only to the scenarios.

Fourth, as shown in Table 2, the boundaries of these large-scale events are firmly related to the boundaries of scenes, the next smaller scale events. Thus, the relation between the two seems meronomical.

Fifth, there is no strong evidence that viewers used anything but cognitive resources to do their segmentations. To be sure, five movies (Wings, Grand Hotel, Passage to Marseille, All About Eve, and Source Code) had surface cues to some segmentations that were provided by the filmmakers in terms of intertitles, fades, dissolves, or other salient transitions. But in one case (Source Code) viewer segmentations did not reliably correlate with those cues, in another (Grand Hotel) the correlation was weak, and in the other two movies (Rope and Ordinary People) there was no surface information whatsoever that could help.

The idea that cognition alone would be the basis of these segmentations fits with the general schema of event segmentation theory (Zacks & Swallow, 2007). That is, more fine-grained segmentation is typically done bottom-up on the basis of perceptual information (like motion), but more coarse-grained segmentation is done top-down on the basis of conceptual features (like an agent’s goals, and here perhaps the filmmakers’ goals). The event segments researched here are much larger than those typically discussed in the event processing literature. On the basis of these results it makes sense that these large parts in movies can be segmented without needing lower-level perceptual information, and done on the basis of the organization of the many event models (or situation models; Zwaan & Radvansky, 1998) built up over the course of watching a movie.

Finally, and still missing, is any firm understanding for why these larger units should be roughly 20 to 35 min in duration. In this sample they range from 20 to 38 min except for two shorter climaxes, one in Passage to Marseille (13 min) and the other in Source Code (18 min). Bordwell (2006) and Thompson (1999) both noted that climaxes are often the shortest segments in movies. But, as we quoted earlier (Thompson, 1999, pp. 43–44), is this half-hour span a cognitive constraint?

Perhaps the only cognitive domain relevant here is that of vigilance, the measure of sustained attention over prolonged periods of time (see Parasuraman, 1986), typically in the context of detecting rare targets. Individual differences in this domain are large, and many experiments have measured vigilance during tasks of high stress, such as those for radar operators, air-traffic controllers, and TSA inspectors. Nonetheless, performance falloff (called vigilance decrement) occurs in all situations, including low-load tasks (Frankmann & Adams, 1962), particularly after about 15 to 30 min (Grier et al., 2003; Parasuraman, 1986).

Popular movies, particularly contemporary ones, tend to have relatively complex narratives and narrational structures, not rare targets, but sustained attention is nonetheless a prerequisite. Understanding movies is not always easy, particularly when a brief moment of dialog can quickly change the direction of the plot. Difficulty in sustained attention to a narrative may account for some of the decline in movie watching by older adults (Armstrong & Cutting, 2017).

So here is our conjecture: narrative change, such as that which occurs at a boundary between larger-scale narrative events studied here, may sufficiently freshen the task of understanding so that the viewer can better sustain attention for a new period of 25 min or more. The idea here is that when the narrative goes in a different direction, the viewer has to work harder. Perhaps it is not a coincidence that, in a vigilance task, Thomson, Smilek, and Besner (2015) showed that making a task harder reduces the vigilance decrement—that is, people sustain their attention better. Borrowing an idea from Berlyne and Parham (1968) in experimental aesthetics—a domain likely closer to movies than to vigilance—Thomson et al. (2015), p. 387 suggested:

It may therefore be the case that by increasing the perceptual variability of critical targets in a vigilance task, one might arrive at a situation in which the novelty of the task persists for longer, thus holding attention and reducing performance decrements.

Perhaps it is the variability in the movie narrative, particularly at boundaries between large-scale parts, that keeps viewers interested. Indeed, in terms of cross-cutting scenes in narratives, there is evidence that movies have become more complex over the last 70 years (Cutting, 2019b). Moreover, the greater motion, the greater change in luminance, and the greater range of shot durations that typically occur in the climax (Cutting, 2016)—in addition to bringing the story to a close – may help to sustain attention at the end of a nearly two-hour movie experience.

Summary evaluation

We began this article with an overview of the meronomical organization of many of the arts. We briefly discussed music, literature, theater—for which the evidence seems quite obvious—but also dance, and even aspects of movies. These all have units within units sharing boundaries. Thus, the results here may not seem to be much of a surprise. Essentially, we’ve added psychological evidence to the theoretical notion of a unit intermediate in size between that of scenes and sequences on the one hand and whole movies on the other. This can rightfully be seen as an incremental addition, but not one foundational importance.

We would agree, but we wish to push back somewhat on this simple evaluation. For over a century, filmmakers of popular movies have perfected what is called continuity editing. This is a film style that deliberately attempts to elide across, if not completely hide, boundaries among all narrative units. The typical goal of popular filmmakers is to make the progression of events in a movie as seamless as suits the narrative. Thus, cuts between shots are often hard for viewers to detect because of matching on action (Smith & Santacreu, 2016), for example, with leftward motion at the end of one shot followed by leftward motion in the next, within or across scenes; audio coverage (Shimamura, Cohn-Sheehy, Pogue, & Shimamura, 2015) and sound overlaps (called split-edits; which include J-cuts, where the audio track of the first shot of the next scene begins before the last shot of the previous scene is finished, and L-cuts, where the audio lags from the last shot in the previous scene into the first shot of the next scene); and various kinds of hooks (Bordwell, 2008), including sound bridges (where the sound of the last shot in one scene is mimicked by the sound of the first shot in the next), graphic matches (where the aspects of the geometric layout of the ending image in one scene is mimicked by that of the first shot of the next), audio to visual juxtapositions (mention in a dialog about a particular book, immediately followed by an image of that book), and vice versa. All of these devices are used to knit scenes together, hiding boundaries by perceptual and cognitive tricks. Thus, psychological evidence that movie viewers can retain larger-scale narrative information in the face of continuity editing is welcome and not a foregone conclusion.

In addition, there are several film-theoretic concerns. Not all film theorists believe that there really should be any psychological reality to concept of an act in a movie. For example, one film glossary suggests: “Since screenplays never show act breaks, an ‘act’ is really a theoretical concept” (August, n.d.). Our data contravene this idea. In addition, our data inform the discussions about two film-theoretic controversies: three (Field, 2005) versus four (Thompson, 1999) large-scale narrative units, and the variable number of such units depending on the length of the movie (Thompson, 1999). Pretty clearly our data support the notion of four such units for average-length movies, three for shorter ones, and five for longer ones. But finally, our viewers and non-viewers alike seemed quite attuned to an early turning point within the first large-scale unit, the setup. We have no evidence that they treated this boundary any differently than the others.