Introduction

Biological motion (BM) refers to the movement of animate entities (e.g., walking, jumping, or waving by humans; Johansson, 1973; Troje, 2013). It is one of the most important and sophisticated stimuli encountered in our daily lives. For instance, BM processing is critical for prosocial behavior and nonverbal communication (Blake & Shiffrar, 2007; Pavlova, 2012). The processing capability of BM is currently suggested to be a hallmark of social cognition (Gao, Ye, Shen, & Perry, 2016; Pavlova, 2012). Unsurprisingly, research on BM has been among the most important and fruitful fields in visual/social cognition in the recent decade (for reviews, see Blake & Shiffrar, 2007; Pavlova, 2012; Puce & Perrett, 2003; Steel, Ellem, & Baxter, 2015; Troje, 2013). Researchers from such varied areas as psychology, neuroscience, clinical sciences, and robotics, have investigated BM using behavioral, neuroimaging (ERP/EEG, fMRI, MEG, fNIRS, and TMS), and modeling methods (e.g., Bardi, Regolin, & Simion, 2011; Blakemore, 2008; Jaywant, Shiffrar, Roy, & Cronin-Golomb, 2016; Karg et al., 2013; Kawai, Asada, & Nagai, 2014; Kim, Doop, Blake, & Park, 2005; Koldewyn, Whitney, & Rivera, 2011; Loula, Prasad, Harber, & Shiffrar, 2005; Mather, Battaglini, & Campana, 2016; Miller & Saygin, 2013; Puce & Perrett, 2003; Rizzolatti, Fogassi, & Gallese, 2001; Shen, Gao, Ding, Zhou, & Huang, 2014; J. Thompson & Parasuraman, 2012; Urgen, Plank, Ishiguro, Poizner, & Saygin, 2013; van Kemenade, Muggleton, Walsh, & Saygin, 2012).

To explore the mechanisms of BM, it is important to use a set of stimuli that can efficiently and elegantly convey the movements of animate entities. Given that a considerable amount of non-BM-related information is contained in a scene (e.g., a person’s skin color, hair, and clothing), researchers have attempted to extract pure BM information by removing irrelevant data. Johansson (1973) successfully solved this issue by developing a point-light display (PLD) technique, which depicts human movements through a set of light points (e.g., 12 points) designated at distinct joints of the moving human body. Although highly impoverished—for instance, they do not provide information pertaining to skin color, hair, or clothing—once in motion, PLDs can rapidly be recognized as showing coherent and meaningful movements. Numerous studies have shown that important social information, such as gender, emotion, intention, and direction of motion, can be extracted from PLDs (Atkinson, Dittrich, Gemmell, & Young, 2004; Blakemore, 2008, 2012; Loula et al., 2005; Pollick, Kay, Heim, & Stringer, 2005; Pollick, Lestou, Ryu, & Cho, 2002; Rizzolatti et al., 2001; B. Thompson, Hansen, Hess, & Troje, 2007; Troje & Westhoff, 2006; for reviews, see Blake & Shiffrar, 2007; Pavlova, 2012; Puce & Perrett, 2003; Troje, 2013), even when they are passively observed (e.g., Perry, Bentin, et al., 2010; Perry, Troje, & Bentin, 2010), or are embedded in a set of dynamic noise dots (e.g., Aaen-Stockdale, Thompson, Hess, & Troje, 2008; Bertenthal & Pinto, 1994; Cutting, Moore, & Morrison, 1988; Ikeda, Blake, & Watanabe, 2005; Neri, Morrone, & Burr, 1998; Thurman & Grossman, 2008). By using PLD-based BM for stimuli of interest, studies involving infants, young and older adults, patients with lesions, and animals (e.g., chickens and pigeons) have revealed converging evidence that our brain has evolved specialized neural circuitry to process BM information, and that this involves the superior temporal sulcus and ventral premotor cortex (e.g., Blake & Shiffrar, 2007; Gao, Bentin, & Shen, 2014; Gilaie-Dotan, Kanai, Bahrami, Rees, & Saygin, 2013; Grossman, Battelli, & Pascual-Leone, 2005; Lu et al., 2016; Puce & Perrett, 2003; Troje & Aust, 2013; Vallortigara, Regolin, & Marconato, 2005).

PLD stimuli are at present the most popular ones used in BM-related research, significantly promoting our understanding of BM mechanisms. Therefore, how to effectively and efficiently produce PLD stimuli becomes an important issue deserving of attention. Three main techniques have thus far emerged in the literature for creating PLD stimuli: video recording, artificial synthesis, and motion capture. We specify the characteristics of each technique below.

Video recording

Johansson (1973) created PLDs by recording videos. He either attached low-voltage bulbs to the major joints of human agents wearing black clothes and filmed their motion in a dark room, or attached reflex patches to their main joints and illuminated them to obtain reflection from the patches. The recorded stimuli can vividly reflect the natural movements of agents. However, this method has several disadvantages: (1) The accurate position of joints in 3-D space cannot be obtained; instead, this is estimated by fixing the bulbs or reflex patches on the surface of the body. (2) It takes a long time to prepare for the recording, which involves the agents putting on special tights, fixing markers, and so forth. (3) This method can only provide a 2-D version of PLD, which constrains its flexible use.

Artificial synthesis

Cutting (1978) created an algorithm to simulate point-light walkers. The algorithm can occlude joints, fluctuate bodies, and adjust the stride or contrast of PLD. The stimuli thereby were more controllable than the video format created by video recording. Verfaillie, De Troy, and Van Rensbergen (1994) later improved Cutting’s algorithm by allowing for the presentation of stimuli in different in-depth orientations; Hodgins, Wooten, Brogan, and O’Brien (1995) further updated algorithms to correct joint velocity and joint position and obtain natural BM at multiple walking speeds. At the same time, artificial synthesis has at least two disadvantages: (1) Each new BM needs a corresponding new algorithm. (2) Obtaining natural BM, to some extent, is more difficult via artificial synthesis than the video recording method. Previous studies suggested that the PLD stimuli generated via artificial synthesis had certain divergence from the natural BM of human beings. For instance, Runeson (1994) pointed out that synthesized PLD stimuli may be missing information about dynamics, such as “mass, elasticity, energetic processing, or neural mechanisms” (p. 392; see also Saunders, Suchan, & Troje, 2009). Indeed, Troje and Westhoff (2006) showed that the PLD stimuli generated via the Cutting (1978) algorithms lacked a visual invariant inherent to the local motion of a natural walker’s feet: Observers could successfully retrieve the walking direction from a real walker based on the local motion of feet, yet could not retrieve this information from the synthesized PLD. These divergences had to be addressed to get natural BM.

Motion capture

With developments in technology, powerful motion capture systems have enabled us to collect trajectories with sufficient precision. Of these, the outside-in system of motion capture (e.g., the Falcon Analog Optical Motion Capture System, the Qualisys MacReflex Motion Capture System, and ShapeWrap III) consists of external sensors (e.g., high-speed camera) and on-body sources (e.g., reflex patches), and is commonly used to construct PLD stimuli. With the aid of computers, researchers can obtain natural BM easily while conveniently adjusting the BM parameters (e.g., orientation). Therefore, the motion capture method combines the advantages of video recording and artificial synthesis.

Nowadays, state-of-the-art motion capture systems are abundantly used to record and investigate BM. For instance, currently used BM databases (Ma, Paterson, & Pollick, 2006; Manera, Schouten, Becchio, Bara, & Verfaillie, 2010; Vanrie & Verfaillie, 2004; Zaini, Fawcett, White, & Newman, 2013) were all constructed by motion capture systems, which significantly promote BM investigation. However, these motion capture systems are expensive and bulky, and the process of constructing PLD-based BM is time consuming even for a well-trained technician. Therefore, in the absence of a motion capture system, or in a scenario in which a public BM database does not contain necessary stimuli, the exploration of the BM mechanism is impeded. To this end, we think that the use of low-cost sensors to generate PLD stimuli has practical and theoretical significance.

The Microsoft Kinect sensor is a popular, low-cost, and markerless motion capture system first developed in 2010 for the gaming industry. With an infrared emitter and a depth sensor, the Kinect sensor can detect the contours of a human body and identify 25 joints (for Kinect 2.0) in 3-D space with high precision (Xu & McGorry, 2015). It only costs USD 140 (according to www.microsoft.com, 2016), but has fairly fast data processing. Moreover, the Kinect is easy to use and does not require complex calibration processes; users can typically master it in 15 min. Therefore, researchers have attempted to use the Kinect to replace traditional, expensive devices to track human movement and measure postural load. They found that the Kinect is a fast and reliable motion capture system for practical use (e.g., Bonnechere et al., 2013; Bonnechere et al., 2014; Clark, Bower, Mentiplay, Paterson, & Pua, 2013; Clark, Pua, Bryant, & Hunt, 2013; Dutta, 2012; Mousavi Hondori & Khademi, 2014; van Diest et al., 2014; Xu & McGorry, 2015). For instance, in comparing Kinect with the state-of-the-art Vicon motion capture system (Oxford Metrics, UK),Footnote 1 Clark, Pua, et al. revealed that the positions of anatomical landmarks from Kinect-generated point clouds can be measured with high test–retest reliability, and the differences in the intraclass coefficient correlation between Kinect and Vicon are ≤0.16; van Diest et al. (2014) further showed that both systems can effectively capture >90% variance in full-body segment movements during exergaming.

Therefore, using a Kinect to generate PLD stimuli is technically feasible. Indeed, Andre Gouws at the York Neuroimaging Center in the University of York successfully built a MATLAB-based prototype toolbox to generate PLD stimuli using a Kinect 1.0. However, Gouws’s toolbox is not publicly accessible at present. Moreover, it is important to examine whether the PLD stimuli generated via Kinect accurately and sensitively convey the BM characteristics of human beings, which has not been examined before. This issue is critical, particularly considering that the PLD stimuli generated by artificial synthesis contained divergence from the natural BM of human beings (see the second disadvantage of artificial synthesis aforementioned), and the Kinect actually locates human joints by using a machine-learning algorithm (Shotton et al., 2013).

In the present study, we introduce a free Kinect-based biological motion capture (KBC) toolbox with a GUI written in C++, which can be freely accessed.Footnote 2 Using three experiments, we show that the PLD stimuli generated can represent the BM characteristics of humans, thus establishing the effectiveness of the proposed KBC.

Kinect-based biological motion capture (KBC) toolbox

The KBC toolbox has been developed on the basis of the Kinect Sensor 2.0, with the help of the Kinect adapter for Windows PC (Microsoft, 2014) and the Kinect for Windows Software Development Kit (SDK) 2.0 (Microsoft, 2014). With a user-friendly graphical user interface (GUI; see Fig. 1a), KBC offers three core functions: recording the BM of humans (Recording box), editing the recorded BM (Movie Editing box), and playing the recorded BM in the playing window (Movie Playing box). Furthermore, KBC enables users to freely select their preferred parameters by defining target skeleton points (see Fig. 1b), view options, number of agents in motion, and whether hand gestures are recorded.

Fig. 1
figure 1

Graphical user interface and selectable skeletal joints of the KBC

KBC generates PLD stimuli in three steps. First, it stores the tracked data, including depth and body frame data (skeletal joints, hand states, etc.), at a sampling rate of 30 Hz. Second, a joint-filtering method (median-value average filter and limit-breadth filter, in particular) is employed to smoothen the recorded trajectory. Third, the selected joints (see Fig. 1b) are shown as point-light stimuli in the playing window (see Fig. 1a), and the coordinates of the joints are written as .txt files according to pre-defined parameters (see the next section for details).

The recommended system requirements for smooth execution of KBC are shown in Table 1. Below, we introduce the detailed functions and parameters for generating PLD.

Table 1 Recommended system requirements

Parameters and functions of KBC

Parameters to define PLD

Pointer selection

Kinect Sensor 2.0 can track up to 25 skeletal joints with fairly high precision. On the “Pointer Selection” panel, users can select joints of interest to them by checking corresponding check boxes. Once a joint is selected, a check mark is shown on the relevant box (see Figs. 1a and b).

View options

In the “View Options” panel of KBC, four viewing options for the recorded PLD are provided: The “front view” displays mirror images of the agents being tracked, the “side view” shows a sagittal view from the agents’ left, the “vertical view” displays an overhead view of the agents, and the “depth view” shows a depth view of the recorded scene in black and white.

Agents in play

KBC can track at most six agents at a time. Users select the corresponding number of agents in play by ticking the check box. Moreover, KBC offers six RGB colors (eight-bit) to distinguish the agents: royal blue (65, 105, 225), green (0, 255, 0), magenta (255, 0, 255), brown2 (238, 59, 59), goldenrod (139, 37, 0), and white (255, 255, 255).

Hand gesture

Kinect 2.0 can track three hand gestures: open, closed, and lasso. KBC also provides this function by allowing users to tick “on” to activate this function. The default setting is “off.”

Function

Recording

By clicking the “Run” button, a live preview of the PLD-based BM is shown in the “Playing Window.” Meanwhile, the “Start Recording” button is activated. Once the “Run” button is pressed, it changes to a “Stop” button. When the users are ready to record the BM, the BM trajectory of interest is recorded by clicking the “Start Recording” button, which then transforms into a “Stop Recording” button. When a BM is completed, “Stop Recording” is clicked, and a dialog box pops up in which the user is required to enter a filename for the BM. The system then stores it in the defined path.

Movie editing

This function allows users to select sections of interest in the recorded PLD. Users first click the “Open” button on the “Movie Editing” panel to select an already recorded PLD. They can then preview the PLD stimulus frame-by-frame through the progress bar on the panel. Once the users define both the start and the end frame and click “Cut” button, KBC will clip the PLD from the start to the end frame, with the start and end frames being included. A dialog box pops up to remind them to save the newly edited PLD stimulus.

Movie playing

This function enables users to watch the recorded BM. Users click the “Play” button on the “Movie Playing” panel, and select the target file. If the users intend to play the file more than once, they need to click the “Replay” button.

Evaluating the KBC-generated PLD stimuli

In this section, we examine whether the KBC-generated PLD stimuli can convey the BM characteristics of humans in high fidelity. According to previous studies, the processing of PLD-based BM stimuli has three typical hallmarks. In particular, (1) our vision system processes BM in a global manner; therefore, inverting the BM drastically impairs BM perception, although low-level features (e.g., absolute movement of individual dots, the local relations in the display, etc.) are kept constant between upright and inverted BM (e.g., Barclay, Cutting, & Kozlowski, 1978; Bertenthal & Pinto, 1994; Cutting, 1978; Dittrich, 1993; Ikeda et al., 2005; Pavlova & Sokolov, 2000; Shipley, 2003; Sumi, 1984; Troje, 2003); (2) BM cues can trigger reflexive attentional orientation (Ding, Yin, Shui, Zhou, & Shen, 2016; Doi & Shinohara, 2012; Pilz, Vuong, Bülthoff, & Thornton, 2011; Saunders et al., 2009; Shi, Weng, He, & Jiang, 2010; Zhao et al., 2014), even for local movement of the feet (Chang & Troje, 2009; Troje & Westhoff, 2006; Wang, Yang, Shi, & Jiang, 2014); (3) various types of emotion can be extracted from BM (Alaerts, Nackaerts, Meyns, Swinnen, & Wenderoth, 2011; Atkinson et al., 2004; Atkinson, Heberlein, & Adolphs, 2007; Clarke, Bradshaw, Field, Hampson, & Rose, 2005; Dittrich, Troscianko, Lea, & Morgan, 1996; Heberlein, Adolphs, Tranel, & Damasio, 2004; Hubert et al., 2007; Johnson, McKay, & Pollick, 2011; Karg et al., 2013; Nackaerts et al., 2012; Spencer, Sekuler, Bennett, Giese, & Pilz, 2010; Walk & Homan, 1984). If the KBC-generated BM stimuli genuinely reflect the movements of humans, they should exhibit at least these three processing characteristics. To this end, we examined these three hallmarks in turn.

Experiment 1: Global processing of BM

The global processing characteristics of BM are manifested by embedding a PLD-based BM in a set of randomly moving dots. Even though the PLD stimuli and the random dots were constructed using the same elements, observers could quickly discriminate BM from noise (e.g., Bertenthal & Pinto, 1994; Pinto & Shiffrar, 1999; Shiffrar, Lichtey, & Chatterjee, 1997). Moreover, akin to face perception (e.g., Yin, 1969), inverting BM significantly impaired observers’ performance (e.g., Bertenthal & Pinto, 1994; Dittrich, 1993; Pavlova & Sokolov, 2000; Shipley, 2003; Troje, 2003). In Experiment 1, we adopted the design proposed by Bertenthal and Pinto (1994), in which the aforementioned global processing of BM was systematically tested while using the front-view PLD stimuli generated by KBC.

Participants

Eight participants (four males, four female; mean age 19.4 ± 0.9) participated in the experiment. They were all undergraduates at Zhejiang University, and volunteered to participate in the experiment through signed consent forms. All participants had normal color vision, and normal or corrected-to-normal visual acuity.

Stimuli and apparatus

Five front-view BM stimuli were generated via KBC: chopping, jumping, spading, walking, and waving. A male undergraduate volunteered as the actor to perform the five movements. Each BM consisted of 13 white, round points (0.09° in radius each), with an area of 3.34° (width) × 8.15° (height). They were presented at a rate of 30 fps. The randomly moving elements were identical to the points in BM in terms of size, color, luminance, shape, and motion trajectories, but different in their phase and spatial location. The presentation always contained 78 moving elements on the screen, occupying an area of 14.00° (width) × 16.13° (height). When a BM stimulus were detected, 13 points belonged to BM and 65 points to noise; when no BM was displayed, 13 random elements were shown within the area of the BM.

The stimuli were presented in MATLAB and Psychophysics Toolbox 3 (Kleiner et al., 2007), and displayed on a black (0, 0, 0, RGB) background projected onto a 17-in. CRT monitor (1,024 × 768, 60-Hz refresh rate). The participants sat in a dark room watching the stimuli at a distance of 50 cm from the screen center.

Design and procedure

Figure 2 illustrates the procedure used in the experiment. A white fixation (0.89° × 0.89°) was first presented for 1 s, to inform the participants that a trial was about to begin. A set of moving dots were then presented at the center of the screen for 1 s. Finally, an instruction appeared (in Chinese) that required the participants to judge whether a BM stimulus had appeared. In 50% of the trials, a BM stimulus was presented. The intertrial blank interval was 6 s.

Fig. 2
figure 2

Schematic illustration of a single trial in Experiment 1, in which a biological-motion stimulus appears. In the formal experiment, the response instruction was given in Chinese

A one-factor (BM Orientation: upright vs. inverted) within-subjects design was adopted. According to whether the BM stimuli were upright or inverted, the experiment was divided into two blocks. Each block consisted of 100 trials, for a total of 200 trials. Each movement was presented 20 times in each block. Four participants (two male) first took part in the upright block and the other four in the inverted block.

Before the experiment, the participants were first shown six to eight distinct BM that were not used in the formal experiment to familiarize them with PLD-based BM. They were then given ten practice trials to familiarize themselves with the procedure, and their responses were followed by feedback. The entire experiment lasted approximately 40 min, and no feedback was presented in the experiment.

Results and discussion

Following Bertenthal and Pinto (1994), we analyzed the data (see Fig. 3) via signal detection theory. The d′ for upright PLDs was significantly higher than chance level, d′ = 1.53, t(7) = 5.57, p < .01, Cohen’s d = 2.78. Moreover, the d′ for upright PLDs was also considerably higher than that for inverted PLDs (d′ = 0.49), t(7) = 5.15, p < .01, Cohen’s d = 1.76. These results replicated findings by Bertenthal and Pinto (1994), suggesting that the PLD-based BM generated via KBC was processed in a global manner.

Fig. 3
figure 3

Results of Experiment 1. Error bars indicate standard errors (SEs)

Experiment 2: Reflexive attentional orientation due to local BM cue

Jiang and colleagues recently revealed another hallmark of BM processing: visual attention can be reflexively shifted to the walking direction of a BM stimulus (Shi et al., 2010). Particularly, participants were briefly presented a central point-light walker walking toward either the left or right direction, then judged the orientation of a Gabor patch. Although participants were explicitly told that walking direction of BM did not predict Gabor’s location, their performance was considerably better for target in the walking direction relative to that in the opposite direction. Moreover, Jiang and colleagues revealed that not only the full-body BM but also the local BM of feet movement can lead to reflexive attentional orientation. Wang et al. (2014) demonstrated this effect by first showing participants the BM of feet using two points; after a 100-ms blank interval, a Gabor patch was presented congruently or incongruently to the direction of the movement of the feet. They found that participants responded significantly more quickly to the Gabor in the congruent condition than to that in the incongruent condition. However, a reversed pattern was found once the BM was inverted. They explained these results by suggesting that motion acceleration due to muscle activity and gravity in the swing phase of walking—the biological characteristics of feet motion signals—plays a key role in the upright condition (see also Chang & Troje, 2009). However, inverting the BM disrupted the function of motion acceleration, and the intact horizontal, translatory motion in the stance phase of walking began to play an important role. In other words, local BM for the feet also contains rich information, including motion acceleration and horizontal, translatory motion, and can modulate our attention.

To test the whether the local BM collected from KBC also contained motion acceleration and horizontal, translatory motion information, in Experiment 2 we adopted the design proposed by Wang et al. (2014) to examine whether local BM from the feet can reflexively trigger attention orientation. This question is particularly important, considering that Troje and Westhoff (2006) found that walking direction cannot be retrieved from the PLD stimuli generated via the Cutting (1978) algorithms.

Participants, apparatus, and stimuli

Ten participants (five male, five female; mean age 20.2 ± 1.2) took part in the experiment. The participants were sat 50 cm from the screen center in a dark room. All stimuli were presented against a gray (128, 128, 128, RGB) background. Gabor patches with a diameter of 2.15 cm (2.5° × 2.5° visual angle) were used.

To generate clear local BM from the feet, we recorded the front view of a walking male with his side-view facing the Kinect sensor. Moreover, to precisely record the trajectory of the movement of the feet, we placed the Kinect sensor on the floor such that it was at the same level as the feet. The final feet BM consisted of two white points, exhibiting a visual angle of 3.3° × 0.7°. It contained 30 frames, and could be presented in cycle at a rate of 30 fps. The initiated frame was selected randomly in presentation. The others were the same as Experiment 1.

Design and procedure

Figure 4 shows the procedure used in Experiment 2. Each trial started with a white 1-s cross (0.6° × 0.6°); then the BM cue was presented for 500 ms. After a 100-ms interstimulus interval (ISI), a Gabor patch was presented for 100 ms on the left or right side of the cross at a distance of 4.5°. Then the probe disappeared, and participants were required to press one of two buttons to indicate the location of the Gabor relative to fixation as quickly as possible while minimizing errors (“F” for the left and “J” for the right).

Fig. 4
figure 4

Schematic illustration of a single trial in Experiment 2

A 2 (Congruency between the direction of motion and the Gabor position: congruent vs. incongruent) × 2 (BM Orientation: upright vs. inverted) within-subjects design was adopted. Each combined condition consisted of 40 trials, resulting in a total of 160 randomly trials. Before the formal experiment, the participants were given at least 20 trials for practice. Since participants were requires to respond as quickly as possible while minimizing errors, we arbitrarily set a criterion of passing the practice as no lower than 95%. If the correction was below 95%, the participants had to redo the practice. However, all participants passed the practice in the first round, and hence no participant required extra practice. There was feedback in the practice trial, but none in the experimental trials.

Results and discussion

Only trials with correct responses were analyzed for response times. Figure 5 shows the response times under four conditions. We conducted a two-way repeated analysis of variance measurement by taking Congruency and BM Orientation as within-subjects factors. We found that the main effect of neither congruency, F(1, 36) = 0.018, p = .893, η p 2 = .001, nor BM orientation, F(1, 36) = 0.103, p = .750, η p 2 = .003, was significant. Critically, the Congruency × BM Orientation interaction reached significance, F(1, 36) = 4.792, p = .035, η p 2 = .117. Further simple-effect analysis showed that when the BM cue was presented upright, the response time under the congruent condition (212 ms) was significantly shorter than that under the incongruent condition (231 ms), t(9) = 3.227, p < .01, Cohen’s d = 0.67. However, a reversed pattern (228 vs. 211 ms) was found when the BM cue was presented in an inverted manner, t(9) = 4.655, p < .01, Cohen’s d = 0.74.

Fig. 5
figure 5

Results of Experiment 2. Error bars indicate standard errors (SEs)

These results replicated the findings by Wang et al. (2014), suggesting that the PLD-based BM generated via KBC contained detailed local information, such as motion acceleration and horizontal, translatory motion, which reflexively triggered attentional orientation.

Experiment 3: BM conveying emotions

It is well-known that BM conveys rich social information (see Pavlova, 2012, for a review). One critical aspect is emotion. Researchers have shown that observers can recognize basic emotions, such as happiness, fear, surprise, anger, and sadness, from PLD-based BM stimuli (e.g., Atkinson et al., 2004; Dittrich et al., 1996; Walk & Homan, 1984). Moreover, observers showed a higher accuracy in perception tasks involving emotion when watching PLD-based BM than static PLD frames (Atkinson et al., 2004), which underscores the contribution of BM in emotion perception. To promote the exploration of BM-related emotion, Ma et al. (2006) combined four standardized actions (walking, knocking, lifting, and throwing) and four types of emotions (happiness, anger, sadness, and neutral stimulus) to create an emotional PLD database.

In Experiment 3, we collected a set of emotional PLD and examined whether participants could extract the embedded emotion in it.

Participants and apparatus

Thirty participants (16 male, 14 female; mean age 20.3 ± 1.4) participated in the experiment. Each BM (11.28° × 11.28°) was presented at the center of the screen at a viewing distance of 60 cm from the participants, against a black (0, 0, 0, RGB) background. The other aspects were the same as in Experiment 1.

Generating emotional BM stimuli

To test the effectiveness of KBC in generating BM, we considered that it was important to create a set of BM relying less on body postures in conveying emotion than those used in previous studies (Atkinson et al., 2004; Dittrich et al., 1996). Instead, the participants had to rely more on body movements to make judgments. To this end, we decided to follow Ma et al. (2006) and construct a set of emotional BM by requiring two actors to perform the actions of walking, knocking, lifting, and throwing for each of the four emotional types (happiness, anger, sadness, and neutral).

The BM stimuli were acquired in a 4.5 m × 3.5 m room, with Kinect 2.0 placed on a holder 1.05 m above the floor. Two actors (21 years old; one male, one female) with experience in drama were paid to perform the actions. One of the authors (the fourth) was in charge of conducting BM acquisition. Before recording, the actors were given instructions on the actions and the corresponding emotions. They were allowed to rehearse the action freely. The orders of action and emotion were randomly chosen for each actor. The detailed description of each action was set according to Ma et al. (2006). Once an actor was ready, he or she informed the experimenter and performed the actions within a 2 m × 2 m space, standing two meters in front of the Kinect. Each action was repeated five times (except walking, which required a 30-s nonstop walk), following which the participants were given a rest. This resulted in 32 BM records (4 Actions × 4 Emotions × 2 Actors). Using KBC, we then cut each record into five clips according to the number of repetition. For walking, each record was evenly cut into five clips. This process generated a total of 160 BM clips (32 BM × 5 Repetitions), which were further converted into .avi format files via Psychophysics Toolbox 3.

Finally, we recruited ten volunteers (eight males, two females; mean age 21.1 ± 0.7 years) to participate in the BM evaluation process to select the best clip from the five options for each action–affect combination. The ones with the highest score were used for further experiments. Eventually, 32 clips (4 Actions × 4 Emotions × 2 Actors) were involved in the formal experiment, with a duration ranging from 2 to 5 s.

Design and procedure

Figure 6 shows the interface used in Experiment 3. The participants took part in an emotion discrimination task. In particular, each trial they were shown a PLD and made an unspeeded response by choosing one of the four affects. The participants could then move to the next trial by clicking the “Next” button. It is noteworthy that participants had no knowledge of the identities of the actors or the action type of the PLD before the experiment.

Fig. 6
figure 6

The graphical user interface of Experiment 3

A one-factor within-subjects design was used by adopting emotion type (neutral, angry, happy, and sad) as independent variables. Each affect contained eight BM clips (4 Actions × 2 Actors). The experiment consisted of two practice trials and 32 randomly presented formal trials. No feedback was provided in the experiment, which lasted approximately 10 min.

Results and discussion

Figure 7 shows performance in terms of emotion discrimination. The participants were able to discriminate the correct emotion, and the discrimination performance was significantly above chance (25%): t(29) = 31.8, p < .01, Cohen’s d = 3.71, for angry (67.5%); t(29) = 23.9, p < .01, Cohen’s d = 2.62, for happy (61.25%); t(29) = 17.0, p < .01, Cohen’s d = 1.86, for sad (61.25%); and t(29) = 28.0, p < .01, Cohen’s d = 3.07, for neutral (61.25%). Moreover, we found a trend for the participants to interpret a sad as a neutral stimulus (34.17%), which was also significantly above chance: t(29) = 8.9, p < .01, Cohen’s d = 0.44. We considered that this was not due to the KBC itself, but was related to the difficulty of expressing sadness through actions. Indeed, it took the two actors more time to express sadness than the other emotions as we prepared the stimuli. Overall, we think that the PLD-based BM generated via KBC can effectively reflect emotional information embedded in the BM.

Fig. 7
figure 7

Mean discrimination rates for four types of emotion in Experiment 3. Error bars indicate standard errors (SEs)

General discussion

In this study, we introduced a C++-based toolbox for use with the Kinect 2.0 to permit the acquisition of PLD-based BM in an easy, low-cost, and user-friendly way. We conducted three experiments to examine whether the KBC-generated BM can genuinely reflect the processing characteristics of BM (Atkinson et al., 2004; Bertenthal & Pinto, 1994; Wang et al., 2014), and obtained positive results. In particular, we showed that: (1) The KBC-generated BM was processed in a global manner: participants could perceive BM from a set of dynamic noise, and this ability was significantly impaired after inverting the stimuli. (2) KBC-generated local BM for the feet effectively retained the detailed local information of BM, such as motion acceleration and horizontal, translatory motion, which can trigger reflexive attentional orientation. (3) KBC-generated BM conveyed emotional information: Participants could effectively distinguish among four basic types of emotion from the same set of actions. Therefore, we think that the KBC toolbox can be useful in generating BM for future studies.

In addition to the advantages of the KBC toolbox mentioned above, three other advantages are also worth noting. First, the generation of BM is fairly quick. The KBC toolbox does not need markers attached to the human joints and offers a filter to smooth the motion trajectory. Therefore, the BM can be created within 5 min once the agents are available. As a consequence, second, the KBC offers a portable method to generate BM due to the nature of the Kinect sensor. Therefore, creating BM can be extended from within a well-equipped laboratory to real-world situations—for instance, classrooms or offices—which can allow us to obtain more natural and rich BM, particularly considering that the Kinect Sensor 2.0 can simultaneously track up to six agents. Third, the 3-D positions of the joint points of BM via the KBC toolbox are close to their actual positions in the human body (Bonnechère et al., 2014; Xu & McGorry, 2015); by contrast, most motion capture systems use marker positions at the skin surface to represent the positions of joints (Shotton et al., 2013).

Finally, we should note that the KBC toolbox also has a few constraints in practical use, predominantly due to the technical limitations of the Kinect sensor. First, KBC cannot accurately track and locate the joints when the skeleton joints overlap—for instance, when facing an agent’s profile or an irregular posture (such as a squat), or when one person has passed in front of a second person, leading to an overlap of limbs. Therefore, future KBC users will need to avoid these cases. Second, KBC cannot work well when the participants are too close or too far from the Kinect senor. It is recommended that an agent perform actions 1.5 m away from Kinect. Third, KBC at present can only track three hand gestures, far fewer than are used in real life. We are now considering the possibility of integrating Leap motion, which has a powerful function to track hand gestures, with the KBC toolbox.