Using a Kinect sensor to acquire biological motion: Toolbox and evaluation
Biological motion (BM) is the movement of animate entities, which conveys rich social information. To obtain pure BM, researchers nowadays predominantly use point-light displays (PLDs), which depict BM through a set of light points (e.g., 12 points) placed at distinct joints of a moving human body. Most prevalent BM stimuli are created by state-of-the-art motion capture systems. Although these stimuli are highly precise, the motion capture system is expensive and bulky, and its process of constructing a PLD-based BM is time-consuming and complex. These factors impede the investigation of BM mechanisms. In this study, we propose a free Kinect-based biological motion capture (KBC) toolbox based on the Kinect Sensor 2.0 in C++. The KBC toolbox aims to help researchers acquire PLD-based BM in an easy, low-cost, and user-friendly way. We conducted three experiments to examine whether KBC-generated BM can genuinely reflect the processing characteristics of BM: (1) Is BM from this source processed globally in vision? (2) Does its BM (e.g., from the feet) retain detailed local information? and (3) Does the BM convey emotional information? We obtained positive results in response to all three questions. Therefore, we think that the KBC toolbox can be useful in generating BM for future research.
KeywordsBiological motion Kinect Toolbox
Biological motion (BM) refers to the movement of animate entities (e.g., walking, jumping, or waving by humans; Johansson, 1973; Troje, 2013). It is one of the most important and sophisticated stimuli encountered in our daily lives. For instance, BM processing is critical for prosocial behavior and nonverbal communication (Blake & Shiffrar, 2007; Pavlova, 2012). The processing capability of BM is currently suggested to be a hallmark of social cognition (Gao, Ye, Shen, & Perry, 2016; Pavlova, 2012). Unsurprisingly, research on BM has been among the most important and fruitful fields in visual/social cognition in the recent decade (for reviews, see Blake & Shiffrar, 2007; Pavlova, 2012; Puce & Perrett, 2003; Steel, Ellem, & Baxter, 2015; Troje, 2013). Researchers from such varied areas as psychology, neuroscience, clinical sciences, and robotics, have investigated BM using behavioral, neuroimaging (ERP/EEG, fMRI, MEG, fNIRS, and TMS), and modeling methods (e.g., Bardi, Regolin, & Simion, 2011; Blakemore, 2008; Jaywant, Shiffrar, Roy, & Cronin-Golomb, 2016; Karg et al., 2013; Kawai, Asada, & Nagai, 2014; Kim, Doop, Blake, & Park, 2005; Koldewyn, Whitney, & Rivera, 2011; Loula, Prasad, Harber, & Shiffrar, 2005; Mather, Battaglini, & Campana, 2016; Miller & Saygin, 2013; Puce & Perrett, 2003; Rizzolatti, Fogassi, & Gallese, 2001; Shen, Gao, Ding, Zhou, & Huang, 2014; J. Thompson & Parasuraman, 2012; Urgen, Plank, Ishiguro, Poizner, & Saygin, 2013; van Kemenade, Muggleton, Walsh, & Saygin, 2012).
To explore the mechanisms of BM, it is important to use a set of stimuli that can efficiently and elegantly convey the movements of animate entities. Given that a considerable amount of non-BM-related information is contained in a scene (e.g., a person’s skin color, hair, and clothing), researchers have attempted to extract pure BM information by removing irrelevant data. Johansson (1973) successfully solved this issue by developing a point-light display (PLD) technique, which depicts human movements through a set of light points (e.g., 12 points) designated at distinct joints of the moving human body. Although highly impoverished—for instance, they do not provide information pertaining to skin color, hair, or clothing—once in motion, PLDs can rapidly be recognized as showing coherent and meaningful movements. Numerous studies have shown that important social information, such as gender, emotion, intention, and direction of motion, can be extracted from PLDs (Atkinson, Dittrich, Gemmell, & Young, 2004; Blakemore, 2008, 2012; Loula et al., 2005; Pollick, Kay, Heim, & Stringer, 2005; Pollick, Lestou, Ryu, & Cho, 2002; Rizzolatti et al., 2001; B. Thompson, Hansen, Hess, & Troje, 2007; Troje & Westhoff, 2006; for reviews, see Blake & Shiffrar, 2007; Pavlova, 2012; Puce & Perrett, 2003; Troje, 2013), even when they are passively observed (e.g., Perry, Bentin, et al., 2010; Perry, Troje, & Bentin, 2010), or are embedded in a set of dynamic noise dots (e.g., Aaen-Stockdale, Thompson, Hess, & Troje, 2008; Bertenthal & Pinto, 1994; Cutting, Moore, & Morrison, 1988; Ikeda, Blake, & Watanabe, 2005; Neri, Morrone, & Burr, 1998; Thurman & Grossman, 2008). By using PLD-based BM for stimuli of interest, studies involving infants, young and older adults, patients with lesions, and animals (e.g., chickens and pigeons) have revealed converging evidence that our brain has evolved specialized neural circuitry to process BM information, and that this involves the superior temporal sulcus and ventral premotor cortex (e.g., Blake & Shiffrar, 2007; Gao, Bentin, & Shen, 2014; Gilaie-Dotan, Kanai, Bahrami, Rees, & Saygin, 2013; Grossman, Battelli, & Pascual-Leone, 2005; Lu et al., 2016; Puce & Perrett, 2003; Troje & Aust, 2013; Vallortigara, Regolin, & Marconato, 2005).
PLD stimuli are at present the most popular ones used in BM-related research, significantly promoting our understanding of BM mechanisms. Therefore, how to effectively and efficiently produce PLD stimuli becomes an important issue deserving of attention. Three main techniques have thus far emerged in the literature for creating PLD stimuli: video recording, artificial synthesis, and motion capture. We specify the characteristics of each technique below.
Johansson (1973) created PLDs by recording videos. He either attached low-voltage bulbs to the major joints of human agents wearing black clothes and filmed their motion in a dark room, or attached reflex patches to their main joints and illuminated them to obtain reflection from the patches. The recorded stimuli can vividly reflect the natural movements of agents. However, this method has several disadvantages: (1) The accurate position of joints in 3-D space cannot be obtained; instead, this is estimated by fixing the bulbs or reflex patches on the surface of the body. (2) It takes a long time to prepare for the recording, which involves the agents putting on special tights, fixing markers, and so forth. (3) This method can only provide a 2-D version of PLD, which constrains its flexible use.
Cutting (1978) created an algorithm to simulate point-light walkers. The algorithm can occlude joints, fluctuate bodies, and adjust the stride or contrast of PLD. The stimuli thereby were more controllable than the video format created by video recording. Verfaillie, De Troy, and Van Rensbergen (1994) later improved Cutting’s algorithm by allowing for the presentation of stimuli in different in-depth orientations; Hodgins, Wooten, Brogan, and O’Brien (1995) further updated algorithms to correct joint velocity and joint position and obtain natural BM at multiple walking speeds. At the same time, artificial synthesis has at least two disadvantages: (1) Each new BM needs a corresponding new algorithm. (2) Obtaining natural BM, to some extent, is more difficult via artificial synthesis than the video recording method. Previous studies suggested that the PLD stimuli generated via artificial synthesis had certain divergence from the natural BM of human beings. For instance, Runeson (1994) pointed out that synthesized PLD stimuli may be missing information about dynamics, such as “mass, elasticity, energetic processing, or neural mechanisms” (p. 392; see also Saunders, Suchan, & Troje, 2009). Indeed, Troje and Westhoff (2006) showed that the PLD stimuli generated via the Cutting (1978) algorithms lacked a visual invariant inherent to the local motion of a natural walker’s feet: Observers could successfully retrieve the walking direction from a real walker based on the local motion of feet, yet could not retrieve this information from the synthesized PLD. These divergences had to be addressed to get natural BM.
With developments in technology, powerful motion capture systems have enabled us to collect trajectories with sufficient precision. Of these, the outside-in system of motion capture (e.g., the Falcon Analog Optical Motion Capture System, the Qualisys MacReflex Motion Capture System, and ShapeWrap III) consists of external sensors (e.g., high-speed camera) and on-body sources (e.g., reflex patches), and is commonly used to construct PLD stimuli. With the aid of computers, researchers can obtain natural BM easily while conveniently adjusting the BM parameters (e.g., orientation). Therefore, the motion capture method combines the advantages of video recording and artificial synthesis.
Nowadays, state-of-the-art motion capture systems are abundantly used to record and investigate BM. For instance, currently used BM databases (Ma, Paterson, & Pollick, 2006; Manera, Schouten, Becchio, Bara, & Verfaillie, 2010; Vanrie & Verfaillie, 2004; Zaini, Fawcett, White, & Newman, 2013) were all constructed by motion capture systems, which significantly promote BM investigation. However, these motion capture systems are expensive and bulky, and the process of constructing PLD-based BM is time consuming even for a well-trained technician. Therefore, in the absence of a motion capture system, or in a scenario in which a public BM database does not contain necessary stimuli, the exploration of the BM mechanism is impeded. To this end, we think that the use of low-cost sensors to generate PLD stimuli has practical and theoretical significance.
The Microsoft Kinect sensor is a popular, low-cost, and markerless motion capture system first developed in 2010 for the gaming industry. With an infrared emitter and a depth sensor, the Kinect sensor can detect the contours of a human body and identify 25 joints (for Kinect 2.0) in 3-D space with high precision (Xu & McGorry, 2015). It only costs USD 140 (according to www.microsoft.com, 2016), but has fairly fast data processing. Moreover, the Kinect is easy to use and does not require complex calibration processes; users can typically master it in 15 min. Therefore, researchers have attempted to use the Kinect to replace traditional, expensive devices to track human movement and measure postural load. They found that the Kinect is a fast and reliable motion capture system for practical use (e.g., Bonnechere et al., 2013; Bonnechere et al., 2014; Clark, Bower, Mentiplay, Paterson, & Pua, 2013; Clark, Pua, Bryant, & Hunt, 2013; Dutta, 2012; Mousavi Hondori & Khademi, 2014; van Diest et al., 2014; Xu & McGorry, 2015). For instance, in comparing Kinect with the state-of-the-art Vicon motion capture system (Oxford Metrics, UK),1 Clark, Pua, et al. revealed that the positions of anatomical landmarks from Kinect-generated point clouds can be measured with high test–retest reliability, and the differences in the intraclass coefficient correlation between Kinect and Vicon are ≤0.16; van Diest et al. (2014) further showed that both systems can effectively capture >90% variance in full-body segment movements during exergaming.
Therefore, using a Kinect to generate PLD stimuli is technically feasible. Indeed, Andre Gouws at the York Neuroimaging Center in the University of York successfully built a MATLAB-based prototype toolbox to generate PLD stimuli using a Kinect 1.0. However, Gouws’s toolbox is not publicly accessible at present. Moreover, it is important to examine whether the PLD stimuli generated via Kinect accurately and sensitively convey the BM characteristics of human beings, which has not been examined before. This issue is critical, particularly considering that the PLD stimuli generated by artificial synthesis contained divergence from the natural BM of human beings (see the second disadvantage of artificial synthesis aforementioned), and the Kinect actually locates human joints by using a machine-learning algorithm (Shotton et al., 2013).
In the present study, we introduce a free Kinect-based biological motion capture (KBC) toolbox with a GUI written in C++, which can be freely accessed.2 Using three experiments, we show that the PLD stimuli generated can represent the BM characteristics of humans, thus establishing the effectiveness of the proposed KBC.
Kinect-based biological motion capture (KBC) toolbox
KBC generates PLD stimuli in three steps. First, it stores the tracked data, including depth and body frame data (skeletal joints, hand states, etc.), at a sampling rate of 30 Hz. Second, a joint-filtering method (median-value average filter and limit-breadth filter, in particular) is employed to smoothen the recorded trajectory. Third, the selected joints (see Fig. 1b) are shown as point-light stimuli in the playing window (see Fig. 1a), and the coordinates of the joints are written as .txt files according to pre-defined parameters (see the next section for details).
Recommended system requirements
64-bit (x64) processor, physical dual-core 3.1 GHz (2 logical cores per physical) or faster processor
Windows 8, Windows 8.1, Windows 10
Supporting DirectX 11.0
USB 3.0 port controller dedicated to the Kinect for Windows v2 sensor
Software development kit
Kinect for Windows SDK 2.0
Parameters and functions of KBC
Parameters to define PLD
Kinect Sensor 2.0 can track up to 25 skeletal joints with fairly high precision. On the “Pointer Selection” panel, users can select joints of interest to them by checking corresponding check boxes. Once a joint is selected, a check mark is shown on the relevant box (see Figs. 1a and b).
In the “View Options” panel of KBC, four viewing options for the recorded PLD are provided: The “front view” displays mirror images of the agents being tracked, the “side view” shows a sagittal view from the agents’ left, the “vertical view” displays an overhead view of the agents, and the “depth view” shows a depth view of the recorded scene in black and white.
Agents in play
KBC can track at most six agents at a time. Users select the corresponding number of agents in play by ticking the check box. Moreover, KBC offers six RGB colors (eight-bit) to distinguish the agents: royal blue (65, 105, 225), green (0, 255, 0), magenta (255, 0, 255), brown2 (238, 59, 59), goldenrod (139, 37, 0), and white (255, 255, 255).
Kinect 2.0 can track three hand gestures: open, closed, and lasso. KBC also provides this function by allowing users to tick “on” to activate this function. The default setting is “off.”
By clicking the “Run” button, a live preview of the PLD-based BM is shown in the “Playing Window.” Meanwhile, the “Start Recording” button is activated. Once the “Run” button is pressed, it changes to a “Stop” button. When the users are ready to record the BM, the BM trajectory of interest is recorded by clicking the “Start Recording” button, which then transforms into a “Stop Recording” button. When a BM is completed, “Stop Recording” is clicked, and a dialog box pops up in which the user is required to enter a filename for the BM. The system then stores it in the defined path.
This function allows users to select sections of interest in the recorded PLD. Users first click the “Open” button on the “Movie Editing” panel to select an already recorded PLD. They can then preview the PLD stimulus frame-by-frame through the progress bar on the panel. Once the users define both the start and the end frame and click “Cut” button, KBC will clip the PLD from the start to the end frame, with the start and end frames being included. A dialog box pops up to remind them to save the newly edited PLD stimulus.
This function enables users to watch the recorded BM. Users click the “Play” button on the “Movie Playing” panel, and select the target file. If the users intend to play the file more than once, they need to click the “Replay” button.
Evaluating the KBC-generated PLD stimuli
In this section, we examine whether the KBC-generated PLD stimuli can convey the BM characteristics of humans in high fidelity. According to previous studies, the processing of PLD-based BM stimuli has three typical hallmarks. In particular, (1) our vision system processes BM in a global manner; therefore, inverting the BM drastically impairs BM perception, although low-level features (e.g., absolute movement of individual dots, the local relations in the display, etc.) are kept constant between upright and inverted BM (e.g., Barclay, Cutting, & Kozlowski, 1978; Bertenthal & Pinto, 1994; Cutting, 1978; Dittrich, 1993; Ikeda et al., 2005; Pavlova & Sokolov, 2000; Shipley, 2003; Sumi, 1984; Troje, 2003); (2) BM cues can trigger reflexive attentional orientation (Ding, Yin, Shui, Zhou, & Shen, 2016; Doi & Shinohara, 2012; Pilz, Vuong, Bülthoff, & Thornton, 2011; Saunders et al., 2009; Shi, Weng, He, & Jiang, 2010; Zhao et al., 2014), even for local movement of the feet (Chang & Troje, 2009; Troje & Westhoff, 2006; Wang, Yang, Shi, & Jiang, 2014); (3) various types of emotion can be extracted from BM (Alaerts, Nackaerts, Meyns, Swinnen, & Wenderoth, 2011; Atkinson et al., 2004; Atkinson, Heberlein, & Adolphs, 2007; Clarke, Bradshaw, Field, Hampson, & Rose, 2005; Dittrich, Troscianko, Lea, & Morgan, 1996; Heberlein, Adolphs, Tranel, & Damasio, 2004; Hubert et al., 2007; Johnson, McKay, & Pollick, 2011; Karg et al., 2013; Nackaerts et al., 2012; Spencer, Sekuler, Bennett, Giese, & Pilz, 2010; Walk & Homan, 1984). If the KBC-generated BM stimuli genuinely reflect the movements of humans, they should exhibit at least these three processing characteristics. To this end, we examined these three hallmarks in turn.
Experiment 1: Global processing of BM
The global processing characteristics of BM are manifested by embedding a PLD-based BM in a set of randomly moving dots. Even though the PLD stimuli and the random dots were constructed using the same elements, observers could quickly discriminate BM from noise (e.g., Bertenthal & Pinto, 1994; Pinto & Shiffrar, 1999; Shiffrar, Lichtey, & Chatterjee, 1997). Moreover, akin to face perception (e.g., Yin, 1969), inverting BM significantly impaired observers’ performance (e.g., Bertenthal & Pinto, 1994; Dittrich, 1993; Pavlova & Sokolov, 2000; Shipley, 2003; Troje, 2003). In Experiment 1, we adopted the design proposed by Bertenthal and Pinto (1994), in which the aforementioned global processing of BM was systematically tested while using the front-view PLD stimuli generated by KBC.
Eight participants (four males, four female; mean age 19.4 ± 0.9) participated in the experiment. They were all undergraduates at Zhejiang University, and volunteered to participate in the experiment through signed consent forms. All participants had normal color vision, and normal or corrected-to-normal visual acuity.
Stimuli and apparatus
Five front-view BM stimuli were generated via KBC: chopping, jumping, spading, walking, and waving. A male undergraduate volunteered as the actor to perform the five movements. Each BM consisted of 13 white, round points (0.09° in radius each), with an area of 3.34° (width) × 8.15° (height). They were presented at a rate of 30 fps. The randomly moving elements were identical to the points in BM in terms of size, color, luminance, shape, and motion trajectories, but different in their phase and spatial location. The presentation always contained 78 moving elements on the screen, occupying an area of 14.00° (width) × 16.13° (height). When a BM stimulus were detected, 13 points belonged to BM and 65 points to noise; when no BM was displayed, 13 random elements were shown within the area of the BM.
The stimuli were presented in MATLAB and Psychophysics Toolbox 3 (Kleiner et al., 2007), and displayed on a black (0, 0, 0, RGB) background projected onto a 17-in. CRT monitor (1,024 × 768, 60-Hz refresh rate). The participants sat in a dark room watching the stimuli at a distance of 50 cm from the screen center.
Design and procedure
A one-factor (BM Orientation: upright vs. inverted) within-subjects design was adopted. According to whether the BM stimuli were upright or inverted, the experiment was divided into two blocks. Each block consisted of 100 trials, for a total of 200 trials. Each movement was presented 20 times in each block. Four participants (two male) first took part in the upright block and the other four in the inverted block.
Before the experiment, the participants were first shown six to eight distinct BM that were not used in the formal experiment to familiarize them with PLD-based BM. They were then given ten practice trials to familiarize themselves with the procedure, and their responses were followed by feedback. The entire experiment lasted approximately 40 min, and no feedback was presented in the experiment.
Results and discussion
Experiment 2: Reflexive attentional orientation due to local BM cue
Jiang and colleagues recently revealed another hallmark of BM processing: visual attention can be reflexively shifted to the walking direction of a BM stimulus (Shi et al., 2010). Particularly, participants were briefly presented a central point-light walker walking toward either the left or right direction, then judged the orientation of a Gabor patch. Although participants were explicitly told that walking direction of BM did not predict Gabor’s location, their performance was considerably better for target in the walking direction relative to that in the opposite direction. Moreover, Jiang and colleagues revealed that not only the full-body BM but also the local BM of feet movement can lead to reflexive attentional orientation. Wang et al. (2014) demonstrated this effect by first showing participants the BM of feet using two points; after a 100-ms blank interval, a Gabor patch was presented congruently or incongruently to the direction of the movement of the feet. They found that participants responded significantly more quickly to the Gabor in the congruent condition than to that in the incongruent condition. However, a reversed pattern was found once the BM was inverted. They explained these results by suggesting that motion acceleration due to muscle activity and gravity in the swing phase of walking—the biological characteristics of feet motion signals—plays a key role in the upright condition (see also Chang & Troje, 2009). However, inverting the BM disrupted the function of motion acceleration, and the intact horizontal, translatory motion in the stance phase of walking began to play an important role. In other words, local BM for the feet also contains rich information, including motion acceleration and horizontal, translatory motion, and can modulate our attention.
To test the whether the local BM collected from KBC also contained motion acceleration and horizontal, translatory motion information, in Experiment 2 we adopted the design proposed by Wang et al. (2014) to examine whether local BM from the feet can reflexively trigger attention orientation. This question is particularly important, considering that Troje and Westhoff (2006) found that walking direction cannot be retrieved from the PLD stimuli generated via the Cutting (1978) algorithms.
Participants, apparatus, and stimuli
Ten participants (five male, five female; mean age 20.2 ± 1.2) took part in the experiment. The participants were sat 50 cm from the screen center in a dark room. All stimuli were presented against a gray (128, 128, 128, RGB) background. Gabor patches with a diameter of 2.15 cm (2.5° × 2.5° visual angle) were used.
To generate clear local BM from the feet, we recorded the front view of a walking male with his side-view facing the Kinect sensor. Moreover, to precisely record the trajectory of the movement of the feet, we placed the Kinect sensor on the floor such that it was at the same level as the feet. The final feet BM consisted of two white points, exhibiting a visual angle of 3.3° × 0.7°. It contained 30 frames, and could be presented in cycle at a rate of 30 fps. The initiated frame was selected randomly in presentation. The others were the same as Experiment 1.
Design and procedure
A 2 (Congruency between the direction of motion and the Gabor position: congruent vs. incongruent) × 2 (BM Orientation: upright vs. inverted) within-subjects design was adopted. Each combined condition consisted of 40 trials, resulting in a total of 160 randomly trials. Before the formal experiment, the participants were given at least 20 trials for practice. Since participants were requires to respond as quickly as possible while minimizing errors, we arbitrarily set a criterion of passing the practice as no lower than 95%. If the correction was below 95%, the participants had to redo the practice. However, all participants passed the practice in the first round, and hence no participant required extra practice. There was feedback in the practice trial, but none in the experimental trials.
Results and discussion
These results replicated the findings by Wang et al. (2014), suggesting that the PLD-based BM generated via KBC contained detailed local information, such as motion acceleration and horizontal, translatory motion, which reflexively triggered attentional orientation.
Experiment 3: BM conveying emotions
It is well-known that BM conveys rich social information (see Pavlova, 2012, for a review). One critical aspect is emotion. Researchers have shown that observers can recognize basic emotions, such as happiness, fear, surprise, anger, and sadness, from PLD-based BM stimuli (e.g., Atkinson et al., 2004; Dittrich et al., 1996; Walk & Homan, 1984). Moreover, observers showed a higher accuracy in perception tasks involving emotion when watching PLD-based BM than static PLD frames (Atkinson et al., 2004), which underscores the contribution of BM in emotion perception. To promote the exploration of BM-related emotion, Ma et al. (2006) combined four standardized actions (walking, knocking, lifting, and throwing) and four types of emotions (happiness, anger, sadness, and neutral stimulus) to create an emotional PLD database.
In Experiment 3, we collected a set of emotional PLD and examined whether participants could extract the embedded emotion in it.
Participants and apparatus
Thirty participants (16 male, 14 female; mean age 20.3 ± 1.4) participated in the experiment. Each BM (11.28° × 11.28°) was presented at the center of the screen at a viewing distance of 60 cm from the participants, against a black (0, 0, 0, RGB) background. The other aspects were the same as in Experiment 1.
Generating emotional BM stimuli
To test the effectiveness of KBC in generating BM, we considered that it was important to create a set of BM relying less on body postures in conveying emotion than those used in previous studies (Atkinson et al., 2004; Dittrich et al., 1996). Instead, the participants had to rely more on body movements to make judgments. To this end, we decided to follow Ma et al. (2006) and construct a set of emotional BM by requiring two actors to perform the actions of walking, knocking, lifting, and throwing for each of the four emotional types (happiness, anger, sadness, and neutral).
The BM stimuli were acquired in a 4.5 m × 3.5 m room, with Kinect 2.0 placed on a holder 1.05 m above the floor. Two actors (21 years old; one male, one female) with experience in drama were paid to perform the actions. One of the authors (the fourth) was in charge of conducting BM acquisition. Before recording, the actors were given instructions on the actions and the corresponding emotions. They were allowed to rehearse the action freely. The orders of action and emotion were randomly chosen for each actor. The detailed description of each action was set according to Ma et al. (2006). Once an actor was ready, he or she informed the experimenter and performed the actions within a 2 m × 2 m space, standing two meters in front of the Kinect. Each action was repeated five times (except walking, which required a 30-s nonstop walk), following which the participants were given a rest. This resulted in 32 BM records (4 Actions × 4 Emotions × 2 Actors). Using KBC, we then cut each record into five clips according to the number of repetition. For walking, each record was evenly cut into five clips. This process generated a total of 160 BM clips (32 BM × 5 Repetitions), which were further converted into .avi format files via Psychophysics Toolbox 3.
Finally, we recruited ten volunteers (eight males, two females; mean age 21.1 ± 0.7 years) to participate in the BM evaluation process to select the best clip from the five options for each action–affect combination. The ones with the highest score were used for further experiments. Eventually, 32 clips (4 Actions × 4 Emotions × 2 Actors) were involved in the formal experiment, with a duration ranging from 2 to 5 s.
Design and procedure
A one-factor within-subjects design was used by adopting emotion type (neutral, angry, happy, and sad) as independent variables. Each affect contained eight BM clips (4 Actions × 2 Actors). The experiment consisted of two practice trials and 32 randomly presented formal trials. No feedback was provided in the experiment, which lasted approximately 10 min.
Results and discussion
In this study, we introduced a C++-based toolbox for use with the Kinect 2.0 to permit the acquisition of PLD-based BM in an easy, low-cost, and user-friendly way. We conducted three experiments to examine whether the KBC-generated BM can genuinely reflect the processing characteristics of BM (Atkinson et al., 2004; Bertenthal & Pinto, 1994; Wang et al., 2014), and obtained positive results. In particular, we showed that: (1) The KBC-generated BM was processed in a global manner: participants could perceive BM from a set of dynamic noise, and this ability was significantly impaired after inverting the stimuli. (2) KBC-generated local BM for the feet effectively retained the detailed local information of BM, such as motion acceleration and horizontal, translatory motion, which can trigger reflexive attentional orientation. (3) KBC-generated BM conveyed emotional information: Participants could effectively distinguish among four basic types of emotion from the same set of actions. Therefore, we think that the KBC toolbox can be useful in generating BM for future studies.
In addition to the advantages of the KBC toolbox mentioned above, three other advantages are also worth noting. First, the generation of BM is fairly quick. The KBC toolbox does not need markers attached to the human joints and offers a filter to smooth the motion trajectory. Therefore, the BM can be created within 5 min once the agents are available. As a consequence, second, the KBC offers a portable method to generate BM due to the nature of the Kinect sensor. Therefore, creating BM can be extended from within a well-equipped laboratory to real-world situations—for instance, classrooms or offices—which can allow us to obtain more natural and rich BM, particularly considering that the Kinect Sensor 2.0 can simultaneously track up to six agents. Third, the 3-D positions of the joint points of BM via the KBC toolbox are close to their actual positions in the human body (Bonnechère et al., 2014; Xu & McGorry, 2015); by contrast, most motion capture systems use marker positions at the skin surface to represent the positions of joints (Shotton et al., 2013).
Finally, we should note that the KBC toolbox also has a few constraints in practical use, predominantly due to the technical limitations of the Kinect sensor. First, KBC cannot accurately track and locate the joints when the skeleton joints overlap—for instance, when facing an agent’s profile or an irregular posture (such as a squat), or when one person has passed in front of a second person, leading to an overlap of limbs. Therefore, future KBC users will need to avoid these cases. Second, KBC cannot work well when the participants are too close or too far from the Kinect senor. It is recommended that an agent perform actions 1.5 m away from Kinect. Third, KBC at present can only track three hand gestures, far fewer than are used in real life. We are now considering the possibility of integrating Leap motion, which has a powerful function to track hand gestures, with the KBC toolbox.
This system uses infrared cameras to detect light reflected by markers placed on the surface of an agent and estimates the 3-D position of each joint. It is well known for its high precision and accuracy in measurement (Windolf, Gotzen, & Morlock, 2008) and is widely used in the biomechanical and clinical fields.
The script and GUI Toolbox can be freely accessed by visiting http://person.zju.edu.cn/en/zaifengg/702583.html.
We gratefully acknowledge the efforts and advice from Yanzhe Li and Wenmin Li in preparing this project. This project was supported by the National Natural Science Foundation of China (31271089 and 61431015), a project grant from the Ministry of Science and Technology of the People’s Republic of China (2016YFE0130400), and the Science and Technology Innovation Project for Undergraduates, Zhejiang Province.
- Bonnechere, B., Jansen, B., Salvia, P., Bouzahouene, H., Omelina, L., Moiseev, F., & Van Sint Jan, S. (2013). Validity and reliability of the Kinect within functional assessment activities: Comparison with standard stereophotogrammetry. Gait Posture, 39, 593–598. doi: 10.1016/j.gaitpost.2013.09.018 CrossRefPubMedGoogle Scholar
- Bonnechere, B., Jansen, B., Salvia, P., Bouzahouene, H., Sholukha, V., Cornelis, J., & Van Sint Jan, S. (2014). Determination of the precision and accuracy of morphological measurements using the Kinect sensor: Comparison with standard stereophotogrammetry. Ergonomics, 57, 622–631. doi: 10.1080/00140139.2014.884246 CrossRefPubMedGoogle Scholar
- Hodgins, J. K., Wooten, W. L., Brogan, D. C., & O’Brien, J. F. (1995, August). Animating human athletics. Paper presented at the Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA.Google Scholar
- Hubert, B., Wicker, B., Moore, D. G., Monfardini, E., Duverger, H., Fonséca, D. D., & Deruelle, C. (2007). Brief report: Recognition of emotional and non-emotional biological motion in individuals with autistic spectrum disorders. Journal of Autism and Developmental Disorders, 37, 1386–1392. doi: 10.1007/s10803-006-0275-y CrossRefPubMedGoogle Scholar
- Kawai, Y., Asada, M., & Nagai, Y. (2014, October). A model for biological motion detection based on motor prediction in the dorsal premotor area. Paper presented at the 4th International Conference on Development and Learning and on Epigenetic Robotics, Frankfurt, Germany.Google Scholar
- Kleiner, M., Brainard, D., & Pelli, D. (2007). What’s new in Psychtoolbox-3? Perception, 36(ECVP Abstract Suppl.), 14.Google Scholar
- Runeson, S. (1994). Perception of biological motion: The KSD-principle and the implications of a distal versus proximal approach. In G. Jansson, W. Epstein, S. S. Bergström (Eds.), Perceiving events and objects (pp.383–405). Hillsdale, NJ: Erlbaum.Google Scholar