Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for Virtual Human Animation Prediction
One type of experiential learning in the medical domain is chat interactions with a virtual human. These virtual humans play the role of a patient and allow students to practice skills such as communication and empathy in a safe, but realistic sandbox. These interactions last 10–15 min, and the typical virtual human has approximately 200 responses. Part of the realism of the virtual human’s response is the associated animation. These animations can be time consuming to create and associate with each response.
We turned to crowdsourcing to assist with this problem. We decomposed the process of creating basic animations into a simple task that nonexpert workers can complete. We provided workers with a set of predefined basic animations: six focused on head animation and nine focused on body animation. These animations could be mixed and matched for each question/response pair. Then, we used this unsupervised process to create a machine learning model for animation prediction: one for head animation and one for body animation. Multiple models were evaluated and their performance was assessed.
In an experiment, we evaluated participant perception of multiple versions of a virtual human suffering from dyspepsia (heartburn-like symptoms). For the version of the virtual human that utilized our machine learning approach, participants rated the character’s animation on par with a commercial expert. Head animation specifically was rated more natural and typically expected than other versions. Additionally, analysis of time and cost show the machine learning approach to be quicker and cheaper than an expert alternative.
KeywordsCrowdsourcing Machine Learning Virtual Human Animation Pipeline
- Adda G, Mariani J, Besacier L, Gelas H (2013) Economic and ethical background of crowdsourcing for speech. In: Crowdsourcing for speech processing: applications to data collection, pp 303–334Google Scholar
- Borish M, Lok B (2016) Rapid low-cost virtual human bootstrapping via the crowd. Trans Intell Syst Technol 7(4):47Google Scholar
- Brand M, Hertzmann A (2000) Style machines. In: 27th SIGGRAPH, pp 183–192Google Scholar
- Callison-Burch C, Dredze M (2010) Creating speech and language data with Amazons Mechanical Turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, number June, pp 1–12Google Scholar
- Deng Z, Gu Q, Li Q (2009) Perceptually consistent example-based human motion retrieval. In: Interactive 3D graphics and games, vol 1, pp 191–198Google Scholar
- Hoon LN, Chai WY, Aidil K, Abd A (2014) Development of real-time lip sync animation framework based on viseme human speech. Arch Des Res 27(4):19–29Google Scholar
- Hoque ME, Courgeon M, Mutlu B, Picard RW, Link C, Martin JC (2013) MACH: My Automated Conversation coacH. In: Pervasive and ubiquitous computing, pp 697–706Google Scholar
- Huang L, Morency LP, Gratch J (2011) Virtual rapport 2.0. In: Intelligent virtual agents, pp 68–79Google Scholar
- Madirolas G, de Polavieja G (2014) Wisdom of the confident: using social interactions to eliminate the bias in wisdom of the crowds. In: Collective intelligence, pp 2012–2015Google Scholar
- Marsella S, Lhommet M, Feng A (2013) Virtual character performance from speech. In: 12th SIGGRAPH/Eurographics symposium on computer animation, pp 25–35Google Scholar
- Mcclendon JL, Mack NA, Hodges LF (2014) The use of paraphrase identification in the retrieval of appropriate responses for script based conversational agents. In: Twenty-seventh international flairs conference, pp 19–201Google Scholar
- Rossen B, Lok B (2012) A crowdsourcing method to develop virtual human conversational agents. IJHCS 70(4):301–319Google Scholar
- Rossen B, Cendan J, Lok B (2010) Using virtual humans to bootstrap the creation of other virtual humans. In: Intelligent virtual agents, pp 392–398Google Scholar
- Sargin ME, Aran O, Karpov A, Ofli F, Yasinnik Y, Wilson S, Erzin E, Yemez Y, Tekalp AM (2006) Combined gesture-speech analysis and speech driven gesture synthesis. In: Multimedia and Expo, number Jan 2016, pp 893–896Google Scholar
- Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, p 1642Google Scholar
- Triola MM, Campion N, Mcgee JB, Albright S, Greene P, Smothers V, Ellaway R (2007) An XML standard for virtual patients: exchanging case-based simulations in medical education. In: AMIA, pp 741–745Google Scholar
- Wang L, Cardie C (2014) Improving agreement and disagreement identification in online discussions with a socially-tuned sentiment lexicon. In: ACL, vol 97, p 97Google Scholar
- Xu Y, Pelachaud C, Marsella S (2014) Compound gesture generation: a model based on ideational units. In: IVA, pp 477–491Google Scholar