Keywords

1 Introduction

Robot nowadays becomes involved in Human’s life. Robot has been applied in various industries and bring people’s daily lives more convenience and efficiency. Previous studies have revealed that social robots can be used as museum guides (Gehle et al. 2017), as information-providers at shopping malls (Kanda et al. 2009), as teaching assistants (Ferrarelli et al. 2018), mental-care for elderly people (Broadbent et al. 2014), in autism therapy (Rudovic et al. 2017). User’s interactive experience is an important issue as people interact with technological artifacts. Emotional experience has recently aroused wide attention in the field of human-robot interaction (Breazeal 2003; Jokinen 2015).

Karaoke is one of the most popular recreational activities in China. The statistics from 2017 Music Industry Development Report, introduced by Music Industry Promotion Industry, showed that the total revenue of karaoke industry in China was estimated to 86.9 billion RMB. However, the number of traditional karaoke has decreased with the development of technology. Many people no longer satisfy with the physiological needs in Maslow’s Hierarchy of Needs, they are seeking for love, belonging and other higher needs instead. Previous research shows that karaoke singing reflects human being’s psycho-social needs (Ruismäki et al. 2013). Thus, traditional karaoke cannot meet these kinds of needs and has been gradually replaced by mini KTV, a new type of karaoke. Some mini KTVs adopt new technology as novel entertaining devices in karaoke, such as augmented reality (AR) and virtual reality (VR), in order to attract more consumers. Besides, as mini KTV popping up on the street, people can approach to karaoke more easily. According to the 2017 China Mini KTV Market Research Report, released by the AskCI Consulting, the scale of mini KTV in China is expected to hit more than 200,000 in 2022 and the market size will reach 31 billion RMB. Based on the trend of mini KTV, we explored the application of robot in karaoke environment and designed an Emotional Karaoke Robot (EKR) system in this study. The purpose of EKR system was to improve user’s satisfaction and emotional experience in KTV. There were four different conditions (simple interaction task, diverse neutral emotional interaction task, diverse positive emotional interaction task, diverse negative emotional interaction task) designed to help researchers explore how participants interact with, perceive and react toward Emotional Karaoke Robot under four kinds of condition.

2 Literature Review

2.1 Emotion in Human-Robot Interaction

Emotions play a significant role in human-robot interaction and emotional behaviors are commonly used in the design of social robots. Previous studies have explored design factors which cause human empathy toward robots (Kwak et al. 2013). Kim et al. (2009) designed a robot which expressed emotional state by displaying with its bruising and complexion color. The robot could recognize humans’ emotion in a conversation. A blue bruise was emerged when the robot perceived the negative emotional state of the human. Choi et al. (2014) tested the effect of robot types on emotional communication and indicated people felt more embarrassed when they interacted with tele-operated robots than autonomous robots. Wei et al. (2016) examined the effects of robots’ emotional motion patterns on the user’s perception of the robot. In addition, emotion recognition system was established by Devillers et al. (2015) and the system could further drive the expressive behavior of the robot.

Affection is a specific set involving emotions, moods, or attitude, among which the intensity of emotion is the most significant. Emotion is a dynamic process between organism and environment (Lazarus 1982). In psychology research, there are two primary approaches used to describe emotions: One is “dimensional approach” (continuous), which describes the full and continuous spectrum of human emotions into three independent, bipolar dimensions: Pleasure (the one perceived positively or negatively), Arousal (one’s sense of energy, ranging from sleepy to excited) and Dominance (PAD) (Mehrabian and Russell 1974). The Self Assessment Manikin (SAM) is a widely adopted visual self-report tool to measure subjective feelings. The other one is “discrete emotion approach” (categorical), which describes the spectrum of human emotions as a mixture of a limited number of different basic emotions. Those basic emotions are considered to be universal and possess distinct adaptive values (Ekman 1992).

2.2 Human-Social Robot Interaction

It has become increasingly apparent that social robots play a crucial role in the world, collaborating with humans and engaging in people’s lives. The social robot is different from an industrial robot. The core element of social robot is its interactivity with a human. Breazeal (2004) defined sociable robot as “socially intelligent in a human-like way, interacting with it is like interacting with another person”. It means a sociable robot is designed to interact with human as if human being. Humanoid social robots were divided into utilitarian humanoid social robots and affective humanoid social robots (Zhao 2006). The former one was designed for instrumental purposes, whereas the latter one was used to interact with humans on an emotional level.

The term “socially interactive robot” was proposed by Fong et al. (2003) and it focused on peer-to-peer human-robot interaction. The concept of the socially interactive robot is largely different from “master-slave” command ways of conventional human-robot interaction. Therefore, “socially interactive robot” referencing “human social” characteristics such as express and perceive emotion, learn/recognize models of other agents, communicate with high level dialogue, establish and maintain social relationships, use natural cues (gaze, gestures etc.), exhibit distinctive personality and character, and learn/develop social competencies (p. 145.).

In addition, previous research indicated that usefulness, adaptability, enjoyment, sociability, companionship and perceived behavioral control are key variables for the acceptance of social robots (De Graaf and Allouch 2013). Successful human-social robot interaction requires the robot engage in properly. The most common interactive modalities in current in human-robot interaction could be inducted as follows:

  1. 1.

    Appearance: Mori (1970) proposed the well-known uncanny valley hypothesis, which suggested the relationship between the degree of a human-like object and emotional response to the object. The more humanoid object, the more likable people perceive. However, an object in real human being appearance would elicit uncannily and make people feel revulsion. Nevertheless, Goetz et al. (2003) suggested that a robot’s appearance and behavior should match the situation which people expect the robot’s role in. Hence, participants did not always choose the more humanlike one in their experiment but preferred the robot’s appearance matched the sociability required.

  2. 2.

    Gestures: Sidner et al. (2005) indicated the engagement gestural abilities of robots attract humans’ attention directly. Riek et al. (2010) showed that gesture type and gesture style of a robot significantly affected overall reaction and cooperation time when interacting with a robot.

  3. 3.

    Speech: Conversation is a basic way of social interaction. Kanda et al. (2002) indicated that subjects interacted with the robot in the similar way of communicate with humans. In the experiment, they observed subjects performed interpersonal behaviors such as giving responses to the robot and voluntarily spoke to it. Hence, a conversational robot may be designed to communicate with humans seamlessly and effectively with the human voice.

  4. 4.

    Emotion: Emotional interaction is a key issue in HRI. An emotional interactive social robot can benefit from its natural and human manner interaction. Emotion expression through multimodal communication channels. For instance, some studies focused on face-based emotion expressions (Kirby et al. 2010; Kishi et al. 2012), gesturing (wei et al. 2016), and emotional speech (Devillers et al. 2015). Hence, in the design of the emotional social robot, emotional expression, including facial expression, body movement and gesture, and speech and vocalization, should be taken into consideration.

Hypotheses

We expect the interaction with Emotional Karaoke Robot will improve participants’ psychological well-being in all four experimental conditions. Due to manipulation of the level of emotional enhancement and fulfillment by the robot, we predict each task will have a different effect. Based on the overview above, we draw the following hypotheses:

  • Participants’ emotional reactions will be enhanced during the experiment.

  • Participants under positive/negative emotional interaction condition will have stronger emotional reactions than under neutral condition.

  • Participants under positive/negative emotional interaction condition will rate higher user satisfaction scores than under neutral condition.

3 Concept Design

An overview of the system architecture is shown in Fig. 1. There are three main parts of EKR system that create human-robot interaction: sensor, dialogue, and motion. The speech and visual input signals are detected by sensors embedded in the NAO robot. The sensor is capable of detecting the presence of a person. Nao can detect human faces and track users. The face detection module enables better non-verbal human-robot interaction. Moreover, Nao has four directional microphones and loudspeakers. The speech dialogue interactive system comprises user’s input speech reception and speech response delivery. The robot identifies the user and has eye contact with the user to make the interaction more natural. The speech recognition module makes NAO recognize users’ voice and provide adequate feedback. It generates the movement corresponding to the speech content simultaneously.

Fig. 1.
figure 1

Concept of the EKR system design

4 Method

4.1 Participants

16 subjects, including 8 men and 8 women (mean age M = 23), were recruited from Tsinghua University. Besides, participants have been to karaoke in the past 6 month (mean times M = 3). Informed consents were written by all subjects prior to the experiment. At the end of experiment, each of them was paid 20 RMB for participation.

4.2 Apparatus

We employed NAO robot in our experiment for three reasons. Firstly, NAO robot is an interactive medium-sized humanoid robot. Previous research showed that human-like appearance facilitates an intuitive style and familiarity (Oztop et al. 2005). Secondly, the appearance of NAO robot is lifelike. Moreover, with 25 degree of freedom and a suite of joint sensors, NAO can move smoothly and flexibly. (SoftBank Robotics 2017).

4.3 Scenarios

The NAO robot was pre-programmed to interact with the participants, and each participant interacted alone with NAO in a room lasted on average for 40 to 50 min. The procedure for the simple interaction task session was as follows:

  1. 1.

    NAO greeted the participant with a hand wave and gave a brief self-introduction.

  2. 2.

    NAO sang the song “Actor” written by Xue Zhiqian

  3. 3.

    The participant sang a song

  4. 4.

    NAO sang the song “She Says” written by Lin Junjie

  5. 5.

    The participant sang a song

  6. 6.

    The participant sang a song

  7. 7.

    The session was finished by the robot saying, “Nice meeting you and thanks for having a great karaoke experience”

In our experiment, we programmed three main motions in different situations. They were waving, holding microphone and sitting respectively. Table 1 showed how NAO performed in the real scenes. In addition, the experiment set-up was shown in Fig. 2.

Table 1. NAO’s three main motions
Fig. 2.
figure 2

Experiment set-up: participants sang songs with the robot

4.4 Task and Procedure

We designed four conditions, which were associated with certain feeling that could be triggered within participants. A within-subject design was adopted to each subject and we explored important factors that affect users’ satisfaction with Emotional Karaoke Robot. The details in four tasks were described as follow:

Task 1 is a basic set, called Simple interaction task. The NAO robot and the participant sang song by turns. The experiment design of human-robot interaction was less. And it would not provoke negative or positive feelings within the participant.

Task 2 is a diverse neutral emotional interaction task, which is the set that the robot had more interactions with the human, such as imitating another singer, bringing a dance show.

Task 3 is called diverse positive emotional interaction task, which is the set of eliciting positive emotion. Namely, NAO robot tend to give the participant positive feedback and the lucky money will be sent as reward via WeChat after his/her singing. In addition, participants also sent lucky money to robot, and the amount of lucky money depends on the performance of robot. It is also an alternative way to grade the robot’s singing performance.

While Task 4 is diverse negative emotional interaction task, which is the set of eliciting negative emotion. In other words, NAO robot will interrupt the participant directly during the singing and give the participant negative feedback.

Speaking of the experiment procedure, participants were given a brief description of the experiment’s procedure and the consent form after arriving the laboratory. Then, they were provided with a pre-test questionnaire, which is about the perception of robot. After listening to the “Weightless” music, participants assessed their affective state with the SAM (Self-Assessment Manikin) scales and DES (Differential Emotion Scale).

In the period of the experiment, they interacted with NAO and followed NAO’s instructions. At the end of the task, participants had to fill out the post-test questionnaire, including SAM scales, DES, hedonic value and satisfaction. Afterwards, each participant was asked for a brief interview and was paid a 20 RMB participation fee.

4.5 Measures and Analysis

In order to assess participants’ affective state, the Self-Assessment Manikin (SAM) and Differential Emotion Scale (DES) (5-point Likert scale) were used. And the four-item hedonic value questionnaire (5-point Likert scale) from Wu et al. (2015) was adopted. (e.g., ‘I had a good time singing here because I felt a sense of freedom’, ‘I enjoyed being immersed in singing songs here’, ‘Compared to other things, the experience was truly enjoyable’, ‘Compared to other things, the experience was truly enjoyable’). Furthermore, we evaluated the design of Emotional Karaoke Robot system via the fourteen-item satisfaction questionnaire, which was proposed by Cook (1991).

5 Results and Discussion

5.1 Quantitative Data (Questionnaire Results)

How do the participants perceive and their affect toward Emotional Karaoke Robot in the four different conditions?

The results of participants’ emotions toward robot interaction from the Self-Assessment Manikin (SAM) questionnaire were shown in Figs. 3 and 4. Participants in task 1, task 3 and task 4 had the same average arousal level (M = 4.25), which was higher than task 2 (M = 4.00) (Fig. 3). An analysis of the ratings of the robot’s valence indicated that participants in task 3 had a highest average score (M = 4.75) (Fig. 4). In a sequence were task 4 (M = 4.5), task 1 (M = 4.25), and task 2 (M = 4.00). Among all the four tasks, average rating of the robot’s valence lay between 4 to 4.75 on the five-point scale. Therefore, the results showed that participants had positive attitudes toward Emotional Karaoke Robot.

Fig. 3.
figure 3

Average ratings of the robot’s arousal of all tasks from questionnaire

Fig. 4.
figure 4

Average ratings of the robot’s valence of all tasks from questionnaire

For the analysis of hedonic value, the mean hedonic value score of task 3 was the highest (4.63) while the followings were task 1 (4.31), task 4 (4.06), and task 2 (3.56) (Fig. 5). Thus, the results suggested that participants in task 1, task 3, and task 4 had great hedonic experiences.

Fig. 5.
figure 5

Average ratings of the robot’s hedonic value of all tasks from questionnaire

In terms of the analysis of satisfaction, the mean satisfaction score was 4.64 in task 1 group, 4.52 in task 2 group, 5.3 in task 3 group, and 5.38 in task 4 group (Fig. 6). Hence, participants in diverse positive/negative emotional interaction task rated higher satisfaction scores than in diverse neutral emotional interaction task.

Fig. 6.
figure 6

Average ratings of the robot’s satisfaction of all tasks from questionnaire

According to the analysis of pre and post emotion, we found that no matter what task it was, all participants were aroused and had positive emotional reactions in the period of the experiment (Figs. 7 and 8).

Fig. 7.
figure 7

Average ratings of the robot’s arousal of all tasks from pre-post questionnaire

Fig. 8.
figure 8

Average ratings of the robot’s valence of all tasks from pre-post questionnaire

Figure 9 showed the overall differential emotion scale (DES) scores of participants before and after the experiment under four tasks. Based on the DES analysis, participants felt more interested, joyful and surprised after the experiment stimulus.

Fig. 9.
figure 9

Average ratings of the robot’s differential emotion of all tasks from pre-post questionnaire

5.2 Qualitative Data (Observations of Interactions with ERK)

How do the participants interact with Emotional Karaoke Robot?

We observed participants’ interaction from three perspectives during the experiment, including sing with the robot, movement and respond to the robot (Fig. 10). In the dimension of singing with the robot, there were 11 participants (11 out of 16) singing with the robot. As for the movement of robot, participants felt surprised and awesome while watching the dancing robot. Furthermore, most participants (8 out of 12) gave full marks to the performance of the robot. In the interview, many participants mentioned that they were impressed a lot when they saw the robot performing the Gangnam Style dance. From the aspect of responding to the robot, only five-sixteenths participants greeted the robot with hand waving. But more than half participants (10 out of 16) responded or talked to the robot. For instance, one participant asked the robot, “Can you sing Cantonese song?”

Fig. 10.
figure 10

Interact with Emotional Karaoke Robot

Moreover, those who took part in task 3, the diverse positive emotional interaction task, further expressed that it was interesting and surprising when they got lucky money from robot via WeChat unexpectedly. This behavior was viewed as one kind of encouragement to their performance and it also motivate them to send back lucky money as a reward to robot’s singing performance. Also, participants’ attitude toward sending lucky money was optimistic (Table 2).

Table 2. Interact scenes

6 Conclusion

This study applied the robot to karaoke and explored the design and the implementation by conducting experiments. The experiment results showed that the participants were successfully aroused and affected by Emotional Karaoke Robot during the experiment time. The participants felt interested, joyful and surprised while interacting with the robot in the period of the experiment. Hence, designers can refer the design system of Emotional Karaoke Robot. The average level of arousal, hedonic value, and user satisfaction under diverse positive/negative emotional interaction condition is greater than under the neutral one. Moreover, participants in diverse positive/negative emotional interaction task felt more positive than in diverse neutral emotional interaction task.

However, this study has several limitations. First, the sample size was small. Only 16 participants were recruited in the experiment and only 4 participants were tested for each condition. Second, participants had positive attitudes toward the Emotional Karaoke Robot on the whole. It was a new thing for all participants to sing with the robot. Therefore, long-term interaction should be take into consideration. Finally, participants’ personality and attitude toward robot would influence their behavior. According to previous studies, people with gender, age, cultural background, and region difference have an impact on their acceptance toward social robot (Kaplan 2004; Rau et al. 2009). As a result, these variables should be taken into account in designing the social robots.

In conclusion, people’s attitude is positive toward the application of robot in karaoke. And appropriate feedbacks will lead more hedonic and satisfied for people, which tend to arouse people’s emotions and also elevate the circumstance of karaoke environment. With the combination to some social media, for example, the lucky money in WeChat, also a good way to enhance people’s motivation to use robot as their partner in karaoke.