Keywords

1 Introduction

The number of older adults in the US is increasing dramatically, growing from 35 million in 2000 to an estimated 74 million in 2030 [1]. Many older adults suffer from Alzheimer’s disease or other dementias, which affect their memory and/or other cognitive skills such as communication, ability to focus, and reasoning. An additional 15–20% older adults have mild cognitive impairment (MCI) and are at high risk of developing dementia [2]. In addition to the prevalence, Alzheimer’s and many other dementias are progressive and costly. In 2017, dementia cost the nation $259 billion [2]. To date, there is no cure to slow or stop Alzheimer’s and most dementias. Activity-oriented therapies, including regular physical exercise, cognitive stimulation, and social engagement, have been found to be beneficial for the physical and mental health of older adults with and without cognitive impairment (CI) and may reduce the risk of developing Alzheimer’s disease and other dementias [1,2,3,4].

In order to mitigate the substantial emotional, financial, and physical burdens of caregivers, researchers have been investigating applications of sensor-based technologies, virtual avatars/environments, and robotics to support the care of older adults [5]. Many systems were designed to assist older adults in aging in place by monitoring their behaviors and providing alarms and reminders through networks of sensors [6, 7]. Animal robots, such as PARO, and telepresence robots, such as Giraff, were developed to provide older adults with social support and reduce their stress [6, 8]. More recently, intelligent systems have been developed to administer activity-oriented therapies. McColl et al. developed a socially assistive robotic (SAR) system Brian to encourage older adults with and without CI during a meal eating activity and a cognitively stimulating activity [9]. Fasola et al. designed a SAR system Bandit to administer physical exercise sessions with older adults [10]. Young et al. developed a platform consisting of a Wii balance board and a virtual environment for the purpose of training older adults’ balance function through interacting with virtual tasks [11]. Anderson-Hanley et al. compared the effect of stationary cycling with and without virtual reality tours on older adults’ cognition and found that cycling with virtual reality tours had greater potential for preventing cognitive decline [12]. Although these systems were promising in their ability to engage older adults in activity-oriented therapies and potentially benefit their cognitive function, the systems focus solely on one-on-one interaction with the system and thus do not support social engagement by involving multiple older adults simultaneously. Without social interaction with other humans, many older adults feel socially isolated and can suffer from apathy [13].

Research on computer/SAR systems interacting with multiple older adults is still in its early stage. Matsusaka et al. developed a SAR system TAIZO to lead physical exercise session in front of a group of older adults [14]. TAIZO is an open-loop system and does not have the ability to analyze older adults’ performance during interaction. Louie et al. developed a SAR system Tangy which can play a bingo game with a group of older adults in a closed-loop fashion [15]. The system can interact with the group as a whole or with each individual. However, it could not capture social interaction among the older adults. In our previous work, we developed a SAR system RAMU that could engage two older adults simultaneously in a physically and cognitively stimulating activity [16]. Although social communications were observed during the interaction, the task was not designed to promote social engagement.

In this work, we designed and developed a novel collaborative virtual environment (CVE) that through human-machine interaction (HMI) actively supports activity and social engagement for older adults with and without CI. In this CVE, two older adults interact with a virtual environment through physical movements. Collaborative components are embedded within the CVE design to encourage human-human interaction (HHI) in addition to HMI. The CVE continuously evaluates older adults’ activity compliance and collaboration status in order to provide feedback to keep older adults engaged in both HMI and HHI. We believe that the CVE system with the ability to support social engagement will be more beneficial in enhancing the overall health of older adults than systems focusing solely on older adults’ functional ability. In this paper, we present the development of the CVE and the preliminary user study results on system validation and older adults’ tolerance and acceptance of the system. The rest of the paper is organized as follows. Section 2 describes the overall system framework and details the design and development of the motion-based CVE application. Section 3 presents the experimental setup and procedure as well as the participants’ information. Section 4 provides the results on system usability and participants’ interaction including performance, interaction frequency, and conversation duration. Finally, we conclude the paper in Sect. 5 with a discussion of the current results, the limitations, and future directions.

2 System Design

2.1 Overview

The system has two main components: a motion-based CVE application and a robotic facilitator. Figure 1 illustrates the overall system framework. Two users interact with the CVE through a motion-based user interface (UI) using the Kinect sensor [17]. The Data Management module is responsible for recording users’ real time interaction data. Users’ interaction together with the change of the game state trigger audio-visual feedback from the CVE application to support HMI and HHI. In addition, the CVE application sends events to a physically embodied robot through socket communication in order to provide additional feedback and facilitate HMI and HHI. A humanoid robot NAO [18] was used to serve as an artificial intelligent (AI) player, to help users on their motion-based cursor control and to encourage collaboration. In this paper, we focus on the design and development of the motion-based CVE application. The CVE application is based on a sorting activity where older adults sort books with different colors into color-matched collection bins. The Unity game engine [19] was used to develop the 3D book-sorting task shown in Fig. 2. Each user controls one hand cursor in the CVE by upper body movement and hand manipulation. The subsections that follow elaborate on the design of the motion-based UI, the models of computation used to develop the CVE task, and the data management module.

Fig. 1.
figure 1

System framework overview.

Fig. 2.
figure 2

Two users were interacting with the CVE. (Color figure online)

2.2 Motion-Based UI

We implemented a motion-based UI to introduce physical activity in the CVE task as well as to remove the need for keyboard and mouse, which are not user friendly for older adults. The motion-based UI is designed based on Kinect’s skeletal tracking function and hand tracking function. The first step is to map each user’s hand joint position to the cursor position in the CVE. This is realized by defining an interaction box for each hand (Fig. 2) and mapping the relative positions of the hand joint with respect to the interaction box to the cursor positions in the CVE. The positions and the sizes of the interaction boxes are defined by each user’s left shoulder (SL), right shoulder (SR), left hip (HL), right hip (HR), and spine base (SB) joints and are listed in Table 1. When users move their hands to the left, the hand cursors move to the left of the CVE, otherwise the hand cursors move to the right. Similarly, when users move their hands down, the hand cursors move down, otherwise the hand cursors move up. The hand cursor does not move in the third dimension unless the user is holding onto a book. Users move books closer to them by moving their hands towards their chest and move books away from them by moving their hands away from their chest.

Table 1. Definition of interaction box.

The second step is to determine which hand is currently controlling the cursor and to allow book manipulation by simple hand gestures. We designed a hierarchical state machine (HSM) to handle hand switching and hand manipulation (Fig. 3). The current control hand is determined by the relative positions of user’s left and right hands and interaction boxes. The left hand is interacting (LHI) if its position is within or around the left hand interaction box, similarly for the right hand. When there is no hand cursor and LHI event occurs, left hand is the current control hand that moves the cursor. If only the right hand is interacting (RHI), right hand is set as the current control hand. In the case that both LHI and RHI events occur, the hand that was interacting with the system is set as the current control hand. Kinect’s hand state detection algorithm returns five possible hand states, which are closed, lasso, not tracked, open, and unknown. Initially, the left or right hand cursor is in release state. If closed or lasso hand state event is detected, the cursor state takes the transition to grip state. If open hand state is detected, the cursor state becomes release state. Cursor movements together with hand manipulations enable users to grip, move, and release virtual objects in the CVE through physical movements.

Fig. 3.
figure 3

User interface model.

2.3 CVE Task Design

Main Task.

For the purpose of supporting social engagement, we embedded collaborative components in the task so that users have to communicate with each other by verbally exchanging information or physically moving books. The virtual space is divided into two interaction areas, marked by the red and green vertical lines (Fig. 2). Different users’ left and right interaction boxes are mapped to different interaction areas. The red cursor can move freely to the left of the green vertical line whereas the green cursor can move freely to the right of the red vertical line. As a result, the virtual space in between the red and green vertical lines is accessible by both users. The red cursor cannot move books inside the green collection bin, and the green cursor cannot move books inside the red collection bin. We refer to the virtual space that is accessible by only the red cursor as red only area (ROA). The green only area (GOA) is defined in the similar way. Notice that some red books are in GOA and some green books are in ROA. These books are designated as ‘team bonus’ books. Implicit rules for collaboration are attached to these books by varying the scores of the books. If the users collaborate, the score of the book increases. Otherwise, the score of the book decreases or remains low. When the green cursor moves a red team bonus book from GOA inside the red square, it is easy for the red cursor to collect the book. Such a move is called a collaborative move. When red cursor moves a green book away from green cursor’s interaction area, i.e., inside ROA, the green cursor is not able to collect the book and the move is called a competitive move. The restricted interaction areas together with the team bonus books form the collaborative components in the CVE task.

The CVE was modeled by timed automata and HSMs to support both continuous and discrete events, state hierarchy, and concurrency. There are models for displaying audio-visual feedback, for online analysis of users’ interactions, for movement and score of books, for determining when a book is selected by which user, and for socket communication. It is not possible to present all the models, instead we focus on the HSM model for the books. Figure 4 illustrates the top level model that describes how users change book positions and scores. The Book Manager state keeps track of the number of red and green books in the CVE, spawns new books, and removes collected books from the bins. The Book Position Adjustment state gradually shifts the books inside the camera view of the CVE in the event users drop the books at the boundaries to guarantee enough visibility of books. Each book has its own concurrent state machines, one for controlling the position and movement of the book, and one for controlling the score of the book. Initially, books are spawned at four locations in the CVE. Users grab a book by moving their hand cursors onto a book and closing their hands (grip cursor state). A selected book is highlighted and moves in the environment following the user’s hand cursor. Another user cannot grab and move the book that is currently highlighted. When the book is in move state and release event is detected, the selected book drops onto the virtual floor by gravity. If the book drops in a color-matched bin, it is collected and removed from the scene (collect state). Otherwise, it stays on the floor (stay state). The release event in Fig. 4 occurs if the user open his/her hand (release cursor state) or the distance between cursor and the book center is above the threshold (200 in pixels). Although a moving book always follows the position of cursor, when the user tries to move the book below the virtual floor, release event is triggered and the book goes back to stay state. The initial score for each book is 5 points. For team bonus books, collaborative move event increases the score to 10 points and competitive move event decreases the score back to 5 points. The countdown timer state records the remaining interaction time and would end the task after 6 min. There are 6 normal books and 10 team bonus books. Users can achieve a maximum score of 80 points without collaboration and a maximum score of 130 points with collaboration. To win the game, they need to receive at least 100 points.

Fig. 4.
figure 4

Hierarchical state machine (HSM) describing movement and score of books.

Post-test Task.

The post-test (Fig. 5) is designed to explore users’ behaviors when they perform a similar task (book sorting) with unknown information. Users see yellow books and a yellow bin but are not aware of the new collaborative rule that they have to move the same book simultaneously (or together) in the same direction. We were interested to see whether users would communicate with each other to figure out the unknown piece of the task. If they cannot move any yellow books half way through the interaction (3 min in total), the robotic facilitator gives them a hint by asking them to try moving the book together. No score is associated with the yellow books, instead we record how far users move yellow books together and how many books are collected.

Fig. 5.
figure 5

Post-test task. (Color figure online)

2.4 Data Management

The data management module records users’ interactions with the CVE application and with each other. These data are stored in csv format in real time and are indexed by timestamps. The performance data file logs the number of books collected for each book type and by each user, the number of collaborative and competitive moves by each user, the time to finish the task, and the total score. The user interaction file logs the motion-based cursor control of each user. These include the interaction hand (left or right), the hand state, the position and type of selected book, and the screen position of the hand cursors. In addition, we record users’ conversations as audio files. Audio files are later transcribed to analyze the content of the conversation. All the data are stored in data buffers and written locally when buffers are full or the task ends.

3 Experimental Design

A small user study was conducted with two pairs of older adults. The study was approved by the Vanderbilt Institutional Review Board. Before the experiment, participants completed the Montreal cognitive assessment (MoCA© Version 7.1) [20] to evaluate their cognition. Participants’ information is shown in Table 2. In the experiment room, there were two chairs, two web cameras, two microphones, a Kinect, a 32-in HD monitor and a NAO robot on the table. An experimenter operated the CVE system and observed older adults’ interaction through a one-way mirror in the observation room (Fig. 6). Participants sat approximately two meters away from the monitor and at a 30% angle toward each other. When a single participant played the game with the robot, one chair was positioned directly in front of the table. The experimental procedure had five components or games (Fig. 6). Each participant first interacted independently with the system and then pairs of participants played with each other. In the Practice game, the robot taught participants how to interact with the system by arm movement and hand manipulation. The length of the Practice depended on how long it took for participants to become familiar with the motion-based UI and collect their first book. Participants then played the main task alone with the robot as the second player. After two older adults completed the single user games, they were paired to play the main task together. They first took turns to play the game and then played simultaneously. Lastly, they completed the post-test to finish the whole session. The first pair of participants finished the session in one visit, and the second pair finished single user games in the first visit and paired games in the second visit.

Table 2. Participant data.
Fig. 6.
figure 6

Experimental setup and procedure.

4 Results

4.1 System Usability

The system worked as designed. For the main task, every collaborative move triggered a rewarding sound and every competitive move triggered an unpleasant sound. Books were spawned correctly, and the movement and score of books followed the HSM model described in Fig. 4. For the post-test task, the yellow books moved in the CVE according to design. When only one participant grabbed a book or two participants tried to move books in different directions, the yellow books did not move. All data files were recorded correctly. Unity was not responsive at the end of one game; for that game the data files were recorded up to the point when the system froze. In the practice game, robot NAO taught participants how to interact with the system step by step following the action order in Table 3. When participants successfully performed one action, NAO proceeded to teach the next action. We computed the time duration for participants to learn each motion-based actions. Participants were able to learn most actions quickly, within one minute, and took about a minute to collect one book successfully. P01 took longer to move a book up and down and collect a book. The rest of the participants took longer to move book forward. From the audio files, participants asked questions like “What am I moving up and down?”, “Is backward this way and is forward this way?”, “This way or that way? When you say forward.” This indicates that participants got confused about the instructions related to depth in the virtual environment. There was no need for operator assistance once participants were familiar with the motion-based UI. There were times participants struggled to master move book forward action. The depth in the CVE is relatively hard for older adults to perceive correctly. Due to the fact that normal aging or Alzheimer’s disease may affect older adults’ sensitivity to depth [21, 22], the motion-based actions related to depth in virtual environment can be challenging for older adults.

Table 3. Time takes for older adults to learn motion-based actions.

4.2 Interaction Data

Table 4 gives the results of participants’ main task performance and their interaction frequency as quantified by hand and book movements in the CVE. Collect and collaboration are the number of books collected and the number of books increased score due to collaboration, respectively. Potential collaboration is the number of times participants tried to collaborate but failed to move books into red or green squares. The main reason is that they failed to perform the move book forward action correctly. This action is relatively hard as shown in Table 3. Hand movement is the normalized accumulated cursor movement per minute. Averaged book distance is the mean value of the travel distance of books moved by participants. Book movement is the accumulated book movement per minute. These three metrics were calculated as indicators of participants’ interaction frequency. If participants were not engaged with the CVE task, they were less likely to interact with the system and all the indicators would have low values. Due to the small sample size, we are unable to draw any conclusions by comparing the results in Table 4. On average, participants’ collaboration decreased when playing together, however, potential collaboration complemented the difference. In fact, it is easier to play with the robot than play with another human. When playing with another human, participants need to worry about their own part of the task as well as their peers’ performance. The interaction frequency are similar for all the main task games, single or dyad. Note that in take turns game, each participant interacted about half the total interaction time. Therefore, hand movement and book movement indicators of take turns are relatively low compared to that of the other games. P03 and P04 performed poorly during take turns game. Take turns game is the first task on their second visit to the lab. P03 did not perform well initially. Since the turn switches after one user successfully collects a book or makes a collaborative move, P04 had very few interaction with the CVE and therefore also collected fewer books.

Table 4. Participants’ performance, interaction frequency, and conversation duration.

In terms of the post-test task, P01 and P02 collected one yellow book without help from the robotic facilitator. They moved two yellow books, one had a travel distance of 4.0 and the other had a travel distance of 5.3. P03 and P04 received a hint from the robot to move book together and were able to collect one yellow book. They moved three books in total. The moving distances were 6.2, 1.4, and 6.9 respectively.

Participants talked with each other a lot. Their conversation mostly focused on helping each other with how to move and collect books, how to increase the score of the book, and remind their peers what hand cursor they were controlling. We computed the amount of HHI by counting the time duration they talked directly to each other. The results are listed in Table 4.

5 Discussion and Conclusion

We developed a CVE system for the purpose of supporting activity and social engagement for older adults with and without CI. For activity engagement, we designed a motion-based UI using Kinect to involve older adults’ in physical movement, and developed a book-sorting task to involve older adults’ in cognitive activity. For social engagement, we designed collaborative rules to encourage social communication between older adults. Older adults have to collaborate with each other in order to win the game. The system records quantitative data regarding participants’ individual and collaborative performance, their interaction frequency with the CVE system, and logs audio data for offline analysis of their social interaction in the form of conversation.

A preliminary user study was conducted with two pairs of older adults. The sample used for the current study is obviously too small to draw any conclusion on the ability of the system to benefit older adults. However, the current results provide insights on the usability and older adults’ acceptance of the motion-based UI and the CVE task. The results indicate the difficulty of depth perception and control in virtual environment for older adults. Older adults enjoyed the collaborative virtual game and some indicated their preference to play with another human than with the robot. The results also demonstrate the ability of the CVE system to collect quantitative data needed to assess older adults’ performance, interaction frequency, and social communication. Audio data analysis and participants’ post-test performance further show promising results that participants were capable of collaboration without knowing the rules and they were talking with each other during game playing.

In the future, we intend to conduct more experiments and collect other modalities of data such as gaze in order to systematically evaluate HMI and HHI. We are also interested in the content of the conversation. The content could help guide the design of better collaborative components as well as system feedback. In addition, we plan to design 2D task instead of 3D task to remove the depth in the collaborative game. The new task will have different difficulty levels to accommodate older adults with different cognition level.