Keywords

1 Introduction

Low back pain is a leading cause disabling people particularly affecting the elderly, whose proportion in European societies keeps rising, incurring growing concern about healthcare. 50 to 80% of the world population suffers at a given moment from back pain which makes it in the lead in terms of health problems occurrence frequency [1]. To tackle this chronic low back pain, regular physical rehabilitation exercises is considered most effective [10].

With this perspective, solutions are being developed based on assistive technology and particularly robotics [5, 6, 9] where humanoid robots are used for demonstrating rehabilitation exercises to patients. These robots have previously learned these exercises from physiotherapist. However, due to different morphologies between humans and robots, and possible physical limitations of patients, human motion may be difficult to understand by a robot. In this work, we address these issues by training a common low dimensional latent space shared between the therapist, the robot coach and patients, as illustrated in Fig. 1 (left). This model allows us to learn an ideal rehabilitation exercise from physiotherapist demonstrations which can be difficult using human data. Moreover, this ideal motion representation is easily interpreted by the robot coach to make it reproduce the correct exercise to the patient. Finally, this model is also employed to adapt the robot’s understanding and analysis to the possible physical limitations of patients attending the rehabilitation session.

Fig. 1.
figure 1

(Left) Overview of approach. (Right) Schema of different GP-LVM

2 Related Work

In the literature, the challenges of robot imitation and motion assessment by robot coaches are usually addressed separately.

In the context of robot imitation, several vision-based approaches have been proposed. Riley et al. [16] proposed an approach for real-time control of a humanoid by imitation. The imitation is using a stereo vision system to record human trajectories by exploiting color markers on the demonstrators attached to the upper body by inverse kinematics. The authors apply IK to estimate the human’s joint angles and then map it to the robot. Dariush et al. [4] presented an online task space control theoretic retargeting formulation to generate robot joint motions that adhere to the robot’s joint limit constraints, joint velocity constraints and self-collision constraints. The inputs to the proposed method include low dimensional normalized human motion descriptors, detected and tracked using a vision based key-point detection and tracking algorithm. Koenemann et al. [11] presented a system that enables humanoid robots to imitate complex whole-body motions of humans in real time. The system uses a compact human model and considers the positions of the end effectors as well as the center of mass as the most important aspects to imitate. Stanton et al. [19] used machine learning to train neural networks to map sensor data to joint space. However, these two last approaches employ human motion capture system instead of vision features to capture the human motion. this makes the system not suitable for real-word scenario like physical rehabilitation.

Only few approaches addressed the challenge of physical rehabilitation through coaching robot systems. While several studies showed the potential of virtual agents [2, 20] and physical robots [3] to enhance engagement and learning in health, physical activity or social contexts, Fasola et al. [7] showed better assessment by the elderly subjects of the physical robot coach compared to virtual systems. Robots for coaching physical exercises have been recently presented [8, 15, 17]. These approaches employed robots with few degrees of freedom that facilitates the imitation process. However, such robots do not allow realistic movements. Moreover, Obo et al. [15] did not provide any feedback or active guidance to the patient.

In this paper, we employ a humanoid robot with many degrees of freedom called Poppy [12] and capture human motion using a kinect sensor with a skeleton tracking algorithm from depth images. We propose a method to simultaneously consider the challenge of robot imitation and human motion assessment in a physical rehabilitation context.

3 Proposed Approach

3.1 Shared Gaussian Process Latent Variable Model

Our goal is to learn a latent space where we can represent and compare both human and robot poses. Human upper body poses are characterized by skeletons captured with a kinect sensor providing the 3D position \(p_j\) of a set of \(J=12\) joints. A human pose \(y\in \mathcal {H}\) is thus defined as \(y = [p_1 \ p_2 \dots p_J]\), where \(\mathcal {H}\) denotes the human space. Robot poses are characterized as the motor angles \(a_m\) of the Poppy robot including \(M=13\) motors. Hence, a robot pose \(z \in \mathcal {R}\) is defined as \(z = [a_1 \ a_2 \dots a_M]\), where \(\mathcal {R}\) denotes the robot space. To learn such a shared space, we employ the shared Gaussian Process Latent Variable Model [18].

GP-LVM [13] (See Fig. 1(right)) is a probabilistic model mapping high dimensional observed data from a low dimensional latent space using a Gaussian process, with zero mean and covariance function characterized by a kernel K: \(f(x) \sim \mathcal {GP}(0,k(x,x'))\). For the kernel K, we adopt the popular Radial Basis Function. The shared GP-LVM is an extension of GP-LVM for multiple data space that shares a common latent space. In our work, we have two observation spaces, the human space \(\mathcal {H}\) and the robot space \(\mathcal {R}\). Given a training set of N human poses \(Y = \{y_n\}_{n=1}^N \in \mathcal {H}\) and corresponding robot poses \(Z = \{z_n\}_{n=1}^N \in \mathcal {R}\), two mapping functions from the latent space X to observed spaces are defined:

$$\begin{aligned} f|X \sim \mathcal {GP}(0,K_Y(X,X')) \;\;and\;\; f|Z \sim \mathcal {GP}(0,K_Z(X,X')) \end{aligned}$$
(1)

where \(K_Y\) and \(K_Z\) are RBF kernel matrices with hyperparameters \(\varPhi _Y\) and \(\varPhi _Z\). In shared GPLVM, optimal latent locations \(X^{*}\) are unknown and need to be learned as well as hyperparameters of mappings \(\varPhi _Y^{*}\) and \(\varPhi _Z^{*}\). This is done by optimizing the joint marginal likelihood \(p(Y,Z|X,\varPhi _Y,\varPhi _Z) = p(Y|X,\varPhi _Y) \ p(Z|X,\varPhi _Z)\). We are interesting in mapping data from the human space to robot space through the latent space. Hence, an inverse mapping from the human space to the latent space is required. For that purpose, back constraints are introduced [14]. This feature allows to define latent locations with respect to observed data, \(X = h(Y;W)\), where h is an RBF function parameterized by weights W. These weights are learned during optimization process instead of latent locations:

$$\begin{aligned} \{W^{*}, \varPhi _Y^{*}, \varPhi _Z^{*}\} = \underset{W,\varPhi _Z,\varPhi _Z}{\arg \max } \ p(Y,Z|W,\varPhi _Y,\varPhi _Z) \end{aligned}$$
(2)

As body parts can move concurrently and independently, we consider different shared latent space for each body part separately. Therefore, our approach can also be extended to cases also using lower body parts, by just adding latent spaces for the left and right legs. We use three 2D latent space for the two arms and the spine (Fig.  2).

Fig. 2.
figure 2

(Left) Three rehabilitation exercises represented in the 2D latent space of the left arm. (Right) Corresponding human and robot poses of locations A, B, C and D.

3.2 Gaussian Mixture Model on the Latent Space

Once we trained a shared latent space, we can propose to learn a Gaussian Mixture Model on this low dimensional space. This allows to learn an ideal movement from therapist demonstrations projected on the shared space. It can then be employed for robot imitation by projecting back the ideal movement in the robot space. From N therapist demonstrations \(Y^n = [y_1 \ y_2 \dots \ y_T]\), the Gaussian Mixture Model on the latent space is defined as \(p(x) = \sum _{k=1}^{K} \phi _k \mathcal {N}(x | \mu _k, \varSigma _k)\), where x encodes the human pose \(y_t\) projected on the shared latent space. K is the number of Gaussians, \(\phi _k\) is the weight of the k-th Gaussian, \(\mu _k\) and \(\varSigma _k\) are the mean and covariance matrix of the k-th Gaussian. The parameters \(\phi _k\), \(\mu _k\) and \(\varSigma _k\) are learned using Expectation-Maximization. Once a model is learned for each exercise, we generate an optimal sequence using Gaussian Mixture Regression (GMR) which approximates the sequence using a single Gaussian: \(p(\hat{x}|t) \approx \mathcal {N}(\hat{\mu }, \hat{\varSigma })\). This optimal sequence is then projected to the robot space to make the robot imitates the expert and demonstrates the exercise to the patient.

3.3 Transferring Knowledge from Therapist to Patient

In our rehabilitation scenario, the robot coach needs to evaluate the patient’s movement captured using a kinect sensor similarly to therapist’s movement. However, patients needing rehabilitation are often constrained by physical limitations or pain while performing exercises. It may result an incorrect performance even if they did their best to perform the correct exercise. A robust and effective robot coach system must consider such features. We propose to extend the learn shared GP-LVM (see Fig. 1 (right)) by considering two distinct human pose spaces \(\mathcal {H}_T\) and \(\mathcal {H}_P\) for the therapist and the patient, respectively. \(\mathcal {H}_T\) is equivalent to \(\mathcal {H}\) described above. \(\mathcal {H}_P\) differs from \(\mathcal {H}_T\) in the inverse mapping function to the latent space. Specifically, a therapist pose \(y_T \in \mathcal {H}_T\) and the corresponding patient pose with physical limitations \(y_P \in \mathcal {H}_P\) must be represented by the same point x in the latent space. For that, the weight matrix \(W_P\) of the inverse mapping is updated according to the patient. Let \(Y_p\) be a patient’s performance of an exercise and \(X^{*}\) the corresponding ideal demonstration of the same exercise projected on the latent space. The optimization becomes:

$$\begin{aligned} \{W_P^{*}\} = \underset{W_P}{\arg \max } \ p(Y_p|X^{*},\varPhi _Y) \end{aligned}$$
(3)

The patient specific weight matrix is optimized using gradient descent algorithm. Figure 3 shows a patients’ sequence in the latent space before (red) and after the update (green) in comparison to the ideal therapists’ sequence (blue).

Fig. 3.
figure 3

(Left) A wrong exercise in the latent space before (red) and after (green) the model updating. (Right) Corresponding human and robot poses of points A, B, and C. (Color figure online)

4 Experimental Results

We evaluate our method on the three rehabilitation exercises selected in cooperation with physiotherapists and performed by two subjects three timesFootnote 1 playing the role of the physiotherapist and the patient, respectively. In addition, subjects performs incorrect exercises by simulating errorsFootnote 2. For the first exercise, the arms are not enough raised. For the second exercise, the subject does not tilt the arm and keep it straight. In the third exercise, the arms are not enough raised.

For robot movements, we build ideal robot movements with the cooperation of a physiotherapist manipulating the robot in order to perform the desired rehabilitation movement while we record angle positions along the motion. We record one ideal movement per exercise. In addition simulated movements with errors described above are also recorded. These robot movements are used during training of the shared GP-LVM as well as ground truth during evaluation.

4.1 Imitation Evaluation

We first evaluate the ability of the approach to perform robot imitation. As described in Sect. 3.2, an ideal motion is generated using GMR on the latent space and the GMM model learned from expert demonstrations. This ideal motion is then transferred back to the robot space and compare to the ground truth. We compute the average RMSE error of motor angles between sampled sequence and ground truth. Moreover, we also normalized the RMSE by the standard deviation of motor angles for each exercise to compare the RMSE with the robot’s motion. Results are reported for each exercise in Table 1.

Table 1. Robot imitation results.

We can see that we obtain a mean RMSE of 6.7\(^\circ \) corresponding to \(4.1\%\) of the total range of Poppy motor angles. In addition, we obtain a normalized RMSE of 0.28 showing that the RMSE error is much lower than the standard deviation of rehabilitation movements, which represents the noise and the variations in the exercise. This validates the proposed model to imitate therapist demonstration with a high similarity accuracy so as to be clearly understood by the patient.

4.2 Therapist-Patient Transfer Evaluation

We then evaluate the ability of our model to transfer knowledge between a therapist and a patient with physical limitations. We first project the error sequence in the shared latent space. Then we project back the sequence to the robot space before and after applying weight updating as described in Sect. 3.3. To show the robustness of the approach, we sample ten random sequences from the latent-robot Gaussian process mapping and compute RMSE error in comparison with ground truth. Average RMSE and standard deviation among the ten sampled sequences are computed. For comparison we also compute such RMSE values for correct sequences of the patient. Results are reported in Table 2.

Table 2. Therapist-Patient transfer results.

We can first observe that, as expected, RMSE errors are much higher for incorrect exercises than for correct exercises. However, if we consider that these errors are due to physical limitations of the patient and apply our updating method, we can see that the RMSE errors becomes close to correct exercises. This means that the robot understands the incorrect exercises similarly to correct exercises. In addition, we propose to deepen the analysis of the third exercise by similarly evaluating a different kind of error (arms are not enough outstretched) with the previously trained model. We obtain RMSE values of \(13.4\pm 0.89\) and \(14.4\pm 1.08\) before and after the update, respectively. The similar RMSE values show that by updating the model for one kind of error, it does not affect other type of errors as required in our rehabilitation scenario.

5 Conclusions

We have proposed a method based on Gaussian Process Latent variable Model for a robot coach system in physical rehabilitation. The method allows to learn a shared space between the therapist and the robot to facilitate robot learning and imitation. The model is then extended to consider variations of patients physical limitations. This allows the robot to understand and assess the patient independently of his physical limitation. Experimental evaluation demonstrates the efficiency of our approach for both robot imitation and model adaptation.

In the future, we plan to extend our experimental evaluation with more data acquired in real-world environment. Moreover, we would like to investigate the use of key poses instead of full motion sequences during the model training. It would be suitable for a real-world rehabilitation scenario.