Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Multi-factor authentication schemes are adopting behavioral biometrics (or behaviometrics) [3] to continuously verify in the background the identity of users by leveraging information about the user’s device [21, 22], context or the user’s behavior [5, 11] within that context. These trends are often referred to as Active Authentication, also known as Context-aware [9], Continuous [19] or Implicit [20] Authentication. The key challenges that these authentication schemes aim to address are (1) the ability to conveniently and reliably authenticate the identity of a user, and (2) to continuously assess the confidence in the user’s identity.

One well-known behaviometric is gait recognition [10, 17] using accelerometer data to analyze motion patterns. While this technique is hardly new, several challenges from a practical feasibility and security point of view remain: (1) there has been little research that investigates the practical resilience of such schemes against sensor displacement, (2) reported high recognition rates were only achieved in a controlled setup where the test subjects are known to walk, making it difficult to ascertain the accuracy − or even the misauthentication resistance − under other conditions and motion activities, and (3) the feasibility and effectiveness of zero and non-zero effort attacks against gait analysis.

To the best of our knowledge, we are the first to evaluate the effectiveness of using multiple accelerometers to collectively further improve this type of authentication scheme. Additionally, in this work, we enhance the resilience against the above threats with the following contributions:

  • We investigate the effectiveness of accelerometers on 9 different places on the body, and analyze the impact of different human activities on the EER.

  • We research whether multiple accelerometers can enhance misauthentication resistance, report on the use of different machine learning algorithms, and discuss which combination of on-body positions is the most effective.

  • We evaluate a solution that relies on a common set of features, rather than a unique set for each type of activity, to improve classification robustness under diverse circumstances and motion activities.

  • We evaluate our authentication scheme against zero-effort and non-zero effort attacks, and compare the results against single accelerometer schemes.

We evaluate our research on the public REALDISP benchmark datasetFootnote 1 that was previously collected to evaluate sensor displacement in activity recognition [2, 4]. Based on a study with data from 12 individuals, our results show a recognition improvement reducing the equal error rate (EER) from 5% down to 3%, with an increased resilience against observation and spoofing attacks.

The remainder of this paper is structured as follows. In Sect. 2, we discuss related work on accelerometer based gait authentication. Section 3 presents our multi-sensor approach. In Sect. 4, we describe the experiments and results. We conclude in Sect. 5 summarizing our main insights and discussing future work.

2 Related Work

This section reviews relevant research on gait authentication schemes, summarizing the EER and recognition accuracy results in Table 1.

Table 1. Comparisng the EER and recognition rate of gait authentication schemes

Mantyjarvi et al. [13] investigated the feasibility of using gait signals for identification using correlation, frequency domain and distribution statistics. For 36 subjects wearing the accelerometer on 2 different days, correlation proved to be the best method, obtaining a 7% EER and a 88% recognition rate (RR). Similar work by Gafurov et al. [7] compared absolute distance, correlation, histogram, and higher order moments to evaluate performance of the system both in authentication and identification modes. Their analysis on 50 subjects showed that the distance metric had the best performance with an EER of 7.3%, and a recognition rate of 86.3%. Annadhorai et al. [1] identified subjects from gait cycles using k-Nearest Neighbor classification. Features were extracted for each gait cycle from accelerometer (3D), pitch and roll data. A subject was identified with an accuracy of 84%. However, these results were obtained on a relatively small data set, with only 2 walks from 4 different subjects. Derawi et al. [6] tested the feasibility of gait as a behaviometric by using the accelerometer in a smartphone. During an enrollment phase the average gait cycle is determined. Two gait cycles are compared using Dynamic Time Wrapping (DTW). An EER of 20% was achieved on a dataset containing 51 subjects, with two walks per subject. Contrary to previous works, Nickel et al. [16] did not rely on extracting gait cycles to calculate feature vectors, but used Hidden Markov Models to classify gait patterns of 48 subjects. They reported a False Reject Rate (FRR) of 10.42% at a False Acceptance Rate (FAR) of 10.29% (or an EER of \(\approx \)10.3%). A large scale experiment was conducted by Ngo et al. [15] with 744 subjects between 2 to 78 years old, walking under different ground slope conditions. They verified four different gait based authentication methods. The authors conclude that the maturity of the subject’s walking ability and the slope greatly influence the performance of gait based user authentication. Lu et al. [12] describe a gait verification system based on Gaussian Mixture Model - Universal Background Model (GMM-UBM) framework. The design objective was to adapt the gait model for mobile phones such that it can account for different body placements and over time variance in the user’s gait pattern. The UBM was trained using data from 47 different subjects, the user gait model was tested for 12 subjects. The reported EER was 14%.

3 Challenges with Gait Authentication Schemes

This section identifies challenges and the gap that we aim to bridge when using accelerometer based gait recognition as a behaviometric in real life scenarios.

3.1 Different Body Positions and Sensor Displacement

Most people own at least one mobile device, with different types of sensors. In the future even more sensors will be attached to our body, in the form of smart watches, activity trackers, smart shoes or even smart clothes. Therefore, there is an opportunity to research what positions on the body are the most characterizing and effective for authentication purposes.

However, most of these devices are not fixed at all times to a certain place on our body. They do have an area where they are normally located, but their exact placement varies from time to time, e.g. changing your smartphone from your left to your right pocket, or wearing pants with entirely different pockets. These subtle sensor displacements in the real world, will have an impact on the classification accuracy, and hence the effectiveness of accelerometer based gait authentication schemes.

3.2 Misauthentication Resistance Under Different Motion Activities

Walking is not the only predominant activity in human life. We are sitting, running, cycling, climbing stairs, etc. as well. A behaviometric should be able to deal with different types of activities. The related work showed that most techniques (1) assume that people are walking and do not consider other activities; (2) explicitly exploit gait cycles to extract features: their first step is always to discover the gait cycle and extract it from the data sample. While the first assumption is reasonable for completely different activities (Wilson et al. [24] achieved an activity classification accuracy of 95%), this is not valid for the latter. It is useless to extract gait cycles for sitting, and not straightforward to find patterns similar to gait cycles for activities like rowing, going to the gym or cycling. Moreover, the related work seemed to struggle when the walking conditions changed slightly (i.e. changing the walking speed, the type of shoes used, the amount of weight being carried, the type of surface and the inclination of the ground). While it might be possible to classify whether the wearer of the accelerometer is running or walking, and maintaining different models for both cases, it certainly is not practical to repeat this for a range of different speeds.

We therefore investigate the feasibility of a common feature set − rather than special features fitted to every particular activity − and the added value of using multiple accelerometers for behaviometric-based authentication. We will use data where the activity is known beforehand. This is reasonable because of high accuracies achieved for activity recognition in other work [24]. Based on our previous research in the field of activity recognition [18], our hypothesis is that we can obtain even higher accuracies for our use case and setting, because our solution does not rely on a fine-grained distinction between activities, as discussed in Sect. 4.4.

3.3 Security Threats and Attacker Model

To evaluate the effectiveness of the proposed scheme, we consider the impact of two different types of attacks:

  • Zero-effort attack: the adversary is simply another subject in the database that acts as a casual impostor

  • Non-zero effort attack: the adversary actively masquerades as someone else by mimicking and spoofing the gait pattern of the claimed identity

In the zero-effort attack, we use the data of the other subjects as negative examples for a given user to get insights into the probability of misauthentication.

A non-zero effort attack would occur when the attacker tries to obtain activity patterns of the subject (i.e. observation and spoofing). The attacker attempts to act like the subject by walking at the same pace or mimicking the characteristic activity, as investigated in [8, 14], or he can try to sneak an accelerometer into the coat of the subject. To make these attacks harder to perform, we combine multiple sensors on different places on the body. This way, we collect more data to learn a subject’s movement patterns, with an opportunity to further decrease the EER.

4 Evaluation

This section reports on the experiments conducted to test the concerns expressed in Sect. 3. We use the public REALDISP benchmark dataset [2, 4] to enable the reproducibility of our research results. It contains 17 subjects, all performing 33 different actions, among which: walking, jogging, running, cycling, rowing, etc. All subjects wore 9 sensors on different positions. They performed the set of exercises twice: once with the sensors adjusted carefully by the makers of the dataset; and once adjusting the sensors themselves. The data collected consists of 3D accelerometer, 3D gyroscope, 3D magnetic field measurements and an estimation of the orientation using quaternions. The sampling rate is 50 Hz.

4.1 Activity-Agnostic Behaviometrics

We do not make any assumptions on a particular motion pattern (e.g. presence of gait cycles) so that our behaviometrics can be used for different types of activities.

The REALDISP dataset contains 33 different activities. For each of them we extracted features using the same approach: the data was split in intervals of 128 samples (which is \(\approx \)2.5 s). For each interval we calculated some straightforward features, in both the time and frequency domain. Among them: mean, standard deviation, kurtosis, mean average derivation, energy in the signal, average resultant vector. This led to a feature vector of length 224. Only the activities walking, jogging, running and cycling had a meaningful amount of samples (\(\approx \)23) per subject. Only 12 subjects appeared to have walking, running and jogging data, of them only 9 cycled. Each subject had performed all actions twice, once with self sensor placement and once with ideal sensor placement.

With authentication in mind, we trained a model for each subject and each activity. We constructed a set which consists for 50% of samples belonging to the subject and for 50% of samples from other subjects. The samples belonging to the other subjects were sampled equally among the total distribution w.r.t. subjects. For each subject adjacent samples w.r.t. time were taken. This set was split in a training and test set using n-fold cross-validation. This process splits the set in n temporal adjacent chunks, in a stratified manner, thus taking into account to what user the samples belong. A model is trained n times, each time leaving a different chunk out for testing. The others are used for training. The number of false positives (fp), false negatives (fn), true positives (tp) and true negatives (tn) are accumulated over the different iterations. This process is repeated for every subject in the dataset. We compared different classification algorithms by calculating the average EER of all body positions for the walking activity (see Table 2). Support Vector Machines produced bad results due to the small amount of training samples compared to the dimension of the feature space. Ensemble methods like AdaBoosting, Random Forests, Bagging and Gradient-Boosted Trees performed a lot better. Because of the robustness w.r.t. outliers in other machine learning experiments and its ability to handle heterogeneous features, we decided upon Gradient-Boosted Trees. We will use this model throughout the following experiments.

Table 2. Comparison of EER with different machine learning classifiers
Table 3. EERs of ideally placed sensors

4.2 Optimal Sensor Positions on the Body

The REALDISP dataset contains data from sensors placed on different positions. This allowed us to evaluate which are the most relevant ones for authentication. We used the approach described above to train a model for each subject. The FAR and FRR can be tuned by demanding a minimal certainty before accepting a sample as genuine. In a first experiment, we used the data collected during walking under an ideal sensor placement. The results were evaluated using 8-fold cross validation. 9 body positions were considered: the back (BACK), the left (LUA) and right upper arm (RUA), the left (LLA) and right lower arm (RLA), the left (LC) and right calf (RC), the left (LT) and right thigh (RT). First, we note that the results are very promising, with really low EERs: \(\approx \)8% in the worst case scenario, and reducing further down to \(\approx \)2%. Second, the lower body seems to be more relevant than the upper body.

We repeated the same experiment for the other activities: jogging, running and cycling. The EERs are shown in Table 3. The conclusion that the lower body is more informative than the upper body remains valid for the activities considered. This observation holds in all subsequent experiments. Due to the limited amount of data, we cannot make more fine-grained conclusions.

Table 4. EERs of slightly displaced sensors (training on both self-placement and ideal placement data) for each body position and different activities
Table 5. EERs when training on data from both sides of the body
Table 6. EERs training on different activities
Fig. 1.
figure 1

EERs for walking. The left bar corresponds to Table 4 and the right to Table 3.

4.3 Impact of Sensor Displacements

In real life scenarios, a sensor will never be worn on the exact same position. Therefore we investigated the effect of small sensor displacements.

In a first experiment the model was trained with walking samples where the sensors were administered by a professional, while testing with walking data when the sensors were self placed, and vice versa. The results plummeted, with best case EERs of \({\approx }45\%\), worst case up to \({\approx }50\%\). This can be explained by the lack of walks under sensor displacement in the training set.

A second experiment uses data from both (ideal and self placement) walks as training data. The best EER, for the RC sensor, is \({\approx }5\%\). The worst EER is \({\approx }10\%\).

The same experiment was repeated for the other activities as well. The results are shown in Table 4. Our earlier conclusion that the upper body seems to be less suited for authentication than the lower body still holds. On top of that, the lower arm consistently has worse EERs than the upper arm. The increase in amount of training data makes our results more consistent.

Table 7. EERs when combining multiple accelerometers

Figure 1 illustrates our conclusions. It shows the results w.r.t. EER for the walking activity of the previous experiment (left bar) and the experiment described in Sect. 4.2 corresponding with Table 3 (right bar). It can clearly be seen that in both cases the upper body is less suited for authentication then the lower body and the back. Furthermore, when data of both ideal and self sensor placement (left bar) is used, the EERs suffer a bit, when compared to using data of only ideal sensor placement.

In a third experiment the training set contained the data from the right part of the body and tests were executed with the corresponding left side of the body. 4 fold cross validation yielded EERs of approximately 45%. The clarification is probably similar to the one in the first experiment: there is not enough training data for this type of brutal sensor displacements.

In a fourth experiment the model was trained using data from both the right and left sensor. The tests were conducted using 4 fold cross validation. The results are shown in Table 5. We conclude that this type of sensor displacement has no additional measurable impact.

4.4 Impact of Other Motion Activities

Earlier we argued that having a model for every activity is infeasible. Even more, engineering optimal features for each activity under different circumstances is impossible. We conduct an experiment where we train our model using different activities. In a first experiment we use walking, jogging and running data, for self and ideal sensor placements. The results, using 4 fold cross validation, are shown in Table 6. Compared to training the model using only one activity, as shown in Table 4, the results have improved. The best EER is \({\approx }2\%\), while the worst is \({\approx }5.5\%\). We assume that using training data at different speeds improves the EER. This needs to be verified using more fine grained data w.r.t. speed.

In the second experiment we added cycling to the dataset, which does not seem to affect the results significantly (see Table 6). The EERs are lower than in the first experiment, but cycling gave better results in previous experiments as well. Furthermore, only 9 subjects were available for cycling.

4.5 Resilience Against Observation Attacks

To improve the EER and the resilience against observation and spoofing attacks, we combined the data of two accelerometers. A feature vector of length 448 (2*224) is obtained. The experiment is similar as before, using walking data for both self and ideal sensor placement. The results are shown in Table 7. As expected, combining the same sensor yields no new information. The values on the diagonal of Table 7 are similar to the results shown in Table 4. On top of that, the order in which sensors are combined does not matter, since EER\(_{i,j} \approx \) EER\(_{j,i}\). The best result is achieved by combining the sensors adjusted on the back and the left thigh, which gives an EER of \({\approx }3\%\). This is an improvement of \({\approx }2\%\) compared to using both sensors separately (see Table 4). However, if we consider for each body position, the results for left and right sensor together, the combination of a sensor placed on a calf with a sensor on the back yields the best results. Furthermore, for each sensor placement, a combination with the back sensor is among the best scoring. Combining two sensors does not always lead to an improvement in performance, i.e. combining the right upper arm and back sensor leads to an EER of \({\approx }7\%\), this is higher then the \({\approx }5\%\) EER obtained using the back sensor by itself.

For completeness we investigated what would happen if three sensors were used together. This lead to a feature vector of length 672 (3*224). We conclude that adding a third sensor leads to a minor improvement, but definitely not in all circumstances. This is illustrated in Fig. 2. The left bar shows the EER corresponding to each body position (as shown in Table 3). For each body position we add the sensor that leads to the best combined EER. The EERs for two sensors are shown by the middle bar. Then the third sensor, leading to the best EER is added, which illustrated by the right bar.

Fig. 2.
figure 2

Each bar represents the EER of at least one sensor. The left bar shows the EER when using only the first sensor from the description. The middle bar represents EER of the first two sensors. The right bar, is the EER of all three sensors.

An attacker can execute an observation attack by collecting accelerometer data through the HTML5 APIs of a mobile browser. An authentication system relying on only one sensor would now be compromised. When two sensors are used, we need to test the feasibility of misauthentication when the attacker constructs a trace, using the obtained data and his own data for the second sensor. We assume that the attacker knows the location of the second sensor.

To test the above use case we trained the system as we did before; using data from the walking activity, where the sensors are placed on the back and the left calf. Positive training samples are combinations of traces from the subject itself. Negative training samples consist of back and left calf data from other subjects. We test the system with genuine combinations of traces and with constructed combinations of traces by an attacker. The attacker combines an obtained a back trace and his own left calf accelerometer data. This leads to bad results, an EER of \({\approx }17\%\). At a FRR (false rejection rate) of \({\approx }10\%\) a FAR (false acceptance rate) of \({\approx }37\%\) is achieved. At the threshold used to achieve the result in Table 4, the FAR is \({\approx }43\% \).

In a last experiment we added some samples of the attack to the training data. This lead to an EER of \({\approx }4\%\), which is only slightly more then before (0.037). The FAR is \({\approx }1\%\) when the FRR is \({\approx }10\%\). When the old threshold is used, the FAR is \({\approx }4\%\). We conclude that the model has to be trained with the observation attack in mind in order to be resilient against it.

The above results were obtained with a public dataset to guarantee the reproducibility of our research results. However, this means there might be some validity threats to generalize our research results. To address this concern, we are collecting bigger datasets to confirm our findings.

5 Conclusion

In this work, we evaluated the resilience of the accelerometer as a behaviometric for gait authentication, and more specifically the effect on the equal error rate (EER) when using multiple sensors at different places on the body.

Our experiments on gait authentication with a single accelerometer using Gradient-Boosted Trees and a fairly elaborate feature vector showed low EERs and recognition accuracies that go beyond the state-of-the-art. For data collected from 12 subjects and on 9 different places on the body, we obtained EER values between 2 to 8% for ideally positioned sensors. However, further experiments demonstrated that the accuracy dropped significantly after subtle sensor displacements, with the EER worsening from single digits to about 45%. We obtained similar results when sensors were displaced from one side to the other side of the body. When we incorporate data from displaced sensors in our training data set, the accuracy improves again with EERs between 5 to 10%, i.e. slightly worse compared to our first experiment. We also measured the impact of other motion activities, including jogging, running and cycling, and their effect on misauthentication

We evaluated the effectiveness of multiple accelerometers for gait authentication and the impact on the classification accuracy, demonstrating further improvements in the EER of \({\approx }2\%\). Additionally, as a hacker can carry out an observation attack and obtain accelerometer data with the HTML5 APIs, we investigated the resilience of our multi-sensor scheme against spoofing attacks. Our experimental results show our scheme is more robust against such attacks under the condition that not all sensors are compromised.

As future work, we will investigate to what extent motion patterns are independent. We will execute additional attack experiments based on an adversary leveraging his own sensor data to reproduce traces of the victim in order to ascertain whether such attacks are feasible and practical. Furthermore, we will investigate whether it is feasible to modularize our multi-sensor behaviometric-based authentication scheme, not by fusing the different data sets before training but rather fusing individual decisions based on each data set [23], allowing for more flexibility to combine different behaviometrics at runtime.