1 Introduction

As smartphones equipped with various sensors are increasingly prevalent, activity recognition is becoming one of the emerging mobile applications in the area of ubiquitous computing [1]. Successful researches have so far focused on many real-life, human-centric tasks such as: healthcare [3], harmful habit [4], socialization [5] and so on. However, few of the existing works has ever explored the possibility of using smartphone to detect or even prevent potential crime. According to the 2013 crime statistics from NIBRS released by FBI [6], there are 1,120,614 incidents of crimes against persons with 1,289,799 victims. Thus we believe that crime detection and prevention using personal mobile devices is another very important research issue that needs more attention from researchers.

Despite the deficiency of researches about personal security against crime offences in the area of ubiquitous computing, there have been a plenty of commercial mobile applications on smartphone for citizen emergency management. For example, EmergenSee [7] could transmit live video, audio, location, etc. of user’s incident directly to pre-selected emergency contacts for help by just one tap. iGoSafely [8] is another personal security alarm and emergency contact notification tool which can be activated by simply plugging in headphones. However, in most circumstances, people encountering violence may not have chance to take out their phone and open the app. To this end, a passive and real-time crime detection system is highly desired.

In this paper, for the first time, we propose a non-intrusive and real-time method for detecting ongoing aggravated assault using smart phones. The proposed solution leverages the accelerometer sensor equipped on most off-the-shelf smartphones to capture actions and movements of human body under aggravated assaults, which can accurately identify whether user is being assaulted in real time. The main contributions of this work are as follows:

  1. (1)

    To distinguish aggravated assaults and ADLs, we extracted several features from raw accelerometer readings, which are demonstrated to be discriminative by our evaluation.

  2. (2)

    Considering the difficulty of collecting training data of aggravated assaults from users and the imbalance distribution of classes, we designed a combinatorial classification scheme, based on which a probabilistic threshold method is used to make the final judgement about user’s status.

  3. (3)

    To evaluate the performance of our proposed system, we collected sensor data of actions under aggravated assaults by imitating the process of real instances in the surveillance videos. Results of the experiments showed that our proposed system could correctly detect most instances while keeps little false alarm rate and short delay.

2 Background

To acquire knowledge of aggravated assault in aspects of recognizable characteristics, for the first time, we collect and analyze 100 surveillance videos involving aggravated assault. For each video, we extract the duration of the instance and the performed actions of victims and assailants respectively which belongs to a vocabulary we defined to depict the possible actions in the instance of aggravated assault. The vocabulary contains 20 verbs or phrases, and the statistics are presented in Fig. 1.

Fig. 1.
figure 1

Statistics of 100 surveillance videos

Although these verbs and phrases covered most actions, it is extremely difficult if not impossible to define aggravated assaults using a simple and common rule. We argued that fine-grained recognition of assaults is particularly hard because: (1) actions of real assault are fast and each action may last for a very short duration leading to the difficulty to find the boundary between actions; (2) the accelerometer reading at every moment is influenced concurrently by initiative actions of victim to fight or flee, force from assailant and interference to actions from ambience, which blur the signal unidirectionally and make it impossible to recur the actions reversely. Thus we decide to first extract some coarse-grained and general characteristics of aggravated assault by analyzing our surveillance videos. The characteristics are as follow:

  • Fierceness: It is obvious that actions of assailant in real situation are always extremely powerful and destructive, and as a result, the victim has to act intensively to fight or flee. Thus, the first characteristic of actions under aggravated assaults is fierce. Of course, there are fierce actions in ADLs as well, such as running, jumping, thus we extracted next characteristic to distinguish such fierce ADLs.

  • Irregularity: The fierce actions in ADLs are mostly come from sports, such as running and jumping, the shared characteristic of which is that these actions are repetitive. However actions under assaulting are always not repetitive, because the real situation is extremely complex and disordered, where victim act to fight or flee while assailant act to maximize his force on victim, none of them could act in a totally initiative manner.

The base intuitions of our solution are two folds: Firstly, ADLs are not as fierce and irregular simultaneously as actions under aggravated assault. Secondly, although there is diversity among actions of different instances of aggravated assault, they are consistent in aspects of fierceness and irregularity.

3 Data Collection

In order to collect original timestamped accelerometer data for analysis and experiments, we developed an application on Android platform which could keep running in the background. The data of ADLs are collected from volunteers who are asked to install and start our application on their smartphones, and then do whatever they are going to do like nothing have been happened. Since we did not find any existing dataset contains sensor records of smartphones from victims. We decide to obtain these data by imitating the process of real assault instances in our collected surveillance videos. To make our collected data more realistic, we ask volunteers who play the role of assailant to free all inhibitions when they perform the actions of assault, and ask volunteer who play the role of victim to response as really as possible. Of course, both the assailants and victims weared amors of full body to protect them from injury.

4 System Design

In this section, we first describe our segmentation method, and then present the features used to represent segments. After that, we introduce design of the combinatorial classification scheme. Then we introduce the procedure of assault detection.

4.1 Segmentation and Feature Extraction

Given the streaming accelerometer data, interval-based sliding window is used to generate segments with the same time span, which are the inputs of the following procedures. A sliding window is initialized once the system starts and filled by the real-time data until time span of current window reaches the preset interval threshold, then the data of current window will be output as a segment.

In order to distinguish actions under aggravated assaulting with actions of ADLs, for each segment, we extracted a feature vector which contains several features to measure the fierceness and irregularity. All the features are described as follows:

  • Intensity Features. These features measure the energy level in a segment to reflect the intensity of actions: range and mean of accelerometer values in the segment, zero-crossing rate of the waveform taking the mean as the zero-crossing level, peak amplitude and spectral energy(except DC) in the frequency spectrum of the segment after performing DFT.

  • Irregularity Features. Irregularity is another discriminative characteristic to distinguish actions under assaults with ADLs, thus we further compute autocorrelation, Lemple-Ziv complexity and sample entropy to extract regularity of a segment.

4.2 Segment Recognition

In this subsection, the design detail of our segment recognition procedure is introduced. The method we used to recognize is supervised classification, which is to construct classifiers by learning the training data and distinguish new coming data by the learned model.

Classification Scheme: To provide general knowledge of difference between ADLs and assaults for better recognition, we proposed public ADLs and assaults samples database, which contains data collected from users performing ADLs and victims who are assaulted. Since there is no such database currently, a primary mechanism to initialize and update such a database is introduced in the discussion section. Given the database, a general binary classifier could be learned with an objective to maximize the separation between objects of ADLs and assaults.

The classification scheme we used is a combinator of the individual one-class classifier and the general binary classifier. The former describes the individuality of a user’s ADLs, while the later describes the universality of difference between ADLs and assaults to most people. Figure 2 present an example to illustrate the difference of the combinatorial classification scheme with other two schemes. Our proposed combinatorial classifier works as follows:

Fig. 2.
figure 2

Example of the combinatorial classification scheme (only two features are used for visualization)

  • In the training phase, we first use SVDD(support vector data description)[9] to learn an individual one-class classifier \(C_1\) using datasets collected from users. Then the general binary classifier \(C_2\) is learned from public ADLs and assaults samples database using SVM. After that, the learned classifiers will be calibrated respectively using Platt scaling manner to enable probabilistic output.

  • In the using phase, the feature vector \(F_t\) of current segment \(S_t\) will firstly be processed by the two calibrated classifiers respectively, each of which will generate an output containing a binary classification label \(L_{i,t}\)(0 for “ADLs” and 1 for “assaults”) and a prediction probability \(P_{i,t}\)(range from 0 to 1). The final output of our combinatorial classification scheme is taken from the two outputs using the max-probability manner, i.e. the one with higher probability will be output as the final recognition result.

Calibration: We have mentioned above that probabilistic output are required to estimate the credibility of classification result, however all one-class classifiers and most binary classifiers could not output probability. The process to transform the outputs of a classifier into a probability distribution over classes is called calibration, and the technology we used to calibrate our classifiers is Platt scaling [10], which works by fitting a sigmoid to classifier’s scores. In SVDD, the unthreshold classification score is the distance from an object to the center, while in SVM is the signed distance from an object to the decision hyperplane.

4.3 Assault Detection

Once the recognition result of current segment is generated, the assault detection procedure will than decide whether to trigger the alarm according to the classification labels and prediction probabilities of most recent several segments including the current one. Let \(\{(L_{t-k+1},P_{t-k+1}),(L_{t-k+2},P_{t-k+2}),...,(L_t,P_t)\}\) be a sequence of the most recent k recognition results at time t. Then we defined a detection confidence at time t, noted as \(A_t\), which could be calculated as

$$\begin{aligned} A_t&=sgn(\Vert {T}\Vert )-\prod _{j \in T}{(1-P_j)}, \end{aligned}$$

where

$$\begin{aligned}&T=\{j|L_j=1,{t-k+1}\le {j}\le {t}\} \end{aligned}$$

The formulation indicates the probability that at least one of the recognition results with the label of “assaults” in the most recent k segments is correct. The bigger the detection confidence, the stronger the likelihood that the user is being assaulted. An detection confidence threshold \(A_\theta \) is used to decide whether alarm or not. Only when the realtime detection confidence \(A_t\) exceeds \(A_\theta \), is an alarm of being assaulted will generated to activate the related emergency applications.

5 Evaluation

In this section, we present the experiment results for the evaluation of iProctect. Firstly, we introduce the data sets, metrics, and methodology of the evaluation. Then, we study the impact of classifier model complexity on the performance of segments recognition. After that, we evaluate the impact of probability threshold and time-related parameters on the performance of detection and delay.

5.1 Datasets, Metrics and Methodology

We selected 100 surveillance videos of aggravated assaults to be the templets of our imitation, process of all these videos are recurred by 10 victim volunteers. Moreover, the data of ADLs of the 10 victim volunteers are also collected during their daily livings for 7 days. Since data of ADLs are easy to be obtained, there are another 50 regular volunteers, who are not involved in our assaults imitation, but contribute by providing samples of their ADLs which can be used to evaluate the false alarm rate.

We regard assaults as the positive class, and ADLs as negative class. Thus, false positive in the segment recognition level means that segments of ADLs are wrongly labeled as assaults, while in the instance detection level it means that false alarm is generated when the user is performing ADLs. On the other hand, false negative in the segment level means that segments of real assaults are wrongly labeled as ADls, and in the instance level it means that an instance of assault is not detected. The results of our experiment are reported in terms of false positive rates (FPR) and false negative rates (FNR) [11]. Note that, the FPR in the instance level we used is obtained by averaging the number of false alarms per day, thus, the unit of which is “times per day” rather than percentage used by FPR, FNR in segment level and FNR in instance level.

Besides, delay is also used as a performance metric in the evaluation of instance detection. The method we used to evaluate is cross validation, which works by partitioning the dataset into training and validation subsets, and averaging performance metrics over different partition cases.

Fig. 3.
figure 3

Impact of model complexity to the performance of segments recognition

5.2 Impact of Model Complexity

Since our classification scheme is a combinator of SVDD and SVM, it is important to evaluate how the parameter choice of the two classifiers will influence the performance. In Fig. 3, we present our experiment results of parameters tuning. The parameter of SVDD we studied is FRACREJ, denoted as F for short, which is a parameter in the implementation by Duwin to tell the learning algorithm how many target objects in the training dataset could be rejected. And the parameter we studied in SVM is C-parameter, which is the penalty factor and could affect the model complexity. From the result, we can find that: (1) F-parameter of SVDD is the key factor of FPR; (2) when FPR takes the minimum value, FNR is accordingly raised to the maximum level, and vice versa; (3) FPR and FNR could be balanced when the F-parameter is around 0.2 and C-parameter is larger than \(10^{-2}\) where both of the two metrics are below 5 %.

5.3 Impact of Probability Threshold

To evaluate the relationship between performance of assault detection and probability threshold, we calculate FPR, FNR, and delay at different probability threshold that ranges from 0.8 to 0.999 with the increment value of 0.001, while other parameters are set to default values. The results are plotted in Fig. 4, where we can observed that with the increase of the probability threshold, the FPR decreased from 2 times per day to nearly 0 times per day, and meanwhile the FNR increased from 1 % to above 10 %, which reflects that smaller probability threshold is apt to produce more false alarms, while bigger probability threshold is apt to miss more real assaults. On the other hand, delay is also apparently influenced by the probability threshold, and the the tendency of the delay along with the increase of probability threshold is in the same manner with the FNR, i.e. higher threshold leads to longer delay.

Fig. 4.
figure 4

Impact of probability threshold to detection performance

Fig. 5.
figure 5

Impact of time-related parameters to detection performance

5.4 Impact of Time-Related Parameters

There are two time-related parameters in our system: sliding window length and the number of most recent segments. We evaluate the impact of these two parameters to the performance of assaults detection by fixing other parameters and changing these two parameters respectively. The overall results are presented in Fig. 5, where we can observe that: (1) the FPR could be decreased by shortening the sliding window length which meanwhile leads to the increase of the FNR; (2) the impact of the number of most recent segments to the FPR is not that obvious except when the sliding window length is set to 1 seconds; (3) it is surprising that when the sliding window length and the number of most recent segments are both set to the minimum, the delay reaches to the maximum, which could be explained that the time required to make a credible decision is larger when the information could be used is too little.

6 Related Work

There have been significant amount of prior works on activity recognition, which can be firstly classified according to the human activity domain of interest, most of previous works targeted on recognition of some specific activities, such as activities of daily living [12, 13], healthcare [4], and so on. Besides, there are also works to recognize unseen or unspecific activities [14, 15] by incorporating human knowledge about the similarity between different activities. These approaches are best fit to the activities with clear and simple pattern, and it cannot be applied to the problem of this paper due to the extremely diversity and complexity of actions under assaults.

In terms of the learning method, most of previous work [4, 5, 13] used supervised classification [21] approaches which require labeled data for the classifier learning. To lessen the reliance on labeled training data, some work [12, 16, 17] used semi-supervised learning or un-supervised learning [20]. In addition, active learning is another emerging approach for activity recognition [16, 18], which is characterized by actively requesting labels from users when possible and thus avoid the boring training phase before using. The learning method we used is different with previous works due to the particularity of assaults, which is impossible to require user to provide training data of assaults. For this situation, anomaly detection [19] is a natural choice, which is always implemented as one-class classification. However, although one-class classification has the advantage that only data of target class is required for the learning, the performance is extremely susceptible to comprehensiveness of the training data. Based on this consideration, we proposed our combinatorial classification scheme.

7 Conclusion

In this paper, we present iProtect, a system using accelerometer equiped in the smartphones to detect aggravated physical assault on users. We analyze the characteristics of actions under assaulting, and propose a practical detection system which process the raw streaming accelerometer data and output the judgement of whether to trigger alarm. We conduct experiments to demonstrate the practicality of our system and evaluate impact of different parameters to the performance. And we will improve our detection system by incorporating acoustic recognition in the future work.