1 Introduction

The Internet of Things (IoT) is the new future of the Internet. It offers individuals, communities and the societies to connect and obtain services over the Internet anytime and anywhere. Furthermore, the IoT also improves interactions between people and devices. In the context of Smart City, the integration of IoT technologies allows municipalities to manage their infrastructure efficiently and to offer interactive reporting services. As the smartphones become more affordable and it is the easiest way to connect, many town councils have utilized this tool to deal with city issues, making city life smarter. Mobile reporting services allow citizens to report issues with data captured through their smartphones [1]. There are a few applications such as Find It, Fix It (USA) [2], Improve My City (Greece) [3] and CitiAct (Malaysia) [4] that have been developed for citizens to participate and report information like faulty traffic lights, vandalism, crime, natural disasters and other city-related issues.

Aggarwal and Abdelkader [5] mentioned in their paper several challenges in mobile reporting services. This includes user privacy and trustworthiness of the reports. Sensor data in mobile devices such as the GPS collects private information of the users such as their daily routes and this can invade their privacy if it is not preserved. On the other hand, the credibility of the report is also one of the major concerns as users may intentionally send misleading information to the system. Thus, false reports such as offensive or threatening messages should be identified and filtered out before it is sent to the town councils. In this paper, we focus on the trustworthiness issue because false information affects the response team in the aspect of time, energy and other resources. This research takes the false reports as digital evidence of illegal mobile reporting.

Researchers in the past have been concentrating primarily on users’ reputation and behavior to detect false information in the reporting services [6,7,8,9,10,11,12]. The researchers have evaluated trustworthiness of a report by evaluating the user’s reputation [6]. The challenge of this technique is that a user with high reputation might also able to send false reports deliberately to sabotage the system. On the other hand, the people who uses the fault reporting system is public and the reputation or the behavior of the people cannot be detected pre-hand. However, elements in the text can also be considered as one of the methods to identify false reports because certain keywords in the text can actually determine whether the report is true or false.

Thus, the motive of this research is to propose an evidence detection framework for mobile reporting services using a text classifier technique. Text classifier has been widely used for spam detection and sentiment analysis [13]. Besides that, it is also applied in the field of digital forensics to analyze text-oriented digital evidence based on keyword search analysis to aid the experts in their investigation [14]. An analysis has been done to compare the performance of Naïve Bayes (NB), Support Vector Machines (SVM), Decision Tree (DT) and K-Nearest Neighbours (KNN) in terms of its accuracy and robustness. This research resolves the false report recognition problem in the CitiAct application by introducing a suitable method that could achieve 97% of accuracy and the phases involved to analyze large text from the application.

2 Related Works

2.1 False Text Detection in Reporting Services

Firoozjaei et al. [15] proposed a score based framework called False Request Detection Algorithm (FRDA) to detect false emergency requests. According to the author, false alarms can occur where reports are sent on good intents such as misjudgments and based on humanity or kind intents. FRDA evaluates the trustworthiness of the callers based on the callers’ reported location and their trust score history. A caller with high trust score is more trustworthy compared to a caller with less trust score. This approach needs the trust score beforehand. Secondly, Kantarci et al.[6] proposed a reputation-based framework named Trustworthy Sensing for Crowd Management (TSCM) where the system validates users with high reputations has a higher probability of sending true reports, whereas users with low reputations will be validated with a lower probability of sending true reports. As discussed in the introduction, reputation and behavior itself cannot be used to predict the authenticity of the reports. Besides that, researchers have also proposed several algorithms to identify false reports by analyzing user’s behaviour in reporting the information. Wang et al. [7] stated that different degree of confidence may affect the correctness of the data. Therefore, they proposed a confidence-aware truth estimation algorithm using Maximum Likelihood Estimation (MLE) approach by measuring the degree of users’ confidence in their report. Next, Marshall et al. [8] came up with another algorithm using Maximum Likelihood Estimation (MLE) approach called emotional-aware truth discovery algorithm. They indicated that emotional reported information (claims) should not be classified as true or false claims because it can lead to inaccuracy in truth discovery results. Furthermore, Marshall et al. [9] also stated that factors such as time, location and scale influence the degree of hardness of claims. Hence, they proposed a hardness-aware truth discovery algorithm.

Some applications implement a credit-based reward to attract the user to participate and provide information. Users will get some incentives if they actively participate using the application. As a result, users misuse it by sending a large number of spam reports to increase their participation and get rewarded. Ghosh et al. [10] proposed a probabilistic-based approach to identify and filter spam reports through automated confidence assignment. The proposed approach is then validated by Barnwal et al. [11] using Waze traffic alerts. Prandi et al. [12] developed a trustworthiness model for a system called Mpass (mobile Pervasive Accessibility Social Sensing) system. This system evaluates the trustworthiness of the information collected by combining the accuracy of sensors, user’s source credibility and the authoritative reports coming from authoritative data sources. Hence, a Mpass trustworthiness model is developed and conducted an agent-based simulation to assess the model.

In recent research by Bhatti et al. [16] aimed to propose an online rescue operation that detects accident and sends notification. The IoT physical system that relies on four sensor inputs to detect road accidents and is connected to medical rescue system for an immediate medical assistance. They claim that using the sensor inputs, the system able to reduce false positive rates in the reporting part. However, the proposed system needs to be tested in a real time scenario to confirm the reduction in false positive rates. Based on the previous studies, researchers have mainly focused on users’ reputation and behavior in identifying the trustworthiness and usefulness of the report. The textual or the content of the report should also be taken into consideration in identifying true and false reports. Text classifier technique has been widely used in various fields mainly in spam filtering and sentiment analysis to analyze large amount data gathered from different sources. It is also applied in the field of digital forensics such as keyword analysis, detecting deceptive criminal identities, identifying groups of criminals who are engaging in various illegal activities, criminal profiling and evidence discovery. Al-Zaidy et al. [17] stated that digital evidence is usually in the form of textual data such as emails, chat messages from mobile phones and social media and web pages. These textual data are unstructured and due to that, investigators had to conduct their investigations in a tedious manner as they had to extract information manually. With text classifier technique, it aids the investigators to detect and extract useful information or digital evidence.

Application of Text (keyword) Classifiers in Various Domains.

Criminals often hide their evidence in a digital device by changing the file type. It is necessary for the digital forensic investigator to identify the file type and classify them as legitimate and illegitimate file types [18]. As the usage of Internet and email continue to grow, the number of spam emails continues to increase rapidly. Lakshmi and Radha [19] compared two different classifiers techniques which are the Support Vector Machine (SVM) and K-Nearest Neighbours (KNN) to investigate the best classifier in classifying spam messages. They collected 20 datasets for the experiment and features such as header, subject and body were used to evaluate the performance of the classifiers. The analysis was carried out using a data mining software called Tanagra. The performances of the classifiers were evaluated based on their ease of learning, error rate and predictive accuracy. The results showed that compared to KNN, SVM consumed more time to learn and train the data. However, SVM has the least error rate and achieved a higher predictive accuracy of 77% compared to KNN which is 75%.

Likewise, authors Trivedi and Dey [20] also conducted a comparative analysis on three classifiers: Naïve Bayes (NB), Support Vector Machine (SVM) and Decision Tree (DT) to evaluate their performances in spam email classification. The performances of each classifier were measured based on their accuracy and false positive rate (FP rate). FP rate is the rate of misclassification of legitimate emails. Based on the overall results, SVM achieved an accuracy of 97.8%, followed by NB with 97.6% and DT with 96.6%. In terms of FP rate, SVM has the lowest FP rate compared to NB and DT.

Application of Text Classifier in Sentiment Analysis.

Mobile network traffic was treated as text and important semantic features were extracted from it using N-gram method from the text of the traffic flows. Later the researchers have used SVM classifier to take the important features as its feed for malware detection [21]. Gautam and Yadav conducted sentiment analysis on customers’ review about their satisfaction of the purchased product [22]. They did a comparative analysis of Support Vector Machine (SVM) and Naïve Bayes (NB). The experiment was implemented in Python and the results showed that SVM achieved a better result than NB with an accuracy of 88.2% while NB achieved an accuracy of 85.5%. Based on the comparisons that have been discussed above, we found that SVM performed better in all the experiments compared to the other three classifiers. Therefore, for our research, we apply SVM text classifier for our proposed framework because SVM does not only have higher accuracy, it is also robust, suitable for binary classification and it is easy to implement. Besides that, to the best of our knowledge, SVM is not used for false report identification in the IoT environment.

3 Methodology

This part explains the method that we have used to build the classifier. Agile model is used in this study for the development of the false text classification engine. It is a simple implementation and iteratively enhances the evolving versions until the full system is implemented. This model was chosen because it is adaptive and easy to understand [23]. Based on activities in Table 1, first, we identified the requirements of tools and its packages needed to build the classifier. Then, an algorithm to extract the suitable features is designed. The algorithm is designed by first gathering all the mechanics on how the CitiAct application works. Understanding the nature of the report generated from the CitiAct is essential. Then the data preparation steps were identified. Following to that, suitable classifiers’ procedure was studied and its comparison process were designed. After forming the algorithm, it is developed and finally the source code is tested where the performance is compared with other classifiers.

Table 1. Activities conducted in each agile model phase.

Real time dataset is collected from CitiAct database. Four attributes are used from the dataset; report text, location, address and type of report. The tool that will be used for programming and text classification is RStudio [24]. The advantage of RStudio is that it is flexible and generates a high quality analysis. Continuous literature review and documentation have been done on each part to update the recent work and compare our result with the previous work.

4 Evidence Detection Framework

The report sent by public through the CitiAct application is an evidence that we need to obtain in order to investigate whether it is a true or untruthful report. Our proposed False Report Evidence Detection Framework applies SVM text classifier technique to identify true and false reports (see Fig. 1). The description of the framework and the experiment steps are explained below.

Fig. 1.
figure 1

False report evidence detection framework.

4.1 Data Preparation

First, the dataset that contains a list of reports has to be labelled as F (represents false report) or T (represents true report). It will then undergo preprocessing by tokenizing the stream of a text document into words (tokens), removing stop words such as “a”, “the”, “and”, “of” and “while” and converting words into their root forms. Next, features are extracted by identifying distinctive keywords that occur frequently in the dataset.

The selected features (keywords) are given weight based on a number of times the terms are used in the dataset. Table 2 shows the first 6 reports in the dataset.

Table 2. Example of a labelled and preprocessed dataset.

Once the preprocessing is completed, the next step is extracting features and generating a list of terms from each document. Each term is given an index and it is sorted in alphabetical order.

IoT Devices’ False Text Classification Engine.

The prepared dataset is sent for text classification analysis that undergoes a series of steps. The first step is the separation of dataset. The dataset is separated into training and testing dataset with the proportion of 70:30. The training dataset will have the reports labelled as T (True) or F (False). This is for SVM to learn the properties of true and false reports. The testing dataset is used for prediction and therefore this portion of the dataset will not be labelled. Instead, SVM will try to predict and categorize the reports into their respective classes. Then the data are converted into term matrix which is in the form of a:b; a is the index of the term and b is the number of times the term occurs in that particular document. Table 3 shows some examples of the calculated term matrix.

Table 3. Calculated term matrix.

In the text classifier application, support vector machine is applied and trained to learn the characteristics or features of true and false data from the training dataset. It computes and assigns a vector to each data. Hence, the data will be separated into two classes; class 1 and 2 where they represent true and false data respectively. The text classifier testing model is built when an optimal hyperplane is generated by identifying the support vectors and maximizing the margin. This separates two classes (True and False). In this paper, we calculated the equal margins between class 1 and class2. True class is denoted as (+1) and the false class is denoted as (−1). Then the SVM is trained and separates the reports into two classes. Class 1 (indicating True reports) and Class 2 (indicating False reports) as shown in Table 4. Thus, this model is used for predicting testing datasets.

Table 4. Classes assigned by SVM.

In the prediction part, the text classifier testing model is then used with the testing dataset for prediction. The number of reports that are correctly predicted can be determined by referring to the confusion matrix [25]. Finally, a list of false texts from the mobile reporting services in the IoT environment that are extracted by the false text classification engine are displayed.

5 Results and Discussion

Naïve Bayes (NB), Support Vector Machines (SVM), Decision Tree (DT) and K-Nearest Neighbours (KNN) are most commonly used classifiers and among these four, SVM has proven to be the most popular because of its good performance and higher accuracy [26, 27]. Therefore, we conducted an experiment to test our framework by applying SVM text classifier and the results are compared with NB, DT and KNN.

SVM produced consistent results with the highest accuracy of 89% and 97% with 500 and 1000 dataset respectively and also with lowest error rate although the time taken to train and predict the data is higher compared to the other three classifiers. This could be because as the dimension gets larger, SVM consumes more time to train and test data. However, SVM works well with larger datasets and able to identify more distinct features to produce good classification results. On the other hand, with the shortest amount of time taken to train and test data, KNN also performed well by achieving the second highest accuracy of 85% in the first experiment 500 dataset were added.

NB produced the lowest result with an accuracy of 70% with 500 datasets but has outperformed DT and KNN with 1000 dataset by achieving the second highest accuracy of 96%. DT performed relatively well with the 500 datasets and achieved the third highest accuracy of 83% in a shorter amount of time but produced the lowest results with 1000 datasets. This is due to overfitting issue that causes a high error in generalization. On the whole, based on the results produced these experiments, SVM is found to be the most accurate and robust text classifier compared to NB, DT and KNN. Table 5 shows results of the four classifiers and the highlighted row indicates that the performance levels of text classifiers are determined by the results of the accuracy.

Table 5. Text classifiers results

Comparing with the previous similar field of research such as application of SVM classifier on spam messages prediction [19], mobile filetype identification and classifying it as legitimate or illegitimate filetypes [18, 28], spam email classification [20], it is found that SVM has outperformed other classification techniques and has the lowest false positive rates. Following to this, SVM is used in our proposed framework and is applied on larger dataset to detect false reports. The framework manages to identify 85% of the false reports from the large dataset.

6 Conclusion and Future Work

The proposed work can be part of digital investigation on criminal cases such misuse of online application or security threats; transmitting false information in the IoT environment. Thus, our proposed framework can also be applied in the digital investigation area as it can help investigators to obtain important information from the unstructured data.

The proposed false report detection algorithm is yet to be tested on the real application to investigate the real time data and to obtain the performance results. The features used in this research is only from the static dataset that were collected from the CitiAct application. At the moment, the proposed framework is only able to read English language based text reports. CitiAct may receive messages both in English and Malay language. There are possibilities to enhance the algorithm to read both the languages. Based on these limitations, the algorithm can be tested with larger datasets and by adding variety of features. Since this algorithm is first being developed for CitiAct application in Malaysia, it should be further enhanced to read other languages and enable this false report identification algorithm to be applied to any other types of smartphone applications in the IoT environment.