Keywords

1 Introduction

Credit card transactions take place frequently with the improvement of modern computing technology and global communication. At the same time, fraud is also increasing dramatically. According to the European Central Bank report [1], billions of Euros are lost in Europe because of credit card fraud every year. Credit card is considered as a nice target of fraud since a significant amount of money can be obtained in a short period with low risk [2]. Credit card frauds can be made in different forms, such as application fraud [3], counterfeit cards [4], offline fraud and online fraud [5]. Application fraud is a popular and dangerous fraud, it refers that fraudsters acquire a credit card by using false personal information or other person’s information with the intention of never repaying the purchases [3]. Counterfeit fraud occurs when the credit card is used remotely; only the credit card details are needed [6]. Offline fraud happens when the plastic card was stolen by fraudsters, using it in stores as the actual owner while online fraud is committed via web, phone shopping or cardholder not-present [5].

There are two mechanisms that are widely used to combat fraud – fraud prevention and fraud detection. Fraud prevention, as the first line of defense, is to filter high risk transactions and stop them occurring at the first time. There are numerous authorization techniques for credit card fraud prevention, such as signatures [7], credit card number, identification number, cardholder’s address and expiry data, etc. However, these methods are inconvenient for the customers and are not enough to curb incidents of credit card fraud. There is an urgent need to use fraud detection approaches which analyze data that can detect and eliminate credit card fraud [8].

However, there are many constraints and challenges that hinder the development of an ideal fraud detection system for banks. Existing FDS usually is prone to inefficient, with a low accuracy rate, or raises many false alarm, due to the reasons such as dataset insufficiency, skewed distribution and limitation of detection time.

  • Dataset Insufficiency

    One of the main issues associated with the FDS is the lack of available public datasets [9]. The increasing concern over data privacy imposes barriers to data sharing for banks. At the same time, most fraud detection systems are produced in-house concealing the model details to protect data security. However, a reliable credit card FDS is impossible to be established in the absence of available dataset.

  • Skewed Distribution

    Credit card transactions are highly unbalanced in every bank - where a few samples are fraud while a majority of them are legitimate transactions. In most circumstance, 99% of transactions are normal while fraudulent transactions are less than 1% [10]. In this case, it is very difficult for machine learning algorithms to discover the patterns in the minority class data. Furthermore, skewed class distribution has a serious impact on the performance of classifiers that are tend to be overwhelmed by the majority class and ignore the minority class [11].

  • Limitation of Detection Time

    In some online credit card payment applications, the delay in time can lead to intolerable loss or potential exploitation by fraudsters. Therefore, an online FDS that has the ability to deal with limited time resource and qualifies enough to detect fraudulent activities rapidly is extremely important [12]. Building a good fraud detection framework which is fast enough to be utilized in a real-time environment should be considered.

In this paper, we aim to address these issues with a novel fraud detection system. First, we focus on a fraud detection system which can protect the data privacy, meanwhile, it can be shared with different banks. Then, we solve the problem of skewed distribution of datasets. A federated fraud detection framework with data balance approach is proposed to construct a fraud detection model, which is different from previous FDS. Federated fraud detection framework enables different banks to collaboratively learn a shared model while keeping all the training data which is skewed on their own private database. Furthermore, the accuracy, convergence rate, training time and communication cost of FDS are comprehensively taken into consideration.

The main contributions of this paper are summarized as follows:

  1. (1)

    To deal with fraud detection problem and construct an effective FDS in data insufficient circumstance. A kind of decentralized data machine learning algorithm–federated fraud detection framework is proposed to train fraud detection model with the fraud and legitimate behavior features. Our work takes a step forward by developing ideas that solve the problem of dataset insufficiency for credit card FDS.

  2. (2)

    Using the real-world dataset from the European cardholders, experiments are conducted to demonstrate our method is robust to the unbalanced credit card data distributions. Experimental results depicted that credit card FDS with federated learning improves traditional FDSFootnote 1 by 10% AUC and F1 score.

  3. (3)

    From the results of experiments, conclusions that how to coordinate communication cost and accuracy of FDS are made, which would be helpful for making a trade off between computation resources and real-time FDS for future fraud detection work.

The rest of the paper is organized as follows. In Sect. 2, related work about credit card fraud is discussed. Section 3 gives the details of federated fraud detection framework. Section 4 provides an analysis of the dataset and experimental results. Conclusions and future work are presented in Sect. 5.

2 Related Work

Although fraud detection in the credit card industry is a much-discussed topic which receives a lot of attention, the number of public available works is rather limited [14]. One of the reasons is that credit card issuers protect the sharing of data source from revealing cardholder’s privacy. In literature about credit card fraud detection, the data mining technologies used to create credit card FDS can categorized into two types: supervised method and unsupervised method.

Supervised learning techniques relies on the dataset that has been labeled as ’normal’ and ’fraud’. This is the most prevalent approach for fraud detection. Recently, decision tree combined with contextual bandits are proposed to construct a dynamic credit card fraud detection model [15]. Adaptive learning algorithms which can update fraud detection model for streaming evolving data over time [16] to adapt with and capture changes of patterns of fraud transactions. Data level balanced techniques such as under sampling approach [17], SMOTE and EasyEnsemble are conducted in [18] to find out the most efficient mechanism for credit card fraud detection. A supervised ensemble method [19] was developed by combining the bagging and boosting techniques together. Bagging technique used to reduce the variance for classification model through resampling the original data set, while boosting technique reduce the bias of the model for unbalanced data. A FDS constructed with a scalable algorithm BOAT (Boostrapped Optimistic Algorithm for Tree Construction) which supports several levels of the tree in one scan over the training database to reduce training time [8]. Other supervised learning methods in fraud are Bayes [20], artificial neural network(ANN) [21, 22] and support vector machine [23, 24].

In unsupervised learning, there is no class label for fraud detection model construction. As in [25], it proposed unsupervised methods that do not require the accurate label of fraudulent transactions but instead detect changes in behavior or unusual transactions. K-means clustering algorithm is an unsupervised learning algorithm for grouping a given set of data based on the similarity in their attribute used to detect credit card fraud [26].

The advantages of supervised FDS over semi-supervised and unsupervised FDS is that the outputs manipulated by supervised FDS are meaningful to humans, and it can be easily used for discriminative patterns. In this paper, a data level balance approach – SMOTE is used to handle the problem of skewed distribution by oversampling fruad transactions. Supervised method with a deep network (CNN) is applied by participated banks to detect fraud transactions. Federated fraud detection framework balances the FDS performance and training time by controlling deep network learning process. But one of the biggest differences is that fraud detection models described above are only trained by individual bank with whereas the model described in this paper is trained collaboratively by different banks.

3 Methodology

3.1 Preliminaries

This section formalizes the problem setting discussed in this paper, and the FFD framework.

Definition 1

(Transaction Dataset). Let \(D_i\) denotes a credit card transaction dataset, (\(x_i,y_i \)) is the training data sample of \(D_i\) with a unique index i. Vector \(x_i \in R^ d \) is a d-dimensional real-valued feature vector, which is regarded as the input of the fraud detection model. Scalar \( y_i \in \{0,1\}\) is a binary class label, which is the desired output of the model. \( y_i \) = 1 denotes that it is a fraud transaction, \( y_i \) = 0 denotes that it is a normal transaction.

Definition 2

(Loss Function). To facilitate the learning process, every model has a loss function defined on its parameter vector w for each data sample. The loss function captures the error of the fraud detection model on the training data. The loss of the prediction on a sample (\(x_i,y_i \)) made with the model parameters w, we define it as \(\ell (x_i,y_i;w)\).

Definition 3

(Learning Rate). The learning rate controls the speed that model converges to the best accuracy. We define the learning rate as \(\eta \).

Fig. 1.
figure 1

Diagram of the Federated learning Framework. \(w_{t+1}\) represents the banks parameter that upload to server, \(w_{t+1}'\) represents the parameter that averaging by server.

Machine learning algorithm always centralizes the training data on a data center. Credit card transaction information is sensitive in both its relationship to customer privacy and its importance as a source of proprietary information for banks. Traditional machine learning models for credit card fraud detection are typically trained by individual banks with their own private dataset. Due to these datasets are privacy sensitive and large in quantity, federated learning was presented by Google [27] in 2017. Different from the traditional machine learning, federated learning enables to collaboratively learn a shared model. This shared model is trained under the coordination of a central server by using dataset distributed on the participating devices and default with privacy [28]. A typical federated algorithm – FederateAveraging (FedAvg) algorithm based on deep learning was introduced. FedAvg algorithm combines local stochastic gradient descent (SGD) on each client with a central server that performs model averaging [29]. Each client is used as nodes performing computation in order to update global shared model maintained by the central server. Every client trained their own model by using local training dataset which is never uploaded to the central server, but the update of model will be communicated. Federated learning can be concluded to five steps [27]: (1) Participating device downloads the common model from the central server. (2) Improving the model by learning data on local device. (3) Summarizes the changes of the model as a small focused update and send it using encrypted communication to the central server. (4) The server immediately aggregates with other device updates to improve the shared model. 5) The process repeats until convergence. The structure of federated learning is illustrated in Fig. 1.

3.2 Federated Fraud Detection Framework

There are fixed set of C banks(or financial institutions) as participants, each bank possesses a fixed private dataset \(D_i= \{x_i^c,y_i^c\}\) (c=1,2,3,..,C). \(x_i^c\) is the feature vector, \(y_i^c\) is the corresponding label and \( n_c\) is the size of dataset associated with participant bank c. Credit card transaction data is skewed, fraudulent transactions have a very small percentage of total number of dataset, which might cause obstructions to the performance of credit card FDS. A data level method–SMOTE [30] is selected for data rebalancing at \(D_i\). SMOTE oversamples the minority class by generating synthetic minority examples in the neighborhood of observed ones. It is easier to implement and does not lead to increase training time or resources compared to algorithm level approach [18].

In our fraud detection system with federated learning, the goal is to allow different banks can share dataset to build an effective fraud detection model without revealing the privacy of each bank’s customers. Before getting involved in training the fraud detection model, all banks will first agree on a common fraud detection model (the architecture of the model, activation function in each hidden layer, loss function, etc). For a non-convex neural network model objective is:

$$\begin{aligned} \min \limits _{w \in \mathbb {R}^d} \quad \ell (x,y;w) \qquad where \qquad \ell (x,y;w) \overset{def}{=} \frac{1}{n}\sum _{i=1}^n\ell (x_i,y_i;w). \end{aligned}$$
(1)

In federated fraud detection model, There are C banks as participant with a fixed dataset \(|D_i|=n_c\), We use n to represent all the data samples involved in the whole FDS. Thus n= \(\sum _{i=1}^C |D_i|\) =\(\sum _{c=1}^C n_c\). We can re-write the objective (1) as

$$\begin{aligned} \ell (x,y;w) = \qquad where \qquad L_c(x_c,y_c;w)= \frac{1}{n_c} \sum \limits _{i \in D_i} \ell (x_i^c,y_i^c;w) \end{aligned}$$
(2)

The server will initialize the fraud detection model parameters. At each communication round t=1,2,...., a random fraction F of banks will be selected. These banks will communicate directly with the server. First, download the current global model parameters from the server. Then, every bank computing the average gradient of the loss \(f_c\) on their own private dataset at current fraud detection model parameters \(w_t\) with a fixed learning rate \(\eta \), \(f_c\) = \(\nabla L_c(x_c,y_c;w)\). These banks update their fraud detection model synchronously and send the update of fraud detection model to server.

The server aggregates these updates and improves the shared model

$$\begin{aligned} w_{t+1} \leftarrow w_t- \eta \nabla \ell (x,y;w) \end{aligned}$$
(3)
$$\begin{aligned} w_{t+1} \leftarrow w_t- \eta \sum _{c=1}^C \frac{n_c}{n} \nabla L_c(x_c,y_c;w) \end{aligned}$$
(4)
$$\begin{aligned} w_{t+1} \leftarrow w_t- \eta \frac{n_c}{n} f_c \end{aligned}$$
(5)

For every bank c, \(w_{t+1}^c \leftarrow w_t- \eta f_c \), since (5), then

$$\begin{aligned} w_{t+1} \leftarrow w_t- \sum _{c=1}^C \frac{n_c}{n} w_{t+1}^c \end{aligned}$$
(6)

Considering the impact of skewed data on model performance, we use the combination of data size and detection model performance \(\alpha _{t+1}^c \) on each bank as the weight of parameter vector. it can be written as

$$\begin{aligned} w_{t+1} \leftarrow w_t- \sum _{c=1}^C \frac{n_c}{n} \alpha _{t+1}^c w_{t+1}^c \end{aligned}$$
(7)

Increasing the weight of strong classifiers and make it plays a more important role to form a better global shared model. Each bank takes a step of gradient descent and evaluates on fraud detection model using its own credit card transactions. Then, the server applies them by taking a weighted average and makes them available to all participated banks. The whole process will go on for T iterations.

figure a

The increasing concern over data privacy imposes restrictions and barriers to data sharing and make it difficult to coordinate large-scale collaborative constructing a reliable FDS. Credit card FDS based on federated learning is proposed, it enables each bank to train a fraud detection model from data distributed across multiple banks. It not only helps credit card FDS learn better patterns of fraud and legitimate transactions but also protect the datasets’ privacy and security. For federated optimization, communication cost is a major challenge. On the one hand, banks should fetching initial fraud detection model parameters from server. At the same time, banks should upload the update of model to server. So communication cost in FDS is symmetric. It is influenced by upload bandwidth, but in our FDS, the communication cost is related three key parameters: F, the fraction of banks that be selected to perform computation on each round; B, the minibatch size used for banks update. E, the number of local epochs. Controlling communication cost by tuning these three parameters which means we can add computation by using more banks to participate to increase parallelism or performing more computation on each bank between every communication round. The details of our fraud detection model training process are described in Algorithm 1.

4 Experimental Results

This section is organized as three parts. Firstly, we introduce the dataset that used in our FDS. Secondly, we show the performance measurement of our fraud detection model. Finally, we demonstrate the results of our experiments.

Table 1. Credit card dataset

4.1 Dataset Description

We conducted a series of comprehensive experiments to manifest the superiority of our method. The experiment dataset from the European Credit Card (ECC) transactions made in September 2013 by European cardholders and it provided by the ULB ML Group [31]. This dataset contains anonymized 284,807 total transactions spanning over a period of two days, but there are only 492 fraudulent transactions in this dataset with a ratio of 1:578. The dataset is highly imbalanced as it has been observed only 0.172% of the transactions are fraudulent. Due to confidentiality issues, the original features, and some background information about the dataset cannot be provided. So this dataset contains only 30 numerical input variables which are a result of the Principal Component Analysis(PCA) transformation. It is described in Table 1. This is a classic example of an unbalanced dataset of credit card fraud(Fig. 2), it is very necessary to rebalance the raw data to prevent the classifiers from over-fitting the legitimate class and ignore the patterns of frauds.

Fig. 2.
figure 2

Dataset visualization via PCA.

4.2 Performance Measures

Measuring the success of machine learning algorithm is a crucial task so that the best parameters suitable for credit card fraud detection system can be selected [32]. When the dataset is significantly imbalanced, accuracy is not enough to measure the performance of FDS. Accuracy will have a high value even if the FDS mispredict all instances to legitimate transactions. Therefore, we take other measures into consideration namely precision, recall, F1 and AUC which are calculated based on Table 2 where Positive correspond to fraud samples and Negative correspond to legitimate samples. Accuracy indicates the total experimental records have been classified correctly by FDS. Precision rate is a measurement of reliability of the FDS while recall rate measures the efficiency of FDS in detecting all fraudulent transactions. F1 is the harmonic mean of recall and precision. Additionally, Area Under Curve(AUC) refers to the area under the Receiver Operating Characteristic(ROC) curve, which can better describe the performance of classifiers trained with unbalanced samples.

$$\begin{aligned} Accuracy=\frac{TP + TN}{TP+ FP+ TN+ FN} \end{aligned}$$
(8)
$$\begin{aligned} Precision =\frac{TP}{TP + FP} \end{aligned}$$
(9)
$$\begin{aligned} Recall=\frac{TP}{TP + FN} \end{aligned}$$
(10)
$$\begin{aligned} F1=\frac{2 \times Precision \times Recall}{Precision + Recall} \end{aligned}$$
(11)
Table 2. Performance matrix

4.3 Results and Discussions

In this section, a series of experiments are conducted to show the advancement of our fraud detection system. All the experiments are running on a standard server on Intel E5 with 28 CPU cores, 2.00 GHz, and 128 GB RAM. The shared global model is a CNN [33] with two convolution layers, the first with 32 channels and the second with 64 channels, each layer followed with a max pooling, a fully connected layer with 512 units and RELU activation, and a final softmax output layer.

Fig. 3.
figure 3

Sensitive test of sampling ratio of fraud and legitimate transactions.

To minimize the impact of over-fitting, we split the dataset into 80% training data and 20% testing data. Data level approach–SMOTE is selected to rebalance raw dataset. We conduct a series of experiments on different sampling ratio with a default E = 5, B = 80 and \(\eta \) = 0.01. Figure 3 shows that the federated FDS with data balance mechanism outperforms FDS that trained with raw data. The better fraud detection system performed with a higher proportion of fraud transactions. Due to FDS can learn better patterns of fraud and legitimate transaction when the data is more balance. Figure 4(b) depict that when the sampling ratio is 1:1 which refers to the ratio of fraud and legitimate transactions over 1:1, the training time increased sharply but there is only a small advantage to FDS performance. Taking the training time and realistic application into consideration, we choose the sampling ratio of 1:100 to achieve an efficient FDS. From real business perspective, the average cost of misjudging 100 normal transactions is approximately the same as the mean cost of missing a fraudulent transaction.

Fig. 4.
figure 4

Efficiency of federated FDS.

After rebalancing the dataset, we also should specify the data distribution on each bank. The dataset is shuffled, and then partitioned into C = 100 banks randomly. Because the amount of transactions owned by each bank is different in reality, each bank receives a different amount of transactions. Then, the experiments with the fraction of banks F which controls the amount of banks parallelism are implemented. Table 3 demonstrates the impact of varying F for credit card fraud detection system. We calculate the number of communication round to reach a target AUC of 95.9%. The first line of Table 3 demonstrates that with the increasing Banks involved in parallel computing, the number of communication round required to reach the target AUC decreased, but the performance of FDS has become better. Time efficiency is also essential to an effective FDS which should be able to deal with limited time resource. In our FDS, the training time of every communication round (Fig. 4(a)) shows an improvement in increasing fraction of banks, but there is small advantages in performance. In order to keep a good balance between the performance of FDS and computational efficiency, in remainder experiments, we fixed \(F=0.1\).

Table 3. Sensitive test of fraction of banks
Fig. 5.
figure 5

Sensitive test of local batch size.

With \(F=0.1\), adding more computation per bank on each round by decreasing batch size or increasing epochs. For batch size –B, we calculate the number of communication rounds necessary to achieve a target recall of 78%, F1 of 87%, AUC of 89% and validation accuracy of 99%. The results are depicted in Fig. 5, where the grey lines stand for the targets. In Fig. 4(c), smaller batch size lead longer training time on average. But the number of communication rounds to reach the targets is decreased with the increasing computation per bank by decreasing the local batch size of banks. So the total time cost is still decreased. Smaller batch size speeds the convergence and improves the performance of FDS. For local epochs, Fig. 4(d) shows that larger epoch leads the increment of training time to per communication round. But Table 4 depicts that the number to reach the target AUC of 96% is decreased. Figure 5 and Table 4 reveal that add more local SGD updates by decreasing batch size or increasing epochs per round to each bank result in a speed up to convergence rate and less computation cost.

Table 4. Sensitive test of number of local epochs

5 Conclusion

This paper constructed a credit card FDS with federated detection. The results of our experiments show that federated learning for credit card detection system has a significant improvement. Federated fraud detection framework enables banks without sending their private data to data center to train a fraud detection system. This decentralized data method can protect the dataset sensitivity and security, alleviate the influence of unavailable dataset to some degree. There are still privacy problems in federated fraud detection system. First, we should consider what information can be learned by inspecting the global shared model parameters. Second, we should think about what privacy-sensitive data can be learned by gaining access to the updates of an individual bank. In future works, we will take more reliable measurements into account to protect the privacy of data. And the Non-IID dataset can be evaluated in this credit card fraud detection system and ensure the credit card FDS to communicate and aggregate the model updates in a secure, efficient and scalable way.