DMDP: A Dynamic Multisource Default Probability Prediction Framework
 177 Downloads
Abstract
In this paper, we propose a dynamic forecasting framework, named DMDP (dynamic multisource default probability prediction), to predict the default probability of a company. The default probability is a very important factor to assess the credit risk of listed companies on a stock market. Aiming at aiding financial institutions in decision making, our DMDP framework not only analyzes financial data to capture the historical performance of a company, but also utilizes long shortterm memory model to dynamically incorporate daily news from social media to take the perceptions of market participants and public opinions into consideration. The study of this paper makes two key contributions. First, we make use of unstructured news crawled from social media to alleviate the impact of financial fraud issue made on default probability prediction. Second, we propose a neural network method to integrate both structured financial factors and unstructured social media data with appropriate time alignment for default probability prediction. Extensive experimental results demonstrate the effectiveness of DMDP in predicting default probability for the listed companies in mainland China, compared with various baselines.
Keywords
Default probability prediction Time series Multisource data1 Introduction
Granting loans to potential borrowers is considered as one of the core business activities for financial institutions. Though loans can help these institutions gain profits, they may also cause huge loss, which is often known as financial risks. For instance, the 2008 financial crises resulted in huge losses globally. Hence, nowadays financial institutions devote more and more attention to evaluating risks before granting loans. In particular, most financial institutions are now cognizant of the need to adopt rigorous credit risk assessment models when determining whether or not to grant loans to specific borrowers.
In the early stage, various classical statistical approaches [3, 7] such as logistic regression [10], multivariate adaptive regression splines [19, 20] and linear discriminate analysis [1, 2] were proposed for credit risk prediction. However, statistical approaches are typically based on certain assumptions, e.g., multivariate normality for independent variables and nonmulticollinearity of data, which make the proposed solutions theoretically invalid for finite samples [16]. Fortunately, with the advent of machine learning algorithms, many studies demonstrated that neural network (NN) [19, 31, 38], support vector machine (SVM) [6, 11, 13, 15, 23], decision tree (DT) [4, 32], random forest (RF) [5, 29] and Naive Bayes (NB) [3, 25, 33] can be used to build credit scoring models for measuring default risks with high accuracy. Some practical works [8, 26, 27, 30, 34] have focused on classifier ensembles and demonstrated that ensemble classifiers constantly outperform single classifier in terms of prediction accuracy.
Although machine learning methods can automatically learn hidden and critical factors based on the past observations and do not require specific prior assumption, the performance of these supervised methods greatly relies on the quality of training data. To be more specific, the accuracy of the risk evaluation results is typically affected by the trustworthiness and the comprehensiveness of the available historical data. At corporation level, the data for default probability prediction involve basic financial indicators such as industry section, geographical area and financial statements. One of the major problems encountered in adopting financial indicators for credit risk assessment is that the companies might commit accounting fraud in order to artificially improve the appearance of the financial reports, which impedes the effectiveness of the learned prediction models. Furthermore, almost all the prior works focus on static models that leverage the most recent indicator values for prediction, but do not consider the temporal trend of indicators that is valuable to reflect the longterm financial status of a company.
In practice, for credit risk assessment, most lenders take advantage of the information from social media networks, such as Twitter and Facebook, to decide whether their potential borrowers are creditworthy. However, probably due to the difficulty in grabbing daily financial news, we do not find any study that leverages social media analysis to improve the prediction performance in terms of the default probabilities for companies. It is important to notice that besides financial statements, social media data contain subjective appraisals of the firm’s prospects which are discriminate indicators to assess the default probability of a publicly traded company.
In this paper, we propose a dynamic multisource default probability predication framework named DMDP, to predict the default probability of the listed companies. In order to relieve the exertion of potentially flawed financial data and enhance the accuracy of machine learningbased default probability prediction method, we make use of social media data to trace the latest development of a company. Through mining the public opinions (randomly updated) as well as the financial indicators (periodically updated), we can make a comprehensive evaluation of the observed companies in terms of their default probabilities. More importantly, aiming at prior evaluation for group loans, our framework is designed to handle the evolving data and continuously produce default probability prediction results based on the uptodate company statuses, thus allowing financial institutions to make quick response when the borrowers experience a drastic market decline.

We have changed the frequency of news text updates, from quarterly updates to weekly updates. More frequent updates allow us to be aware of finegrained changes of public opinions on the target company. Apparently, it is easier to identify subtle fluctuations of the public opinions within a shorter time window. It is important to note that more frequent updates require zooming into weekly news updates will result in longer sequences, which typically calls for a more complicated model to learn the complex temporal dependencies.

We have improved the way to represent the news text. In the preliminary work, we first concatenated all the related news text concerning the target company in a quarter. Then, we extracted the top 50 keywords to and calculated the average word embeddings of the keywords to represent the public opinion inherent in the news text of the target company during the quarter. In this work, we use the average word embeddings of the title of a news text to represent it, which we dub it as the title embedding. Then, we calculate the average title embedding out of all the news text concerning the target company during a week. This change is mainly due to the observation that the titles themselves are good abstractions of the news text produced by professional editors.

We have improved the dataset splitting mechanism. In the preliminary work, we randomly sample 70%, 15% and 15% of the dataset as training, validation and test sets, respectively. In this work, we treat all data points before Quarter 3, 2016 as training data. Data points ranging from Quarter 3, 2016 to Quarter 4, 2016 are treated as validation data. Data from Quarter 1, 2017 on are treated as testing set. By splitting the data according to the time order, we can avoid the problem of future information leaking.

To better model the asynchronous nature of news series, we have replaced the LSTM in the original paper with the Phased LSTM [22], which extends the LSTM unit by adding a new time gate and can process asynchronous time series data. We also adopted a recently proposed CNNbased Wavenet [28] to encode the news text time series. The results have shown that Wavenet has greatly improved AUC on our default behavior dataset.
2 Preliminaries
In general, the objective of this study is to effectively distinguish “bad” corporations from “good” ones, which can be considered as a classification problem where a company is categorized to class “1” if it is predicted to be default, i.e., receive delisting risk warning (*ST). Otherwise, it will be labeled by “0.” In this work, we use sequences of historical financial indicators and unstructured news data in the previous time periods to predict whether the target company will be default or not in the future.
2.1 Definitions
We first introduce the definitions used in this paper. Because the financial indicators and the target value are updated at quarterly frequency, we denote the financial indicators and the target value by quarter. During each time period (quarter) t, the information of a company c during that time period (quarter) t can be recorded in a tuple of \((FIN_t^{c}, {\textit{TEXT}}_t^{c}, y_t^{c})\), where \(FIN_t^{c}\) contains the values of a set of financial indicators collected at the end of period t. Because there can be multiple financial indicators observed, \(FIN_t^{c}\) could be a multidimensional vector. Similarly, because the news text information is updated at more granular frequency (by week), \({\textit{TEXT}}_t^{c}\) could be a sequence with each one being a representation of the news text during the more granular observation period (week). \(y_t^{c}\) is the binary response variable to indicate whether the company is labeled (*ST) in the period.
2.1.1 Financial Indicators (FIN)
Financial indicators are commonly included in the financial statements, and they reflect important characteristics of an company. For instance, the indicator “cash flow to liabilities ratio” directly reflects a company’s ability to cover its liability within a time window and hence is critical when predicting the default probability of the company in near future. By similar logic, we extract several key financial indicators from financial statements of a company at the end of a financial period. Suppose we extract P indicators in each time period (quarter) t for a company c, we denote them by \(FIN_t^{c} = \{X_{1t}^{c}, X_{2t}^{c}, \ldots , X_{Pt}^{c}\}\).
2.1.2 News Data Representation (TEXT)
TEXT stands for the embeddingbased representation of news data that are relevant to the company during a period. The news is crawled from social media. Because the news is classified into each company directly, we can easily match them to corresponding companies. As described earlier, in a specific time period (quarter) t for company c, we can denote the more granular news text sequence as \({\textit{TEXT}}_t^{c} = \{{\textit{TEXT}}_{tk}^{c}\} = \{{\textit{TEXT}}_{t1}^c, {\textit{TEXT}}_{t2}^c, \ldots , {\textit{TEXT}}_{tK_t}^{c}\}\), where \(k \in \{1, 2, \ldots , K_t\}\) is the kth week during quarter t. \({\textit{TEXT}}_{tk}^{c}\) is the representation of all the news texts related to the company c during the kth week in quarter t and is calculated by the average of all the title embeddings of news in that week.
2.1.3 Class Label (y)
y stands for the class label of the target company in a financial period. That is, \(y = 1\) if the company receives delisting risk warning (*ST) during the period. Otherwise, \(y = 0\). In this work, we do not take the sequence of the previously observed label values as the input for our prediction model as the discriminative power of the previous *ST values is quite limited.
2.2 Problem Statement
We now formally define the problem studied in this paper as follows.
Definition 1
(Problem Statement) Given a target company, an observation financial period (quarter) t, sequences of financial indicators FIN and news TEXT before time period t (t is included) in chronological order, we want to develop a framework to predict the default probability of the company during time period \(t + 1\). The predicted value could be 0 or 1, representing relatively low default probability or high default probability, respectively.
3 Methodology
3.1 Dynamic Multisource Data Alignment
To predict whether a company c will be labeled (*ST) or not at time period \(T + 1\), we extract the sequences of financial indicators \(\{FIN_1^c, \ldots , FIN_T^c\}\) and unstructured news representations \(\{{\textit{TEXT}}_1^c, \ldots , {\textit{TEXT}}_T^c\}\), where \(FIN_1^c\) and \({\textit{TEXT}}_1^c\) are the inputs in the first financial period after company c gets listed. Because a listed company is required to publish its financial statements for each financial period according to regulation, the sequence of the historical financial indicators is complete. We denote the sequence of financial indicators for a company c before time period T by \(\{FIN_t^{c}\} = \{X_{it}^{c}\}\), where \(i \in \{1, \ldots , P\}\) and \(t \in \{1, \ldots , T\}\). For the sequence of news representations, we perform preprocessing and alignment over the raw news data to deal with the problems of data missing and irregularity. On the one hand, the number of news within a specific financial period varies from one company to another. On the other hand, even for the same company, the number of relevant news released in different financial periods varies greatly, from zero to several dozens. For instance, a company may receive multiple news within one day, while it might take weeks to get one relevant news for certain time period.
Note that not every company has related news in each week, and hence, we have to decide how to impute or align the data for news text sequence. We have tried three ways to align such data. One is to impute zero vectors into the missing weekly representations of news for each company. To be specific, if a company c has news representation until week k in quarter t, we impute zero vector into all missing weeks before week tk. We call this alignment method ZeroInputAlign. The second way is to serialize the week number tk into a sequence and feed them into the model too. Since Phased LSTM can take time step as direct input into the model, we feed it in this way. We dub this alignment method as TimeInputAlign. The third way is to squeeze the text news sequence directly without considering the potential sparsity of the input. For instance, for a company c, which has only three records \(\{{\textit{TEXT}}_{11}^c, {\textit{TEXT}}_{35}^c, {\textit{TEXT}}_{T4}\}\), we directly input the sequence as \(\{{\textit{TEXT}}_1, {\textit{TEXT}}_2, {\textit{TEXT}}_3\}\) into the model. In this way, the time only determines the position in the sequence but does not serve as input to the model. We refer to this alignment method as SqueezeInputAlign. We would compare the performance of the three methods in our experiments.
3.2 Neural NetworkBased Default Probability Prediction Model
3.2.1 Input Layer
The first layer is the input layer, which contains the aligned sequences of financial indicators and news representations, during time periods \(1, \ldots , T\). Formally, the input layer is defined as \({\mathbf {X}}_{T}^{c} = (\{FIN_{t}^{c}\}, \{{\textit{TEXT}}_{t_{\rm text}}^{c}\})\).
3.2.2 Encode Layer
The second layer is to encode the financial variables time series and the news text time series, respectively. Because the news representations are updated at a weekly frequency while the financial variables are updated quarterly, the length of the former is much longer than the latter and it is much harder to encode the news time series. To resolve this problem, we introduce Phased LSTM and Wavenet in addition to LSTM into our model. Phased LSTM [22] introduces a time gate to take time as an input directly and can thus deal with asynchronous time series. Wavenet is a CNNbased model with causal dilations [28], which has shown powerful expressive power in encoding audio, text. Next we will briefly describe the three models and compare their results in the experiments part.
After encoding the news time series using one of these three models, we then extract the representation of the final time step and combine it with the hidden states from the LSTM on financial variable time series and pass the concatenated vector to the next layer to predict the default probability in the next financial period. In the experiments, we evaluate the performance of the three models on a real dataset.
3.2.3 Prediction Layer
3.3 Learning and Optimization
4 Experiments
4.1 Experimental Settings
4.1.1 Datasets
The numerical financial indicators used in our experiments
Category  Financial indicators 

Cash flow ability  Cash flow to sales ratio 
Cash flow to net profit margin ratio  
Cash flow to liabilities ratio  
Cash flow ratio  
Operation ability  Account receivable turnover 
Account receivable turnover days  
Inventory turnover  
Inventory turnover days  
Current asset turnover  
Current asset turnover days  
Profitability  Return of equity(roe), EPS 
Net profit ratio, net profits  
Gross profit rate, business income  
Business income per share  
Solvency  Current ratio, cash ratio 
Quick ratio  
Interest coverage ratio  
Shareholders equity ratio  
Growth  Main business rate of growth 
Net profit rate of growth  
Net asset  
Total asset rate of growth  
EPS rate of growth  
Shareholders equity rate of growth 
4.1.2 Preprocessing News Text
In the preliminary paper [37], we concatenated the news in one season and extracted top 50 keywords using TFIDF for each company. In this paper, however, we make two changes. First, instead of concatenating news in a quarter into single text and extracting the keywords, we use the titles instead of the body of the news. This is based on the observation that the titles are mostly good abstractions of the news written by the professional editors. Second, we further change the granularity from quarterly base to weekly base. That is, for each company, we calculate the average of all the news titles embeddings to represent the news for the company in the week. By doing this, we expect that the updates of news text could be more timely and we can better model the fluctuations of the public opinions for the stock. When calculating the average word embeddings in a title, we first adopt Jieba^{3} for Chinese word segmentation. Next we remove the stop words and look up the word embeddings from the pretrained Word2Vec model provided by Facebook’s FastText module [17]. We represent the news text in week with a 300dimensional embedding vector, which is the average of the all the news concerning the company in the week. After that, we remove the null value from the financial dataset and standardize the numerical data into range (0,1) with MaxMin Scalar. We align the two sequences of financial indicators and news representation using Algorithm 1.
4.1.3 Implementation Details
Parameter ranges
Parameter  Description  Range 

Learning rate  learning rate  \(\{0.1, 0.01, 0.001\}\) 
Hidden units in LSTM (FIN)  Number of hidden units  {16, 32, 64, 128} 
Hidden units in LSTM/Phased LSTM (TEXT)  Number of hidden units  {16, 32, 64, 128} 
Resample ratio  The ratio of positive cases in a batch  {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9} 
Dilation in Wavenet  Number of dilation layers in Wavenet  {2, 4, 8, 16} 
4.1.4 Compared Methods
 (1)
GAM (generalized additive model) [24] is treated as the baseline in our study. In general, GAM is used to deal with time series data with a fixed window size. Enabling the discovery of a nonlinear fit between a variable and the response, the model makes use of the idea that time series could be decomposed as a plenty of individual trends, denoted by a sum of smooth functions [12].
 (2)
LSTM on FIN data only. The LSTM here is an ordinary LSTM with variablelength inputs. The inputs to this model is the 30dimensional financial indicators sequence.
 (3)
LSTM on TEXT data only. The LSTM here is also an ordinary LSTM with variablelength inputs. The inputs to this model is the 300dimensional embedding representation of text sequence.
 (4)
Phased LSTM on TEXT only. The inputs to this model are the 300dimensional embedding representation of text sequence.
 (5)
Wavenet on TEXT only. The inputs to this model are the 300dimensional embedding representation of text sequence.
4.1.5 Metrics
We adopt area under receiver operating characteristic curve (AUC) as our evaluation criterion. Commonly used in selecting the optimal classifier that predicts the classes best, AUC weights errors on the two classes separately and tells a more truthful story when working with the imbalanced dataset. The random predictor will produce the AUC value with 0.5, the more powerful the classifier is, the larger AUC value will be. We also report the prediction accuracy of different methods on the test set.
4.2 Comparison Results
Comparison of different methods. Note that each method was trained 5 times and we report their average performance and standard deviations (in brackets) for comparison
Framework  Model  Data  Accuracy  AUC 

–  GAM  FIN  0.651(0.000)  0.550(0.000) 
–  LSTM  FIN  0.891(0.070)  0.662(0.033) 
–  LSTM  TEXT  0.991(0.010)  0.628(0.061) 
–  Phased LSTM  TEXT  0.963(0.032)  0.583(0.060) 
–  Wavenet  TEXT  0.979(0.016)  0.757(0.018) 
DMDP  LSTM (FIN) + LSTM (TEXT)  FIN+TEXT  0.981(0.009)  0.682(0.042) 
DMDP  LSTM (FIN) + Phased LSTM (TEXT)  FIN+TEXT  0.996(0.003)  0.607(0.133) 
DMDP  LSTM (FIN) + Wavenet (TEXT)  FIN+TEXT  0.982(0.009)  \(\mathbf {0.761}(0.013)\) 
From the table, we find that DMDP with LSTM on financial variables and Wavenet on news time series achieves the highest AUC of 0.761, higher than all the baselines. Note that all the neural networkbased methods outperform the GAM baseline. Also note that there are huge differences when applying different models to extract information from news text data. The results show that Wavenet outperforms LSTM and Phased LSTM by a large margin in extracting news text information. Besides, when combining financial variables with text information, the models usually improve compared to only financial variables or news text alone. For Wavenet, however, the improvement of introducing financial variables is not large, which indicates the importance of news text information.
4.3 Parameter Tuning
The results of different alignment methods
Alignment method  Model  Data  Accuracy  AUC 

ZeroInputAlign  LSTM (FIN) + Wavenet (TEXT)  FIN+TEXT  0.998(0.000)  0.672(0.027) 
TimeInputAlign  LSTM (FIN) + Phased LSTM (TEXT)  FIN+TEXT  0.996(0.003)  0.607(0.133) 
SqueezeInputAlign  LSTM (FIN) + Wavenet (TEXT)  FIN+TEXT  0.982(0.009)  \(\mathbf {0.761}(0.013)\) 
The results of different learning rates in DMDP with LSTM (FIN) + Wavenet (TEXT)
Learning rate  Accuracy  AUC 

0.1  0.850(0.104)  0.498(0.016) 
0.01  0.720(0.059)  0.579(0.054) 
0.001  0.982(0.009)  \(\mathbf {0.761}(0.013)\) 
The results of different numbers of hidden units of LSTM (FIN) in DMDP with LSTM (FIN) + Wavenet (TEXT)
#Hidden units  Accuracy  AUC 

16  0.982(0.009)  \(\mathbf {0.761}(0.013)\) 
32  0.986(0.016)  0.715(0.018) 
64  0.987(0.014)  0.742(0.040) 
128  0.988(0.000)  0.760(0.014) 
The results of different resampling ratios in DMDP with LSTM (FIN) + Wavenet (TEXT)
Resampling ratio  Accuracy  AUC 

0.1  0.997(0.001)  0.749(0.019) 
0.2  0.987(0.005)  0.729(0.022) 
0.3  0.994(0.005)  0.757(0.044) 
0.4  0.982(0.009)  \(\mathbf {0.761}(0.013)\) 
0.5  0.991(0.003)  0.723(0.019) 
0.6  0.968(0.013)  0.715(0.033) 
0.7  0.993(0.001)  0.696(0.012) 
0.8  0.974(0.020)  0.669(0.016) 
0.9  0.975(0.021)  0.698(0.045) 
The results of different numbers of dilation layers in Wavenet (TEXT) in DMDP with LSTM (FIN) + Wavenet (TEXT)
#Dilation layers  Accuracy  AUC 

2  0.792(0.155)  0.566(0.090) 
4  0.737(0.113)  0.598(0.041) 
8  0.982(0.009)  \(\mathbf {0.761}(0.013)\) 
16  0.973(0.017)  0.703(0.004) 
5 Related Work
Recently, RNN variants such as LSTM [14] have been very successful in modeling the longrange sequential dependencies. And they have been applied to many time series forecasting or classification tasks. In [9], the authors used LSTM to predict whether a stock (6 typical stocks) would increase 0–1% (class 1), above 1% (class 2) or not increasing (class 3) within next three hours with the highest accuracy at 59.5%. In [36], the authors proposed a novel SFM model, which incorporates discrete Fourier transform (DFT) into LSTM, to predict values in the future series. They argued that by decomposing the hidden states of LSTM into multifrequency components, they could capture different latent patterns behind the original time series. In [21], the authors tried to combine LSTM and CNN into a single framework called TreNet as the author argued that CNNs extract salient features from local raw data, while LSTM captures longterm dependency. The results demonstrated that the combined network outperforms both CNN and LSTM as well as various kernelbased models in predicting the trend in time series. However, to the best of our knowledge, no prior work has studied the problem of default probability prediction which is a critical task to perform risk assessment for listed companies. Moreover, none of the existing time series prediction methods leverages the informative social media data to enhance prediction accuracy.
6 Conclusion
This paper has developed a default probability prediction framework DMDP, which leverages both structured financial factors and unstructured news from social media, to capture default risk states of the observed corporations. DMDP involves a data alignment component to absorb multisource data with different timestamps. We further adopt LSTM, Phased LSTM and Wavenet for financial variables time series and news text time series, respectively, to effectively extract the latent information. In the experiments, we considered over 30 financial indicators including the profitability, solvency, operation ability, cash flow ability and potential growth ability of records for over 3000 listed corporations in mainland China. The results show that compared to the existing risk assessment approach that only considers financial factors, our neural method with additional indicators from social media news improves the accuracy of the default probability prediction results. As future work, we will investigate the following research directions: (1) the effects of public opinions among affiliated companies on a company’s default value; (2) the importance of different features on default probability prediction performance.
Footnotes
Notes
Acknowledgements
This work is supported by the National Key Research and Development Program of China (No. 2018YFC0831604). Yanyan Shen is also supported by NSFC (No. 61602297).
Author Contributions
YS proposed the overall framework for default probability prediction and proposed the initial model structure. YZ finalized the two neural networkbased methods and conducted all the experiments. YH reviewed the related works and participated in collecting and cleaning multisource data.
Compliance with Ethical Standards
Ethics approval and consent to participate
Yes.
Consent for publication
Yes.
Availability of data and material
Yes.
Conflict of interest
The author declares that they have no conflict of interest.
References
 1.Altman EI (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J Finance 23(4):589–609CrossRefGoogle Scholar
 2.Altman EI, Saunders A (1997) Credit risk measurement: Developments over the last 20 years. J Bank Finance 21(11–12):1721–1742CrossRefGoogle Scholar
 3.Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking stateoftheart classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635CrossRefGoogle Scholar
 4.Bijak K, Thomas LC (2012) Does segmentation always improve model performance in credit scoring? Expert Syst Appl 39(3):2433–2442CrossRefGoogle Scholar
 5.Brown I, Mues C (2012) An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst Appl 39(3):3446–3453CrossRefGoogle Scholar
 6.Chen W, Ma C, Ma L (2009) Mining the customer credit using hybrid support vector machine technique. Expert Syst Appl 36(4):7611–7616CrossRefGoogle Scholar
 7.Desai VS, Crook JN, Overstreet GA (1996) A comparison of neural networks and linear scoring models in the credit union environment. Eur J Oper Res 95(1):24–37CrossRefGoogle Scholar
 8.Doumpos M, Zopounidis C (2007) Model combination for credit risk assessment: a stacked generalization approach. Ann Oper Res 151(1):289–306CrossRefGoogle Scholar
 9.Gao Q (2016) Stock market forecasting using recurrent neural network. Ph.D. thesis, University of Missouri–ColumbiaGoogle Scholar
 10.Hand DJ, Henley WE (1997) Statistical classification methods in consumer credit scoring: a review. J R Stat Soc Ser A Stat Soc 160(3):523–541CrossRefGoogle Scholar
 11.Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42(2):741–750CrossRefGoogle Scholar
 12.Hastie T, Tibshirani R (1990) Generalized additive models. Wiley Online Library, HobokenzbMATHGoogle Scholar
 13.Hens AB, Tiwari MK (2012) Computational time reduction for credit scoring: an integrated approach based on support vector machine and stratified sampling method. Expert Syst Appl 39(8):6774–6781CrossRefGoogle Scholar
 14.Hochreiter S, Schmidhuber J (1997) Long shortterm memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
 15.Huang CL, Chen MC, Wang CJ (2007) Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl 33(4):847–856CrossRefGoogle Scholar
 16.Huang Z, Chen H, Hsu CJ, Chen WH, Wu S (2004) Credit rating analysis with support vector machines and neural networks: a market comparative study. Decis Support Syst 37(4):543–558CrossRefGoogle Scholar
 17.Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: compressing text classification models. arXiv preprint arXiv:1612.03651
 18.Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
 19.Lee TS, Chen IF (2005) A twostage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Syst Appl 28(4):743–752CrossRefGoogle Scholar
 20.Lee TS, Chiu CC, Chou YC, Lu CJ (2006) Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Comput Stat Data Anal 50(4):1113–1130MathSciNetCrossRefGoogle Scholar
 21.Lin T, Guo T, Aberer K (2017) Hybrid neural networks for learning the trend in time series. In: Proceedings of the twentysixth international joint conference on artificial intelligence, IJCAI17, pp 2273–2279Google Scholar
 22.Neil D, Pfeiffer M, Liu SC (2016) Phased lstm: accelerating recurrent network training for long or eventbased sequences. In: Advances in neural information processing systems, pp 3882–3890Google Scholar
 23.Schebesch KB, Stecking R (2005) Support vector machines for classifying and describing credit applicants: detecting typical and critical regions. J Oper Res Soc 56(9):1082–1088CrossRefGoogle Scholar
 24.Sousa MR, Gama J, Brandão E (2016) A new dynamic modeling framework for credit risk assessment. Expert Syst Appl 45:341–351CrossRefGoogle Scholar
 25.Tsai CF, Chen ML (2010) Credit rating by hybrid machine learning techniques. Appl Soft Comput 10(2):374–380CrossRefGoogle Scholar
 26.Tsai CF, Wu JW (2008) Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst Appl 34(4):2639–2649CrossRefGoogle Scholar
 27.Twala B (2010) Multiple classifier application to credit risk assessment. Expert Syst Appl 37(4):3326–3336CrossRefGoogle Scholar
 28.Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior AW, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. In: SSW, p 125Google Scholar
 29.Verikas A, Gelzinis A, Bacauskiene M (2011) Mining data with random forests: A survey and results of new tests. Pattern Recognition 44(2):330–349CrossRefGoogle Scholar
 30.Wang G, Hao J, Ma J, Jiang H (2011) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38(1):223–230CrossRefGoogle Scholar
 31.West D (2000) Neural network credit scoring models. Comput Oper Res 27(11):1131–1152CrossRefGoogle Scholar
 32.Yap BW, Ong SH, Husain NHM (2011) Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Syst Appl 38(10):13274–13283CrossRefGoogle Scholar
 33.Yeh IC, Lien Ch (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2):2473–2480CrossRefGoogle Scholar
 34.Yu L, Wang S, Lai KK (2008) Credit risk assessment with a multistage neural network ensemble learning approach. Expert Syst Appl 34(2):1434–1444CrossRefGoogle Scholar
 35.Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint arXiv:1409.2329
 36.Zhang L, Aggarwal C, Qi GJ (2017) Stock price prediction via discovering multifrequency trading patterns. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 2141–2149. ACMGoogle Scholar
 37.Zhao Y, Huang Y, Shen Y (2018) \(dmdp^2\): A dynamic multisource based default probability prediction framework. In: AsiaPacific Web (APWeb) and Webage information management (WAIM) Joint international conference on web and big data, pp 312–326. SpringerGoogle Scholar
 38.Zhao Z, Xu S, Kang BH, Kabir MMJ, Liu Y, Wasinger R (2015) Investigation and improvement of multilayer perceptron neural networks for credit scoring. Expert Syst Appl 42(7):3508–3516CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.