Abstract
In view of the real-world problem of falsity and errors credit data, and the performance degradation of the credit evaluation model caused by these problems, we proposed an outlier detection algorithm, which considered two characteristics of class-imbalance and cost-sensitive in credit data. We use an anomaly detection model called EIF to optimize the credit evaluation models. EIF uses the EasyEnsemble algorithm to construct balanced data sets, and train an Isolation Forest model for anomaly detection by the balanced datasets with different disturbances. On the one hand, the balanced dataset ensures that the class-imbalance problem is solved by undersampling, on the other hand, each sub-model learns from the overall minority class samples in order to solve the cost-sensitive problem. Experiments were performed on UCI German dataset, and the test set with fake data was constructed by correlation. Compared with other anomaly detection algorithms in common credit evaluation models, the EIF-optimized model has a higher F1 score and a lower cost-sensitive error rate. In conclusion, the EIF model is effective in enhancing the performance of the credit evaluation model for forged credit datasets.
Similar content being viewed by others
References
Ariza-Garzón MJ, Arroyo J, Caparrini A, Segovia-Vargas MJ (2020) Explainability of a machine learning granting scoring model in peer-to-peer lending. Ieee Access 8:64873–64890
Vojtek M, Koèenda E (2006) Credit-scoring methods. Czech Journal of Economics and Finance (Finance a uver) 56(3–4):152–167
Uddin MS, Chi G, Al Janabi M et al (2020) Leveraging random forest in micro-enterprises credit risk modelling for accuracy and interpretability. Int J Financ Econ 1(2):1–17
Chen QW, Wang W et al (2018) Class-imbalance credit scoring using Ext-GBDT ensemble. Application Research of Computers 35(2):421–427
Jabeur SB, Sadaaoui A, Sghaier A et al (2020) Machine learning models and cost-sensitive decision trees for bond rating prediction. Journal of the Operational Research Society 71(8):1161–1179
Itoo F, Singh S (2021) Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol 13(4):1503–1511
Ye XF, Lu YH (2018) Credit assessment model based on Random Forest and Naive Bayes. J Mathematics in Practice and theory 47:68–73
Yu L, Yao X, Wang SY et al (2011) Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection. Expert Syst Appl 38(12):15392–15399
Liu Y, Yang K (2021) Credit fraud detection for extremely imbalanced data based on ensembled deep Learning. Journal of Computer Research and Development 58(3):539
Horak J, Vrbka J, Suler P (2020) Support vector machine methods and artificial neural networks used for the development of bankruptcy prediction models and their comparison. Journal of Risk and Financial Management 13(3):60
Le HH, Viviani JL (2018) Predicting bank failure: an improvement by implementing a machine-learning approach to classical financial ratios. Res Int Bus Financ 44:16–25
Ren JD, Liu XQ et al (2019) An multi-level intrusion detection method based on KNN outlier detection and random forests. Journal of Computer Research and Development 56(3):566
Breunig MM, Kriegel HP, Ng RT et al (2000) LOF: Identifying density-based local outliers. ACM SIGMOD Rec 29(2):93–104
Yang J, Rahardja S, Fränti P (2021) Mean-shift outlier detection and filtering. Pattern Recogn 115:107874
Campos GO, Zimek A, Sander J et al (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4):891–927
Erfani SM, Rajasegarar S, Karunasekera S et al (2016) High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn 58:121–134
Liu F, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1):1–39
Bandaragoda TR, Ting KM, Albrecht D et al (2018) Isolation-based anomaly detection using nearest-neighbor ensembles. Comput Intell 34(4):968–998
Fernández A, Garcia S, Herrera F et al (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research 61:863–905
X Liu, J Wu, Z Zhou (2008) Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2):539–550
Frumosu FD, Khan AR, Schiøler H et al (2020) Cost-sensitive learning classification strategy for predicting product failures. Expert Syst Appl 161:113653
Funding
This research was funded by National Key R&D Program of China (Grant No. 2019YFB1404602).
Author information
Authors and Affiliations
Contributions
Zhang Xiaodong: Final manuscript writing and checking; Yao Yuan: Original draft preparation; Lv CongDong: Manuscript format; Wang Tao: measurements and data base, visualization.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection: New Intelligent Manufacturing Technologies through the Integration of Industry 4.0 and Advanced Manufacturing
Rights and permissions
About this article
Cite this article
Zhang, X., Yao, Y., Lv, C. et al. Anomaly credit data detection based on enhanced Isolation Forest. Int J Adv Manuf Technol 122, 185–192 (2022). https://doi.org/10.1007/s00170-022-09251-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00170-022-09251-8