Anomaly credit data detection based on enhanced Isolation Forest

Zhang, Xiaodong; Yao, Yuan; Lv, Congdong; Wang, Tao

doi:10.1007/s00170-022-09251-8

Anomaly credit data detection based on enhanced Isolation Forest

ORIGINAL ARTICLE
Published: 26 April 2022

Volume 122, pages 185–192, (2022)
Cite this article

The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Xiaodong Zhang¹,
Yuan Yao¹,
Congdong Lv¹ &
…
Tao Wang²

294 Accesses
2 Citations
Explore all metrics

Abstract

In view of the real-world problem of falsity and errors credit data, and the performance degradation of the credit evaluation model caused by these problems, we proposed an outlier detection algorithm, which considered two characteristics of class-imbalance and cost-sensitive in credit data. We use an anomaly detection model called EIF to optimize the credit evaluation models. EIF uses the EasyEnsemble algorithm to construct balanced data sets, and train an Isolation Forest model for anomaly detection by the balanced datasets with different disturbances. On the one hand, the balanced dataset ensures that the class-imbalance problem is solved by undersampling, on the other hand, each sub-model learns from the overall minority class samples in order to solve the cost-sensitive problem. Experiments were performed on UCI German dataset, and the test set with fake data was constructed by correlation. Compared with other anomaly detection algorithms in common credit evaluation models, the EIF-optimized model has a higher F1 score and a lower cost-sensitive error rate. In conclusion, the EIF model is effective in enhancing the performance of the credit evaluation model for forged credit datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Development of Novel Framework for Identifying Anomalies in High Volume of Data Using Robust Machine Learning Algorithm

Article 27 April 2024

Early Prediction of Credit Card Transaction Using Local Outlier Factor and Isolation Forest Tree Machine Learning Algorithms

Detection of Credit Card Fraud Using Isolation Forest Algorithm

References

Ariza-Garzón MJ, Arroyo J, Caparrini A, Segovia-Vargas MJ (2020) Explainability of a machine learning granting scoring model in peer-to-peer lending. Ieee Access 8:64873–64890
Article Google Scholar
Vojtek M, Koèenda E (2006) Credit-scoring methods. Czech Journal of Economics and Finance (Finance a uver) 56(3–4):152–167
Google Scholar
Uddin MS, Chi G, Al Janabi M et al (2020) Leveraging random forest in micro-enterprises credit risk modelling for accuracy and interpretability. Int J Financ Econ 1(2):1–17
Google Scholar
Chen QW, Wang W et al (2018) Class-imbalance credit scoring using Ext-GBDT ensemble. Application Research of Computers 35(2):421–427
Google Scholar
Jabeur SB, Sadaaoui A, Sghaier A et al (2020) Machine learning models and cost-sensitive decision trees for bond rating prediction. Journal of the Operational Research Society 71(8):1161–1179
Article Google Scholar
Itoo F, Singh S (2021) Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol 13(4):1503–1511
Google Scholar
Ye XF, Lu YH (2018) Credit assessment model based on Random Forest and Naive Bayes. J Mathematics in Practice and theory 47:68–73
Google Scholar
Yu L, Yao X, Wang SY et al (2011) Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection. Expert Syst Appl 38(12):15392–15399
Article Google Scholar
Liu Y, Yang K (2021) Credit fraud detection for extremely imbalanced data based on ensembled deep Learning. Journal of Computer Research and Development 58(3):539
Google Scholar
Horak J, Vrbka J, Suler P (2020) Support vector machine methods and artificial neural networks used for the development of bankruptcy prediction models and their comparison. Journal of Risk and Financial Management 13(3):60
Article Google Scholar
Le HH, Viviani JL (2018) Predicting bank failure: an improvement by implementing a machine-learning approach to classical financial ratios. Res Int Bus Financ 44:16–25
Article Google Scholar
Ren JD, Liu XQ et al (2019) An multi-level intrusion detection method based on KNN outlier detection and random forests. Journal of Computer Research and Development 56(3):566
Google Scholar
Breunig MM, Kriegel HP, Ng RT et al (2000) LOF: Identifying density-based local outliers. ACM SIGMOD Rec 29(2):93–104
Article Google Scholar
Yang J, Rahardja S, Fränti P (2021) Mean-shift outlier detection and filtering. Pattern Recogn 115:107874
Article Google Scholar
Campos GO, Zimek A, Sander J et al (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4):891–927
Article MathSciNet Google Scholar
Erfani SM, Rajasegarar S, Karunasekera S et al (2016) High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn 58:121–134
Article Google Scholar
Liu F, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1):1–39
Article Google Scholar
Bandaragoda TR, Ting KM, Albrecht D et al (2018) Isolation-based anomaly detection using nearest-neighbor ensembles. Comput Intell 34(4):968–998
Article MathSciNet Google Scholar
Fernández A, Garcia S, Herrera F et al (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research 61:863–905
Article MathSciNet Google Scholar
X Liu, J Wu, Z Zhou (2008) Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2):539–550
Frumosu FD, Khan AR, Schiøler H et al (2020) Cost-sensitive learning classification strategy for predicting product failures. Expert Syst Appl 161:113653
Article Google Scholar

Download references

Funding

This research was funded by National Key R&D Program of China (Grant No. 2019YFB1404602).

Author information

Authors and Affiliations

School of Information Engineering, Nanjing Audit University, Nanjing, 211815, China
Xiaodong Zhang, Yuan Yao & Congdong Lv
JUSFOUN BIG DATA, Beijing, 10000, China
Tao Wang

Authors

Xiaodong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Congdong Lv
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Zhang Xiaodong: Final manuscript writing and checking; Yao Yuan: Original draft preparation; Lv CongDong: Manuscript format; Wang Tao: measurements and data base, visualization.

Corresponding author

Correspondence to Yuan Yao.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection: New Intelligent Manufacturing Technologies through the Integration of Industry 4.0 and Advanced Manufacturing

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Yao, Y., Lv, C. et al. Anomaly credit data detection based on enhanced Isolation Forest. Int J Adv Manuf Technol 122, 185–192 (2022). https://doi.org/10.1007/s00170-022-09251-8

Download citation

Received: 22 February 2022
Accepted: 20 April 2022
Published: 26 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s00170-022-09251-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Anomaly credit data detection based on enhanced Isolation Forest

Abstract

Access this article

Similar content being viewed by others

Development of Novel Framework for Identifying Anomalies in High Volume of Data Using Robust Machine Learning Algorithm

Early Prediction of Credit Card Transaction Using Local Outlier Factor and Isolation Forest Tree Machine Learning Algorithms

Detection of Credit Card Fraud Using Isolation Forest Algorithm

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Anomaly credit data detection based on enhanced Isolation Forest

Abstract

Access this article

Similar content being viewed by others

Development of Novel Framework for Identifying Anomalies in High Volume of Data Using Robust Machine Learning Algorithm

Early Prediction of Credit Card Transaction Using Local Outlier Factor and Isolation Forest Tree Machine Learning Algorithms

Detection of Credit Card Fraud Using Isolation Forest Algorithm

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation