Skip to main content
Log in

Anomaly credit data detection based on enhanced Isolation Forest

  • ORIGINAL ARTICLE
  • Published:
The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Abstract

In view of the real-world problem of falsity and errors credit data, and the performance degradation of the credit evaluation model caused by these problems, we proposed an outlier detection algorithm, which considered two characteristics of class-imbalance and cost-sensitive in credit data. We use an anomaly detection model called EIF to optimize the credit evaluation models. EIF uses the EasyEnsemble algorithm to construct balanced data sets, and train an Isolation Forest model for anomaly detection by the balanced datasets with different disturbances. On the one hand, the balanced dataset ensures that the class-imbalance problem is solved by undersampling, on the other hand, each sub-model learns from the overall minority class samples in order to solve the cost-sensitive problem. Experiments were performed on UCI German dataset, and the test set with fake data was constructed by correlation. Compared with other anomaly detection algorithms in common credit evaluation models, the EIF-optimized model has a higher F1 score and a lower cost-sensitive error rate. In conclusion, the EIF model is effective in enhancing the performance of the credit evaluation model for forged credit datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Ariza-Garzón MJ, Arroyo J, Caparrini A, Segovia-Vargas MJ (2020) Explainability of a machine learning granting scoring model in peer-to-peer lending. Ieee Access 8:64873–64890

    Article  Google Scholar 

  2. Vojtek M, Koèenda E (2006) Credit-scoring methods. Czech Journal of Economics and Finance (Finance a uver) 56(3–4):152–167

    Google Scholar 

  3. Uddin MS, Chi G, Al Janabi M et al (2020) Leveraging random forest in micro-enterprises credit risk modelling for accuracy and interpretability. Int J Financ Econ 1(2):1–17

    Google Scholar 

  4. Chen QW, Wang W et al (2018) Class-imbalance credit scoring using Ext-GBDT ensemble. Application Research of Computers 35(2):421–427

    Google Scholar 

  5. Jabeur SB, Sadaaoui A, Sghaier A et al (2020) Machine learning models and cost-sensitive decision trees for bond rating prediction. Journal of the Operational Research Society 71(8):1161–1179

    Article  Google Scholar 

  6. Itoo F, Singh S (2021) Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol 13(4):1503–1511

    Google Scholar 

  7. Ye XF, Lu YH (2018) Credit assessment model based on Random Forest and Naive Bayes. J Mathematics in Practice and theory 47:68–73

    Google Scholar 

  8. Yu L, Yao X, Wang SY et al (2011) Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection. Expert Syst Appl 38(12):15392–15399

    Article  Google Scholar 

  9. Liu Y, Yang K (2021) Credit fraud detection for extremely imbalanced data based on ensembled deep Learning. Journal of Computer Research and Development 58(3):539

    Google Scholar 

  10. Horak J, Vrbka J, Suler P (2020) Support vector machine methods and artificial neural networks used for the development of bankruptcy prediction models and their comparison. Journal of Risk and Financial Management 13(3):60

    Article  Google Scholar 

  11. Le HH, Viviani JL (2018) Predicting bank failure: an improvement by implementing a machine-learning approach to classical financial ratios. Res Int Bus Financ 44:16–25

    Article  Google Scholar 

  12. Ren JD, Liu XQ et al (2019) An multi-level intrusion detection method based on KNN outlier detection and random forests. Journal of Computer Research and Development 56(3):566

    Google Scholar 

  13. Breunig MM, Kriegel HP, Ng RT et al (2000) LOF: Identifying density-based local outliers. ACM SIGMOD Rec 29(2):93–104

    Article  Google Scholar 

  14. Yang J, Rahardja S, Fränti P (2021) Mean-shift outlier detection and filtering. Pattern Recogn 115:107874

    Article  Google Scholar 

  15. Campos GO, Zimek A, Sander J et al (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4):891–927

    Article  MathSciNet  Google Scholar 

  16. Erfani SM, Rajasegarar S, Karunasekera S et al (2016) High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn 58:121–134

    Article  Google Scholar 

  17. Liu F, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1):1–39

    Article  Google Scholar 

  18. Bandaragoda TR, Ting KM, Albrecht D et al (2018) Isolation-based anomaly detection using nearest-neighbor ensembles. Comput Intell 34(4):968–998

    Article  MathSciNet  Google Scholar 

  19. Fernández A, Garcia S, Herrera F et al (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research 61:863–905

    Article  MathSciNet  Google Scholar 

  20. X Liu, J Wu, Z Zhou (2008) Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2):539–550

  21. Frumosu FD, Khan AR, Schiøler H et al (2020) Cost-sensitive learning classification strategy for predicting product failures. Expert Syst Appl 161:113653

    Article  Google Scholar 

Download references

Funding

This research was funded by National Key R&D Program of China (Grant No. 2019YFB1404602).

Author information

Authors and Affiliations

Authors

Contributions

Zhang Xiaodong: Final manuscript writing and checking; Yao Yuan: Original draft preparation; Lv CongDong: Manuscript format; Wang Tao: measurements and data base, visualization.

Corresponding author

Correspondence to Yuan Yao.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection: New Intelligent Manufacturing Technologies through the Integration of Industry 4.0 and Advanced Manufacturing

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Yao, Y., Lv, C. et al. Anomaly credit data detection based on enhanced Isolation Forest. Int J Adv Manuf Technol 122, 185–192 (2022). https://doi.org/10.1007/s00170-022-09251-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00170-022-09251-8

Keywords

Navigation