Fraud detection for job placement using hierarchical clusters-based deep neural networks

  • Jeongrae Kim
  • Han-Joon KimEmail author
  • Hyoungrae Kim


Fraud detection is becoming an integral part of business intelligence, as detecting fraud in the work processes of a company is of great value. Fraud is an inhibitory factor to accurate appraisal in the evaluation of an enterprise, and it is economically a loss factor to business. Previous studies for fraud detection have limited the performance enhancement because they have learned the fraud pattern of the whole data. This paper proposes a novel method using hierarchical clusters based on deep neural networks in order to detect more detailed frauds, as well as frauds of whole data in the work processes of job placement. The proposed method, Hierarchical Clusters-based Deep Neural Networks (HC-DNN) utilizes anomaly characteristics of hierarchical clusters pre-trained through an autoencoder as the initial weights of deep neural networks to detect various frauds. HC-DNN has the advantage of improving the performance and providing the explanation about the relationship of fraud types. As a result of evaluating the performance of fraud detection by cross validation, the results of the proposed method show higher performance than those of conventional methods. And from the viewpoint of explainable deep learning the hierarchical cluster structure constructed through HC-DNN can represent the relationship of fraud types.


Fraud detection Deep neural networks Hierarchical cluster structure Autoencoder Explainable deep learning Job placement 



This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1A02086148), and was also supported by the MSIT (Ministry of Science and ICT), Korea under the ITRC (Information Technology Research Center) support program (IITP-2018-08-01417) supervised by the IITP (Institute for Information & communications Technology Promotion).


  1. 1.
    Andrews MJ, Bradley S, Stott D, Upward R (2008) Successful Employer Search? An Empirical analysis of vacancy duration using micro data. Economica 75(299):455–480Google Scholar
  2. 2.
    Jacobi L, Kluve J (2006) Before and after the Hartz reforms: The performance of active labour market policy in Germany. Institute for the Study of Labor 40(1):45–64Google Scholar
  3. 3.
    Perry A (2000) Performance indicators: measure for measure or a comedy of errors?. In: Proceedings of Further Education Development Agency Research Conference, pp 57–76Google Scholar
  4. 4.
    Singh H, Singh BP (2013) Business Intelligence: Effective machine learning for business administration. International Journal of IT. International Journal of IT, Engineering and Applied Sciences Research (IJIEASR) 2(1):13–19Google Scholar
  5. 5.
    Vidros S, Kolias C, Kambourakis G, Akoglu L (2017) Automatic detection of online recruitment frauds: Characteristics, methods, and a public dataset. Future Internet 9(1):6Google Scholar
  6. 6.
    Jans M, Lybaert N, Vanhoof K (2010) A framework for internal fraud risk reduction at IT integrating business processes: the IFR2 framework. Int. J. Digit. Account. Res. 9:1–29Google Scholar
  7. 7.
    Schreyer M, Sattarov T, Borth D, Dengel A, Reimer B (2017) Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks. arXiv preprint arXiv:1709.05254 (last visited on 2112 2018)Google Scholar
  8. 8.
    Bolton RJ, Hand DJ (2002) Statistical fraud detection: A review. Stat. Sci. 17(3):235–255MathSciNetzbMATHGoogle Scholar
  9. 9.
    Nolle T, Luettgen S, Seeliger A, Mühlhäuser M (2018) Analyzing business process anomalies using autoencoders. Mach. Learn. (last visited on 2112 2018)
  10. 10.
    Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: A comparative study. Decis. Support. Syst. 50(3):602–613Google Scholar
  11. 11.
    Benmessahel I, Xie K, Chellal M (2018) A new evolutionary neural networks based on intrusion detection systems using multiverse optimization. Appl. Intell. 48(8):2315–2327Google Scholar
  12. 12.
    Chakraborty S, Gupta S, Ray A, Mukhopadhyay A (2008) Data-driven fault detection and estimation in thermal pulse combustors. J. Aerosp. Eng. 222(8):1097–1108Google Scholar
  13. 13.
    Zaher A, McArthur SDJ, Infield DG, Patel Y (2009) Online wind turbine fault detection through automated SCADA data analysis. Wind Energy 12(6):574–593Google Scholar
  14. 14.
    Ogbonnaya EA, Ugwu HU, Theophilus-Johnson K (2012) Gas Turbine Engine Anomaly Detection through Computer Simulation Technique of Statistical Correlation. IOSR Journal of Engineering 2(4):544–554Google Scholar
  15. 15.
    McKeever G (1999) Detecting, Prosecuting and punishing benefit fraud: The Social Security Administration (Fraud). Act 1997. The Modern Law Review 62(2):261–270Google Scholar
  16. 16.
    Correia I, Fournier F, Skarbovsky I (2015) The uncertain case of credit card fraud detection. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, pp 181–192Google Scholar
  17. 17.
    Navigli R (2009) Word sense disambiguation: A survey. ACM Comput. Surv. 41(2):1–69Google Scholar
  18. 18.
    Choi SP (2018) Extraction of protein–protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings. J. Inf. Sci. 44(1):60–73Google Scholar
  19. 19.
    Leon F, Floria SA, Bădică C (2017) Evaluating the effect of voting methods on ensemble-based classification. In: Proceedings of 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp 1–6Google Scholar
  20. 20.
    Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: A highly efficient gradient boosting decision tree. In: Proceedings of. Adv. Neural Inf. Proces. Syst.:3146–3154Google Scholar
  21. 21.
    Biau G, Scornet E (2016) A random forest guided tour. Test 25(2):197–227MathSciNetzbMATHGoogle Scholar
  22. 22.
    Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3):297–336zbMATHGoogle Scholar
  23. 23.
    Zhang F, Du B, Zhang L (2016) Scene classification via a gradient boosting random convolutional network framework. IEEE Trans. Geosci. Remote Sens. 54(3):1793–1802Google Scholar
  24. 24.
    Taieb SB, Hyndman RJ (2014) A gradient boosting approach to the Kaggle load forecasting competition. Int. J. Forecast. 30(2):382–394Google Scholar
  25. 25.
    Razzaghi T, Xanthopoulos P, Şeref O (2017) Constraint relaxation, cost-sensitive learning and bagging for imbalanced classification problems with outliers. Optim. Lett. 11(5):915–928MathSciNetzbMATHGoogle Scholar
  26. 26.
    Belgiu M, Drăguţ L (2016) Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 114:24–31Google Scholar
  27. 27.
    Kussul N, Lavreniuk M, Skakun S, Shelestov A (2017) Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci. Remote Sens. Lett. 14(5):778–782Google Scholar
  28. 28.
    Kou Y, Lu CT, Sirwongwattana S, Huang YP (2004) Survey of fraud detection techniques. In: Proceedings of 2004 IEEE international conference on Networking, sensing and control, pp 749–754Google Scholar
  29. 29.
    Goodfellow I, Bengio Y, Courville A (2016) Deep learning. Vol 1. MIT Press, Cambridge, pp 482–586zbMATHGoogle Scholar
  30. 30.
    Hinton GE, Salakhutdinov RR (2006) Reducing the Dimensionality of Data with Neural Networks. Science 313(5786):504–507MathSciNetzbMATHGoogle Scholar
  31. 31.
    Maltarollo VG, Honório KM, da Silva ABF (2013) Applications of artificial neural networks in chemical problems. In: Proceedings of Artificial neural networks-architectures and applications, pp 203–223Google Scholar
  32. 32.
    Hershey S, Chaudhuri S, Ellis DPW, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B, Slaney M, Weiss RJ, Wilson K (2017) CNN architectures for large-scale audio classification. In: Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 131–135Google Scholar
  33. 33.
    Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (last visited on 2112 2018)Google Scholar
  34. 34.
    Fu K, Cheng D, Tu Y, Zhang L (2016) Credit card fraud detection using convolutional neural networks. In: Proceedings of International Conference on Neural Information Processing, pp 483–490Google Scholar
  35. 35.
    Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40(4):834–848Google Scholar
  36. 36.
    Babaee M, Dinh DT, Rigoll G (2018) A deep convolutional neural network for video sequence background subtraction. Pattern Recogn. 76:635–649Google Scholar
  37. 37.
    Yang HF, Lin K, Chen CS (2018) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(2):437–451Google Scholar
  38. 38.
    Jiang C, Song J, Liu G, Zheng L, Luan W (2018) Credit Card Fraud Detection: A Novel Approach Using Aggregation Strategy and Feedback Mechanism. IEEE Internet Things J. 5(5):3637–3647Google Scholar
  39. 39.
    Duman E, Elikucuk I (2013) Solving credit card fraud detection problem by the new metaheuristics migrating birds optimization. In: Proceedings of International Work-Conference on Artificial Neural Networks, pp 62–71Google Scholar
  40. 40.
    Akhilomen J (2013) Data mining application for cyber credit-card fraud detection system. In: Proceeding of Industrial Conference on Data Mining, pp 218–228Google Scholar
  41. 41.
    Ki Y, Yoon JW (2017) PD-FDS: Purchase Density based Online Credit Card Fraud Detection System. In: Proceedings of KDD 2017 Workshop on Anomaly Detection in Finance, pp 76–84Google Scholar
  42. 42.
    Wheeler R, Aitken S (2000) Multiple algorithms for fraud detection. Knowl.-Based Syst. 13(2–3):93–99Google Scholar
  43. 43.
    Kültür Y, Çağlayan MU (2017) Hybrid approaches for detecting credit card fraud. Expert. Syst. 34(2). (last visited on 2112 2018)
  44. 44.
    Xu W, Wang S, Zhang D, Yang B (2011) Random rough subspace based neural network ensemble for insurance fraud detection. In: Proceedings of International Joint Conference on Computational Sciences and Optimization (CSO), pp 1276–1280Google Scholar
  45. 45.
    Wang Y, Xu W (2018) Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decis. Support. Syst. 105:87–95Google Scholar
  46. 46.
    Bolton RJ, Hand DJ (2001) Unsupervised profiling methods for fraud detection. In: Proceedings of Credit Scoring and Credit Control VII, pp 235–255Google Scholar
  47. 47.
    Anandakrishnan A, Kumar S, Statnikov A, Faruquie T, Xu D (2017) Anomaly Detection in Finance: Editors’ Introduction. In: Proceedings of Machine Learning Research, pp 1–7Google Scholar
  48. 48.
    Jiang F, Chen YM (2015) Outlier detection based on granular computing and rough set theory. Appl. Intell. 42(2):303–322Google Scholar
  49. 49.
    Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of International Conference on Data Warehousing and Knowledge Discovery, pp 170–180Google Scholar
  50. 50.
    Williams G, Baxter R, He H, Hawkins S, Gu L (2002) A comparative study of RNN for outlier detection in data mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp 709–712Google Scholar
  51. 51.
    Cozzolino D, Verdoliva L. (2016) Single-image splicing localization through autoencoder-based anomaly detection. In: Proceedings of 2016 IEEE International Workshop on Information Forensics and Security (WIFS), pp 1–6Google Scholar
  52. 52.
    Agarwal B, Mittal N (2012) Hybrid Approach for Detection of Anomaly Network Traffic using Data Mining Techniques. Procedia Technology 6:996–1003Google Scholar
  53. 53.
    Andrews JT, Morton EJ, Griffin LD (2016) Detecting anomalous data using auto-encoders. International Journal of Machine Learning and Computing 6(1):1–21Google Scholar
  54. 54.
    Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. arXiv preprint arXiv:1605.07717 (last visited on 2112 2018)Google Scholar
  55. 55.
    Mao W, He J, Li Y, Yan Y (2017) Bearing fault diagnosis with auto-encoder extreme learning machine: A comparative study. J. Mech. Eng. Sci. 231(8):1560–1578Google Scholar
  56. 56.
    Lin S, Brown DE (2006) An outlier-based data association method for linking criminal incidents. Decis. Support. Syst. 41(3):604–615Google Scholar
  57. 57.
    He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034Google Scholar
  58. 58.
    Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11:625–660MathSciNetzbMATHGoogle Scholar
  59. 59.
    Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Proceedings of Advances in Neural Information Processing Systems, pp 153–160Google Scholar
  60. 60.
    Hinton GE, Osindero S, Teh YW (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 18(7):1527–1554MathSciNetzbMATHGoogle Scholar
  61. 61.
    Gao S, Zhang Y, Jia K, Lu J, Zhang Y (2015) Single sample face recognition via learning deep supervised autoencoders. IEEE Transactions on Information Forensics and Security 10(10):2108–2118Google Scholar
  62. 62.
    Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 151–161Google Scholar
  63. 63.
    Pollack JB (1990) Recursive distributed representations. Artif. Intell. 46(1–2):77–105Google Scholar
  64. 64.
    Voegtlin T, Dominey PF (2005) Linear recursive distributed representations. Neural Netw. 18(7):878–895zbMATHGoogle Scholar
  65. 65.
    Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach. Learn. 7(2-3):195–225Google Scholar
  66. 66.
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, pp 2672–2680Google Scholar
  67. 67.
    Liang D, Krishnan RG, Hoffman MD, Jebara T (2018) Variational Autoencoders for Collaborative Filtering. arXiv preprint arXiv:1802.05814 (last visited on 2112 2018)Google Scholar
  68. 68.
    Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, pp 1096–1103Google Scholar
  69. 69.
    Deng J, Zhang Z, Marchi E, Schuller B (2013) Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Proceedings of 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp 511–516Google Scholar
  70. 70.
    Wu Y, DuBois C, Zheng AX, Ester M (2016) Collaborative denoising auto-encoders for top-n recommender systems. In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining, pp 153–162Google Scholar
  71. 71.
    Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11:3371–3408MathSciNetzbMATHGoogle Scholar
  72. 72.
    Das K, Schneider J (2007) Detecting anomalous records in categorical datasets. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 220–229Google Scholar
  73. 73.
    Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data, pp 93–104Google Scholar
  74. 74.
    Kim H, Chan P (2008) Learning Implicit User Interest Hierarchy for Context in Personalization. Appl. Intell. 28(2):153–166Google Scholar
  75. 75.
    Takezawa K (2005) Introduction to nonparametric regression, vol 606. John Wiley & Sons, Hoboken, pp 325–406Google Scholar
  76. 76.
    Carlsson G, Mémoli F, Ribeiro A, Segarra S (2013) Axiomatic construction of hierarchical clustering in asymmetric networks. In: Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 5219–5223Google Scholar
  77. 77.
    Bengio Y, Yao L, Alain G, Vincent P (2013) Generalized denoising auto-encoders as generative models. In: Proceedings of Advances in Neural Information Processing Systems, pp 899–907Google Scholar
  78. 78.
    Salakhutdinov R, Hinton G (2007) Learning a nonlinear embedding by preserving class neighbourhood structure. In: Proceedings of Artificial Intelligence and Statistics, pp 412–419Google Scholar
  79. 79.
    Shirin G (2017) Autoencoders and anomaly detection with machine learning in fraud analytics. Shirin's palygRound, (last visited on 2112 2018)
  80. 80.
    Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison Wesley, Boston, pp 485–664Google Scholar
  81. 81.
    Kodinariya TM, Makwana PR (2013) Review on determining number of Cluster in K-Means Clustering. Int. J. 1(6):90–95Google Scholar
  82. 82.
    Friedman JH (2002) Stochastic gradient boosting. Computational Statistics & Data Analysis 38(4):367–378MathSciNetzbMATHGoogle Scholar
  83. 83.
    Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5):1189–1232MathSciNetzbMATHGoogle Scholar
  84. 84.
    He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9):1263–1284Google Scholar
  85. 85.
    Agarwal S, Dugar D, Sengupta S (2010) Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach. J. Chem. Inf. Model. 50(5):716–731Google Scholar
  86. 86.
    Rodriguez M, Posse C, Zhang E (2012) Multiple objective optimization in recommender systems. In: Proceedings of the 6th ACM conference on Recommender systems, pp 11–18Google Scholar
  87. 87.
    Christopher DM, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge, pp 145–169zbMATHGoogle Scholar
  88. 88.
    Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics, pp 249–256Google Scholar
  89. 89.
    Zeiler MD (2012) ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (last visited on 2112 2018)Google Scholar
  90. 90.
    Gunn SR (1998) Support vector machines for classification and regression. ISIS Technical Report 14(1):5–16Google Scholar
  91. 91.
    Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3):37–52Google Scholar
  92. 92.
    Murtagh F, Pierre L (2014) Ward’s hierarchical agglomerative clustering method: Which algorithms implement Ward’s criterion? J. Classif. 31(3):274–295MathSciNetzbMATHGoogle Scholar
  93. 93.
    Defays D (1977) An efficient algorithm for a complete link method. Comput. J. 20(4):364–366MathSciNetzbMATHGoogle Scholar
  94. 94.
    Sipser M (2006) Introduction to the Theory of Computation. Thomson Course Technology, pp 245–411Google Scholar
  95. 95.
    Shindler M, Wong A, Meyerson AW (2011) Fast and accurate k-means for large datasets. In: Proceedings of Advances in neural information processing systems, pp 2375–2383Google Scholar
  96. 96.
    Dhillon IS, Parlett BN (2003) Orthogonal eigenvectors and relative gaps. SIAM Journal on Matrix Analysis and Applications 25(3):858–899MathSciNetzbMATHGoogle Scholar
  97. 97.
    Nguyen TD, Schmidt B, Kwoh CK (2014) SparseHC: a memory-efficient online hierarchical clustering algorithm. Procedia Computer Science 29:8–19Google Scholar
  98. 98.
    Kim H, Jang C, Yadav DK, Kim MH (2017) The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix. Journal of Cheminformatics 9(1):1–21Google Scholar
  99. 99.
    Langfelder P, Zhang B, Horvath S (2007) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Electrical and Computer EngineeringUniversity of SeoulSeoulRepublic of Korea
  2. 2.KEISEumseongRepublic of Korea

Personalised recommendations