Abstract
Nowadays various types of devices provide abundant data, and many businesses want to pinpoint what they are interested in from the data such as target marketing, fraud transaction detection. However, current classification algorithms in data mining show a poor performance when classifying imbalanced data.
To enhance the classification performance of minority class in imbalanced datasets, we present an ensemble learning method using the combination of an UnderBagging, a majority vote, and a meta classifier giving higher decision priority to the classifier that predicts a class better, basing on Deep Neural Network (DNN) as a classifier. We tested our method with two imbalanced datasets from UCI Data Repository and compared the performance of our method with four other techniques. The result showed that our method provided an improved performance on classifying minority class instances compared to the other four techniques.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sikora, R., Rania, S.: Controlled under-sampling with majority voting ensemble learning for class imbalance problem. In: Proceedings of the IEEE Computing Conference, London, UK (2018)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6, 20–29 (2004)
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Levi, G., Hassncer, T.: Age and gender classification using convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 34–42 (2015)
Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)
Drummond, C., Holte, R.C.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, pp. 1–8. Citeseer (2003)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)
Kowalczyk, A., Raskutti, B.: One class SVM for yeast regulation prediction. SIGKDD Explor. Newsl. 4, 99–100 (2002)
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a case study. SIGKDD Explor. Newsl. 6, 60–69 (2004)
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM, New York (1999)
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence – Volume 2, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Pazzani, M.J., Merz, C.J., Murphy, P.M., Ali, K.M., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on International Conference on Machine Learning, pp. 217–225. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: 2009 Second International Workshop on Computer Science and Engineering. WCSE 2009, pp. 13–17. IEEE (2009)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining. CIDM 2009, pp. 324–331. IEEE (2009)
Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C., Kuncheva, L.I.: Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl.-Based Syst. 85, 96–111 (2015)
Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6, 245–256 (2003)
Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B.: New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur. J. Oper. Res. 218, 211–229 (2012)
Lessmann, S., Voß, S.: A reference model for customer-centric data mining with support vector machines. Eur. J. Oper. Res. 199, 520–530 (2009)
Ando, S.: Classifying imbalanced data in distance-based feature space. Knowl. Inf. Syst. 46, 707–730 (2016)
Wang, B.X., Japkowicz, N.: Boosting support vector machines for imbalanced data sets. Knowl. Inf. Syst. 25, 1–20 (2010)
Lane, P.C., Clarke, D., Hender, P.: On developing robust models for favourability analysis: model choice, feature sets and imbalanced data. Decis. Support Syst. 53, 712–718 (2012)
Alfaro, E., García, N., Gámez, M., Elizondo, D.: Bankruptcy forecasting: an empirical comparison of AdaBoost and neural networks. Decis. Support Syst. 45, 110–122 (2008)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436 (2015)
Guo, C., Berkhahn, F.: Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737 (2016)
De Brébisson, A., Simon, É., Auvolat, A., Vincent, P., Bengio, Y.: Artificial neural networks applied to taxi destination prediction. arXiv preprint arXiv:1508.00021 (2015)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42, 463–484 (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lee, Y.S. (2019). Ensemble Classification Method for Imbalanced Data Using Deep Learning. In: Xu, J., Zhu, B., Liu, X., Shaw, M., Zhang, H., Fan, M. (eds) The Ecosystem of e-Business: Technologies, Stakeholders, and Connections. WEB 2018. Lecture Notes in Business Information Processing, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-22784-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-22784-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22783-8
Online ISBN: 978-3-030-22784-5
eBook Packages: Computer ScienceComputer Science (R0)