Abstract
Imbalanced learning, an important learning technique to cope with learning cases of one class outnumbering another, has caught many interests in the research community. A newly developed physical-inspired classification method, i.e., the data gravitation-based classification (DGC) model, performs well in many general classification problems. However, like other general classifiers, the performance of DGC suffers for imbalanced tasks. Therefore, we develop a data level imbalanced learning DGC model namely SMOTE-DGC in this paper. An over sampling technique, Synthetic Minority Over-sampling Technique (SMOTE), is integrated with DGC model to improve the imbalanced learning performances. A total of 44 imbalanced classification data sets, several standard and imbalanced learning algorithms are used to evaluate the performance of the proposal. Experimental results suggest that the adapted DGC model is effective for imbalanced problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, H., Lu, G., Qassrawi, M.T., Zhang, Y., Yu, X.: Feature selection for optimizing traffic classification. Comput. Commun. 35, 1457–1471 (2012)
Peng, L., Zhang, H., Yang, B., Chen, Y., Qassrawi, M.T., Lu, G.: Traffic identification using flexible neural trees. In: Proceeding of the 18th International Workshop of QoS (IWQoS 2012), pp. 1–5 (2012)
Mazurowski, M.A., Habas, P.A., Zurada, J.M., et al.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. Off. J. Int. Neural Netw. Soc. 21, 427 (2008)
Dheepa, V., Dhanapal, R., Manjunath, G.: Fraud detection in imbalanced datasets using cost based learning. Eur. J. Sci. Res. 91, 486–490 (2012)
Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012)
Chairi, I., Alaoui, S., Lyhyaoui, A.: Intrusion detection based sample selection for imbalanced data distribution. In: Proceeding of 2012 Second International Conference on Innovative Computing Technology (INTECH), pp. 259–264 (2012)
Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Online neural network model for non-stationary and imbalanced data stream classification. Int. J. Mach. Learn. Cybern. 5(1), 51–62 (2014)
Yu, H., Ni, J., Zhao, J.: ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 101, 309–318 (2013)
Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 204–213 (2001)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. J. 6, 429–450 (2002)
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. 6, 7–19 (2004)
García, V., Mollineda, R., Sánchez, J.S.: On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2008)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., et al.: An empirical study of the classification performance of learners on imbalanced and noisy software quality data. In: Proceedings of IEEE International Conference on Information Reuse and Integration, pp. 651–658 (2007)
Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)
Chawla, N.V., Bowyer, K., Hall, L., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. 6, 20–29 (2004)
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th IEEE International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 973–978 (2001)
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14, 659–665 (2002)
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)
Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18, 63–77 (2006)
Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 164–168 (1998)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class imbalance learning, IEEE Trans. Syst. Man Cybern. Part B Cybern. 39, 539–550 (2009)
Chawla, N.V., Lazarevic, A., Hall, L.O., et al.: SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of the 7th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119 (2003)
Peng, L., Yang, B., Chen, Y., Abraham, A.: Data gravitation based classification. Inf. Sci. 179, 809–819 (2009)
Simić, D., Tanackov, I., Gajić, V., Simić, S.: Financial forecasting of invoicing and cash inflow processes for fair exhibitions. In: Corchado, E., Wu, X., Oja, E., Herrero, A., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 686–693. Springer, Heidelberg (2009)
Cano, A., Zafra, S., Ventura, S.: Weighted data gravitation classification for standard and imbalanced data. IEEE Trans. Cybern. 43(6), 1672–1687 (2013)
Parsazad, S., Yazdi, H.S., Effati, S.: Gravitation based classification. Inf. Sci. 220, 319–330 (2013)
Wen, G., Wei, J., Wang, J., et al.: Cognitive gravitation model for classification on small noisy data. Neurocomputing 118, 245–252 (2013)
Reyes, O., Morell, C., Ventura, S.: Effective lazy learning algorithm based on a data gravitation model for multi-label learning. Inf. Sci. 340–341, 159–174 (2016)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005)
Acknowledgment
This research was partially supported by National Natural Science Foundation of China under grant No. 61472164, No. 61373054, and No. 61203105, the Provincial Natural Science Foundation of Shandong under grant No. ZR2012FM010, and No. ZR2015JL025.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Peng, L., Zhang, H., Yang, B., Chen, Y., Zhou, X. (2016). SMOTE-DGC: An Imbalanced Learning Approach of Data Gravitation Based Classification. In: Huang, DS., Jo, KH. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9772. Springer, Cham. https://doi.org/10.1007/978-3-319-42294-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-42294-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42293-0
Online ISBN: 978-3-319-42294-7
eBook Packages: Computer ScienceComputer Science (R0)