SMOTE-DGC: An Imbalanced Learning Approach of Data Gravitation Based Classification

Peng, Lizhi; Zhang, Haibo; Yang, Bo; Chen, Yuehui; Zhou, Xiaoqing

doi:10.1007/978-3-319-42294-7_11

Lizhi Peng¹⁵,
Haibo Zhang¹⁶,
Bo Yang¹⁵,
Yuehui Chen¹⁵ &
…
Xiaoqing Zhou¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9772))

Included in the following conference series:

International Conference on Intelligent Computing

1950 Accesses
3 Citations

Abstract

Imbalanced learning, an important learning technique to cope with learning cases of one class outnumbering another, has caught many interests in the research community. A newly developed physical-inspired classification method, i.e., the data gravitation-based classification (DGC) model, performs well in many general classification problems. However, like other general classifiers, the performance of DGC suffers for imbalanced tasks. Therefore, we develop a data level imbalanced learning DGC model namely SMOTE-DGC in this paper. An over sampling technique, Synthetic Minority Over-sampling Technique (SMOTE), is integrated with DGC model to improve the imbalanced learning performances. A total of 44 imbalanced classification data sets, several standard and imbalanced learning algorithms are used to evaluate the performance of the proposal. Experimental results suggest that the adapted DGC model is effective for imbalanced problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, H., Lu, G., Qassrawi, M.T., Zhang, Y., Yu, X.: Feature selection for optimizing traffic classification. Comput. Commun. 35, 1457–1471 (2012)
Article Google Scholar
Peng, L., Zhang, H., Yang, B., Chen, Y., Qassrawi, M.T., Lu, G.: Traffic identification using flexible neural trees. In: Proceeding of the 18th International Workshop of QoS (IWQoS 2012), pp. 1–5 (2012)
Google Scholar
Mazurowski, M.A., Habas, P.A., Zurada, J.M., et al.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. Off. J. Int. Neural Netw. Soc. 21, 427 (2008)
Article Google Scholar
Dheepa, V., Dhanapal, R., Manjunath, G.: Fraud detection in imbalanced datasets using cost based learning. Eur. J. Sci. Res. 91, 486–490 (2012)
Google Scholar
Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012)
Article Google Scholar
Chairi, I., Alaoui, S., Lyhyaoui, A.: Intrusion detection based sample selection for imbalanced data distribution. In: Proceeding of 2012 Second International Conference on Innovative Computing Technology (INTECH), pp. 259–264 (2012)
Google Scholar
Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Online neural network model for non-stationary and imbalanced data stream classification. Int. J. Mach. Learn. Cybern. 5(1), 51–62 (2014)
Article Google Scholar
Yu, H., Ni, J., Zhao, J.: ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 101, 309–318 (2013)
Article Google Scholar
Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 204–213 (2001)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. J. 6, 429–450 (2002)
MATH Google Scholar
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. 6, 7–19 (2004)
Article Google Scholar
García, V., Mollineda, R., Sánchez, J.S.: On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2008)
Article MathSciNet Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., et al.: An empirical study of the classification performance of learners on imbalanced and noisy software quality data. In: Proceedings of IEEE International Conference on Information Reuse and Integration, pp. 651–658 (2007)
Google Scholar
Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)
Chapter Google Scholar
Chawla, N.V., Bowyer, K., Hall, L., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. 6, 20–29 (2004)
Article Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th IEEE International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 973–978 (2001)
Google Scholar
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14, 659–665 (2002)
Article Google Scholar
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)
Google Scholar
Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18, 63–77 (2006)
Article Google Scholar
Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 164–168 (1998)
Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class imbalance learning, IEEE Trans. Syst. Man Cybern. Part B Cybern. 39, 539–550 (2009)
Article Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., et al.: SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of the 7th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119 (2003)
Google Scholar
Peng, L., Yang, B., Chen, Y., Abraham, A.: Data gravitation based classification. Inf. Sci. 179, 809–819 (2009)
Article MATH Google Scholar
Simić, D., Tanackov, I., Gajić, V., Simić, S.: Financial forecasting of invoicing and cash inflow processes for fair exhibitions. In: Corchado, E., Wu, X., Oja, E., Herrero, A., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 686–693. Springer, Heidelberg (2009)
Chapter Google Scholar
Cano, A., Zafra, S., Ventura, S.: Weighted data gravitation classification for standard and imbalanced data. IEEE Trans. Cybern. 43(6), 1672–1687 (2013)
Article Google Scholar
Parsazad, S., Yazdi, H.S., Effati, S.: Gravitation based classification. Inf. Sci. 220, 319–330 (2013)
Article MathSciNet Google Scholar
Wen, G., Wei, J., Wang, J., et al.: Cognitive gravitation model for classification on small noisy data. Neurocomputing 118, 245–252 (2013)
Article Google Scholar
Reyes, O., Morell, C., Ventura, S.: Effective lazy learning algorithm based on a data gravitation model for multi-label learning. Inf. Sci. 340–341, 159–174 (2016)
Article Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)
Article Google Scholar
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005)
Article Google Scholar

Download references

Acknowledgment

This research was partially supported by National Natural Science Foundation of China under grant No. 61472164, No. 61373054, and No. 61203105, the Provincial Natural Science Foundation of Shandong under grant No. ZR2012FM010, and No. ZR2015JL025.

Author information

Authors and Affiliations

Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022, People’s Republic of China
Lizhi Peng, Bo Yang, Yuehui Chen & Xiaoqing Zhou
Department of Computer Science, University of Otago, Dunedin, New Zealand
Haibo Zhang

Authors

Lizhi Peng
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuehui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqing Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Yang .

Editor information

Editors and Affiliations

Tongji University , Shanghai, China
De-Shuang Huang
University of Ulsan , Ulsan, Korea (Republic of)
Kang-Hyun Jo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, L., Zhang, H., Yang, B., Chen, Y., Zhou, X. (2016). SMOTE-DGC: An Imbalanced Learning Approach of Data Gravitation Based Classification. In: Huang, DS., Jo, KH. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9772. Springer, Cham. https://doi.org/10.1007/978-3-319-42294-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-42294-7_11
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42293-0
Online ISBN: 978-3-319-42294-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics