Skip to main content

SMOTE-DGC: An Imbalanced Learning Approach of Data Gravitation Based Classification

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9772))

Included in the following conference series:

Abstract

Imbalanced learning, an important learning technique to cope with learning cases of one class outnumbering another, has caught many interests in the research community. A newly developed physical-inspired classification method, i.e., the data gravitation-based classification (DGC) model, performs well in many general classification problems. However, like other general classifiers, the performance of DGC suffers for imbalanced tasks. Therefore, we develop a data level imbalanced learning DGC model namely SMOTE-DGC in this paper. An over sampling technique, Synthetic Minority Over-sampling Technique (SMOTE), is integrated with DGC model to improve the imbalanced learning performances. A total of 44 imbalanced classification data sets, several standard and imbalanced learning algorithms are used to evaluate the performance of the proposal. Experimental results suggest that the adapted DGC model is effective for imbalanced problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, H., Lu, G., Qassrawi, M.T., Zhang, Y., Yu, X.: Feature selection for optimizing traffic classification. Comput. Commun. 35, 1457–1471 (2012)

    Article  Google Scholar 

  2. Peng, L., Zhang, H., Yang, B., Chen, Y., Qassrawi, M.T., Lu, G.: Traffic identification using flexible neural trees. In: Proceeding of the 18th International Workshop of QoS (IWQoS 2012), pp. 1–5 (2012)

    Google Scholar 

  3. Mazurowski, M.A., Habas, P.A., Zurada, J.M., et al.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. Off. J. Int. Neural Netw. Soc. 21, 427 (2008)

    Article  Google Scholar 

  4. Dheepa, V., Dhanapal, R., Manjunath, G.: Fraud detection in imbalanced datasets using cost based learning. Eur. J. Sci. Res. 91, 486–490 (2012)

    Google Scholar 

  5. Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012)

    Article  Google Scholar 

  6. Chairi, I., Alaoui, S., Lyhyaoui, A.: Intrusion detection based sample selection for imbalanced data distribution. In: Proceeding of 2012 Second International Conference on Innovative Computing Technology (INTECH), pp. 259–264 (2012)

    Google Scholar 

  7. Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Online neural network model for non-stationary and imbalanced data stream classification. Int. J. Mach. Learn. Cybern. 5(1), 51–62 (2014)

    Article  Google Scholar 

  8. Yu, H., Ni, J., Zhao, J.: ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 101, 309–318 (2013)

    Article  Google Scholar 

  9. Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 204–213 (2001)

    Google Scholar 

  10. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. J. 6, 429–450 (2002)

    MATH  Google Scholar 

  11. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. 6, 7–19 (2004)

    Article  Google Scholar 

  12. García, V., Mollineda, R., Sánchez, J.S.: On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2008)

    Article  MathSciNet  Google Scholar 

  13. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., et al.: An empirical study of the classification performance of learners on imbalanced and noisy software quality data. In: Proceedings of IEEE International Conference on Information Reuse and Integration, pp. 651–658 (2007)

    Google Scholar 

  14. Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  15. Chawla, N.V., Bowyer, K., Hall, L., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  16. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. 6, 20–29 (2004)

    Article  Google Scholar 

  17. Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th IEEE International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 973–978 (2001)

    Google Scholar 

  18. Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14, 659–665 (2002)

    Article  Google Scholar 

  19. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)

    Google Scholar 

  20. Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18, 63–77 (2006)

    Article  Google Scholar 

  21. Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 164–168 (1998)

    Google Scholar 

  22. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class imbalance learning, IEEE Trans. Syst. Man Cybern. Part B Cybern. 39, 539–550 (2009)

    Article  Google Scholar 

  23. Chawla, N.V., Lazarevic, A., Hall, L.O., et al.: SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of the 7th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119 (2003)

    Google Scholar 

  24. Peng, L., Yang, B., Chen, Y., Abraham, A.: Data gravitation based classification. Inf. Sci. 179, 809–819 (2009)

    Article  MATH  Google Scholar 

  25. Simić, D., Tanackov, I., Gajić, V., Simić, S.: Financial forecasting of invoicing and cash inflow processes for fair exhibitions. In: Corchado, E., Wu, X., Oja, E., Herrero, A., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 686–693. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  26. Cano, A., Zafra, S., Ventura, S.: Weighted data gravitation classification for standard and imbalanced data. IEEE Trans. Cybern. 43(6), 1672–1687 (2013)

    Article  Google Scholar 

  27. Parsazad, S., Yazdi, H.S., Effati, S.: Gravitation based classification. Inf. Sci. 220, 319–330 (2013)

    Article  MathSciNet  Google Scholar 

  28. Wen, G., Wei, J., Wang, J., et al.: Cognitive gravitation model for classification on small noisy data. Neurocomputing 118, 245–252 (2013)

    Article  Google Scholar 

  29. Reyes, O., Morell, C., Ventura, S.: Effective lazy learning algorithm based on a data gravitation model for multi-label learning. Inf. Sci. 340–341, 159–174 (2016)

    Article  Google Scholar 

  30. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)

    Article  Google Scholar 

  31. Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005)

    Article  Google Scholar 

Download references

Acknowledgment

This research was partially supported by National Natural Science Foundation of China under grant No. 61472164, No. 61373054, and No. 61203105, the Provincial Natural Science Foundation of Shandong under grant No. ZR2012FM010, and No. ZR2015JL025.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Peng, L., Zhang, H., Yang, B., Chen, Y., Zhou, X. (2016). SMOTE-DGC: An Imbalanced Learning Approach of Data Gravitation Based Classification. In: Huang, DS., Jo, KH. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9772. Springer, Cham. https://doi.org/10.1007/978-3-319-42294-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42294-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42293-0

  • Online ISBN: 978-3-319-42294-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics