Skip to main content

Improving SVM Classification with Imbalance Data Set

  • Conference paper
Neural Information Processing (ICONIP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5863))

Included in the following conference series:

Abstract

In view of inconsistent problems caused by that Synthetic Minority Over-sampling Technique (SMOTE) and Support Vector Machine (SVM) work in different space, this paper presents a kernel-based SMOTE approach to solve classification with imbalance data set by SVM. The method first preprocesses the data by oversampling the minority instances in the feature space, then the pre-images of the synthetic samples are found based on a distance relation between feature space and input space. Finally, these pre-images are appended to the original dataset to train a SVM. Experiments on real data set indicate that compared with SMOTE approach, the samples constructed by the proposed method have the higher quality. As a result, the effectiveness of classification by SVM on imbalance data set is improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  2. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)

    Google Scholar 

  3. Akbani, R., Kwek, S., Japkowicz, N.: Applying Support Vector Machines to Imbalance data set. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)

    Google Scholar 

  4. Yuan, J., Li, J., Zhang, B.: Learning concepts from large scale imbalanced data sets using support cluster machines. In: Proc. of the ACM Int’l Conf. on Multimedia, pp. 441–450 (2006)

    Google Scholar 

  5. Kang, P., Cho, S.: EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4232, pp. 837–846. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Li, P., Wang, X., Liu, Y., Wang, X.: A Classification Method for Imbalance Data Set Based on Hybrid Strategy. Chinese Journal of Electronics 35(11), 2161–2165 (2007)

    Google Scholar 

  7. Imam, T., Ting, K.M., Kamruzzaman, J.: z-SVM: An SVM for improved classification of imbal-anced data. In: Sattar, A., Kang, B.-h. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 264–273. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. (JAIR) 16, 321–357 (2002)

    MATH  Google Scholar 

  9. Liu, Y., An, A., Huang, X.: Boosting Prediction Accuracy on Imbalance data set with SVM Ensembles. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 107–118. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Kwok, J.T., Tsang, I.W.: The pre-image problem in kernel methods. IEEE Transactions on Neural Networks 15(6), 1517–1525 (2004)

    Article  Google Scholar 

  11. Williams, C.K.I.: On a connection between kernel PCA and metric multidimensional scaling. Machine Learning 46(1/3), 11–19 (2002)

    Article  MATH  Google Scholar 

  12. Gower, J.C.: Adding a point to vector diagrams in multivariate analysis. Biometrika 55(3), 582–585 (1968)

    Article  MATH  Google Scholar 

  13. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  14. Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases, Irvine, CA (1994), http://www.ics.uci.edu/~mlearn/MLRepository.html

  15. Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proceedings of the 14th International Conference on Machine Learning (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zeng, ZQ., Gao, J. (2009). Improving SVM Classification with Imbalance Data Set. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10677-4_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10677-4_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10676-7

  • Online ISBN: 978-3-642-10677-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics