Improving SVM Classification with Imbalance Data Set

Zeng, Zhi-Qiang; Gao, Ji

doi:10.1007/978-3-642-10677-4_44

Zhi-Qiang Zeng¹⁹ &
Ji Gao²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5863))

Included in the following conference series:

International Conference on Neural Information Processing

1702 Accesses
21 Citations

Abstract

In view of inconsistent problems caused by that Synthetic Minority Over-sampling Technique (SMOTE) and Support Vector Machine (SVM) work in different space, this paper presents a kernel-based SMOTE approach to solve classification with imbalance data set by SVM. The method first preprocesses the data by oversampling the minority instances in the feature space, then the pre-images of the synthetic samples are found based on a distance relation between feature space and input space. Finally, these pre-images are appended to the original dataset to train a SVM. Experiments on real data set indicate that compared with SMOTE approach, the samples constructed by the proposed method have the higher quality. As a result, the effectiveness of classification by SVM on imbalance data set is improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)
Google Scholar
Akbani, R., Kwek, S., Japkowicz, N.: Applying Support Vector Machines to Imbalance data set. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)
Google Scholar
Yuan, J., Li, J., Zhang, B.: Learning concepts from large scale imbalanced data sets using support cluster machines. In: Proc. of the ACM Int’l Conf. on Multimedia, pp. 441–450 (2006)
Google Scholar
Kang, P., Cho, S.: EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4232, pp. 837–846. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, P., Wang, X., Liu, Y., Wang, X.: A Classification Method for Imbalance Data Set Based on Hybrid Strategy. Chinese Journal of Electronics 35(11), 2161–2165 (2007)
Google Scholar
Imam, T., Ting, K.M., Kamruzzaman, J.: z-SVM: An SVM for improved classification of imbal-anced data. In: Sattar, A., Kang, B.-h. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 264–273. Springer, Heidelberg (2006)
Chapter Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. (JAIR) 16, 321–357 (2002)
MATH Google Scholar
Liu, Y., An, A., Huang, X.: Boosting Prediction Accuracy on Imbalance data set with SVM Ensembles. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 107–118. Springer, Heidelberg (2006)
Chapter Google Scholar
Kwok, J.T., Tsang, I.W.: The pre-image problem in kernel methods. IEEE Transactions on Neural Networks 15(6), 1517–1525 (2004)
Article Google Scholar
Williams, C.K.I.: On a connection between kernel PCA and metric multidimensional scaling. Machine Learning 46(1/3), 11–19 (2002)
Article MATH Google Scholar
Gower, J.C.: Adding a point to vector diagrams in multivariate analysis. Biometrika 55(3), 582–585 (1968)
Article MATH Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases, Irvine, CA (1994), http://www.ics.uci.edu/~mlearn/MLRepository.html
Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proceedings of the 14th International Conference on Machine Learning (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Xiamen University of Technology, 361024, Xiamen, China
Zhi-Qiang Zeng
Department of Computer Science and Engineering, Zhejiang University, 310027, Hangzhou, China
Ji Gao

Authors

Zhi-Qiang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Ji Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronic Engineering, City University of Hong Kong, Hong Kong,
Chi Sing Leung
School of Electrical Engineering and Computer Science, Kyungpook National University, 1370 Sankyuk-Dong, Puk-Gu, 702-701, Taegu, Korea
Minho Lee
School of Information Technology, King Mongkut’s University of Technology Thonburi, 126 Pracha-U-Thit Rd., Bangmod, Thungkru, 10140, Bangkok, Thailand
Jonathan H. Chan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, ZQ., Gao, J. (2009). Improving SVM Classification with Imbalance Data Set. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10677-4_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-10677-4_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10676-7
Online ISBN: 978-3-642-10677-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics