Skip to main content

An Optimized Cost-Sensitive SVM for Imbalanced Data Learning

  • Conference paper
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7819))

Included in the following conference series:

Abstract

Class imbalance is one of the challenging problems for machine learning in many real-world applications. Cost-sensitive learning has attracted significant attention in recent years to solve the problem, but it is difficult to determine the precise misclassification costs in practice. There are also other factors that influence the performance of the classification including the input feature subset and the intrinsic parameters of the classifier. This paper presents an effective wrapper framework incorporating the evaluation measure (AUC and G-mean) into the objective function of cost sensitive SVM directly to improve the performance of classification by simultaneously optimizing the best pair of feature subset, intrinsic parameters and misclassification cost parameters. Experimental results on various standard benchmark datasets and real-world data with different ratios of imbalance show that the proposed method is effective in comparison with commonly used sampling techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chawla, N.V., Japkowicz, N., Kolcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets 6(1), 1–6 (2004)

    Article  Google Scholar 

  2. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 25–36 (2006)

    Google Scholar 

  3. Weiss, G., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? In: IEEE ICDM, pp. 35–41 (2007)

    Google Scholar 

  4. Yuan, B., Liu, W.H.: A Measure Oriented Training Scheme for Imbalanced Classification Problems. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining Workshop on Biologically Inspired Techniques for Data Mining, pp. 293–303 (2011)

    Google Scholar 

  5. Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: European Conference on Machine Learning (2004)

    Google Scholar 

  6. Chawla, N.V., Cieslak, D.A., Hall, L.O., Joshi, A.: Automatically countering imbalance and its empirical relationship to cost. Utility-Based Data Mining: A Special issue of the International Journal Data Mining and Knowledge Discovery (2008)

    Google Scholar 

  7. Li, N., Tsang, I., Zhou, Z.: Efficient Optimization of Performance Measures by Classifier Adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence PP(99), 1 (2012)

    Google Scholar 

  8. Weiss, G., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intel. Res., 19:315–19:354 (2003)

    Google Scholar 

  9. Zhou, Z.H., Liu, X.Y.: Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem. IEEE Transactions on Knowledge and Data Engineering 18(1), 63–77 (2006)

    Article  Google Scholar 

  10. Sun, Y., Kamel, M.S., Wang, Y.: Boosting for Learning Multiple Classes with Imbalanced Class Distribution. In: Proc. Int’l Conf. Data Mining, pp. 592–602 (2006)

    Google Scholar 

  11. Wang, B.X., Japkowicz, N.: Boosting support vector machines for imbalanced data sets. Journal of Knowledge and Information Systems 4994, 38–47 (2008)

    Google Scholar 

  12. Thai-Nghe, N.: Cost-Sensitive Learning Methods for Imbalanced Data. In: Intl. Joint Conf. on Neural Networks (2010)

    Google Scholar 

  13. Forman, G.: An Extensive Empirical Study of Feature Selection Metrics for Text Classification. J. Machine Learning Research 3, 1289–1305 (2003)

    MATH  Google Scholar 

  14. Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. SIGKDD Explorations 6(1), 80–89 (2004)

    Article  Google Scholar 

  15. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: International Joint Conference on AI, pp. 55–60 (1999)

    Google Scholar 

  16. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: IEEE Int. Conf. Neural Networks, pp. 1942–1948 (1995)

    Google Scholar 

  17. Khanesar, M.A., Teshnehlab, M., Shoorehdeli, M.A.: A novel binary particle swarm optimization. In: Mediterranean Conference on Control & Automation, pp. 1–6 (2007)

    Google Scholar 

  18. Carlisle, A., Dozier, G.: An Off-The-Shelf PSO. In: PSO Workshop, pp. 1–6 (2001)

    Google Scholar 

  19. Hsu, C.W., Chang, C.C., Lin, C.J.: A Practical Guide to Support vector Classification, National Taiwan UniversityTechnical Report (2003)

    Google Scholar 

  20. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  21. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cao, P., Zhao, D., Zaiane, O. (2013). An Optimized Cost-Sensitive SVM for Imbalanced Data Learning. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37456-2_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37455-5

  • Online ISBN: 978-3-642-37456-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics