Skip to main content
  • 1046 Accesses

Abstract

An imbalanced training dataset can pose serious problems for many real-world data-mining tasks that conduct supervised learning. In this chapter,\(^\dagger\) we present a kernel-boundary-alignment algorithm, which considers training-data imbalance as prior information to augment SVMs to improve class-prediction accuracy. Using a simple example, we first show that SVMs can suffer from high incidences of false negatives when the training instances of the target class are heavily outnumbered by the training instances of a non-target class. The remedy we propose is to adjust the class boundary by modifying the kernel matrix, according to the imbalanced data distribution. Through theoretical analysis backed by empirical study, we show that the kernel-boundary-alignment algorithm works effectively on several datasets.

© IEEE, 2005. This chapter is written based on the author’s work with Gang Wu [1] published in IEEE TKDE 17(6). Permission to publish this chapter is granted under copyright license #2587680962412.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Although our algorithmic approach focuses on aligning class boundary, it can effectively remove redundant majority instances as a by-product.

  2. 2.

    Given a kernel function K and a set of instances \(\fancyscript{X}_{\rm train} = \{{\mathbf{x}}_i,y_i\}_{i=1}^{n}\), the kernel matrix (Gram matrix) is the matrix of all possible inner-products of pairs from \(X_{\rm train},{\mathbf{K}}=(k_{ij})=K({\mathbf{x}}_i,{\mathbf{x}}_j)\).

  3. 3.

    Usually, it is difficult to find a totally-conformal mapping function to transform the kernel. As suggested in [19], we can choose a quasi-conformal mapping function for kernel transformation.

  4. 4.

    In the KBA algorithm, if \({\mathbf{x}}\) is a support instance, we call both \({\mathbf{x}}\) and its embedded support vector via \({\mathbf{K}}\;\hbox{in}\; \fancyscript{F}\) support instance.

  5. 5.

    In KBA , we only consider the misclassified test instances among the margin so as to reduce the influence from the outliers. Their SVM scores \(f({\mathbf{x}})\) range from \(-1\) to +1.

  6. 6.

    We exclude from our testbed those categories that cannot be classified automatically, such as “industry”, “Rome”, and “Boston”. (E.g., the Boston category contains various subjects, e.g., architectures, landscapes, and people, of Boston.)

  7. 7.

    For the datasets in Table 9.2 from top to bottom, for SMOTE, the optimal \(\gamma\) was 0.002,0.003,0.085,0.3,0.5, and 0.084, respectively. For SVMs, ACT , and KBA , the optimal \(\gamma\) was 0.004,0.003,0.08,0.3,0.5, and 0.086, respectively. All optimal C’s were 1,000.

References

  1. G. Wu, E.Y. Chang, KBA: kenel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)

    Article  Google Scholar 

  2. G.M. Weiss, Mining with rarity: a unifying framework. SIGKDD Explorations (1):7–19 June 2004

    Google Scholar 

  3. T. Fawcett, F. Provost, Adaptive fraud detection, in Proceedings of ACM SIGKDD, 1997, pp. 291–316

    Google Scholar 

  4. G. Wu, Y. Wu, L. Jiao, Y.F. Wang, E.Y. Chang, Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance, in Proceedings of ACM International Conference on Multimedia, November 2003, pp. 528–538

    Google Scholar 

  5. V. Vapnik, The Nature of Statistical Learning Theory. (Springer, New York, 1995)

    MATH  Google Scholar 

  6. S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia, October 2001, pp. 107–118

    Google Scholar 

  7. T. Joachims, Text categorization with support vector machines: learning with many relevant features, in Proceedings of ECML, 1998, pp. 137–142

    Google Scholar 

  8. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edn. (Academic Press, Boston, 1990)

    Google Scholar 

  9. F. Provost, Learning with imbalanced data sets. Invited Paper for the AAAI’2000 Workshop on Imbalanced Data Sets

    Google Scholar 

  10. M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: one-sided selection, in Proceedings of the Fourteenth International Conference on Machine Learning (ICML), 1997, pp. 179–186

    Google Scholar 

  11. P. Chan, S. Stolfo, Learning with non-uniform class and cost distributions: effects and a distributed multi-classifier approach. Workshop Notes KDD Workshop on Distributed Data Mining, 1998, pp. 1–9

    Google Scholar 

  12. L. Breiman, Bagging predictors. Mach. Learn. 24, 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  13. N. Chawla, K. Bowyer, L. Hall, W.P. Kegelmeyer, Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  14. G.M. Weiss, F. Provost, Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)

    MATH  Google Scholar 

  15. A. Nugroho, S. Kuroyanagi, A. Iwata, A solution for imbalanced training sets problem by combnet-ii and its application on fog forecasting. IEICE Trans. Inf. Syst. E85(D(7), 1165–1174 (2002)

    Google Scholar 

  16. C. Cardie, N. Howe, Improving minority class prediction using case-specific feature weights, in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 57–65

    Google Scholar 

  17. C. Drummond, R. Holte, Exploiting the cost (in)sensitivity of decision tree splitting criteria, in Proceedings of the Seventeenth International Conference on Machine Learning, 2000, pp. 239–246

    Google Scholar 

  18. C. Ling, C. Li, Data mining for direct marketing—specific problems and solutions, in Proceedings of ACM SIGKDD, 1998, pp. 73–79

    Google Scholar 

  19. S. Amari, S. Wu, Improving support vector machine classifiers by modifying kernel functions. Neural Netw. 12(6), 783–789 (1999)

    Article  Google Scholar 

  20. K. Crammer, J. Keshet, Y. Singer, Kernel design using boosting, in Proceedings of NIPS, 2002, pp. 537–544

    Google Scholar 

  21. G. Karakoulas, J.S. Taylor, Optimizing classifiers for imbalanced training sets, in Proceedings of NIPS, 1998, pp. 253–25

    Google Scholar 

  22. Y. Lin, Y. Lee, G. Wahba, Support vector machines for classification in nonstandard situations. Mach. Learn. 46, 191–202 (2002)

    Article  MATH  Google Scholar 

  23. C.S. Ong, A.J. Smola, R.C. Williamson. Hyperkernels, in Proceedings of NIPS, 2003, pp. 478–485

    Google Scholar 

  24. K. Veropoulos, C. Campbell, N. Cristianini, Controlling the sensitivity of support vector machines, in Proceedings of the International Joint Conference on Artificial Intelligence, 1999, pp. 55–60

    Google Scholar 

  25. X. Wu, R. Srihari, New ν-support vector machines and their sequential minimal optimization, in Proceedings of the Twentieth International Conference on Machine Learning (ICML), Washington, DC, August 2003, pp. 824–831

    Google Scholar 

  26. H.W. Kuhn, A.W. Tucker, Non-linear programming, in: Proceedings of Berkeley Syrup. on Mathematical Statistics and Probability, University of California Press, 1961

    Google Scholar 

  27. N. Cristianini, J. Shawe-Taylor, A. Elisseeff, J. Kandola, On kernel target alignment. In Proceedings of NIPS, pp. 367–373, 2001

    Google Scholar 

  28. B. Scholkopf, A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, (MIT Press, Cambridge, 2002

    Google Scholar 

  29. G. Wu, E. Chang, Adaptive feature-space conformal transformation for imbalanced data learning, in Proceedings of the Twentieth International Conference on Machine Learning (ICML), Washington, DC, August 2003, pp. 816–823

    Google Scholar 

  30. J. Kandola, J. Shawe-Taylor, Refining kernels for regression and uneven classification problems, in Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003

    Google Scholar 

  31. C. Burges, Geometry and Invariance in Kernel Based Methods in Advance in Kernel Methods: Support Vector Learning, in Advances in Kernel Methods: Support Vector Learning (MIT Press, Cambridge, 1999), pp. 89–116

    Google Scholar 

  32. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (Springer, New York, 2001)

    MATH  Google Scholar 

  33. C. Burges, A tutorial on support vector machines for pattern recognition, in Proceedings of ACM SIGKDD, 1998, pp. 955–974

    Google Scholar 

  34. T. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)

    MATH  Google Scholar 

  35. A.P. Bradley, The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)

    Article  Google Scholar 

  36. X. Wu, R. Srihari, Incorporating Prior Knowledge with Weighted Margin Support Vector Machines (Seattle, Washington, August 2004), pp. 326–333

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward Y. Chang .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Press

About this chapter

Cite this chapter

Chang, E.Y. (2011). Imbalanced Data Learning. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20429-6_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20428-9

  • Online ISBN: 978-3-642-20429-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics