Abstract
An imbalanced training dataset can pose serious problems for many real-world data-mining tasks that conduct supervised learning. In this chapter,\(^\dagger\) we present a kernel-boundary-alignment algorithm, which considers training-data imbalance as prior information to augment SVMs to improve class-prediction accuracy. Using a simple example, we first show that SVMs can suffer from high incidences of false negatives when the training instances of the target class are heavily outnumbered by the training instances of a non-target class. The remedy we propose is to adjust the class boundary by modifying the kernel matrix, according to the imbalanced data distribution. Through theoretical analysis backed by empirical study, we show that the kernel-boundary-alignment algorithm works effectively on several datasets.
†© IEEE, 2005. This chapter is written based on the author’s work with Gang Wu [1] published in IEEE TKDE 17(6). Permission to publish this chapter is granted under copyright license #2587680962412.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Although our algorithmic approach focuses on aligning class boundary, it can effectively remove redundant majority instances as a by-product.
- 2.
Given a kernel function K and a set of instances \(\fancyscript{X}_{\rm train} = \{{\mathbf{x}}_i,y_i\}_{i=1}^{n}\), the kernel matrix (Gram matrix) is the matrix of all possible inner-products of pairs from \(X_{\rm train},{\mathbf{K}}=(k_{ij})=K({\mathbf{x}}_i,{\mathbf{x}}_j)\).
- 3.
Usually, it is difficult to find a totally-conformal mapping function to transform the kernel. As suggested in [19], we can choose a quasi-conformal mapping function for kernel transformation.
- 4.
In the KBA algorithm, if \({\mathbf{x}}\) is a support instance, we call both \({\mathbf{x}}\) and its embedded support vector via \({\mathbf{K}}\;\hbox{in}\; \fancyscript{F}\) support instance.
- 5.
In KBA , we only consider the misclassified test instances among the margin so as to reduce the influence from the outliers. Their SVM scores \(f({\mathbf{x}})\) range from \(-1\) to +1.
- 6.
We exclude from our testbed those categories that cannot be classified automatically, such as “industry”, “Rome”, and “Boston”. (E.g., the Boston category contains various subjects, e.g., architectures, landscapes, and people, of Boston.)
- 7.
For the datasets in Table 9.2 from top to bottom, for SMOTE, the optimal \(\gamma\) was 0.002,0.003,0.085,0.3,0.5, and 0.084, respectively. For SVMs, ACT , and KBA , the optimal \(\gamma\) was 0.004,0.003,0.08,0.3,0.5, and 0.086, respectively. All optimal C’s were 1,000.
References
G. Wu, E.Y. Chang, KBA: kenel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)
G.M. Weiss, Mining with rarity: a unifying framework. SIGKDD Explorations (1):7–19 June 2004
T. Fawcett, F. Provost, Adaptive fraud detection, in Proceedings of ACM SIGKDD, 1997, pp. 291–316
G. Wu, Y. Wu, L. Jiao, Y.F. Wang, E.Y. Chang, Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance, in Proceedings of ACM International Conference on Multimedia, November 2003, pp. 528–538
V. Vapnik, The Nature of Statistical Learning Theory. (Springer, New York, 1995)
S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia, October 2001, pp. 107–118
T. Joachims, Text categorization with support vector machines: learning with many relevant features, in Proceedings of ECML, 1998, pp. 137–142
K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edn. (Academic Press, Boston, 1990)
F. Provost, Learning with imbalanced data sets. Invited Paper for the AAAI’2000 Workshop on Imbalanced Data Sets
M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: one-sided selection, in Proceedings of the Fourteenth International Conference on Machine Learning (ICML), 1997, pp. 179–186
P. Chan, S. Stolfo, Learning with non-uniform class and cost distributions: effects and a distributed multi-classifier approach. Workshop Notes KDD Workshop on Distributed Data Mining, 1998, pp. 1–9
L. Breiman, Bagging predictors. Mach. Learn. 24, 123–140 (1996)
N. Chawla, K. Bowyer, L. Hall, W.P. Kegelmeyer, Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
G.M. Weiss, F. Provost, Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)
A. Nugroho, S. Kuroyanagi, A. Iwata, A solution for imbalanced training sets problem by combnet-ii and its application on fog forecasting. IEICE Trans. Inf. Syst. E85(D(7), 1165–1174 (2002)
C. Cardie, N. Howe, Improving minority class prediction using case-specific feature weights, in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 57–65
C. Drummond, R. Holte, Exploiting the cost (in)sensitivity of decision tree splitting criteria, in Proceedings of the Seventeenth International Conference on Machine Learning, 2000, pp. 239–246
C. Ling, C. Li, Data mining for direct marketing—specific problems and solutions, in Proceedings of ACM SIGKDD, 1998, pp. 73–79
S. Amari, S. Wu, Improving support vector machine classifiers by modifying kernel functions. Neural Netw. 12(6), 783–789 (1999)
K. Crammer, J. Keshet, Y. Singer, Kernel design using boosting, in Proceedings of NIPS, 2002, pp. 537–544
G. Karakoulas, J.S. Taylor, Optimizing classifiers for imbalanced training sets, in Proceedings of NIPS, 1998, pp. 253–25
Y. Lin, Y. Lee, G. Wahba, Support vector machines for classification in nonstandard situations. Mach. Learn. 46, 191–202 (2002)
C.S. Ong, A.J. Smola, R.C. Williamson. Hyperkernels, in Proceedings of NIPS, 2003, pp. 478–485
K. Veropoulos, C. Campbell, N. Cristianini, Controlling the sensitivity of support vector machines, in Proceedings of the International Joint Conference on Artificial Intelligence, 1999, pp. 55–60
X. Wu, R. Srihari, New ν-support vector machines and their sequential minimal optimization, in Proceedings of the Twentieth International Conference on Machine Learning (ICML), Washington, DC, August 2003, pp. 824–831
H.W. Kuhn, A.W. Tucker, Non-linear programming, in: Proceedings of Berkeley Syrup. on Mathematical Statistics and Probability, University of California Press, 1961
N. Cristianini, J. Shawe-Taylor, A. Elisseeff, J. Kandola, On kernel target alignment. In Proceedings of NIPS, pp. 367–373, 2001
B. Scholkopf, A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, (MIT Press, Cambridge, 2002
G. Wu, E. Chang, Adaptive feature-space conformal transformation for imbalanced data learning, in Proceedings of the Twentieth International Conference on Machine Learning (ICML), Washington, DC, August 2003, pp. 816–823
J. Kandola, J. Shawe-Taylor, Refining kernels for regression and uneven classification problems, in Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003
C. Burges, Geometry and Invariance in Kernel Based Methods in Advance in Kernel Methods: Support Vector Learning, in Advances in Kernel Methods: Support Vector Learning (MIT Press, Cambridge, 1999), pp. 89–116
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (Springer, New York, 2001)
C. Burges, A tutorial on support vector machines for pattern recognition, in Proceedings of ACM SIGKDD, 1998, pp. 955–974
T. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)
A.P. Bradley, The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
X. Wu, R. Srihari, Incorporating Prior Knowledge with Weighted Margin Support Vector Machines (Seattle, Washington, August 2004), pp. 326–333
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Press
About this chapter
Cite this chapter
Chang, E.Y. (2011). Imbalanced Data Learning. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-20429-6_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)