Imbalanced Data Learning

Chang, Edward Y.

doi:10.1007/978-3-642-20429-6_9

Edward Y. Chang²

1046 Accesses

Abstract

An imbalanced training dataset can pose serious problems for many real-world data-mining tasks that conduct supervised learning. In this chapter,\(^\dagger\) we present a kernel-boundary-alignment algorithm, which considers training-data imbalance as prior information to augment SVMs to improve class-prediction accuracy. Using a simple example, we first show that SVMs can suffer from high incidences of false negatives when the training instances of the target class are heavily outnumbered by the training instances of a non-target class. The remedy we propose is to adjust the class boundary by modifying the kernel matrix, according to the imbalanced data distribution. Through theoretical analysis backed by empirical study, we show that the kernel-boundary-alignment algorithm works effectively on several datasets.

^†© IEEE, 2005. This chapter is written based on the author’s work with Gang Wu [1] published in IEEE TKDE 17(6). Permission to publish this chapter is granted under copyright license #2587680962412.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Although our algorithmic approach focuses on aligning class boundary, it can effectively remove redundant majority instances as a by-product.
2.
Given a kernel function K and a set of instances \(\fancyscript{X}_{\rm train} = \{{\mathbf{x}}_i,y_i\}_{i=1}^{n}\), the kernel matrix (Gram matrix) is the matrix of all possible inner-products of pairs from \(X_{\rm train},{\mathbf{K}}=(k_{ij})=K({\mathbf{x}}_i,{\mathbf{x}}_j)\).
3.
Usually, it is difficult to find a totally-conformal mapping function to transform the kernel. As suggested in [19], we can choose a quasi-conformal mapping function for kernel transformation.
4.
In the KBA algorithm, if \({\mathbf{x}}\) is a support instance, we call both \({\mathbf{x}}\) and its embedded support vector via \({\mathbf{K}}\;\hbox{in}\; \fancyscript{F}\) support instance.
5.
In KBA , we only consider the misclassified test instances among the margin so as to reduce the influence from the outliers. Their SVM scores \(f({\mathbf{x}})\) range from \(-1\) to +1.
6.
We exclude from our testbed those categories that cannot be classified automatically, such as “industry”, “Rome”, and “Boston”. (E.g., the Boston category contains various subjects, e.g., architectures, landscapes, and people, of Boston.)
7.
For the datasets in Table 9.2 from top to bottom, for SMOTE, the optimal \(\gamma\) was 0.002,0.003,0.085,0.3,0.5, and 0.084, respectively. For SVMs, ACT , and KBA , the optimal \(\gamma\) was 0.004,0.003,0.08,0.3,0.5, and 0.086, respectively. All optimal C’s were 1,000.

References

G. Wu, E.Y. Chang, KBA: kenel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)
Article Google Scholar
G.M. Weiss, Mining with rarity: a unifying framework. SIGKDD Explorations (1):7–19 June 2004
Google Scholar
T. Fawcett, F. Provost, Adaptive fraud detection, in Proceedings of ACM SIGKDD, 1997, pp. 291–316
Google Scholar
G. Wu, Y. Wu, L. Jiao, Y.F. Wang, E.Y. Chang, Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance, in Proceedings of ACM International Conference on Multimedia, November 2003, pp. 528–538
Google Scholar
V. Vapnik, The Nature of Statistical Learning Theory. (Springer, New York, 1995)
MATH Google Scholar
S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia, October 2001, pp. 107–118
Google Scholar
T. Joachims, Text categorization with support vector machines: learning with many relevant features, in Proceedings of ECML, 1998, pp. 137–142
Google Scholar
K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edn. (Academic Press, Boston, 1990)
Google Scholar
F. Provost, Learning with imbalanced data sets. Invited Paper for the AAAI’2000 Workshop on Imbalanced Data Sets
Google Scholar
M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: one-sided selection, in Proceedings of the Fourteenth International Conference on Machine Learning (ICML), 1997, pp. 179–186
Google Scholar
P. Chan, S. Stolfo, Learning with non-uniform class and cost distributions: effects and a distributed multi-classifier approach. Workshop Notes KDD Workshop on Distributed Data Mining, 1998, pp. 1–9
Google Scholar
L. Breiman, Bagging predictors. Mach. Learn. 24, 123–140 (1996)
MathSciNet MATH Google Scholar
N. Chawla, K. Bowyer, L. Hall, W.P. Kegelmeyer, Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
G.M. Weiss, F. Provost, Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)
MATH Google Scholar
A. Nugroho, S. Kuroyanagi, A. Iwata, A solution for imbalanced training sets problem by combnet-ii and its application on fog forecasting. IEICE Trans. Inf. Syst. E85(D(7), 1165–1174 (2002)
Google Scholar
C. Cardie, N. Howe, Improving minority class prediction using case-specific feature weights, in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 57–65
Google Scholar
C. Drummond, R. Holte, Exploiting the cost (in)sensitivity of decision tree splitting criteria, in Proceedings of the Seventeenth International Conference on Machine Learning, 2000, pp. 239–246
Google Scholar
C. Ling, C. Li, Data mining for direct marketing—specific problems and solutions, in Proceedings of ACM SIGKDD, 1998, pp. 73–79
Google Scholar
S. Amari, S. Wu, Improving support vector machine classifiers by modifying kernel functions. Neural Netw. 12(6), 783–789 (1999)
Article Google Scholar
K. Crammer, J. Keshet, Y. Singer, Kernel design using boosting, in Proceedings of NIPS, 2002, pp. 537–544
Google Scholar
G. Karakoulas, J.S. Taylor, Optimizing classifiers for imbalanced training sets, in Proceedings of NIPS, 1998, pp. 253–25
Google Scholar
Y. Lin, Y. Lee, G. Wahba, Support vector machines for classification in nonstandard situations. Mach. Learn. 46, 191–202 (2002)
Article MATH Google Scholar
C.S. Ong, A.J. Smola, R.C. Williamson. Hyperkernels, in Proceedings of NIPS, 2003, pp. 478–485
Google Scholar
K. Veropoulos, C. Campbell, N. Cristianini, Controlling the sensitivity of support vector machines, in Proceedings of the International Joint Conference on Artificial Intelligence, 1999, pp. 55–60
Google Scholar
X. Wu, R. Srihari, New ν-support vector machines and their sequential minimal optimization, in Proceedings of the Twentieth International Conference on Machine Learning (ICML), Washington, DC, August 2003, pp. 824–831
Google Scholar
H.W. Kuhn, A.W. Tucker, Non-linear programming, in: Proceedings of Berkeley Syrup. on Mathematical Statistics and Probability, University of California Press, 1961
Google Scholar
N. Cristianini, J. Shawe-Taylor, A. Elisseeff, J. Kandola, On kernel target alignment. In Proceedings of NIPS, pp. 367–373, 2001
Google Scholar
B. Scholkopf, A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, (MIT Press, Cambridge, 2002
Google Scholar
G. Wu, E. Chang, Adaptive feature-space conformal transformation for imbalanced data learning, in Proceedings of the Twentieth International Conference on Machine Learning (ICML), Washington, DC, August 2003, pp. 816–823
Google Scholar
J. Kandola, J. Shawe-Taylor, Refining kernels for regression and uneven classification problems, in Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003
Google Scholar
C. Burges, Geometry and Invariance in Kernel Based Methods in Advance in Kernel Methods: Support Vector Learning, in Advances in Kernel Methods: Support Vector Learning (MIT Press, Cambridge, 1999), pp. 89–116
Google Scholar
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (Springer, New York, 2001)
MATH Google Scholar
C. Burges, A tutorial on support vector machines for pattern recognition, in Proceedings of ACM SIGKDD, 1998, pp. 955–974
Google Scholar
T. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)
MATH Google Scholar
A.P. Bradley, The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Article Google Scholar
X. Wu, R. Srihari, Incorporating Prior Knowledge with Weighted Margin Support Vector Machines (Seattle, Washington, August 2004), pp. 326–333
Google Scholar

Download references

Author information

Authors and Affiliations

Google Inc., Mountain View, CA, 94306, USA
Edward Y. Chang

Authors

Edward Y. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edward Y. Chang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chang, E.Y. (2011). Imbalanced Data Learning. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-20429-6_9
Published: 26 August 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics