A Synthetic Minority Oversampling Method Based on Local Densities in Low-Dimensional Space for Imbalanced Learning

Xie, Zhipeng; Jiang, Liyang; Ye, Tengju; Li, Xiaoli

doi:10.1007/978-3-319-18123-3_1

Zhipeng Xie^17,18,
Liyang Jiang¹⁷,
Tengju Ye¹⁷ &
…
Xiaoli Li¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9050))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1878 Accesses
9 Citations

Abstract

Imbalanced class distribution is a challenging problem in many real-life classification problems. Existing synthetic oversampling do suffer from the curse of dimensionality because they rely heavily on Euclidean distance. This paper proposed a new method, called Minority Oversampling Technique based on Local Densities in Low-Dimensional Space (or MOT2LD in short). MOT2LD first maps each training sample into a low-dimensional space, and makes clustering of their low-dimensional representations. It then assigns weight to each minority sample as the product of two quantities: local minority density and local majority count, indicating its importance of sampling. The synthetic minority class samples are generated inside some minority cluster. MOT2LD has been evaluated on 15 real-world data sets. The experimental results have shown that our method outperforms some other existing methods including SMOTE, Borderline-SMOTE, ADASYN, and MWMOTE, in terms of G-mean and F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fawcett, T.E., Provost, F.: Adaptive Fraud Detection. Data Min. Knowl. Disc. 3(1), 291–316 (1997)
Article Google Scholar
Mladenić, D., Grobelnik, M.: Feature selection for unbalanced class distribution and naive bayes. In: Proceedings of the 16th International Conference on Machine Learning, pp. 258−267 (1999)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority oversampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new oversampling method in imbalanced data sets learning. In: Proceedings of International Conference on Intelligent Computing, pp. 878−887 (2005)
Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IEEE International Joint Conference on Neural Networks, pp. 1322−1328 (2008)
Google Scholar
Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE - majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)
Article Google Scholar
van der Maaten, L.J.P., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review. Tilburg University Techical Report, TiCC-TR 2009–005 (2009)
Google Scholar
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 (1933)
Article Google Scholar
Torgerson, W.S.: Multidimensional scaling I: theory and method. Psychometrika 17, 401–419 (1952)
Article MATH MathSciNet Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by Locally Linear Embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems, vol. 15, pp. 833−840 (2002)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008)
MATH Google Scholar
Liu, Y.: Distance metric learning: a comprehensive survey. Research Report, Michigan State University (2006)
Google Scholar
Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf. Process. Manage. 22(6), 465–476 (1986)
Article Google Scholar
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)
Article Google Scholar
MacQueen, J.: Some methods for classifications and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, University of California Press, pp. 281−297 (1967)
Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, 2013 [http://archive.ics.uci.edu/ml]
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC press (1984)
Google Scholar
Cao, H., Li, X.L., Woon, Y.-K., Ng, S.K.: SPO: structure preserving oversampling for imbalanced time series classification. In: Proceedings of IEEE International Conference on Data Mining (2011)
Google Scholar
Cao, H., Li, X.L., Woon, Y.K., Ng, S.K.: Integrated oversampling for imbalanced time series classification. IEEE Trans. Knowl. Data Eng. 25(12), 2809–2822 (2013)
Article Google Scholar
Pang, Z.F., Cao, H., Tan, Y.F.: MOGT: oversampling with a parsimonious mixture of Gaussian trees model for imbalanced time-series classification. In: Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1−6 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, China
Zhipeng Xie, Liyang Jiang & Tengju Ye
Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China
Zhipeng Xie
Institute of InfoComm Research, Fusionopolis Way, Singapore, Singapore
Xiaoli Li

Authors

Zhipeng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Liyang Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Tengju Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhipeng Xie .

Editor information

Editors and Affiliations

Universität München, München, Germany
Matthias Renz
University of Southern California, Los Angeles, USA
Cyrus Shahabi
University of Queensland, Brisbane, Australia
Xiaofang Zhou
Monash University, Clayton, Australia
Muhammad Aamir Cheema

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, Z., Jiang, L., Ye, T., Li, X. (2015). A Synthetic Minority Oversampling Method Based on Local Densities in Low-Dimensional Space for Imbalanced Learning. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9050. Springer, Cham. https://doi.org/10.1007/978-3-319-18123-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-18123-3_1
Published: 09 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18122-6
Online ISBN: 978-3-319-18123-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics