Over-Sampling Algorithm Based on VAE in Imbalanced Classification

Zhang, Chunkai; Zhou, Ying; Chen, Yingyang; Deng, Yepeng; Wang, Xuan; Dong, Lifeng; Wei, Haoyu

doi:10.1007/978-3-319-94295-7_23

Chunkai Zhang¹⁵,
Ying Zhou¹⁵,
Yingyang Chen¹⁵,
Yepeng Deng¹⁵,
Xuan Wang¹⁵,
Lifeng Dong¹⁶ &
…
Haoyu Wei¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10967))

Included in the following conference series:

International Conference on Cloud Computing

2816 Accesses
12 Citations

Abstract

The imbalanced classification problem is a problem that violates the assumption of uniform distribution of samples, classes differ in sample size, sample distribution and misclassification cost. The traditional classifiers tend to ignore the important minority samples because of their rarity. Oversampling, the algorithm uses various methods to increase the minority samples in the training set to increase the recognition rate of them. However, these over-sampling methods are too coarse to improve the classification effect of the minority samples, because they can’t make full use of the information in the original samples, but increase the training time because of adding extra samples. In this paper, we propose to use the distribution information of the minority samples, use the variational auto-encoder to fit the probability distribution function of them without any prior assumption, and reasonably expand the minority class sample set. The experimental results prove the effectiveness of the proposed algorithm.

You have full access to this open access chapter, Download conference paper PDF

Similarity Majority Under-Sampling Technique for Easing Imbalanced Classification Problem

A Review of the Oversampling Techniques in Class Imbalance Problem

Important sampling based active learning for imbalance classification

Article 07 July 2020

Xinyue Wang, Bo Liu, … Jian Yu

Keywords

1 Introduction

The classification problem is a very important part of machine learning, and it is also the first step for artificial intelligence to understand human life. At present, most classifiers assume that the samples of different classes are evenly distributed, and the classification costs are the same. However, in reality, the data people are more concerned about is often scarce, such as the detection of credit card fraud and medical disease diagnosis. In the medical disease diagnosis, most of the results are normal while only a small proportion of the results are diagnosed as diseases, which indicates the different distribution in different classes samples. Second, if healthy people are misdiagnosed as diseases, they can be removed by other inspection methods, errors do not cause very serious accidents, but if the disease people are diagnosed as healthy, it may cause the patients to miss the best treatment time and cause serious consequences. This is the second feature of the imbalanced classification problems: different classes of misclassification costs are inconsistent. At the same time, if samples are classified as diseases as much as possible because they are afraid to miss the disease samples, it will cause a huge waste of medical resources and intensify conflicts between doctors and patients. Therefore, it is not feasible to determine all samples as disease, and the best way is to try to separate these two results as correct as possible. Due to the scarcity of the minority samples and the definition of global accuracy, the classifier pays less attention to the minority class, so the recognition performance is unsatisfying. Imbalanced classification problems arise in many fields, such as bioinformatics [1, 2], remote sensing image recognition [3], and privacy protection in cybersecurity [4,5,6]. The imbalanced problems cover widely and have a very important practical significance.

The traditional solutions to the imbalanced problems are divided into two parts: the algorithm-level methods and the data-level methods. The algorithm-level methods mainly focus on the different misclassification costs, such as improved neural network [7]: it uses the approximation of F1 value of the minority class as the cost function; the bagging algorithm [8] continues to enhance the misclassified the minority samples, and improve the recognition rate of the minority samples; structured SVM [9] uses the F1 value of the minority samples as the optimization function, and has a better performance in the classification of the minority samples.

The data-level methods focus on the imbalance of sample size, which mainly adjust the data sample size through resampling to reduce the impact on classification performance. The data-level methods can be divided into over-sampling, under-sampling and hybrid sampling. Over-sampling adds the minority samples in the training process, Over-sampling can effectively improve the classification performance of the minority class but it has no idea of the rationality. Under-sampling [10] removes the majority samples before training, which can quickly reach equilibrium, but may take a risk of losing valuable samples.

The oversampling method can be divided into random sampling and informed sampling. Random sampling means repeating the known samples, which includes simple repetition [11], linear interpolation [12], nonlinear interpolation [13], etc.; SMOTE [12], as a classic over-sampling algorithm, interpolates linearly in the minority samples, will increase the amount of information and rationality of synthesized samples in random oversampling, which improves the classification effect. Border-line-smote [14], to reduce the risk of overfitting, it selects the minority samples needing to be interpolated called boundary samples. The above oversampling methods only consider the influence of the sample size and the local sample distribution on the classification performance, ignoring the overall distribution of the sample, which is more informative for classification performance.

Informed sampling [15] uses the distribution information in the sample to fit its probability distribution function (PDF) and sample it according to the PDF. Chen [16] proposed a normal distribution based oversampling approach, and this approach assumes the minority class distribution as the Gaussian normal distribution, the parameters are calculated from the minority samples with EM algorithm, the experimental results are better than SMOTE and random oversampling. Different scholars have proposed oversampling algorithms based on various distributions, such as the Gaussian distribution [16, 17], Weibull distribution [18], etc. Due to the distribution information, these algorithms have made greater progress than random oversampling method. However, the problems are also obvious: there is a prior assumption about the real distribution and all the features are dependent from each other. If the real distribution meets this hypothesis, it will get better results, otherwise, the improvement is limited, so it is inconsistent in their effect on different datasets.

Data level methods are of great matter in imbalanced classification, as it can be regarded as a step in data preprocessing, it will have a positive effect on the final classification results. Since the factors that affect the datasets classification include not only the sample size, but also the sample distribution, while the current over-sampling methods do not make full use of distribution information and cannot guarantee the rationality of the generated samples.

In this paper, we propose a oversampling method based on the variational auto-encoder [19] (VAE) model to generate the minority samples. The proposed method is motivated that the distribution information plays an important role in oversampling methods, and aims at the rationality of the generated samples, we use VAE to increase minority instances, to our knowledge, first, the output dimension of the neural network is not limited so it can generate data of any dimension; second, the strong fitting ability of the neural network can simulate any distribution function without any prior knowledge in advance. We use this model to model the distribution of minority samples and oversample according to the model, the proposed method shows the superiority that it doesn’t need any prior distribution assumption nor the dependent features assumption, the experimental results prove the effectiveness of the algorithm.

We organize the paper as follows. Section 2 describes related work of this paper. Section 3 presents the proposed algorithm and analyze it. Section 4 shows the experimental results. Section 5 concludes the paper.

2 Related Work

In 2013, KM [19] proposed VAE: add variational inference to auto-encoder and use parameterization trick to make the variational inference combined with stochastic gradient descent. The overall structure of vae network is shown in Fig. 1, while it assumes the hidden variables to be a Gaussian standard distribution, it is easy to sample and the final probability distribution function is uncertain, coincides with the characteristics of distribution-based oversampling.

In VAE, we assume the variables are determined by the hidden compression code z, the encoder can map z to X, which makes z obey a particular distribution (such as Gaussian distribution, etc.). Knowing the possibility distribution function and its mapping function, we can sample z and encode z, to get new x to generate infinite sample theoretically. The structure of vae as shown below:

Assume z is a latent variable, and its distribution function is p(z), use Bayesian conditional probability formula to calculate P(X):

$$ {\text{p}}\left( {\text{X}} \right)\; = \,\smallint {\text{p}}\left( {\text{X|z}} \right){\text{p}}\left( {\text{z}} \right)\,{\text{dz}} $$

(1)

However, in z’s prior distribution, most of z cannot generate reliable samples, that is p(X|z) tends to 0, so p(X|z)p(z) tends to 0. To simplify the calculation, only p(X│z) need to be calculated. Considering the z with larger P(X|z), which is represented by P(z|X) form the encoder, but only considering this part of z cannot generate samples that are not in original data, so we need to assume the distribution of P(z│X) and complete the error through the decoder.

Q(z) is the assumption of the real distribution, we use KL divergence to calculate the difference between the real distribution and the assumption:

$$ {\text{D(p||q)}}\; = \,\smallint {\text{p}}\left( {\text{x}} \right)\,{ \log }\frac{{{\text{p}}\left( {\text{x}} \right)}}{{{\text{q}}\left( {\text{x}} \right)}}{\text{dx}} $$

(2)

Formula (2) shows that if two distribution is close, KL divergence will tend to 0. And the loss function of VAE model is

$$ argmin\;{\text{D(Q(z)||P(z|X))}} $$

(3)

Apply the formula (2) to the formula (3)

$$ {\text{D}}[{\text{Q}}({\text{z}})||{\text{P}}\left( {{\text{z}}|{\text{X}}} \right)]\; = \;{\text{E}}_{{{\text{z}}\sim{\text{Q}}}} [\log \,{\text{Q}}\left( {\text{z}} \right) - \log \,{\text{P}}\left( {{\text{z}}|{\text{X}}} \right)] $$

(4)

Apply Bayes rule to $ {\text{P}}({\text{z}}|{\text{X}}) $, we can get both $ {\text{P}}\left( {\text{X}} \right) $ and $ {\text{P}}({\text{X}}|{\text{z}}) $

$$ {\text{D}}({\text{Q}}\left( {\text{z}} \right)||{\text{P}}({\text{z}}|{\text{X}}))\; = \;{\text{E}}_{{{\text{z}}\sim{\text{Q}}}} \left[ {\log \,{\text{Q}}\left( {\text{z}} \right)\, - \,\log \,{\text{P}}\left( {\text{z}} \right)} \right]\, + \,\log \,{\text{P}}({\text{X}}) $$

(5)

Apply the $ {\text{D}}[{\text{Q}}({\text{z}})||{\text{P}}\left( {{\text{z}}|{\text{X}}} \right)] $ into it, note that X is fixed, and Q can be any distribution, not just a distribution which does a good job at mapping X to the z’s to produce X. since we’re interested in inferring P(X), it makes sense to construct a Q which does depend on X, and in particular, one which makes $ {\text{D}}({\text{Q}}\left( {\text{z}} \right)||{\text{P}}({\text{z}}|{\text{X}})) $ small: Because $ {\text{P}}\left( {\text{X}} \right) $ is fixed, the minimum $ {\text{D}}({\text{Q}}\left( z \right)||{\text{P}}\left( {{\text{z}}|{\text{X}}} \right)) $ will transform to maximize the value of the right side of the equation, and $ \log {\text{P}}({\text{X}}|{\text{z}}) $ is the probability of X decoded by z. It is calculated as the cross-entropy or mean-squared error of the original sample. The latter can be regarded as the difference between the assumption and the distribution of z in the encoder.

$$ \log \,{\text{P}}\left( {\text{X}} \right)\, - \,{\text{D}}[{\text{Q}}({\text{z}})||{\text{P}}({\text{z}}|{\text{X}})]\; = \;{\text{E}}_{{{\text{z}}\sim{\text{Q}}}} \left[ {\log \,{\text{P}}\left( {{\text{X}}|{\text{z}}} \right)} \right]\, - \,{\text{D}}[{\text{Q}}[{\text{z}}]||{\text{P}}({\text{z}})] $$

(6)

3 The Proposed Method

In this paper, an oversampling method based on VAE is proposed, motivated by the idea that the distribution information is important in oversampling method. Without any prior assumption of the real PDF of the minority samples nor the independent assumption in the features, the proposed method can automatically model the PDF with the oral data. However, there is also a trick in the proposed, there might have discrete features in the data, while the features generated by the stochastic gradient descent must be continuously differentiable, so this part of the features must be selected before vae training using formula (9), and after generating the continuous features, use 1-NN to classify the generated continuous and combine the continuous features with the discrete features of the nearest original sample into a new composite sample.

We don’t have enough information about whether a feature is discrete or not, so we assume that it is a discrete feature if there are no more than 2 distinct values in all the feature values. In fact, it is useless in classification if there is only one distinct value among the whole dataset.

Given training dataset $ {\text{X}}\; = \;\left\{ {\left( {{\text{x}}_{1} ,{\text{y}}_{1} } \right),\left( {{\text{x}}_{2} ,{\text{y}}_{2} } \right), \cdots ,\left( {{\text{x}}_{\text{N}} ,{\text{y}}_{\text{N}} } \right)} \right\} $, $ {\text{x}}_{\text{i}} \; \in \;R^{d} $ is the sample of d dimension, $ {\text{y}}_{\text{i}} \; \in \;\left\{ {0,\;1} \right\} $ is the labels represent negative and positive. We use P and N to represents a positive class sample subset and a negative class sample subset, where P contains $ {\text{N}}_{ + } $ positive samples, N contains $ {\text{N}}_{ - } $ negative samples, and $ N_{ + } + N_{ - } = N $.

During the training of the VAE model, nelements_j represents the number of distinct feature values in $ {\text{j}}_{\text{th}} $ dimension in the positive subset, the formula is shown as (7):

$$ nelements_{j} \; = \;\sum\nolimits_{\text{i}}^{{{\text{N}}_{ + } }} {} distinct\left\{ {{\text{x}}_{\text{ij}} } \right\} ,\; 1\le {\text{j}} \le {\text{d}} $$

(7)

$$ x_{i} = \left\{ {x_{i1} ,x_{i2} , \cdots ,x_{ik} } \right\}\, \cup \,{\text{\{ }}x_{{i\left( {k + 1} \right)}} ,\; \cdots ,\;x_{id} \} $$

(8)

$$ {\text{s}}.{\text{t}}.\;\left\{ {\begin{array}{*{20}l} {nelements_{j} > 2,} \hfill & { 1 \le j \le k} \hfill \\ {nelement{\text{s}}_{j} { \le }2,} \hfill & {k + 1 \le j \le d} \hfill \\ \end{array} } \right. $$

(9)

If nelements_j is no more than 2, the feature j is discrete, otherwise, the feature is continuous. Divide the features in the positive subset into continuous and discrete features in order and the continuous features are used as the final training set.

$$ {\text{Xtrainvae}}\; = \;\left[ {\begin{array}{*{20}c} {{\text{x}}_{ 1 1} } & \cdots & {{\text{x}}_{{ 1 {\text{k}}}} } \\ \vdots & \ddots & \vdots \\ {{\text{x}}_{{{\text{N}}_{ + } 1}} } & \cdots & {{\text{x}}_{{{\text{N}}_{ + } {\text{k}}}} } \\ \end{array} } \right] $$

(10)

Train a VAE model with $ Xtrain $ and randomly sample it, assume $ Xnew $ is synthetic a sample:

$$ \left\{ {\begin{array}{*{20}l} {Xfinal_{ij} \; = \;Xnew_{ij} \, \cup \,X_{lm} ,} \hfill & {k + 1 \le m \le d} \hfill \\ {s.t.\;agrmin{\sum }\left| {\left| {Xnew_{ij} - X_{lj} } \right|} \right|^{ 2} ,} \hfill & {1 \le j \le k} \hfill \\ \end{array} } \right. $$

(11)

$ X_{final} $ is the final synthetic sample, and $ X{ \cup }X_{final} $ is the final training set called $ X_{ov} $.

The whole process is described as Algorithm 1, firstly, normalize the dataset to scale the range of data, and divide(X) is a function which can split the dataset as training set and testing set, in imbalanced classification, to keep the distribution unchanged in these subsets, the positive and negative samples are split separately. Secondly, choose the features with over two distinct values and use them as the $ X_{trainvae} $. Thirdly, train a VAE model and sample from the trained model, suppose the generated samples as $ X_{new} $. Finally, add the discrete features for the generated samples using their nearest neighbors’ discrete features, and these are $ X_{final} $(Table 1).

Table 1. Dataset.

Full size table

4 Experiment

4.1 Dataset and Evaluation

In this paper, all datasets are from UCI [20] Machine Learning Repository, and some of them are multi-label datasets, so we select one class as the minority class and the remaining samples as majority class. The missing values are supplemented by the most frequent value. After that we use normalization, the formula is shown in (12):

$$ x_{inew} \; = \;\frac{{x_{i} - \bar{x}}}{s} $$

(12)

$$ \bar{x}\; = \;\frac{1}{n}\sum\nolimits_{i = 1}^{n} {} x_{i} ,\;s\; = \;\sqrt {\left( {\frac{1}{n - 1}\sum\nolimits_{i}^{n} {} (x_{i} - \bar{x})} \right)^{2} } $$

(13)

In traditional classification method, global accuracy is used as the evaluation, but in the imbalanced problem, this evaluation will mask the classification performance of the minority class. In extreme conditions, assume the dataset only contain 1% minority class, if the classifier decides all samples as majority class, the accuracy still can reach 99%, and however, the recognition rate of the minority samples is 0. In binary imbalanced classification, the confusion matrix in Table 2. Confusion metrics. is often used to evaluate the performance of the classifier.

Table 2. Confusion metrics.

Full size table

Among them, FN is the number of the positive samples misclassified as negative, and FP is the number of the negative samples misclassified as positive. There are some new evaluation metrics based on confusion matrix to calculate the accuracy and recall of imbalance data such as F-value, G-mean [21].

$$ precision\; = \;\frac{TP}{TP + FP} $$

(14)

$$ recall\; = \;\frac{TP}{TP + FN} $$

(15)

$$ F - value\; = \; \frac{{(1 + \beta^{2} )\, \times \,recall\, \times \,precision}}{{\beta^{2} \, \times \,recall\, + \,precision}} $$

(16)

Where $ \beta \;{ \in }\;\left[ {0,\; + \infty } \right]. $

$$ Gmean\; = \;\sqrt {\frac{\text{TP}}{\text{TP + FN}}\, \times \,\frac{\text{TN}}{\text{TN + FP}}} $$

(17)

In this experiment, we choose $ \beta \; = \;1 $ for F-value, it is the average between recall and precision. Gmean is the geometric mean of the classification accuracy of the minority class and majority class. Only when the precision of minority class and precision of majority are high at the same time, gmean will be maximum.

4.2 Experiment Results

In this paper, we compare different oversampling algorithms such as NDO-sampling [17] and random interpolation algorithm SMOTE [12] (SMO). The classifier is naïve Bayes to reduce the impact of classifier’s parameters on classification performance. To reduce the randomness in the final results, each algorithm calculates the average of 10 times with 10-fold cross-validation. The results of NDO are from the corresponding paper, and k = 5 in SMOTE, the structure of the proposed method is shown in Fig. 1, we use the random sample in generating new samples.

The result shown in the Table 3 indicate that vae performs better in generating samples than NDO and SMOTE when the number of oversampling is the same, as the VAE can generate more reasonable samples with more information. With the growth of oversampling rate, all the sampling methods can help to improve the classification performance, which indicate that the original minority samples don’t contain enough information for a classifier to recognize them correctly from the negative samples.

Table 3. F1-min of different algorithms and oversampling rate.

Full size table

In the meanwhile, from the result in Table 4, compared with the traditional oversampling algorithms which sacrifice some majority samples to ensure the classification performance of minority, the proposed method can guarantee the rational distribution of synthetic samples and improve the classification performance of the majority samples, which indicates a stronger classifier.

Table 4. F1-maj of different algorithms and oversampling rate.

Full size table

The proposed method can produce more reasonable samples, which can be concluded form the result shown in Table 5, as the classifier trained with the samples generated by the proposed method can get a better overall classification performance, as the $ Gmean $ is the geometric mean of the accuracy of the minority samples and the majority samples, and with a higher oversampling rate, the classifier gets a best result with the proposed oversampling method.

Table 5. Gmean of different algorithms and oversampling rate.

Full size table

The experimental results also show that for all oversampling methods, a higher oversampling rate can lead to a better classification performance, when the minority samples after oversampling are equal to the majority ones in size, the best classification performance is reached, this indicates that the size has limited effect on the classification performance, more informative samples and stronger classifier play a bigger role.

5 Conclusion

In this paper, we propose an oversampling algorithm based on VAE, in order to make full use of the distribution information in the dataset, it can generate more reasonable samples with no prior assumption of the real distribution nor the assumption that the features are independent, what’s more, we separate the features into discrete and continuous ones, use the nearest discrete features as the features of the generated samples, to generate samples with real meaning as can as possible. The experiment results prove the effectiveness of the proposed method, it can improve the overall performance rather than the minority samples. The sampling is still too rough to guarantee the generated samples’ impact on the classifiers, and the future work is to overcome this drawback.

References

Wang, Y., Li, X., Tao, B.: Improving classification of mature microRNA by solving class imbalance problem. Scientific reports (2016)
Google Scholar
Stegmayer, G., Yones, C., Kamenetzky, L., Milone, D.H.: High class-imbalance in pre-miRNA prediction: a novel approach based on deepSOM. IEEE/ACM Trans. Comput. Biol. Bioinf. 14, 1316–1326 (2016)
Article Google Scholar
Leichtle, T., Geiß, C., Lakes, T., Taubenböck, H.: Class imbalance in unsupervised change detection – a diagnostic analysis from urban remote sensing. Int. J. Appl. Earth Obs. Geoinf. 60, 83–98 (2017)
Article Google Scholar
Li, C., Liu, S.: A comparative study of the class imbalance problem in Twitter spam detection. Concurr. Comput. Pract. Exp. 30(4), e4281 (2018)
Article Google Scholar
Singh, S., Liu, Y., Ding, W., Li, Z.: Empirical Evaluation of Big Data Analytics using Design of Experiment: Case Studies on Telecommunication Data (2016)
Google Scholar
Hale, M.L., Walter, C., Lin, J., Gamble, R.F.: A Priori Prediction of Phishing Victimization Based on Structural Content Factors (2017)
Article Google Scholar
Zhang, C., Wang, G., Zhou, Y., Jiang, J.: A new approach for imbalanced data classification based on minimize loss learning. In: IEEE Second International Conference on Data Science in Cyberspace, pp. 82–87 (2017)
Google Scholar
Provost, F.: Machine learning from imbalanced data sets 101 (extended abstract). In: 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 435–439 (2008)
Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: International Conference on Machine Learning, p. 104 (2004)
Google Scholar
Donoho, D.L., Tanner, J.: Precise undersampling theorems. Proc. IEEE 98(6), 913–924 (2010)
Article Google Scholar
Olken, F., Rotem, D.: Random sampling from databases: a survey. Stat. Comput. 5(1), 25–42 (1995)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
MATH Google Scholar
Zhang, C., Guo, J., Lu, J.: Research on classification method of high-dimensional class-imbalanced data sets based on SVM. In: IEEE Second International Conference on Data Science in Cyberspace, pp. 60–67 (2017)
Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
Gao, M., Hong, X., Chen, S., Harris, C.J.: Probability density function estimation based over-sampling for imbalanced two-class problems. In: International Joint Conference on Neural Networks, pp. 1–8 (2012)
Google Scholar
Chen, S.: A generalized Gaussian distribution based uncertainty sampling approach and its application in actual evapotranspiration assimilation. J. Hydrol. 552, 745–764 (2017)
Article Google Scholar
Zhang, H., Wang, Z.: A normal distribution-based over-sampling approach to imbalanced data classification. In: Tang, J., King, I., Chen, L., Wang, J. (eds.) ADMA 2011. LNCS (LNAI), vol. 7120, pp. 83–96. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25853-4_7
Chapter Google Scholar
Li, D.C., Hu, S.C., Lin, L.S., Yeh, C.W.: Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. Plos One 12(8), (2017)
Article Google Scholar
Diederik, P.K., Max, W.: Auto-Encoding Variational Bayes
Google Scholar
Amini, M.R., Usunier, N., Goutte, C.: http://archive.ics.uci.edu/ml/datasets.html. Accessed 22 Mar 2018
Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Know. Discov. 28(1), 92–122 (2014)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This study was supported by the Shenzhen Research Council (Grant No. JSGG20170822160842949, JCYJ20170307151518535, JCYJ20160226201453085, JCYJ20170307151831260).

Author information

Authors and Affiliations

Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Chunkai Zhang, Ying Zhou, Yingyang Chen, Yepeng Deng & Xuan Wang
Hamline University, 1536 Hewitt Avenue, St. Paul, USA
Lifeng Dong
Sichuan University, Chengdu, Sichuan, China
Haoyu Wei

Authors

Chunkai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yingyang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yepeng Deng
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lifeng Dong
View author publications
You can also search for this author in PubMed Google Scholar
Haoyu Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunkai Zhang .

Editor information

Editors and Affiliations

Huawei Technologies CO., Ltd, Shenzhen, China
Min Luo
Kingdee International Software Group CO. Ltd, Shenzhen, China
Liang-Jie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C. et al. (2018). Over-Sampling Algorithm Based on VAE in Imbalanced Classification. In: Luo, M., Zhang, LJ. (eds) Cloud Computing – CLOUD 2018. CLOUD 2018. Lecture Notes in Computer Science(), vol 10967. Springer, Cham. https://doi.org/10.1007/978-3-319-94295-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-94295-7_23
Published: 19 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94294-0
Online ISBN: 978-3-319-94295-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Over-Sampling Algorithm Based on VAE in Imbalanced Classification

Abstract

Similar content being viewed by others

Similarity Majority Under-Sampling Technique for Easing Imbalanced Classification Problem

A Review of the Oversampling Techniques in Class Imbalance Problem

Important sampling based active learning for imbalance classification

Keywords

1 Introduction

2 Related Work

3 The Proposed Method

4 Experiment

4.1 Dataset and Evaluation

4.2 Experiment Results

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Over-Sampling Algorithm Based on VAE in Imbalanced Classification

Abstract

Similar content being viewed by others

Similarity Majority Under-Sampling Technique for Easing Imbalanced Classification Problem

A Review of the Oversampling Techniques in Class Imbalance Problem

Important sampling based active learning for imbalance classification

Keywords

1 Introduction

2 Related Work

3 The Proposed Method

4 Experiment

4.1 Dataset and Evaluation

4.2 Experiment Results

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation