A Dynamic Decision-Making Method Based on Ensemble Methods for Complex Unbalanced Data
Class imbalance has been proven to seriously hinder the precision of many standard learning algorithms. To solve this problem, a number of methods have been proposed, for example, the distance-based balancing ensemble method that learns the unbalanced dataset by converting it into multiple balanced subsets on which sub-classifiers are built. However, the class-imbalance problem is usually accompanied by other data-complexity problems such as class overlap, small disjuncts, and noise instance. Current algorithms developed for primary unbalanced-data problems cannot address the complex-data problems at the same time. Some of these algorithms even exacerbate the class-overlap and small-disjuncts problems after trying to address the complex-data problem. On this account, this study proposes a dynamic ensemble selection decision-making (DESD) method. The DESD first repeats the random-splitting technique to divide the dataset into multiple balanced subsets that contain no or few class-overlap and small-disjunct problems. Then, the classifiers are built on these subsets to compose the candidate classifier pool. To select the most appropriate classifiers from the candidate classifier pool for the classification of each query instance, we use a weighting mechanism to highlight the competence of classifiers that are more powerful in classifying minority instances belonging to the local region in which the query instance is located. Tests with 15 standard datasets from public repositories are performed to demonstrate the effectiveness of the DESD method. The results show that the precision of the DESD method outperforms other ensemble methods.
KeywordsDynamic ensemble selection Unbalanced dataset Classification
This work is supported by the National Natural Science Foundation of China (Nos. 61702070, 61751203, 61772100, 61672121, 61572093, 61802040), Program for Changjiang Scholars and Innovative Research Team in University (No. IRT_15R07), the Program for Liaoning Innovative Research Team in University (No. LT2015002), the Basic Research Program of the Key Lab in Liaoning Province Educational Department (No. LZ2015004).
- 8.García, V., Sánchez, J.S., Ochoa Domínguez, H.J., Cleofas-Sánchez, L.: Dissimilarity-based learning from imbalanced data with small disjuncts and noise. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 370–378. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_42CrossRefGoogle Scholar
- 16.Cruz, R.M.O., Sabourin, R., Cavalcanti, G.D.C.: META-DES. H: a dynamic ensemble selection technique using meta-learning and a dynamic weighting approach. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2015)Google Scholar
- 18.Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple Valued Logic Soft Comput. 17, 255–287 (2011)Google Scholar