Abstract
In this paper, new methods for solving imbalanced classification problems based on prototypes are proposed. Using similarity relations for the granulation of the universe, similarity classes are generated and a prototype is selected for each similarity class. Experimental results show that the performance of our methods is statistically superior to other imbalanced methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kuang, D., Ling, C.X., Du, J.: Foundation of mining class-imbalanced data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2012)
García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl.-Based Syst. 25(1), 22–34 (2012)
Garcia-Pedrajas, N., Perez-Rodriguez, J., de Haro-Garcia, A.: OligoIS: scalable instance selection for class-imbalanced data sets. IEEE Trans. Cybern. 43(1), 332–346 (2013)
Thanathamathee, P., Lursinsap, C.: Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recogn. Lett. 34(12), 1339–1347 (2013)
McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM (2005)
Liu, W., et al.: A robust decision tree algorithm for imbalanced data sets. In: Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM (2010)
Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning. Knowl.-Based Syst. 21(8), 753–763 (2008)
Hong, X., Chen, S., Harris, C.J.: A kernel-based two-class classifier for imbalanced data sets. IEEE Trans. Neural Netw. 18(1), 28–41 (2007)
Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)
García-Pedrajas, N., García-Osorio, C.: Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections. Prog. Artif. Intell. 2(1), 29–44 (2013)
Galar, M., et al.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)
Ertekin, S., Huang, J., Giles. Active learning for class imbalance problem. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2007)
Di Martino, M., et al.: Novel classifier scheme for imbalanced problems. Pattern Recogn. Lett. 34(10), 1146–1151 (2013)
Bezdek, J.C., Kuncheva, L.I.: Nearest prototype classifier designs: an experimental study. Int. J. Intell. Syst. 16(12), 1445–1473 (2001)
Triguero, I., et al.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(1), 86–100 (2012)
Pawlak, Z., et al.: Rough sets. Commun. ACM 38(11), 88–95 (1995)
Bello, R., Luis Verdegay, J.: Los conjuntos aproximados en el contexto de la Soft Computing. Revista Cubana de Ciencias Inf. 4 (2010)
Fernández Hernández, Y.B., et al.: An approach for prototype generation based on similarity relations for problems of classification. Comput. Syst. 19(1), 109–118 (2015)
Frias, M., et al.: Prototypes selection based on similarity relations for classification problems. In: Engineering Applications—International Congress on Engineering (WEA), Bogota. IEEE (2015)
Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences. Citeseer (2000)
Bello-García, M., García-Lorenzo, M.M., Bello, R.: A method for building prototypes in the nearest prototype approach based on similarity relations for problems of function approximation. In: Advances in Artificial Intelligence, pp. 39–50. Springer (2012)
Filiberto, Y., et al.: A method to build similarity relations into extended rough set theory. In: 2010 10th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE (2010)
Zhao, J.H., Li, X., Dong, Z.Y.: Online rare events detection. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2007)
Lee, Y.-H., et al.: A preclustering-based ensemble learning technique for acute appendicitis diagnoses. Artif. Intell. Med. 58(2), 115–124 (2013)
Quinlan, J.R.: C4.5: Programming for machine learning. Morgan Kauffmann 38 (1993)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl 6(1), 20–29 (2004)
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Ramentol, E., et al.: SMOTE-FRST: a new resampling method using fuzzy rough set theory. In: 10th International FLINS Conference on Uncertainty Modelling in Knowledge Engineering and Decision Making (to appear) (2012)
Ramentol, E., et al.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)
Filiberto, Y., et al.: An analysis about the measure quality of similarity and its applications in machine learning. In: Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support. Atlantis Press (2013)
Filiberto, Y., et al.: Algoritmo para el aprendizaje de reglas de clasificación basado en la teoría de los conjuntos aproximados extendida. Dyna 78(169), 62–70 (2011)
Fernandez, Y.B., et al.: Effects of using reducts in the performance of the irbasir algorithm. Dyna 80(182), 182–190 (2013)
Filiberto, Y., et al.: Using PSO and RST to predict the resistant capacity of connections in composite structures. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) (2010)
Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17 (2011)
García, S., et al.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian J. Stat. 65–70 (1979)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Alvarez, Y.R., Mota, Y.C., Cabrera, Y.F., Hilarión, I.G., Hernández, Y.F., Dominguez, M.F. (2019). Similar Prototype Methods for Class Imbalanced Data Classification. In: Bello, R., Falcon, R., Verdegay, J. (eds) Uncertainty Management with Fuzzy and Rough Sets. Studies in Fuzziness and Soft Computing, vol 377. Springer, Cham. https://doi.org/10.1007/978-3-030-10463-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-10463-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10462-7
Online ISBN: 978-3-030-10463-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)