Similar Prototype Methods for Class Imbalanced Data Classification

Alvarez, Yanela Rodríguez; Mota, Yailé Caballero; Cabrera, Yaima Filiberto; Hilarión, Isabel García; Hernández, Yumilka Fernández; Dominguez, Mabel Frias

doi:10.1007/978-3-030-10463-4_11

Similar Prototype Methods for Class Imbalanced Data Classification

Yanela Rodríguez Alvarez⁵,
Yailé Caballero Mota⁵,
Yaima Filiberto Cabrera⁵,
Isabel García Hilarión⁵,
Yumilka Fernández Hernández⁵ &
…
Mabel Frias Dominguez⁵

Chapter
First Online: 23 January 2019

368 Accesses
4 Citations

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 377))

Abstract

In this paper, new methods for solving imbalanced classification problems based on prototypes are proposed. Using similarity relations for the granulation of the universe, similarity classes are generated and a prototype is selected for each similarity class. Experimental results show that the performance of our methods is statistically superior to other imbalanced methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Kuang, D., Ling, C.X., Du, J.: Foundation of mining class-imbalanced data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2012)
Google Scholar
García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl.-Based Syst. 25(1), 22–34 (2012)
Article Google Scholar
Garcia-Pedrajas, N., Perez-Rodriguez, J., de Haro-Garcia, A.: OligoIS: scalable instance selection for class-imbalanced data sets. IEEE Trans. Cybern. 43(1), 332–346 (2013)
Article Google Scholar
Thanathamathee, P., Lursinsap, C.: Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recogn. Lett. 34(12), 1339–1347 (2013)
Article Google Scholar
McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM (2005)
Google Scholar
Liu, W., et al.: A robust decision tree algorithm for imbalanced data sets. In: Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM (2010)
Google Scholar
Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning. Knowl.-Based Syst. 21(8), 753–763 (2008)
Article Google Scholar
Hong, X., Chen, S., Harris, C.J.: A kernel-based two-class classifier for imbalanced data sets. IEEE Trans. Neural Netw. 18(1), 28–41 (2007)
Article Google Scholar
Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)
Article Google Scholar
García-Pedrajas, N., García-Osorio, C.: Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections. Prog. Artif. Intell. 2(1), 29–44 (2013)
Article Google Scholar
Galar, M., et al.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)
Article Google Scholar
Ertekin, S., Huang, J., Giles. Active learning for class imbalance problem. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2007)
Google Scholar
Di Martino, M., et al.: Novel classifier scheme for imbalanced problems. Pattern Recogn. Lett. 34(10), 1146–1151 (2013)
Article Google Scholar
Bezdek, J.C., Kuncheva, L.I.: Nearest prototype classifier designs: an experimental study. Int. J. Intell. Syst. 16(12), 1445–1473 (2001)
Article Google Scholar
Triguero, I., et al.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(1), 86–100 (2012)
Article Google Scholar
Pawlak, Z., et al.: Rough sets. Commun. ACM 38(11), 88–95 (1995)
Article Google Scholar
Bello, R., Luis Verdegay, J.: Los conjuntos aproximados en el contexto de la Soft Computing. Revista Cubana de Ciencias Inf. 4 (2010)
Google Scholar
Fernández Hernández, Y.B., et al.: An approach for prototype generation based on similarity relations for problems of classification. Comput. Syst. 19(1), 109–118 (2015)
Google Scholar
Frias, M., et al.: Prototypes selection based on similarity relations for classification problems. In: Engineering Applications—International Congress on Engineering (WEA), Bogota. IEEE (2015)
Google Scholar
Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences. Citeseer (2000)
Google Scholar
Bello-García, M., García-Lorenzo, M.M., Bello, R.: A method for building prototypes in the nearest prototype approach based on similarity relations for problems of function approximation. In: Advances in Artificial Intelligence, pp. 39–50. Springer (2012)
Google Scholar
Filiberto, Y., et al.: A method to build similarity relations into extended rough set theory. In: 2010 10th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE (2010)
Google Scholar
Zhao, J.H., Li, X., Dong, Z.Y.: Online rare events detection. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2007)
Google Scholar
Lee, Y.-H., et al.: A preclustering-based ensemble learning technique for acute appendicitis diagnoses. Artif. Intell. Med. 58(2), 115–124 (2013)
Article Google Scholar
Quinlan, J.R.: C4.5: Programming for machine learning. Morgan Kauffmann 38 (1993)
Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl 6(1), 20–29 (2004)
Article Google Scholar
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)
Article MathSciNet Google Scholar
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Ramentol, E., et al.: SMOTE-FRST: a new resampling method using fuzzy rough set theory. In: 10th International FLINS Conference on Uncertainty Modelling in Knowledge Engineering and Decision Making (to appear) (2012)
Chapter Google Scholar
Ramentol, E., et al.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)
Article Google Scholar
Filiberto, Y., et al.: An analysis about the measure quality of similarity and its applications in machine learning. In: Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support. Atlantis Press (2013)
Google Scholar
Filiberto, Y., et al.: Algoritmo para el aprendizaje de reglas de clasificación basado en la teoría de los conjuntos aproximados extendida. Dyna 78(169), 62–70 (2011)
Google Scholar
Fernandez, Y.B., et al.: Effects of using reducts in the performance of the irbasir algorithm. Dyna 80(182), 182–190 (2013)
Google Scholar
Filiberto, Y., et al.: Using PSO and RST to predict the resistant capacity of connections in composite structures. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) (2010)
Chapter Google Scholar
Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17 (2011)
Google Scholar
García, S., et al.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)
Article Google Scholar
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Article Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian J. Stat. 65–70 (1979)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Computación, Universidad de Camagüey, Circunvalación Norte Km 5 ½, Camagüey, Cuba
Yanela Rodríguez Alvarez, Yailé Caballero Mota, Yaima Filiberto Cabrera, Isabel García Hilarión, Yumilka Fernández Hernández & Mabel Frias Dominguez

Authors

Yanela Rodríguez Alvarez
View author publications
You can also search for this author in PubMed Google Scholar
Yailé Caballero Mota
View author publications
You can also search for this author in PubMed Google Scholar
Yaima Filiberto Cabrera
View author publications
You can also search for this author in PubMed Google Scholar
Isabel García Hilarión
View author publications
You can also search for this author in PubMed Google Scholar
Yumilka Fernández Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Mabel Frias Dominguez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanela Rodríguez Alvarez .

Editor information

Editors and Affiliations

Department of Computer Science, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara, Cuba
Rafael Bello
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
Rafael Falcon
Department of Computer Science and Artificial Intelligence, Technical School of Informatics and Telecommunications Engineering, University of Granada, Granada, Spain
José Luis Verdegay

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Alvarez, Y.R., Mota, Y.C., Cabrera, Y.F., Hilarión, I.G., Hernández, Y.F., Dominguez, M.F. (2019). Similar Prototype Methods for Class Imbalanced Data Classification. In: Bello, R., Falcon, R., Verdegay, J. (eds) Uncertainty Management with Fuzzy and Rough Sets. Studies in Fuzziness and Soft Computing, vol 377. Springer, Cham. https://doi.org/10.1007/978-3-030-10463-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-10463-4_11
Published: 23 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10462-7
Online ISBN: 978-3-030-10463-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics