Similar Prototype Methods for Class Imbalanced Data Classification

  • Yanela Rodríguez AlvarezEmail author
  • Yailé Caballero Mota
  • Yaima Filiberto Cabrera
  • Isabel García Hilarión
  • Yumilka Fernández Hernández
  • Mabel Frias Dominguez
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 377)


In this paper, new methods for solving imbalanced classification problems based on prototypes are proposed. Using similarity relations for the granulation of the universe, similarity classes are generated and a prototype is selected for each similarity class. Experimental results show that the performance of our methods is statistically superior to other imbalanced methods.


Imbalanced classification Prototype selection Prototype generation Classification Similarity relations 


  1. 1.
    Kuang, D., Ling, C.X., Du, J.: Foundation of mining class-imbalanced data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2012)Google Scholar
  2. 2.
    García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl.-Based Syst. 25(1), 22–34 (2012)CrossRefGoogle Scholar
  3. 3.
    Garcia-Pedrajas, N., Perez-Rodriguez, J., de Haro-Garcia, A.: OligoIS: scalable instance selection for class-imbalanced data sets. IEEE Trans. Cybern. 43(1), 332–346 (2013)CrossRefGoogle Scholar
  4. 4.
    Thanathamathee, P., Lursinsap, C.: Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recogn. Lett. 34(12), 1339–1347 (2013)CrossRefGoogle Scholar
  5. 5.
    McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM (2005)Google Scholar
  6. 6.
    Liu, W., et al.: A robust decision tree algorithm for imbalanced data sets. In: Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM (2010)Google Scholar
  7. 7.
    Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning. Knowl.-Based Syst. 21(8), 753–763 (2008)CrossRefGoogle Scholar
  8. 8.
    Hong, X., Chen, S., Harris, C.J.: A kernel-based two-class classifier for imbalanced data sets. IEEE Trans. Neural Netw. 18(1), 28–41 (2007)CrossRefGoogle Scholar
  9. 9.
    Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)CrossRefGoogle Scholar
  10. 10.
    García-Pedrajas, N., García-Osorio, C.: Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections. Prog. Artif. Intell. 2(1), 29–44 (2013)CrossRefGoogle Scholar
  11. 11.
    Galar, M., et al.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)CrossRefGoogle Scholar
  12. 12.
    Ertekin, S., Huang, J., Giles. Active learning for class imbalance problem. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2007)Google Scholar
  13. 13.
    Di Martino, M., et al.: Novel classifier scheme for imbalanced problems. Pattern Recogn. Lett. 34(10), 1146–1151 (2013)CrossRefGoogle Scholar
  14. 14.
    Bezdek, J.C., Kuncheva, L.I.: Nearest prototype classifier designs: an experimental study. Int. J. Intell. Syst. 16(12), 1445–1473 (2001)CrossRefGoogle Scholar
  15. 15.
    Triguero, I., et al.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(1), 86–100 (2012)CrossRefGoogle Scholar
  16. 16.
    Pawlak, Z., et al.: Rough sets. Commun. ACM 38(11), 88–95 (1995)CrossRefGoogle Scholar
  17. 17.
    Bello, R., Luis Verdegay, J.: Los conjuntos aproximados en el contexto de la Soft Computing. Revista Cubana de Ciencias Inf. 4 (2010)Google Scholar
  18. 18.
    Fernández Hernández, Y.B., et al.: An approach for prototype generation based on similarity relations for problems of classification. Comput. Syst. 19(1), 109–118 (2015)Google Scholar
  19. 19.
    Frias, M., et al.: Prototypes selection based on similarity relations for classification problems. In: Engineering Applications—International Congress on Engineering (WEA), Bogota. IEEE (2015)Google Scholar
  20. 20.
    Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences. Citeseer (2000)Google Scholar
  21. 21.
    Bello-García, M., García-Lorenzo, M.M., Bello, R.: A method for building prototypes in the nearest prototype approach based on similarity relations for problems of function approximation. In: Advances in Artificial Intelligence, pp. 39–50. Springer (2012)Google Scholar
  22. 22.
    Filiberto, Y., et al.: A method to build similarity relations into extended rough set theory. In: 2010 10th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE (2010)Google Scholar
  23. 23.
    Zhao, J.H., Li, X., Dong, Z.Y.: Online rare events detection. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2007)Google Scholar
  24. 24.
    Lee, Y.-H., et al.: A preclustering-based ensemble learning technique for acute appendicitis diagnoses. Artif. Intell. Med. 58(2), 115–124 (2013)CrossRefGoogle Scholar
  25. 25.
    Quinlan, J.R.: C4.5: Programming for machine learning. Morgan Kauffmann 38 (1993)Google Scholar
  26. 26.
    Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl 6(1), 20–29 (2004)CrossRefGoogle Scholar
  27. 27.
    Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  29. 29.
    Ramentol, E., et al.: SMOTE-FRST: a new resampling method using fuzzy rough set theory. In: 10th International FLINS Conference on Uncertainty Modelling in Knowledge Engineering and Decision Making (to appear) (2012)CrossRefGoogle Scholar
  30. 30.
    Ramentol, E., et al.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)CrossRefGoogle Scholar
  31. 31.
    Filiberto, Y., et al.: An analysis about the measure quality of similarity and its applications in machine learning. In: Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support. Atlantis Press (2013)Google Scholar
  32. 32.
    Filiberto, Y., et al.: Algoritmo para el aprendizaje de reglas de clasificación basado en la teoría de los conjuntos aproximados extendida. Dyna 78(169), 62–70 (2011)Google Scholar
  33. 33.
    Fernandez, Y.B., et al.: Effects of using reducts in the performance of the irbasir algorithm. Dyna 80(182), 182–190 (2013)Google Scholar
  34. 34.
    Filiberto, Y., et al.: Using PSO and RST to predict the resistant capacity of connections in composite structures. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) (2010)CrossRefGoogle Scholar
  35. 35.
    Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17 (2011)Google Scholar
  36. 36.
    García, S., et al.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)CrossRefGoogle Scholar
  37. 37.
    Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)CrossRefGoogle Scholar
  38. 38.
    Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian J. Stat. 65–70 (1979)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Yanela Rodríguez Alvarez
    • 1
    Email author
  • Yailé Caballero Mota
    • 1
  • Yaima Filiberto Cabrera
    • 1
  • Isabel García Hilarión
    • 1
  • Yumilka Fernández Hernández
    • 1
  • Mabel Frias Dominguez
    • 1
  1. 1.Departamento de ComputaciónUniversidad de CamagüeyCamagüeyCuba

Personalised recommendations