Skip to main content

Similar Prototype Methods for Class Imbalanced Data Classification

  • Chapter
  • First Online:

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 377))

Abstract

In this paper, new methods for solving imbalanced classification problems based on prototypes are proposed. Using similarity relations for the granulation of the universe, similarity classes are generated and a prototype is selected for each similarity class. Experimental results show that the performance of our methods is statistically superior to other imbalanced methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Kuang, D., Ling, C.X., Du, J.: Foundation of mining class-imbalanced data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2012)

    Google Scholar 

  2. García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl.-Based Syst. 25(1), 22–34 (2012)

    Article  Google Scholar 

  3. Garcia-Pedrajas, N., Perez-Rodriguez, J., de Haro-Garcia, A.: OligoIS: scalable instance selection for class-imbalanced data sets. IEEE Trans. Cybern. 43(1), 332–346 (2013)

    Article  Google Scholar 

  4. Thanathamathee, P., Lursinsap, C.: Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recogn. Lett. 34(12), 1339–1347 (2013)

    Article  Google Scholar 

  5. McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM (2005)

    Google Scholar 

  6. Liu, W., et al.: A robust decision tree algorithm for imbalanced data sets. In: Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM (2010)

    Google Scholar 

  7. Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning. Knowl.-Based Syst. 21(8), 753–763 (2008)

    Article  Google Scholar 

  8. Hong, X., Chen, S., Harris, C.J.: A kernel-based two-class classifier for imbalanced data sets. IEEE Trans. Neural Netw. 18(1), 28–41 (2007)

    Article  Google Scholar 

  9. Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)

    Article  Google Scholar 

  10. García-Pedrajas, N., García-Osorio, C.: Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections. Prog. Artif. Intell. 2(1), 29–44 (2013)

    Article  Google Scholar 

  11. Galar, M., et al.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)

    Article  Google Scholar 

  12. Ertekin, S., Huang, J., Giles. Active learning for class imbalance problem. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2007)

    Google Scholar 

  13. Di Martino, M., et al.: Novel classifier scheme for imbalanced problems. Pattern Recogn. Lett. 34(10), 1146–1151 (2013)

    Article  Google Scholar 

  14. Bezdek, J.C., Kuncheva, L.I.: Nearest prototype classifier designs: an experimental study. Int. J. Intell. Syst. 16(12), 1445–1473 (2001)

    Article  Google Scholar 

  15. Triguero, I., et al.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(1), 86–100 (2012)

    Article  Google Scholar 

  16. Pawlak, Z., et al.: Rough sets. Commun. ACM 38(11), 88–95 (1995)

    Article  Google Scholar 

  17. Bello, R., Luis Verdegay, J.: Los conjuntos aproximados en el contexto de la Soft Computing. Revista Cubana de Ciencias Inf. 4 (2010)

    Google Scholar 

  18. Fernández Hernández, Y.B., et al.: An approach for prototype generation based on similarity relations for problems of classification. Comput. Syst. 19(1), 109–118 (2015)

    Google Scholar 

  19. Frias, M., et al.: Prototypes selection based on similarity relations for classification problems. In: Engineering Applications—International Congress on Engineering (WEA), Bogota. IEEE (2015)

    Google Scholar 

  20. Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences. Citeseer (2000)

    Google Scholar 

  21. Bello-García, M., García-Lorenzo, M.M., Bello, R.: A method for building prototypes in the nearest prototype approach based on similarity relations for problems of function approximation. In: Advances in Artificial Intelligence, pp. 39–50. Springer (2012)

    Google Scholar 

  22. Filiberto, Y., et al.: A method to build similarity relations into extended rough set theory. In: 2010 10th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE (2010)

    Google Scholar 

  23. Zhao, J.H., Li, X., Dong, Z.Y.: Online rare events detection. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2007)

    Google Scholar 

  24. Lee, Y.-H., et al.: A preclustering-based ensemble learning technique for acute appendicitis diagnoses. Artif. Intell. Med. 58(2), 115–124 (2013)

    Article  Google Scholar 

  25. Quinlan, J.R.: C4.5: Programming for machine learning. Morgan Kauffmann 38 (1993)

    Google Scholar 

  26. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl 6(1), 20–29 (2004)

    Article  Google Scholar 

  27. Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)

    Article  MathSciNet  Google Scholar 

  28. Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  29. Ramentol, E., et al.: SMOTE-FRST: a new resampling method using fuzzy rough set theory. In: 10th International FLINS Conference on Uncertainty Modelling in Knowledge Engineering and Decision Making (to appear) (2012)

    Chapter  Google Scholar 

  30. Ramentol, E., et al.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)

    Article  Google Scholar 

  31. Filiberto, Y., et al.: An analysis about the measure quality of similarity and its applications in machine learning. In: Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support. Atlantis Press (2013)

    Google Scholar 

  32. Filiberto, Y., et al.: Algoritmo para el aprendizaje de reglas de clasificación basado en la teoría de los conjuntos aproximados extendida. Dyna 78(169), 62–70 (2011)

    Google Scholar 

  33. Fernandez, Y.B., et al.: Effects of using reducts in the performance of the irbasir algorithm. Dyna 80(182), 182–190 (2013)

    Google Scholar 

  34. Filiberto, Y., et al.: Using PSO and RST to predict the resistant capacity of connections in composite structures. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) (2010)

    Chapter  Google Scholar 

  35. Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17 (2011)

    Google Scholar 

  36. García, S., et al.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)

    Article  Google Scholar 

  37. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)

    Article  Google Scholar 

  38. Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian J. Stat. 65–70 (1979)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanela Rodríguez Alvarez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alvarez, Y.R., Mota, Y.C., Cabrera, Y.F., Hilarión, I.G., Hernández, Y.F., Dominguez, M.F. (2019). Similar Prototype Methods for Class Imbalanced Data Classification. In: Bello, R., Falcon, R., Verdegay, J. (eds) Uncertainty Management with Fuzzy and Rough Sets. Studies in Fuzziness and Soft Computing, vol 377. Springer, Cham. https://doi.org/10.1007/978-3-030-10463-4_11

Download citation

Publish with us

Policies and ethics