Skip to main content

Improving SMOTE with Fuzzy Rough Prototype Selection to Detect Noise in Imbalanced Classification Data

  • Conference paper
Advances in Artificial Intelligence – IBERAMIA 2012 (IBERAMIA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7637))

Included in the following conference series:

Abstract

In this paper, we present a prototype selection technique for imbalanced data, Fuzzy Rough Imbalanced Prototype Selection (FRIPS), to improve the quality of the artificial instances generated by the Synthetic Minority Over-sampling TEchnique (SMOTE). Using fuzzy rough set theory, the noise level of each instance is measured, and instances for which the noise level exceeds a certain threshold level are deleted. The threshold is determined using a wrapper approach that evaluates the training Area Under the Curve of candidate subsets. This proposal aims to clean noisy data before applying SMOTE, such that SMOTE can generate high quality artificial data.

Experiments on artificial data show that FRIPS in combination with SMOTE outperforms state-of-the-art methods, and that it particularly performs well in the presence of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explorations 6(1), 20–29 (2004)

    Article  Google Scholar 

  2. Bradley, A.P.: The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  3. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE – Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Chawla, N.W., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE – Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  5. Cover, T., Hart, P.: Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  6. Derrac, J., García, S., Molina, D., Herrera, F.: A Practical Tutorial on the Use of Nonparametric Statistical Tests as a Methodology for Comparing Evolutionary and Swarm Intelligence Algorithms. Swarm and Evolutionary Computation 1(1), 3–18 (2011)

    Article  Google Scholar 

  7. Dubois, D., Prade, H.: Rough Fuzzy Sets and Fuzzy Rough Sets. International Journal of General Systems 17(2-3), 191–209 (1990)

    Article  MATH  Google Scholar 

  8. García, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype Selection for Nearest Neighbor Classification – Taxonomy and Empirical Study. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3), 417–435 (2012)

    Article  Google Scholar 

  9. García, S., Fernández, F., Luengo, J., Herrera, F.: A Study of Statistical Techniques and Performance Measures for Genetics-Based Machine Learning – Accuracy and Interpretability. Soft Computing 13(10), 959–977 (2009)

    Article  Google Scholar 

  10. García, S., Alcalá Fernandez, J., Luengo, J., Herrera, F.: Advanced Nonparametric Tests for Multiple Comparisons in the Design of Experiments in Computational Intelligence and Data Mining – Experimental Analysis of Power. Information Sciences 180(10), 2044–2064 (2010)

    Article  Google Scholar 

  11. Han, H., Wang, W., Mao, B.: Borderline-SMOTE – A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Napierala, K., Stefanowski, J., Wilk, S.: Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. In: Szczuka, M., et al. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB* – A Hybrid Preprocessing Approach Based on Oversampling and Undersampling for High Imbalanced Data-Sets Using Smote and Rough Sets Theory. Knowledge and Information Systems (2011) (in press)

    Google Scholar 

  14. Ramentol, E., Verbiest, N., Bello, R., Caballero, Y., Cornelis, C., Herrera, F.: Smote-frst – A New Resampling Method Using Fuzzy Rough Set Theory. In: 10th International FLINS Conference on Uncertainty Modeling in Knowledge Engineering and Decision Making, FLINS 2012 (in press, 2012)

    Google Scholar 

  15. Stefanowski, J., Wilk, S.: Selective Pre-processing of Imbalanced Data for Improving Classification Performance. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 283–292. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Verbiest, N., Cornelis, C., Herrera, F.: Fuzzy Rough Prototype Selection (submitted)

    Google Scholar 

  17. Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometrics Bulletin 1(6), 80–83 (1945)

    Article  Google Scholar 

  18. Yager, R.R.: On Ordered Weighted Averaging Aggregation Operators in Multicriteria Decisionmaking. IEEE Transactions on Systems, Man and Cybernetics 18(1), 183–190 (1988)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Verbiest, N., Ramentol, E., Cornelis, C., Herrera, F. (2012). Improving SMOTE with Fuzzy Rough Prototype Selection to Detect Noise in Imbalanced Classification Data. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds) Advances in Artificial Intelligence – IBERAMIA 2012. IBERAMIA 2012. Lecture Notes in Computer Science(), vol 7637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34654-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34654-5_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34653-8

  • Online ISBN: 978-3-642-34654-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics