Advertisement

A Classification Method for Imbalanced Data Based on SMOTE and Fuzzy Rough Nearest Neighbor Algorithm

  • Weibin ZhaoEmail author
  • Mengting Xu
  • Xiuyi Jia
  • Lin Shang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9437)

Abstract

FRNN (Fuzzy Rough Nearest Neighbor) algorithm has exhibited good performance in classifying data with inadequate features. However, FRNN does not perform well on imbalanced data. To overcome this problem, this paper introduces a combination method. An improved SMOTE method is adopted to balance data and FRNN is applied as the classification method. Experiments show that the combination method can obtain a better result rather than classical FRNN algorithm.

Keywords

Imbalanced data SMOTE Fuzzy rough set Nearest neighbor Classification 

Notes

Acknowledgements

We would like to acknowledge the support for this work from the National Natural Science Foundation of China (Grant Nos. 61403200, 61170180), Natural Science Foundation of Jiangsu Province (Grant No.BK20140800).

References

  1. 1.
    Cormack, R.M.: A review of classification. J. Roy. Stat. Soc. Ser. A (General), 321–367 (1971)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Kotsiantis, S.B., Zaharakis, I.: Supervised machine learning: a review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng., 3–24 (2007)Google Scholar
  3. 3.
    Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 Workshop on Learning from Imbalanced Data Sets II, vol. 2, pp. 1–2 (2003)Google Scholar
  4. 4.
    Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In: AAAI Workshop on Learning from Imbalanced Data Sets, pp. 10–15 (2000)Google Scholar
  5. 5.
    Duman, E., Ekinci, Y., Tanrıverdi, A.: Comparing alternative classifiers for database marketing: the case of imbalanced datasets. Expert Syst. Appl. 39, 48–53 (2012)CrossRefGoogle Scholar
  6. 6.
    Khreich, W., Granger, E., Miri, A., Sabourin, R.: Iterative boolean combination of classifiers in the ROC space. In: An Application to Anomaly Detection with HMMs. Pattern Recogn. 43, 2732–2752 (2010)CrossRefGoogle Scholar
  7. 7.
    Lee, Y., Hu, P., Cheng, T., Huang, T., Chuang, W.: A preclustering-based ensemble learning technique for acute appendicitis diagnoses. Artif. Intell. Med. 58, 115–124 (2013)CrossRefGoogle Scholar
  8. 8.
    Ramentol, E., Vluymans, S., Verbiest, N., Caballero, Y.: IFROWANN: imbalanced fuzzy-rough ordered weighted average nearest neighbor classification, pp. 1–15 (2014)Google Scholar
  9. 9.
    Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6, 7–19 (2004)CrossRefGoogle Scholar
  10. 10.
    Hwang, J., Park, S., Kim, E.: A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Syst. Appl. 38, 8580–8585 (2011)CrossRefGoogle Scholar
  11. 11.
    Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40, 3358–3378 (2007)CrossRefGoogle Scholar
  12. 12.
    Jensen, R., Cornelis, C.: Fuzzy-rough nearest neighbour classification. In: Peters, J.F., Skowron, A., Chan, C.-C., Grzymala-Busse, J.W., Ziarko, W.P. (eds.) Transactions on Rough Sets XIII. LNCS, vol. 6499, pp. 56–72. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  13. 13.
    Sarkar, M.: Rough-fuzzy functions in classification. Fuzzy Sets Syst. 132(3), 353–369 (2002)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Liu, X., Liu, S.: New oversampling algorithm DB-SMOTE. Comput. Eng. Appl., 92–95 (2014)Google Scholar
  15. 15.
    Sáez, J., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015)CrossRefGoogle Scholar
  16. 16.
    Barandela, R., Sánchez, J., Garcıa, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recogn. 36, 849–851 (2003)CrossRefGoogle Scholar
  17. 17.
    Liu, J., Hu, Q.: A weighted rough set based method developed for class imbalance learning. Inf. Sci. 178, 1235–1256 (2008)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13, 21–27 (1967)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Weibin Zhao
    • 1
    Email author
  • Mengting Xu
    • 1
  • Xiuyi Jia
    • 2
  • Lin Shang
    • 1
  1. 1.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.School of Computer Science and EngineeringNanjing University of Science and TechnologyNanjingChina

Personalised recommendations