Advertisement

Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine

  • Ratchakoon PruengkarnEmail author
  • Kok Wai Wong
  • Chun Che Fung
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10638)

Abstract

A hybrid sampling technique is proposed by combining Fuzzy C-Mean Clustering and Synthetic Minority Oversampling Technique (FCMSMT) for tackling the imbalanced multiclass classification problem. The mean number of classes is used as the number of instances for applying undersampling and oversampling. Using the mean as the fixed number of the required instances for each class can prevent the within-class imbalance data from being eliminated erroneously during undersampling. This technique can decrease both within-class and between-class errors, and thus can increase the classification performance. The study was conducted using eight benchmark datasets from KEEL and UCI repositories and the results were compared against three major classifiers based on G-mean and AUC measurements. The results reveal that the proposed technique could handle most of the multiclass imbalanced datasets used in the experiments for all classifiers and retain the integrity of the original data.

Keywords

FCM SMOTE FSVM Imbalanced data 

References

  1. 1.
    López, V., Fernández, A., Herrera, F.: On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed. Inf. Sci. 257, 1–13 (2014)CrossRefGoogle Scholar
  2. 2.
    Agrawal, A., Viktor, H.L., Paquet, E.: SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In: 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3 K), pp. 226–234. Lisbon (2015)Google Scholar
  3. 3.
    Jeatrakul, P., Wong, K.W., Fung, C.C.: Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6444, pp. 152–159. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-17534-3_19 CrossRefGoogle Scholar
  4. 4.
    Ou, G., Murphey, Y.L.: Multi-class pattern classification using neural networks. Pattern Recogn. 40(1), 4–18 (2007)CrossRefzbMATHGoogle Scholar
  5. 5.
    Fernández, A., del Jesus, M.J., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS, vol. 6178, pp. 89–98. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-14049-5_10 CrossRefGoogle Scholar
  6. 6.
    Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 1119–1130 (2012)CrossRefGoogle Scholar
  7. 7.
    Rahman, M., Davis, D.N.: Addressing the class imbalance problem in medical datasets. Int. J. Mach. Learn. Comput. 3(2), 224–228 (2013)CrossRefGoogle Scholar
  8. 8.
    Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409–410, 17–26 (2017)CrossRefGoogle Scholar
  9. 9.
    Kocyigit, Y., Seker, H.: Imbalanced data classifier by using ensemble fuzzy c-means clustering. In: The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2012), pp. 952–955. Hong Kong (2012)Google Scholar
  10. 10.
    Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, P.: SMOTE: synthetic minority over-sampling technique. Artif. Intell. Res. 16(1), 321–357 (2002)zbMATHGoogle Scholar
  11. 11.
    Abdi, L., Hashemi, S.: To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)CrossRefGoogle Scholar
  12. 12.
    Jian, C., Gao, J., Ao, Y.: A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193(1), 115–122 (2016)CrossRefGoogle Scholar
  13. 13.
    KEEL Data-Mining Software Tool: Data Set Repository. http://sci2s.ugr.es/keel/imbalanced.php. Accessed 30 May 2017
  14. 14.
    Lichman, M.: UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Accessed 30 May 2017
  15. 15.
    Dumitru, C., Maria, V.: Advantages and disadvantages of using neural networks for predictions. Ovidius University Ann. Econ. Sci. Ser. 13(1), 444–449 (2013)Google Scholar
  16. 16.
    Karamizadeh, S., Abdullah, S.M., Halimi, M., Shayan, J., Rajabi, M.J.: Advantage and drawback of support vector machine functionality. In: International Conference on Computer, Communications, and Control Technology (I4CT 2014), pp. 63–65. Langkawi (2014)Google Scholar
  17. 17.
    Batuwita, R., Palade, V.: FSVM-CIL: fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 18(3), 558–571 (2010)CrossRefGoogle Scholar
  18. 18.
    Pruengkarn, R., Wong, K.W., Fung, C.C.: Data cleaning using complementary fuzzy support vector machine technique. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 160–167. Springer, Cham (2016). doi: 10.1007/978-3-319-46672-9_19 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ratchakoon Pruengkarn
    • 1
    Email author
  • Kok Wai Wong
    • 1
  • Chun Che Fung
    • 1
  1. 1.School of Engineering and Information TechnologyMurdoch UniversityPerthAustralia

Personalised recommendations