Advertisement

A Classifier Combining Local Distance Mean and Centroid for Imbalanced Datasets

  • Yingying Zhao
  • Xingcheng LiuEmail author
Conference paper
  • 122 Downloads
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 313)

Abstract

The K-Nearest Neighbor (KNN) algorithm is widely used in practical life because of its simplicity and easy understanding. However, the traditional KNN algorithm has some shortcomings. It only considers the number of samples of different classes in k neighbors, but ignores the distance and location distribution of the unknown sample relative to the k nearest training samples. Moreover, classes imbalance problem is always a challenge faced with the KNN algorithm. To solve the above problems, we propose an improved KNN classification method for classes imbalanced datasets based on local distance mean and centroid (LDMC-KNN) in this paper. In the proposed scheme, different numbers of nearest neighbor training samples are selected from each class, and the unknown sample is classified according to the distance and position of these nearest training samples. Experiments are performed on the UCI datasets. The results show that the proposed algorithm has strong competitiveness and is always far superior to KNN algorithm and its variants.

Keywords

K-Nearest Neighbor (KNN) Local distance mean Centroid Classes imbalance Classifier 

References

  1. 1.
    Wu, X., Zuo, W., Lin, L., Jia, W., Zhang, D.: F-SVM: combination of feature transformation and SVM learning via convex relaxation. IEEE Trans. Neural Netw. Learn. Syst. 29(11), 5185–5199 (2018)CrossRefGoogle Scholar
  2. 2.
    Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3), 660–674 (1991)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Jiang, L., Zhang, L., Li, C., Wu, J.: A correlation-based feature weighting filter for Naive Bayes. IEEE Trans. Knowl. Data Eng. 31(2), 201–213 (2019)CrossRefGoogle Scholar
  4. 4.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(10), 21–27 (1967)zbMATHCrossRefGoogle Scholar
  5. 5.
    Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar
  6. 6.
    Mullick, S.S., Datta, S., Das, S.: Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance. IEEE Trans. Neural Netw. Learn. Syst. 29(11), 5713–5725 (2018)MathSciNetGoogle Scholar
  7. 7.
    García-Pedrajas, N., Romero del Castillo, J.A. Cerruela-García, G.: A proposal for local k values for k-nearest neighbor rule. IEEE Trans. Neural Netw. Learn. Syst. 28(2), 470–475 (2017)Google Scholar
  8. 8.
    Zeng, Y., Yang, Y., Zhao, L.: Pseudo nearest neighbor rule for pattern classification. Pattern Recogn. Lett. 36(2), 3587–3595 (2009)Google Scholar
  9. 9.
    Mitani, Y., Hamamoto, Y.: A local mean-based nonparametric classifier. Pattern Recogn. Lett. 27(10), 1151–1159 (2006)CrossRefGoogle Scholar
  10. 10.
    Pan, Z., Wang, Y., Ku, W.: A new k-harmonic nearest neighbor classifier based on the multi-local means. Expert Syst. Appl. 67, 115–125 (2017)CrossRefGoogle Scholar
  11. 11.
    Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence: Special Track on Inductive Learning, Las Vegas, pp. 111–117 (2000)Google Scholar
  12. 12.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Philip Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)zbMATHCrossRefGoogle Scholar
  13. 13.
    He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, pp. 1322–1328. IEEE, Hong Kong (2008)Google Scholar
  14. 14.
    Zhang, X., Li, Y., Kotagiri, R., Wu, L., Tari, Z., Cheriet, M.: KRNN: k rare-class nearest neighbour classification. Pattern Recogn. 62, 33–44 (2017)CrossRefGoogle Scholar
  15. 15.
    Dubey, H., Pudi, V.: Class based weighted k-nearest neighbor over imbalance dataset. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 305–316. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37456-2_26CrossRefGoogle Scholar
  16. 16.
    Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 321–332. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20847-8_27CrossRefGoogle Scholar
  17. 17.
    Liu, W., Chawla, S.: Class confidence weighted kNN algorithms for imbalanced data sets. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 345–356. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20847-8_29CrossRefGoogle Scholar
  18. 18.
    Dua, D., Graff, C.: UCI machine learning repository (2019)Google Scholar
  19. 19.
    Zhang, X., Li, Y.: A positive-biased nearest neighbour algorithm for imbalanced classification. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 293–304. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37456-2_25CrossRefGoogle Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2020

Authors and Affiliations

  1. 1.School of Electronics and Information TechnologySun Yat-sen UniversityGuangzhouChina
  2. 2.School of Information ScienceXinhua College of Sun Yat-sen UniversityGuangzhouChina

Personalised recommendations