Abstract
In most real life applications of classification, samples are imbalanced. Usually, this is due to the difficulty of data collection. Large margin, or instance based classifiers suffer a lot from sparsity of samples close to the dichotomy. In this work, we propose an improvement to a recent technique developed for rare class classification. The experimental results show a definite performance gain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, Hoboken (2013)
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-09823-4_45
Gagliardi, F.: Instance-based classifiers applied to medical databases: diagnosis and knowledge extraction. Artif. Intell. Med. 52(3), 123–139 (2011)
Hu, L.-Y., Huang, M.-W., Ke, S.-W., Tsai, C.-F.: The distance function effect on k-nearest neighbor classification for medical datasets. SpringerPlus 5(1), 1304 (2016)
Zhang, X., Li, Y.: A positive-biased nearest neighbour algorithm for imbalanced classification. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 293–304. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_25
Liu, W., Chawla, S.: Class confidence weighted knn algorithms for imbalanced data sets. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 345–356. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20847-8_29
Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 321–332. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20847-8_27
Zhang, X.J., Tari, Z., Cheriet, M.: KRNN: k rare-class nearest neighbor classification. Pattern Recogn. 62(2), 33–44 (2017)
Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 1–5 (2017)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IJCNN, pp. 1322–1328 (2008)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Alhammady, H., Ramamohanaran, K.: Using emerging patterns and decision trees in rare-class classification. In: Proceedings of Fourth IEEE International Conference on Data Mining (ICDM04), pp. 315–318. IEEE, New York (2004)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)
Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 45(2), 16:1–16:35 (2013)
Qi, Z., Tian, Y., Shi, Y., Yu, X.: Cost-sensitive support vector machine for semi-supervised learning. Procedia Comput. Sci. 18, 1684–1689 (2013)
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. In: SIGKDD Explorations, vol. 11, no. 1 (2009)
Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive boosting. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 294–309 (2011)
Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of Third IEEE International Conference on Data Mining ICDM2003, pp. 435–442. IEEE (2003)
Holte, R.C., Acker, L., Porter, B.W.: Concept learning and the problem of small disjunts. In: Proceedings of IJCAI, pp. 813–818 (1989)
Liu, W., Chawla, S., Cieslak, D., Chawla, N.: A robust decision tree algorithm for imbalanced data sets. In: Proceedings of the 2010 SIAM International Conference on Data Mining, p. 12 (2010)
Cieslak, D.A., Chawla, N.V.: Learning decision trees for unbalanced data. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5211, pp. 241–256. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87479-9_34
Carvalho, D.R., Freitas, A.A.: A genetic-algorithm for discovering small-disjunct rules in data mining. Appl. Soft Comput. 2(2), 75–88 (2002)
Carvalho, D., Freitas, A.: A hybrid decision tree/genetic algorithm method for data mining. Inf. Sci. 163(1–3), 13–35 (2004)
Hong, X., Chen, S., Harris, C.J.: A kernel-based two-class classifier for imbalanced data sets. IEEE Trans. Neural Netw. 18(1), 28–41 (2007)
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of IJCAI 2001, pp. 973–978 (2001)
Weiss, G.M., McCarthy, K., Zahar, B.: Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unqeual error costs. In: Proceedings of ICDM, pp. 35–41 (2007)
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Pekalska, E., Duin, R.P.W., Paclik, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)
Huang, Y., Chiang, C., Shieh, J., Grimson, E.: Prototype optimization for nearest neighbor classification. Pattern Recogn. 35(6), 1237–1245 (2002)
Wu, Y., Ianakiev, K., Govindaraju, V.: Improved k-nearest neighbor classification. Pattern Recogn. 35(10), 2311–2318 (2002)
Wang, J., Neskovic, P., Cooper, L.: Neighborhood size selection in the k-nearest neighbour rule using statistical confidence. Pattern Recogn. 39, 417–423 (2006)
Zhou, C.Y., Chen, Y.Q.: Improving nearest neighbor classification with cam weighted distance. Pattern Recogn. 39(4), 635–645 (2006)
Gou, J., Du, L., Zhang, Y., Xiong, T.: A new distance-weighted k-nearest neighbor classifier. J. Inf. Comput. Sci. 9(6), 1429–1436 (2012)
Dudani, S.A.: The distance weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC–6(4), 325–327 (1976)
Yigit, H.: ABC-based distance weighted kNN algorithm. J. Exp. Theor. Artif. Intell. 27(2), 189–198 (2015)
Manning, C.D., Raghavan, P., Schütze, M.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Moltchanov, D.: Distance distributions in random networks. Ad Hoc Netw. 10(6), 1146–1166 (2012)
Bache, K., Lichman, M.: UCI machine learning repository. Technical report (2013)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
László, Z., Török, L., Kovács, G. (2018). Improving the Performance of the k Rare Class Nearest Neighbor Classifier by the Ranking of Point Patterns. In: Ferrarotti, F., Woltran, S. (eds) Foundations of Information and Knowledge Systems. FoIKS 2018. Lecture Notes in Computer Science(), vol 10833. Springer, Cham. https://doi.org/10.1007/978-3-319-90050-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-90050-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90049-0
Online ISBN: 978-3-319-90050-6
eBook Packages: Computer ScienceComputer Science (R0)