Improving the Performance of the k Rare Class Nearest Neighbor Classifier by the Ranking of Point Patterns

László, Zsolt; Török, Levente; Kovács, György

doi:10.1007/978-3-319-90050-6_15

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10833))

Included in the following conference series:

International Symposium on Foundations of Information and Knowledge Systems

422 Accesses
1 Citations

Abstract

In most real life applications of classification, samples are imbalanced. Usually, this is due to the difficulty of data collection. Large margin, or instance based classifiers suffer a lot from sparsity of samples close to the dichotomy. In this work, we propose an improvement to a recent technique developed for rare class classification. The experimental results show a definite performance gain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://promise.site.uottawa.ca/SERepository/datasets-page.html.

References

He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, Hoboken (2013)
Book Google Scholar
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-09823-4_45
Chapter Google Scholar
Gagliardi, F.: Instance-based classifiers applied to medical databases: diagnosis and knowledge extraction. Artif. Intell. Med. 52(3), 123–139 (2011)
Article Google Scholar
Hu, L.-Y., Huang, M.-W., Ke, S.-W., Tsai, C.-F.: The distance function effect on k-nearest neighbor classification for medical datasets. SpringerPlus 5(1), 1304 (2016)
Article Google Scholar
Zhang, X., Li, Y.: A positive-biased nearest neighbour algorithm for imbalanced classification. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 293–304. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_25
Chapter Google Scholar
Liu, W., Chawla, S.: Class confidence weighted knn algorithms for imbalanced data sets. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 345–356. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20847-8_29
Chapter Google Scholar
Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 321–332. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20847-8_27
Chapter Google Scholar
Zhang, X.J., Tari, Z., Cheriet, M.: KRNN: k rare-class nearest neighbor classification. Pattern Recogn. 62(2), 33–44 (2017)
Article Google Scholar
Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 1–5 (2017)
MATH Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IJCNN, pp. 1322–1328 (2008)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Alhammady, H., Ramamohanaran, K.: Using emerging patterns and decision trees in rare-class classification. In: Proceedings of Fourth IEEE International Conference on Data Mining (ICDM04), pp. 315–318. IEEE, New York (2004)
Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)
Google Scholar
Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 45(2), 16:1–16:35 (2013)
Article Google Scholar
Qi, Z., Tian, Y., Shi, Y., Yu, X.: Cost-sensitive support vector machine for semi-supervised learning. Procedia Comput. Sci. 18, 1684–1689 (2013)
Article Google Scholar
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. In: SIGKDD Explorations, vol. 11, no. 1 (2009)
Article Google Scholar
Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive boosting. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 294–309 (2011)
Article Google Scholar
Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of Third IEEE International Conference on Data Mining ICDM2003, pp. 435–442. IEEE (2003)
Google Scholar
Holte, R.C., Acker, L., Porter, B.W.: Concept learning and the problem of small disjunts. In: Proceedings of IJCAI, pp. 813–818 (1989)
Google Scholar
Liu, W., Chawla, S., Cieslak, D., Chawla, N.: A robust decision tree algorithm for imbalanced data sets. In: Proceedings of the 2010 SIAM International Conference on Data Mining, p. 12 (2010)
Chapter Google Scholar
Cieslak, D.A., Chawla, N.V.: Learning decision trees for unbalanced data. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5211, pp. 241–256. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87479-9_34
Chapter Google Scholar
Carvalho, D.R., Freitas, A.A.: A genetic-algorithm for discovering small-disjunct rules in data mining. Appl. Soft Comput. 2(2), 75–88 (2002)
Article Google Scholar
Carvalho, D., Freitas, A.: A hybrid decision tree/genetic algorithm method for data mining. Inf. Sci. 163(1–3), 13–35 (2004)
Article Google Scholar
Hong, X., Chen, S., Harris, C.J.: A kernel-based two-class classifier for imbalanced data sets. IEEE Trans. Neural Netw. 18(1), 28–41 (2007)
Article Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of IJCAI 2001, pp. 973–978 (2001)
Google Scholar
Weiss, G.M., McCarthy, K., Zahar, B.: Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unqeual error costs. In: Proceedings of ICDM, pp. 35–41 (2007)
Google Scholar
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7
Chapter Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Article Google Scholar
Pekalska, E., Duin, R.P.W., Paclik, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)
Article Google Scholar
Huang, Y., Chiang, C., Shieh, J., Grimson, E.: Prototype optimization for nearest neighbor classification. Pattern Recogn. 35(6), 1237–1245 (2002)
Article Google Scholar
Wu, Y., Ianakiev, K., Govindaraju, V.: Improved k-nearest neighbor classification. Pattern Recogn. 35(10), 2311–2318 (2002)
Article Google Scholar
Wang, J., Neskovic, P., Cooper, L.: Neighborhood size selection in the k-nearest neighbour rule using statistical confidence. Pattern Recogn. 39, 417–423 (2006)
Article Google Scholar
Zhou, C.Y., Chen, Y.Q.: Improving nearest neighbor classification with cam weighted distance. Pattern Recogn. 39(4), 635–645 (2006)
Article Google Scholar
Gou, J., Du, L., Zhang, Y., Xiong, T.: A new distance-weighted k-nearest neighbor classifier. J. Inf. Comput. Sci. 9(6), 1429–1436 (2012)
Google Scholar
Dudani, S.A.: The distance weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC–6(4), 325–327 (1976)
Article Google Scholar
Yigit, H.: ABC-based distance weighted kNN algorithm. J. Exp. Theor. Artif. Intell. 27(2), 189–198 (2015)
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, M.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Moltchanov, D.: Distance distributions in random networks. Ad Hoc Netw. 10(6), 1146–1166 (2012)
Article Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository. Technical report (2013)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Analytical Minds Ltd., Budapest, Hungary
Zsolt László, Levente Török & György Kovács

Authors

Zsolt László
View author publications
You can also search for this author in PubMed Google Scholar
Levente Török
View author publications
You can also search for this author in PubMed Google Scholar
György Kovács
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to György Kovács .

Editor information

Editors and Affiliations

Software Competence Center Hagenberg, Hagenberg im Mühlkreis, Austria
Flavio Ferrarotti
Vienna University of Technology, Vienna, Austria
Stefan Woltran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

László, Z., Török, L., Kovács, G. (2018). Improving the Performance of the k Rare Class Nearest Neighbor Classifier by the Ranking of Point Patterns. In: Ferrarotti, F., Woltran, S. (eds) Foundations of Information and Knowledge Systems. FoIKS 2018. Lecture Notes in Computer Science(), vol 10833. Springer, Cham. https://doi.org/10.1007/978-3-319-90050-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-90050-6_15
Published: 18 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90049-0
Online ISBN: 978-3-319-90050-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics