Skip to main content

Improving the Performance of the k Rare Class Nearest Neighbor Classifier by the Ranking of Point Patterns

  • Conference paper
  • First Online:
Foundations of Information and Knowledge Systems (FoIKS 2018)

Abstract

In most real life applications of classification, samples are imbalanced. Usually, this is due to the difficulty of data collection. Large margin, or instance based classifiers suffer a lot from sparsity of samples close to the dichotomy. In this work, we propose an improvement to a recent technique developed for rare class classification. The experimental results show a definite performance gain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://promise.site.uottawa.ca/SERepository/datasets-page.html.

References

  1. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  2. He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, Hoboken (2013)

    Book  Google Scholar 

  3. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-09823-4_45

    Chapter  Google Scholar 

  4. Gagliardi, F.: Instance-based classifiers applied to medical databases: diagnosis and knowledge extraction. Artif. Intell. Med. 52(3), 123–139 (2011)

    Article  Google Scholar 

  5. Hu, L.-Y., Huang, M.-W., Ke, S.-W., Tsai, C.-F.: The distance function effect on k-nearest neighbor classification for medical datasets. SpringerPlus 5(1), 1304 (2016)

    Article  Google Scholar 

  6. Zhang, X., Li, Y.: A positive-biased nearest neighbour algorithm for imbalanced classification. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 293–304. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_25

    Chapter  Google Scholar 

  7. Liu, W., Chawla, S.: Class confidence weighted knn algorithms for imbalanced data sets. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 345–356. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20847-8_29

    Chapter  Google Scholar 

  8. Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 321–332. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20847-8_27

    Chapter  Google Scholar 

  9. Zhang, X.J., Tari, Z., Cheriet, M.: KRNN: k rare-class nearest neighbor classification. Pattern Recogn. 62(2), 33–44 (2017)

    Article  Google Scholar 

  10. Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 1–5 (2017)

    MATH  Google Scholar 

  11. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IJCNN, pp. 1322–1328 (2008)

    Google Scholar 

  12. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  13. Alhammady, H., Ramamohanaran, K.: Using emerging patterns and decision trees in rare-class classification. In: Proceedings of Fourth IEEE International Conference on Data Mining (ICDM04), pp. 315–318. IEEE, New York (2004)

    Google Scholar 

  14. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)

    Google Scholar 

  15. Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 45(2), 16:1–16:35 (2013)

    Article  Google Scholar 

  16. Qi, Z., Tian, Y., Shi, Y., Yu, X.: Cost-sensitive support vector machine for semi-supervised learning. Procedia Comput. Sci. 18, 1684–1689 (2013)

    Article  Google Scholar 

  17. Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999)

    Google Scholar 

  18. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. In: SIGKDD Explorations, vol. 11, no. 1 (2009)

    Article  Google Scholar 

  19. Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive boosting. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 294–309 (2011)

    Article  Google Scholar 

  20. Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of Third IEEE International Conference on Data Mining ICDM2003, pp. 435–442. IEEE (2003)

    Google Scholar 

  21. Holte, R.C., Acker, L., Porter, B.W.: Concept learning and the problem of small disjunts. In: Proceedings of IJCAI, pp. 813–818 (1989)

    Google Scholar 

  22. Liu, W., Chawla, S., Cieslak, D., Chawla, N.: A robust decision tree algorithm for imbalanced data sets. In: Proceedings of the 2010 SIAM International Conference on Data Mining, p. 12 (2010)

    Chapter  Google Scholar 

  23. Cieslak, D.A., Chawla, N.V.: Learning decision trees for unbalanced data. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5211, pp. 241–256. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87479-9_34

    Chapter  Google Scholar 

  24. Carvalho, D.R., Freitas, A.A.: A genetic-algorithm for discovering small-disjunct rules in data mining. Appl. Soft Comput. 2(2), 75–88 (2002)

    Article  Google Scholar 

  25. Carvalho, D., Freitas, A.: A hybrid decision tree/genetic algorithm method for data mining. Inf. Sci. 163(1–3), 13–35 (2004)

    Article  Google Scholar 

  26. Hong, X., Chen, S., Harris, C.J.: A kernel-based two-class classifier for imbalanced data sets. IEEE Trans. Neural Netw. 18(1), 28–41 (2007)

    Article  Google Scholar 

  27. Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of IJCAI 2001, pp. 973–978 (2001)

    Google Scholar 

  28. Weiss, G.M., McCarthy, K., Zahar, B.: Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unqeual error costs. In: Proceedings of ICDM, pp. 35–41 (2007)

    Google Scholar 

  29. Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7

    Chapter  Google Scholar 

  30. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)

    Article  Google Scholar 

  31. Pekalska, E., Duin, R.P.W., Paclik, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)

    Article  Google Scholar 

  32. Huang, Y., Chiang, C., Shieh, J., Grimson, E.: Prototype optimization for nearest neighbor classification. Pattern Recogn. 35(6), 1237–1245 (2002)

    Article  Google Scholar 

  33. Wu, Y., Ianakiev, K., Govindaraju, V.: Improved k-nearest neighbor classification. Pattern Recogn. 35(10), 2311–2318 (2002)

    Article  Google Scholar 

  34. Wang, J., Neskovic, P., Cooper, L.: Neighborhood size selection in the k-nearest neighbour rule using statistical confidence. Pattern Recogn. 39, 417–423 (2006)

    Article  Google Scholar 

  35. Zhou, C.Y., Chen, Y.Q.: Improving nearest neighbor classification with cam weighted distance. Pattern Recogn. 39(4), 635–645 (2006)

    Article  Google Scholar 

  36. Gou, J., Du, L., Zhang, Y., Xiong, T.: A new distance-weighted k-nearest neighbor classifier. J. Inf. Comput. Sci. 9(6), 1429–1436 (2012)

    Google Scholar 

  37. Dudani, S.A.: The distance weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC–6(4), 325–327 (1976)

    Article  Google Scholar 

  38. Yigit, H.: ABC-based distance weighted kNN algorithm. J. Exp. Theor. Artif. Intell. 27(2), 189–198 (2015)

    Article  Google Scholar 

  39. Manning, C.D., Raghavan, P., Schütze, M.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  40. Moltchanov, D.: Distance distributions in random networks. Ad Hoc Netw. 10(6), 1146–1166 (2012)

    Article  Google Scholar 

  41. Bache, K., Lichman, M.: UCI machine learning repository. Technical report (2013)

    Google Scholar 

  42. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to György Kovács .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

László, Z., Török, L., Kovács, G. (2018). Improving the Performance of the k Rare Class Nearest Neighbor Classifier by the Ranking of Point Patterns. In: Ferrarotti, F., Woltran, S. (eds) Foundations of Information and Knowledge Systems. FoIKS 2018. Lecture Notes in Computer Science(), vol 10833. Springer, Cham. https://doi.org/10.1007/978-3-319-90050-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90050-6_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90049-0

  • Online ISBN: 978-3-319-90050-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics