Skip to main content

A Positive-biased Nearest Neighbour Algorithm for Imbalanced Classification

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7819))

Included in the following conference series:

Abstract

The k nearest neighbour (kNN) algorithm classifies a query instance to the most frequent class among its k nearest neighbours in the training instance space. For imbalanced class distribution where positive training instances are rare, a query instance is often overwhelmed by negative instances in its neighbourhood and likely to be classified to the negative majority class. In this paper we propose a Positive-biased Nearest Neighbour (PNN) algorithm, where the local neighbourhood of query instances is dynamically formed and classification decision is carefully adjusted based on class distribution in the local neighbourhood. Extensive experiments on real-world imbalanced datasets show that PNN has good performance for imbalanced classification. PNN often outperforms recent kNN-based imbalanced classification algorithms while significantly reducing their extra computation cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D.W. (ed.): Lazy learning. Kluwer Academic Publishers, Norwell (1997)

    MATH  Google Scholar 

  2. Aha, D.W., Kibler, D.F., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)

    Google Scholar 

  3. Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/

  4. Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)

    Article  Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  6. Cover, T., Hart, P.: Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  7. Domingos, P.: MetaCost: A general method for making classifiers cost-sensitive. In: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD 1999), pp. 155–164. ACM Press (1999)

    Google Scholar 

  8. Ferrandiz, S., Boullé, M.: Bayesian instance selection for the nearest neighbor rule. Machine Learning 81(3), 229–256 (2010)

    Article  Google Scholar 

  9. Holte, R.C., Acker, L., Porter, B.W.: Concept learning and the problem of small disjuncts. In: Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pp. 813–818 (1989)

    Google Scholar 

  10. Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explorations 6(1), 40–49 (2004)

    Article  MathSciNet  Google Scholar 

  11. Kubat, M., Holte, R., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. In: Machine Learning, pp. 195–215 (1998)

    Google Scholar 

  12. Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 321–332. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Liu, W., Chawla, S.: Class confidence weighted knn algorithms for imbalanced data sets. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 345–356. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  14. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33, 2–13 (2007)

    Article  Google Scholar 

  15. Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998), pp. 445–453. Morgan Kaufmann (1998)

    Google Scholar 

  16. Provost, F.J., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)

    Article  MATH  Google Scholar 

  17. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)

    Google Scholar 

  18. Swets, J.: Measuring the accuracy of diagnostic systems. Science 240(4857), 1285–1293 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ting, K.: The problem of small disjuncts: its remedy in decision trees. In: Proceedings of the 10th Canadian Conference on Artificial Intelligence, pp. 91–97 (1994)

    Google Scholar 

  20. Van Den Bosch, A., Weijters, A., Van Den Herik, H.J., Daelemans, W.: When small disjuncts abound, try lazy learning: A case study. In: Proceedings of the Seventh Belgian-Dutch Conference on Machine Learning, pp. 109–118 (1997)

    Google Scholar 

  21. Wang, J., Neskovic, P., Cooper, L.: Neighborhood size selection in the k-nearest-neighbour rule using statistical confidence. Pattern Recognition 39, 417–423 (2006)

    Article  MATH  Google Scholar 

  22. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explorations 6(1), 7–19 (2004)

    Article  Google Scholar 

  23. Weiss, G.M., Hirsh, H.: A quantitative study of small disjuncts. In: Proceedings of the National Conference on Artificial Intelligence, pp. 665–670 (2000)

    Google Scholar 

  24. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. In: Machine Learning, pp. 257–286 (2000)

    Google Scholar 

  25. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, X., Li, Y. (2013). A Positive-biased Nearest Neighbour Algorithm for Imbalanced Classification. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37456-2_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37455-5

  • Online ISBN: 978-3-642-37456-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics