Skip to main content

Feature Extraction for the k-Nearest Neighbour Classifier with Genetic Programming

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2038))

Abstract

In pattern recognition the curse of dimensionality can be handled either by reducing the number of features, e.g. with decision trees or by extraction of new features.

We propose a genetic programming (GP) framework for automatic extraction of features with the express aim of dimension reduction and the additional aim of improving accuracy of the k-nearest neighbour (k-NN) classifier. We will show that our system is capable of reducing most datasets to one or two features while k-NN accuracy improves or stays the same. Such a small number of features has the great advantage of allowing visual inspection of the dataset in a two-dimensional plot.

Since k-NN is a non-linear classification algorithm [2], we compare several linear fitness measures. We will show the a very simple one, the accuracy of the minimal distance to means (mdm) classifier outperforms all other fitness measures.

We introduce a stopping criterion gleaned from numeric mathematics. New features are only added if the relative increase in training accuracy is more than a constant d, for the mdm classifier estimated to be 3.3%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E.I. Chang and R.P. Lippman. Using genetic algorithms to improve pattern classification performance. In Advances in Neural Information Processing Systems, 1991.

    Google Scholar 

  2. J. Friedman, J. Bentley, and R. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209–226, 1977.

    Article  MATH  Google Scholar 

  3. K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, second edition, 1990.

    MATH  Google Scholar 

  4. J. Edward Jackson. A User’s Guide to Principal Components. John Wiley & Sons, Inc, 1991.

    Google Scholar 

  5. M. Kotani, M. Nakai, and K. Akazawa. Feature extraction using evolutionary computation. In CEC 1999, pages 1230–1236, 1999.

    Google Scholar 

  6. H. Liu and R. Setiono. Feature transformation and multivariate decision tree induction. In Discovery Science, pages 279–290, 1998.

    Google Scholar 

  7. B. Masand and G. Piatetsky-Shapiro. Discovering time oriented abstractions in historical data to optimize decision tree classification. In P. Angeline and E. Kinnear Jr, editors, Advances in Genetic Programming, volume 2, pages 489–498. MIT Press, 1996.

    Google Scholar 

  8. T. Mitchell. Machine Learning. WCB/McGraw-Hill, 1997.

    Google Scholar 

  9. M.L. Raymer, W.F. Punch, E.D. Goodman, and L.A. Kuhn. Genetic programming for improved data mining: An application to the biochemistry of protein interactions. In Proceedings GP 1996, pages 375–380. MIT Press, 1996.

    Google Scholar 

  10. R. Setiono and H. Liu. Fragmentation problem and automated feature construction. In Proc. 10th IEEE Int. Conf on Tools with AI, pages 208–215, 1998.

    Google Scholar 

  11. J. Sherrah. Automatic Feature Extraction for Pattern Recognition. PhD thesis, University of Adelaide, South Australia, 1998.

    Google Scholar 

  12. Zijian Zheng. A comparison of constructive induction with different types of new attribute. Technical Report TR C96/8, Deakin University, Geelong, Australia, May 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bot, M.C.J. (2001). Feature Extraction for the k-Nearest Neighbour Classifier with Genetic Programming. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tettamanzi, A.G.B., Langdon, W.B. (eds) Genetic Programming. EuroGP 2001. Lecture Notes in Computer Science, vol 2038. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45355-5_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-45355-5_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41899-3

  • Online ISBN: 978-3-540-45355-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics