Skip to main content

Instance Sampling for Boosted and Standalone Nearest Neighbor Classifiers

  • Chapter
Instance Selection and Construction for Data Mining

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

  • 282 Accesses

Abstract

Several previous research efforts have questioned the utility of combining nearest neighbor classifiers. We introduce an algorithm that combines a nearest neighbor classifier with a “small,” coarse-hypothesis nearest neighbor classifier that stores only one prototype per class. We show that this simple paired boosting scheme yields increased accuracy on some data sets.

The research presented in this article also extends previous work on prototype selection for a standalone nearest neighbor classifier. We show that in some domains, storing a very small number of prototypes can provide classification accuracy greater than or equal to that of a nearest neighbor classifier that stores all training instances. We extend previous work by demonstrating that algorithms that rely primarily on random sampling can effectively choose a small number of prototypes.

Finally, we present a taxonomy of instance types that arises from the statistics collected on the performance of a set of sampled nearest neighbor classifiers as they are applied to each individual instance. This taxonomy generalizes the idea of an outlier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D. W. (1990). A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, and Psychological Evaluations. PhD thesis, Dept. of Information and Computer Science, University of California, Irvine.

    Google Scholar 

  • Ali, K. and Pazzani, M. (1996). Error reduction through learning multiple descriptions. Machine Learning, 24:173.

    Google Scholar 

  • Alpaydin, E. (1997). Voting over Multiple Condensed Nearest Neighbors. Artificial Intelligence Review, 11:115–132.

    Article  Google Scholar 

  • Breiman, L. (1992). Stacked Regressions. Technical Report 367, Department of Statistics, University of California, Berkeley, CA.

    Google Scholar 

  • Breiman, L. (1994). Bagging predictors. Technical Report 421, Department of Statistics, University of California, Berkeley, CA.

    Google Scholar 

  • Brodley, C. (1994). Recursive Automatic Algorithm Selection for Inductive Learning. PhD thesis, Dept. of Computer Science, University of Massachusetts, Amherst, MA. (Available as Dept. of Computer Science Technical Report 96–61).

    Google Scholar 

  • Brodley, C. and Friedl, M. (1996). Identifying and eliminating mislabeled training instances. In Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pages 799–805. AAAI Press/MIT Press, Menlo Park, CA.

    Google Scholar 

  • Cameron-Jones, M. (1995). Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing. In Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence, pages 99–106. World Scientific.

    Google Scholar 

  • Chang, C. L. (1974). Finding Prototypes for Nearest Neighbor Classifiers. IEEE Transactions on Computers, c-23:1179–1184.

    Article  Google Scholar 

  • Cherkauer, K. and Shavlik, J. (1996). Growing simpler decision trees to facilitate knowledge discovery. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 315–318. AAAI Press, San Mateo, CA.

    Google Scholar 

  • Cleveland, W. (1993). Visualizing Data. Hobart Press, Summit, NJ.

    Google Scholar 

  • Dasarathy, B. V. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, CA.

    Google Scholar 

  • de la Maza, M. (1991). A Prototype Based Symbolic Concept Learning System. In Proceedings of the Eighth International Workshop on Machine Learning, pages 41–45, San Mateo, CA. Morgan Kaufmann.

    Google Scholar 

  • Drucker, H., Cortes, C., Jackel, L., LeCun, Y., and Vapnik, V. (1994). Boosting and other machine learning methods. In Proceedings of the Eleventh International Conference on Machine Learning, pages 53–61. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • Freund, Y. and Schapire, R. (1995). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. In Proceedings of the Second European Conference on Computational Learning Theory, pages 23–37. Springer Verlag, Barcelona, Spain.

    Chapter  Google Scholar 

  • Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • Gates, G. W. (1972). The Reduced Nearest Neighbor Rule. IEEE Transactions on Information Theory, IT-18, No. 3:431–433.

    Article  Google Scholar 

  • Hansen, M., Hurwitz, W., and Madow, W. (1953). Sample Survey Methods and Theory, volume I. John Wiley and Sons, New York, NY.

    MATH  Google Scholar 

  • Harries, M. (1999). Boosting a strong learner: Evidence against the minimum margin. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 171–180. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • Hart, P. E. (1968). The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory (Corresp.), IT-14:515–516.

    Article  Google Scholar 

  • Ho, T. (1998). Nearest Neighbors in Random Subspaces. In Proceedings of the Second International Workshop on Statistical Techniques in Pattern Recognition, pages 640–648, Sydney, Australia. Springer.

    Google Scholar 

  • Holte, R. C. (1993). Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning, 11:63–90.

    Article  MATH  Google Scholar 

  • Quinlan, J. (1999). Some elements of machine learning (extended abstract). In Proceedings of the Sixteenth International Conference on Machine Learning, pages 523–525. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1:81–106.

    Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  • Ripley, B. (1996). Pattern Recognition and Neural Networks. Cambridge, Cambridge, England.

    MATH  Google Scholar 

  • Skalak, D. (1997). Prototype Selection for Composite Nearest Neighbor Classifiers. PhD thesis, Dept. of Computer Science, University of Massachusetts, Amherst, MA.

    Google Scholar 

  • Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Proceedings of the Eleventh International Conference on Machine Learning, pages 293— 301, New Brunswick, NJ. Morgan Kaufmann.

    Google Scholar 

  • Turney, P. (1995). Technical Note: Bias and the Quantification of Stability. Machine Learning, 20:23–33.

    Google Scholar 

  • Voisin, J. and Devijver, P. A. (1987). An application of the MultieditCondensing technique to the reference selection problem in a print recognition system. Pattern Recognition, 5:465–474.

    Article  Google Scholar 

  • Wilson, D. (1972). Asymptotic Properties of Nearest Neighbor Rules using Edited Data. Institute of Electrical and Electronic Engineers Transactions on Systems, Man and Cybernetics, 2:408–421.

    MATH  Google Scholar 

  • Wolpert, D. (1992). Stacked Generalization. Neural Networks, 5:241–259.

    Article  Google Scholar 

  • Zhang, J. (1992). Selecting Typical Instances in Instance-Based Learning. In Proceedings of the Ninth International Machine Learning Workshop, pages 470–479, Aberdeen, Scotland. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Skalak, D.B. (2001). Instance Sampling for Boosted and Standalone Nearest Neighbor Classifiers. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3359-4_16

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4861-8

  • Online ISBN: 978-1-4757-3359-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics