Instance Sampling for Boosted and Standalone Nearest Neighbor Classifiers

Skalak, David B.

doi:10.1007/978-1-4757-3359-4_16

David B. Skalak³

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

282 Accesses

Abstract

Several previous research efforts have questioned the utility of combining nearest neighbor classifiers. We introduce an algorithm that combines a nearest neighbor classifier with a “small,” coarse-hypothesis nearest neighbor classifier that stores only one prototype per class. We show that this simple paired boosting scheme yields increased accuracy on some data sets.

The research presented in this article also extends previous work on prototype selection for a standalone nearest neighbor classifier. We show that in some domains, storing a very small number of prototypes can provide classification accuracy greater than or equal to that of a nearest neighbor classifier that stores all training instances. We extend previous work by demonstrating that algorithms that rely primarily on random sampling can effectively choose a small number of prototypes.

Finally, we present a taxonomy of instance types that arises from the statistics collected on the performance of a set of sampled nearest neighbor classifiers as they are applied to each individual instance. This taxonomy generalizes the idea of an outlier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D. W. (1990). A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, and Psychological Evaluations. PhD thesis, Dept. of Information and Computer Science, University of California, Irvine.
Google Scholar
Ali, K. and Pazzani, M. (1996). Error reduction through learning multiple descriptions. Machine Learning, 24:173.
Google Scholar
Alpaydin, E. (1997). Voting over Multiple Condensed Nearest Neighbors. Artificial Intelligence Review, 11:115–132.
Article Google Scholar
Breiman, L. (1992). Stacked Regressions. Technical Report 367, Department of Statistics, University of California, Berkeley, CA.
Google Scholar
Breiman, L. (1994). Bagging predictors. Technical Report 421, Department of Statistics, University of California, Berkeley, CA.
Google Scholar
Brodley, C. (1994). Recursive Automatic Algorithm Selection for Inductive Learning. PhD thesis, Dept. of Computer Science, University of Massachusetts, Amherst, MA. (Available as Dept. of Computer Science Technical Report 96–61).
Google Scholar
Brodley, C. and Friedl, M. (1996). Identifying and eliminating mislabeled training instances. In Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pages 799–805. AAAI Press/MIT Press, Menlo Park, CA.
Google Scholar
Cameron-Jones, M. (1995). Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing. In Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence, pages 99–106. World Scientific.
Google Scholar
Chang, C. L. (1974). Finding Prototypes for Nearest Neighbor Classifiers. IEEE Transactions on Computers, c-23:1179–1184.
Article Google Scholar
Cherkauer, K. and Shavlik, J. (1996). Growing simpler decision trees to facilitate knowledge discovery. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 315–318. AAAI Press, San Mateo, CA.
Google Scholar
Cleveland, W. (1993). Visualizing Data. Hobart Press, Summit, NJ.
Google Scholar
Dasarathy, B. V. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, CA.
Google Scholar
de la Maza, M. (1991). A Prototype Based Symbolic Concept Learning System. In Proceedings of the Eighth International Workshop on Machine Learning, pages 41–45, San Mateo, CA. Morgan Kaufmann.
Google Scholar
Drucker, H., Cortes, C., Jackel, L., LeCun, Y., and Vapnik, V. (1994). Boosting and other machine learning methods. In Proceedings of the Eleventh International Conference on Machine Learning, pages 53–61. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Freund, Y. and Schapire, R. (1995). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. In Proceedings of the Second European Conference on Computational Learning Theory, pages 23–37. Springer Verlag, Barcelona, Spain.
Chapter Google Scholar
Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Gates, G. W. (1972). The Reduced Nearest Neighbor Rule. IEEE Transactions on Information Theory, IT-18, No. 3:431–433.
Article Google Scholar
Hansen, M., Hurwitz, W., and Madow, W. (1953). Sample Survey Methods and Theory, volume I. John Wiley and Sons, New York, NY.
MATH Google Scholar
Harries, M. (1999). Boosting a strong learner: Evidence against the minimum margin. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 171–180. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Hart, P. E. (1968). The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory (Corresp.), IT-14:515–516.
Article Google Scholar
Ho, T. (1998). Nearest Neighbors in Random Subspaces. In Proceedings of the Second International Workshop on Statistical Techniques in Pattern Recognition, pages 640–648, Sydney, Australia. Springer.
Google Scholar
Holte, R. C. (1993). Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning, 11:63–90.
Article MATH Google Scholar
Quinlan, J. (1999). Some elements of machine learning (extended abstract). In Proceedings of the Sixteenth International Conference on Machine Learning, pages 523–525. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1:81–106.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Ripley, B. (1996). Pattern Recognition and Neural Networks. Cambridge, Cambridge, England.
MATH Google Scholar
Skalak, D. (1997). Prototype Selection for Composite Nearest Neighbor Classifiers. PhD thesis, Dept. of Computer Science, University of Massachusetts, Amherst, MA.
Google Scholar
Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Proceedings of the Eleventh International Conference on Machine Learning, pages 293— 301, New Brunswick, NJ. Morgan Kaufmann.
Google Scholar
Turney, P. (1995). Technical Note: Bias and the Quantification of Stability. Machine Learning, 20:23–33.
Google Scholar
Voisin, J. and Devijver, P. A. (1987). An application of the MultieditCondensing technique to the reference selection problem in a print recognition system. Pattern Recognition, 5:465–474.
Article Google Scholar
Wilson, D. (1972). Asymptotic Properties of Nearest Neighbor Rules using Edited Data. Institute of Electrical and Electronic Engineers Transactions on Systems, Man and Cybernetics, 2:408–421.
MATH Google Scholar
Wolpert, D. (1992). Stacked Generalization. Neural Networks, 5:241–259.
Article Google Scholar
Zhang, J. (1992). Selecting Typical Instances in Instance-Based Learning. In Proceedings of the Ninth International Machine Learning Workshop, pages 470–479, Aberdeen, Scotland. Morgan Kaufmann, San Mateo, CA.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Data Mining and Analytics Group, Cornell Theory Center, Cornell University, Frank H.T. Rhodes Hall, Ithaca, NY 148, USA
David B. Skalak

Authors

David B. Skalak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arizona State University, USA
Huan Liu
Osaka University, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Skalak, D.B. (2001). Instance Sampling for Boosted and Standalone Nearest Neighbor Classifiers. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_16

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3359-4_16
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics